From moilanen at austin.ibm.com Tue Nov 1 08:45:40 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 31 Oct 2005 15:45:40 -0600 Subject: [PATCH 2/2] Export Physical IO base address In-Reply-To: <1130540087.29054.128.camel@gaston> References: <20051028150035.3d1da846.moilanen@austin.ibm.com> <20051028150804.73b5cedb.moilanen@austin.ibm.com> <1130540087.29054.128.camel@gaston> Message-ID: <20051031154540.195b0de0.moilanen@austin.ibm.com> On Sat, 29 Oct 2005 08:54:47 +1000 Benjamin Herrenschmidt wrote: > On Fri, 2005-10-28 at 15:08 -0500, Jake Moilanen wrote: > > This patch exports the physical IO base address so drivers can pick it > > up when using addresses from the device-tree. > > Why ? What is your driver exactly trying to do ? TPM needs to get the base address for IO as an offset into IO space. This base physical address is stored in the reg property in the device-node. To calculate the offset, need to do: TPM_base_phy_addr - io_base_phys. Thus the need to export this address. Jake From arndb at de.ibm.com Tue Nov 1 12:08:38 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:38 -0500 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.300238000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-smp.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/03409828/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:37 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:37 -0500 Subject: [patch 1/5] powerpc: Rename BPA to Cell References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.134984000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-kconfig.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/751fab3b/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:36 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:36 -0500 Subject: [patch 0/5] Move Cell stuff to arch/powerpc Message-ID: <20051101010836.771791000@localhost> As promised, here is my new patch set moving all Cell stuff over to arch/powerpc. Please apply. Arnd <>< From arndb at de.ibm.com Tue Nov 1 12:08:40 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:40 -0500 Subject: [patch 4/5] powerpc: move mmio_nvram.c over to arch/powerpc References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.623411000@localhost> An embedded and charset-unspecified text was scrubbed... Name: mmio-nvram.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/f47181fa/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:39 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:39 -0500 Subject: [patch 3/5] powerpc: move rtas_fw.c out of platforms/pseries References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.463223000@localhost> An embedded and charset-unspecified text was scrubbed... Name: rtas-flash.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/5894915b/attachment.txt From arndb at de.ibm.com Tue Nov 1 12:08:41 2005 From: arndb at de.ibm.com (Arnd Bergmann) Date: Mon, 31 Oct 2005 20:08:41 -0500 Subject: [patch 5/5] powerpc: move arch/ppc64/kernel/bpa* to arch/powerpc/platforms/cell References: <20051101010836.771791000@localhost> Message-ID: <20051101011133.788778000@localhost> An embedded and charset-unspecified text was scrubbed... Name: cell-platform.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051031/13827568/attachment.txt From michael at ellerman.id.au Tue Nov 1 10:26:55 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 1 Nov 2005 10:26:55 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <20051101011133.300238000@localhost> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> Message-ID: <200511011026.59266.michael@ellerman.id.au> On Tue, 1 Nov 2005 12:08, Arnd Bergmann wrote: > During the conversion to the merge tree, the Cell specific > SMP initialization was removed from the pSeries code. > > This creates a new Cell specific SMP implementation file. > > Signed-off-by: Arnd Bergmann > > --- > > arch/powerpc/platforms/Makefile | 1 > arch/powerpc/platforms/cell/Makefile | 1 > arch/powerpc/platforms/cell/smp.c | 230 ++++++++++++++++++++ > include/asm-ppc64/smp.h | 1 > 4 files changed, 233 insertions(+) A lot of your smp routines are identical to the pSeries versions. Wouldn't it be preferable to only have one implementation? cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051101/0d649a53/attachment.pgp From benh at kernel.crashing.org Tue Nov 1 10:27:23 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Nov 2005 10:27:23 +1100 Subject: [PATCH 2/2] Export Physical IO base address In-Reply-To: <20051031154540.195b0de0.moilanen@austin.ibm.com> References: <20051028150035.3d1da846.moilanen@austin.ibm.com> <20051028150804.73b5cedb.moilanen@austin.ibm.com> <1130540087.29054.128.camel@gaston> <20051031154540.195b0de0.moilanen@austin.ibm.com> Message-ID: <1130801243.29054.376.camel@gaston> On Mon, 2005-10-31 at 15:45 -0600, Jake Moilanen wrote: > On Sat, 29 Oct 2005 08:54:47 +1000 > Benjamin Herrenschmidt wrote: > > > On Fri, 2005-10-28 at 15:08 -0500, Jake Moilanen wrote: > > > This patch exports the physical IO base address so drivers can pick it > > > up when using addresses from the device-tree. > > > > Why ? What is your driver exactly trying to do ? > > TPM needs to get the base address for IO as an offset into IO space. > > This base physical address is stored in the reg property in the > device-node. > > To calculate the offset, need to do: TPM_base_phy_addr - io_base_phys. > > Thus the need to export this address. Hrm... that is sooo bogus. If the device-tree exposes a full physical address in "reg", then just use that with ioremap (ignore the fact that it's actually IO space). Additionally, tell the firmware folks to fix their device-tree, this is all very bogus to me. The TPM device is on the LPC bus right ? Thus it should appear below the LPC/ISA bridge and thus get proper address space. It's totally broken to put it anywhere else Ben. From arnd at arndb.de Tue Nov 1 10:50:48 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 1 Nov 2005 00:50:48 +0100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511011026.59266.michael@ellerman.id.au> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> <200511011026.59266.michael@ellerman.id.au> Message-ID: <200511010050.48828.arnd@arndb.de> On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > A lot of your smp routines are identical to the pSeries versions. Wouldn't it > be preferable to only have one implementation? Yes it would. I'm not sure how that would best be done however. Until 2.6.14, we've just used the pSeries implementation, which does not work any more now that we want to keep the platform stuff in separate directories. One idea might be to split out the rtas calls (startup_cpu, give_timebase, take_timebase) to rtas.c so they can be included by all chrp-descendants. smp_init_cell() can be further simplified under the assumption that we're always SMT and never LPAR, although the latter might change in the future. Arnd <>< From benh at kernel.crashing.org Tue Nov 1 11:05:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Nov 2005 11:05:08 +1100 Subject: [PATCH] tpm: support PPC64 hardware In-Reply-To: <1130769479.4882.35.camel@localhost.localdomain> References: <1130769479.4882.35.camel@localhost.localdomain> Message-ID: <1130803508.29054.388.camel@gaston> On Mon, 2005-10-31 at 08:37 -0600, Kylene Jo Hall wrote: > The TPM is discovered differently on PPC64 because the device must be > discovered through the device tree in order to open the proper holes in > the io_page_mask for reading and writing in the low memory space. This > does not happen automatically like most devices because the tpm is not a > normal pci device and lives under the root node. > > This patch contains the necessary changes to the tpm logic. > > This depends on patches submitted by Jake Moilanen (10/28) to allow for > the opening of holes in the io_page_mask for this device. Please submit to the appropriate list (linuxppc64-dev at ozlabs.org). There are some issues with that patch. One, I intend to get rid of the io_page_mask, so that part at least is gone. I don't like the exporting of io_base_phys neither, it's an ugly hack. Other comments inline. > +/* Verify this is a 1.1 Atmel TPM */ > +static int atmel_verify_tpm11(void) > +{ > + struct device_node * dn; > + char *compat; > + int compat_len; > + > + dn = find_devices("tpm"); find_devices() is a deprecated interface. Use the of_find_node_* series and do an of_node_put() once done. > + if (!dn) > + return 1; > + > + compat = (char *) get_property(dn, "compatible", &compat_len); > + if (!compat) > + return 1; > + > + if ( strcmp( compat,"AT97SC3201_r") == 0 ) > + return 0; > + Testing the "compatible" property this way is bogus. Use device_is_compatible(). Or better, use of_find_compatible_node() which allows you to find by type & compatible in one step. > + dn = find_devices("tpm"); Same comment as above. In addition, why do you have to do it twice ? You should rethink your changes. Only one "probe" should be needed that retreives the base addresses. > + if (!dn) > + return 0; > + > + reg = (unsigned int *) get_property(dn, "reg", ®len); > + naddrc = prom_n_addr_cells(dn); > + nsizec = prom_n_size_cells(dn); > + > + for (i = 0; i < reglen; i = i + naddrc + nsizec) { > + > + if (naddrc == 2) > + address = ((unsigned long)reg[i] << 32) | reg[i+1]; > + else > + address = reg[i]; > + > + address = address - pci_io_base_phys; That is bogosity. Your address is an ISA IO address, It should be relative to the parent LPC bus and thus useable as is. It looks like the firmware folks crapped the device-tree. Please check that with them. If they decide to stick with a broken device-tree, then you'll have to consider the address as an MMIO address. That mean you'll have to change the IO accesses of the TPM driver to use the iomap API thus making it immune of IO cs. MMIO distinction. > + allow_isa_address(address, address+size-1); That is going away. Ben. From michael at ellerman.id.au Tue Nov 1 11:13:23 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 1 Nov 2005 11:13:23 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511010050.48828.arnd@arndb.de> References: <20051101010836.771791000@localhost> <200511011026.59266.michael@ellerman.id.au> <200511010050.48828.arnd@arndb.de> Message-ID: <200511011113.26609.michael@ellerman.id.au> On Tue, 1 Nov 2005 10:50, Arnd Bergmann wrote: > On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > > A lot of your smp routines are identical to the pSeries versions. > > Wouldn't it be preferable to only have one implementation? > > Yes it would. I'm not sure how that would best be done however. Until > 2.6.14, we've just used the pSeries implementation, which does not work any > more now that we want to keep the platform stuff in separate directories. > > One idea might be to split out the rtas calls (startup_cpu, give_timebase, > take_timebase) to rtas.c so they can be included by all chrp-descendants. > smp_init_cell() can be further simplified under the assumption that we're > always SMT and never LPAR, although the latter might change in the future. OK, I'm not sure what the best spot is. arch/powerpc/sysdev is apparently the place for stuff that's not core-kernel but shared between platforms, although maybe smp ops are core, I dunno. cheers -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051101/dc38ac71/attachment.pgp From david at gibson.dropbear.id.au Tue Nov 1 15:30:26 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 15:30:26 +1100 Subject: powerpc: Move naca.h to platforms/iseries Message-ID: <20051101043026.GB27961@localhost.localdomain> Paulus, please apply. These days, the NACA only exists on iSeries. Therefore, this patch moves naca.h from include/asm-ppc64 to arch/powerpc/platforms/iseries. There was one file including naca.h outside of platforms/iseries - arch/ppc64/kernel/udbg_scc.c. However, that's obviously a hangover from older days. The include is not necessary, so this patch simply removes it. Built and booted on iSeries, built for G5 (which uses udbg_scc.o). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/platforms/iseries/lpardata.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/lpardata.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/lpardata.c 2005-11-01 15:15:57.000000000 +1100 @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -23,6 +22,7 @@ #include #include +#include "naca.h" #include "vpd_areas.h" #include "spcomm_area.h" #include "ipl_parms.h" Index: working-2.6/arch/powerpc/platforms/iseries/naca.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/platforms/iseries/naca.h 2005-11-01 15:28:03.000000000 +1100 @@ -0,0 +1,24 @@ +#ifndef _PLATFORMS_ISERIES_NACA_H +#define _PLATFORMS_ISERIES_NACA_H + +/* + * c 2001 PPC 64 Team, IBM Corp + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include + +struct naca_struct { + /* Kernel only data - undefined for user space */ + void *xItVpdAreas; /* VPD Data 0x00 */ + void *xRamDisk; /* iSeries ramdisk 0x08 */ + u64 xRamDiskSize; /* In pages 0x10 */ +}; + +extern struct naca_struct naca; + +#endif /* _PLATFORMS_ISERIES_NACA_H */ Index: working-2.6/arch/powerpc/platforms/iseries/release_data.h =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/release_data.h 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/release_data.h 2005-11-01 15:15:57.000000000 +1100 @@ -24,7 +24,7 @@ * address of the OS's NACA). */ #include -#include +#include "naca.h" /* * When we IPL a secondary partition, we will check if if the Index: working-2.6/arch/powerpc/platforms/iseries/setup.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/setup.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/setup.c 2005-11-01 15:15:57.000000000 +1100 @@ -40,7 +40,6 @@ #include #include -#include #include #include #include @@ -53,6 +52,7 @@ #include #include +#include "naca.h" #include "setup.h" #include "irq.h" #include "vpd_areas.h" Index: working-2.6/arch/ppc64/kernel/udbg_scc.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg_scc.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/udbg_scc.c 2005-11-01 15:15:57.000000000 +1100 @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/include/asm-ppc64/naca.h =================================================================== --- working-2.6.orig/include/asm-ppc64/naca.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,24 +0,0 @@ -#ifndef _NACA_H -#define _NACA_H - -/* - * c 2001 PPC 64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include - -struct naca_struct { - /* Kernel only data - undefined for user space */ - void *xItVpdAreas; /* VPD Data 0x00 */ - void *xRamDisk; /* iSeries ramdisk 0x08 */ - u64 xRamDiskSize; /* In pages 0x10 */ -}; - -extern struct naca_struct naca; - -#endif /* _NACA_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Nov 1 16:53:24 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 16:53:24 +1100 Subject: powerpc: Merge ipcbuf.h Message-ID: <20051101055324.GA3551@localhost.localdomain> Paulus, please apply. This patch merges ppc32 and ppc64 versions of ipcbuf.h. The merge is essentially trivial, since the structure defined in each version was already identical. Only wrinkle is that the merged version now includes linux/types.h in order to get the fixed width integer types. In fact, the old versions probably should have been including that anyway, since the file uses various __kernel_*_t types. Built and booted on G5, built for 32-bit pmac, but not booted, since the merge tree currently doesn't boot there. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-01 15:44:01.000000000 +1100 @@ -0,0 +1,34 @@ +#ifndef _ASM_POWERPC_IPCBUF_H +#define _ASM_POWERPC_IPCBUF_H + +/* + * The ipc64_perm structure for the powerpc is identical to + * kern_ipc_perm as we have always had 32-bit UIDs and GIDs in the + * kernel. Note extra padding because this structure is passed back + * and forth between kernel and user space. Pad space is left for: + * - 1 32-bit value to fill up for 8-byte alignment + * - 2 miscellaneous 64-bit values + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include + +struct ipc64_perm +{ + __kernel_key_t key; + __kernel_uid_t uid; + __kernel_gid_t gid; + __kernel_uid_t cuid; + __kernel_gid_t cgid; + __kernel_mode_t mode; + unsigned int seq; + unsigned int __pad1; + u64 __unused1; + u64 __unused2; +}; + +#endif /* _ASM_POWERPC_IPCBUF_H */ Index: working-2.6/include/asm-ppc64/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-ppc64/ipcbuf.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,28 +0,0 @@ -#ifndef __PPC64_IPCBUF_H__ -#define __PPC64_IPCBUF_H__ - -/* - * The ipc64_perm structure for the PPC is identical to kern_ipc_perm - * as we have always had 32-bit UIDs and GIDs in the kernel. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -struct ipc64_perm -{ - __kernel_key_t key; - __kernel_uid_t uid; - __kernel_gid_t gid; - __kernel_uid_t cuid; - __kernel_gid_t cgid; - __kernel_mode_t mode; - unsigned int seq; - unsigned int __pad1; - unsigned long __unused1; - unsigned long __unused2; -}; - -#endif /* __PPC64_IPCBUF_H__ */ Index: working-2.6/include/asm-ppc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-ppc/ipcbuf.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,29 +0,0 @@ -#ifndef __PPC_IPCBUF_H__ -#define __PPC_IPCBUF_H__ - -/* - * The ipc64_perm structure for PPC architecture. - * Note extra padding because this structure is passed back and forth - * between kernel and user space. - * - * Pad space is left for: - * - 1 32-bit value to fill up for 8-byte alignment - * - 2 miscellaneous 64-bit values (so that this structure matches - * PPC64 ipc64_perm) - */ - -struct ipc64_perm -{ - __kernel_key_t key; - __kernel_uid_t uid; - __kernel_gid_t gid; - __kernel_uid_t cuid; - __kernel_gid_t cgid; - __kernel_mode_t mode; - unsigned long seq; - unsigned int __pad2; - unsigned long long __unused1; - unsigned long long __unused2; -}; - -#endif /* __PPC_IPCBUF_H__ */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Nov 1 17:28:10 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 1 Nov 2005 17:28:10 +1100 Subject: powerpc: Merge bitops.h In-Reply-To: <20051031064823.GD6622@localhost.localdomain> References: <20051031064823.GD6622@localhost.localdomain> Message-ID: <20051101062810.GC3551@localhost.localdomain> Here's a revised version. This re-introduces the set_bits() function from ppc64, which I removed because I thought it was unused (it exists on no other arch). In fact it is used in the powermac interrupt code (but not on pSeries). This seems to be running fine on my G5 (ARCH=powerpc), but it still hasn't been tested on 32-bit, which should probably happen before merging. - We use LARXL/STCXL macros to generate the right (32 or 64 bit) instructions, similar to LDL/STL from ppc_asm.h, used in fpu.S - ppc32 previously used a full "sync" barrier at the end of test_and_*_bit(), whereas ppc64 used an "isync". The merged version uses "isync", since I believe that's sufficient. - The ppc64 versions of then minix_*() bitmap functions have changed semantics. Previously on ppc64, these functions were big-endian (that is bit 0 was the LSB in the first 64-bit, big-endian word). On ppc32 (and x86, for that matter, they were little-endian. As far as I can tell, the big-endian usage was simply wrong - I guess no-one ever tried to use minixfs on ppc64. - On ppc32 find_next_bit() and find_next_zero_bit() are no longer inline (they were already out-of-line on ppc64). - For ppc64, sched_find_first_bit() has moved from mmu_context.h to the merged bitops. What it was doing in mmu_context.h in the first place, I have no idea. - The fls() function is now implemented using the cntlzw instruction on ppc64, instead of generic_fls(), as it already was on ppc32. - For ARCH=ppc, this patch requires adding arch/powerpc/lib to the arch/ppc/Makefile. This in turn requires some changes to arch/powerpc/lib/Makefile which didn't correctly handle ARCH=ppc. Built and running on G5. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc/bitops.h =================================================================== --- working-2.6.orig/include/asm-ppc/bitops.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,460 +0,0 @@ -/* - * bitops.h: Bit string operations on the ppc - */ - -#ifdef __KERNEL__ -#ifndef _PPC_BITOPS_H -#define _PPC_BITOPS_H - -#include -#include -#include -#include - -/* - * The test_and_*_bit operations are taken to imply a memory barrier - * on SMP systems. - */ -#ifdef CONFIG_SMP -#define SMP_WMB "eieio\n" -#define SMP_MB "\nsync" -#else -#define SMP_WMB -#define SMP_MB -#endif /* CONFIG_SMP */ - -static __inline__ void set_bit(int nr, volatile unsigned long * addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - or %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc" ); -} - -/* - * non-atomic version - */ -static __inline__ void __set_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p |= mask; -} - -/* - * clear_bit doesn't imply a memory barrier - */ -#define smp_mb__before_clear_bit() smp_mb() -#define smp_mb__after_clear_bit() smp_mb() - -static __inline__ void clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - andc %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -/* - * non-atomic version - */ -static __inline__ void __clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p &= ~mask; -} - -static __inline__ void change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__("\n\ -1: lwarx %0,0,%3 \n\ - xor %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -/* - * non-atomic version - */ -static __inline__ void __change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - *p ^= mask; -} - -/* - * test_and_*_bit do imply a memory barrier (?) - */ -static __inline__ int test_and_set_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - or %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_set_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old | mask; - return (old & mask) != 0; -} - -static __inline__ int test_and_clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - andc %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_clear_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old & ~mask; - return (old & mask) != 0; -} - -static __inline__ int test_and_change_bit(int nr, volatile unsigned long *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - xor %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -/* - * non-atomic version - */ -static __inline__ int __test_and_change_bit(int nr, volatile unsigned long *addr) -{ - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - unsigned long old = *p; - - *p = old ^ mask; - return (old & mask) != 0; -} - -static __inline__ int test_bit(int nr, __const__ volatile unsigned long *addr) -{ - return ((addr[nr >> 5] >> (nr & 0x1f)) & 1) != 0; -} - -/* Return the bit position of the most significant 1 bit in a word */ -static __inline__ int __ilog2(unsigned long x) -{ - int lz; - - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); - return 31 - lz; -} - -static __inline__ int ffz(unsigned long x) -{ - if ((x = ~x) == 0) - return 32; - return __ilog2(x & -x); -} - -static inline int __ffs(unsigned long x) -{ - return __ilog2(x & -x); -} - -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - return __ilog2(x & -x) + 1; -} - -/* - * fls: find last (most-significant) bit set. - * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. - */ -static __inline__ int fls(unsigned int x) -{ - int lz; - - asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); - return 32 - lz; -} - -/* - * hweightN: returns the hamming weight (i.e. the number - * of bits set) of a N-bit word - */ - -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) - -/* - * Find the first bit set in a 140-bit bitmap. - * The first 100 bits are unlikely to be set. - */ -static inline int sched_find_first_bit(const unsigned long *b) -{ - if (unlikely(b[0])) - return __ffs(b[0]); - if (unlikely(b[1])) - return __ffs(b[1]) + 32; - if (unlikely(b[2])) - return __ffs(b[2]) + 64; - if (b[3]) - return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; -} - -/** - * find_next_bit - find the next set bit in a memory region - * @addr: The address to base the search on - * @offset: The bitnumber to start searching at - * @size: The maximum size to search - */ -static __inline__ unsigned long find_next_bit(const unsigned long *addr, - unsigned long size, unsigned long offset) -{ - unsigned int *p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = *p++; - tmp &= ~0UL << offset; - if (size < 32) - goto found_first; - if (tmp) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = *p++) != 0) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp &= ~0UL >> (32 - size); - if (tmp == 0UL) /* Are any bits set? */ - return result + size; /* Nope. */ -found_middle: - return result + __ffs(tmp); -} - -/** - * find_first_bit - find the first set bit in a memory region - * @addr: The address to start the search at - * @size: The maximum size to search - * - * Returns the bit-number of the first set bit, not the number of the byte - * containing a bit. - */ -#define find_first_bit(addr, size) \ - find_next_bit((addr), (size), 0) - -/* - * This implementation of find_{first,next}_zero_bit was stolen from - * Linus' asm-alpha/bitops.h. - */ -#define find_first_zero_bit(addr, size) \ - find_next_zero_bit((addr), (size), 0) - -static __inline__ unsigned long find_next_zero_bit(const unsigned long *addr, - unsigned long size, unsigned long offset) -{ - unsigned int * p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = *p++; - tmp |= ~0UL >> (32-offset); - if (size < 32) - goto found_first; - if (tmp != ~0U) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = *p++) != ~0U) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = *p; -found_first: - tmp |= ~0UL << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - - -#define ext2_set_bit(nr, addr) __test_and_set_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_set_bit_atomic(lock, nr, addr) test_and_set_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_clear_bit(nr, addr) __test_and_clear_bit((nr) ^ 0x18, (unsigned long *)(addr)) -#define ext2_clear_bit_atomic(lock, nr, addr) test_and_clear_bit((nr) ^ 0x18, (unsigned long *)(addr)) - -static __inline__ int ext2_test_bit(int nr, __const__ void * addr) -{ - __const__ unsigned char *ADDR = (__const__ unsigned char *) addr; - - return (ADDR[nr >> 3] >> (nr & 7)) & 1; -} - -/* - * This implementation of ext2_find_{first,next}_zero_bit was stolen from - * Linus' asm-alpha/bitops.h and modified for a big-endian machine. - */ - -#define ext2_find_first_zero_bit(addr, size) \ - ext2_find_next_zero_bit((addr), (size), 0) - -static __inline__ unsigned long ext2_find_next_zero_bit(const void *addr, - unsigned long size, unsigned long offset) -{ - unsigned int *p = ((unsigned int *) addr) + (offset >> 5); - unsigned int result = offset & ~31UL; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31UL; - if (offset) { - tmp = cpu_to_le32p(p++); - tmp |= ~0UL >> (32-offset); - if (size < 32) - goto found_first; - if (tmp != ~0U) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = cpu_to_le32p(p++)) != ~0U) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = cpu_to_le32p(p); -found_first: - tmp |= ~0U << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - -/* Bitmap functions for the minix filesystem. */ -#define minix_test_and_set_bit(nr,addr) ext2_set_bit(nr,addr) -#define minix_set_bit(nr,addr) ((void)ext2_set_bit(nr,addr)) -#define minix_test_and_clear_bit(nr,addr) ext2_clear_bit(nr,addr) -#define minix_test_bit(nr,addr) ext2_test_bit(nr,addr) -#define minix_find_first_zero_bit(addr,size) ext2_find_first_zero_bit(addr,size) - -#endif /* _PPC_BITOPS_H */ -#endif /* __KERNEL__ */ Index: working-2.6/include/asm-ppc64/bitops.h =================================================================== --- working-2.6.orig/include/asm-ppc64/bitops.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,360 +0,0 @@ -/* - * PowerPC64 atomic bit operations. - * Dave Engebretsen, Todd Inglett, Don Reed, Pat McCarthy, Peter Bergner, - * Anton Blanchard - * - * Originally taken from the 32b PPC code. Modified to use 64b values for - * the various counters & memory references. - * - * Bitops are odd when viewed on big-endian systems. They were designed - * on little endian so the size of the bitset doesn't matter (low order bytes - * come first) as long as the bit in question is valid. - * - * Bits are "tested" often using the C expression (val & (1< - -/* - * clear_bit doesn't imply a memory barrier - */ -#define smp_mb__before_clear_bit() smp_mb() -#define smp_mb__after_clear_bit() smp_mb() - -static __inline__ int test_bit(unsigned long nr, __const__ volatile unsigned long *addr) -{ - return (1UL & (addr[nr >> 6] >> (nr & 63))); -} - -static __inline__ void set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # set_bit\n\ - or %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ void clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # clear_bit\n\ - andc %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ void change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # change_bit\n\ - xor %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -static __inline__ int test_and_set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_set_bit\n\ - or %1,%0,%2 \n\ - stdcx. %1,0,%3 \n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_clear_bit\n\ - andc %1,%0,%2\n\ - stdcx. %1,0,%3\n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ int test_and_change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long old, t; - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - __asm__ __volatile__( - EIEIO_ON_SMP -"1: ldarx %0,0,%3 # test_and_change_bit\n\ - xor %1,%0,%2\n\ - stdcx. %1,0,%3\n\ - bne- 1b" - ISYNC_ON_SMP - : "=&r" (old), "=&r" (t) - : "r" (mask), "r" (p) - : "cc", "memory"); - - return (old & mask) != 0; -} - -static __inline__ void set_bits(unsigned long mask, unsigned long *addr) -{ - unsigned long old; - - __asm__ __volatile__( -"1: ldarx %0,0,%3 # set_bit\n\ - or %0,%0,%2\n\ - stdcx. %0,0,%3\n\ - bne- 1b" - : "=&r" (old), "=m" (*addr) - : "r" (mask), "r" (addr), "m" (*addr) - : "cc"); -} - -/* - * non-atomic versions - */ -static __inline__ void __set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p |= mask; -} - -static __inline__ void __clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p &= ~mask; -} - -static __inline__ void __change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - - *p ^= mask; -} - -static __inline__ int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old | mask; - return (old & mask) != 0; -} - -static __inline__ int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old & ~mask; - return (old & mask) != 0; -} - -static __inline__ int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr) -{ - unsigned long mask = 1UL << (nr & 0x3f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 6); - unsigned long old = *p; - - *p = old ^ mask; - return (old & mask) != 0; -} - -/* - * Return the zero-based bit position (from RIGHT TO LEFT, 63 -> 0) of the - * most significant (left-most) 1-bit in a double word. - */ -static __inline__ int __ilog2(unsigned long x) -{ - int lz; - - asm ("cntlzd %0,%1" : "=r" (lz) : "r" (x)); - return 63 - lz; -} - -/* - * Determines the bit position of the least significant (rightmost) 0 bit - * in the specified double word. The returned bit position will be zero-based, - * starting from the right side (63 - 0). - */ -static __inline__ unsigned long ffz(unsigned long x) -{ - /* no zero exists anywhere in the 8 byte area. */ - if ((x = ~x) == 0) - return 64; - - /* - * Calculate the bit position of the least signficant '1' bit in x - * (since x has been changed this will actually be the least signficant - * '0' bit in * the original x). Note: (x & -x) gives us a mask that - * is the least significant * (RIGHT-most) 1-bit of the value in x. - */ - return __ilog2(x & -x); -} - -static __inline__ int __ffs(unsigned long x) -{ - return __ilog2(x & -x); -} - -/* - * ffs: find first bit set. This is defined the same way as - * the libc and compiler builtin ffs routines, therefore - * differs in spirit from the above ffz (man ffs). - */ -static __inline__ int ffs(int x) -{ - unsigned long i = (unsigned long)x; - return __ilog2(i & -i) + 1; -} - -/* - * fls: find last (most-significant) bit set. - * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. - */ -#define fls(x) generic_fls(x) - -/* - * hweightN: returns the hamming weight (i.e. the number - * of bits set) of a N-bit word - */ -#define hweight64(x) generic_hweight64(x) -#define hweight32(x) generic_hweight32(x) -#define hweight16(x) generic_hweight16(x) -#define hweight8(x) generic_hweight8(x) - -extern unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_zero_bit(addr, size) \ - find_next_zero_bit((addr), (size), 0) - -extern unsigned long find_next_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_bit(addr, size) \ - find_next_bit((addr), (size), 0) - -extern unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); -#define find_first_zero_le_bit(addr, size) \ - find_next_zero_le_bit((addr), (size), 0) - -static __inline__ int test_le_bit(unsigned long nr, __const__ unsigned long * addr) -{ - __const__ unsigned char *ADDR = (__const__ unsigned char *) addr; - return (ADDR[nr >> 3] >> (nr & 7)) & 1; -} - -#define test_and_clear_le_bit(nr, addr) \ - test_and_clear_bit((nr) ^ 0x38, (addr)) -#define test_and_set_le_bit(nr, addr) \ - test_and_set_bit((nr) ^ 0x38, (addr)) - -/* - * non-atomic versions - */ - -#define __set_le_bit(nr, addr) \ - __set_bit((nr) ^ 0x38, (addr)) -#define __clear_le_bit(nr, addr) \ - __clear_bit((nr) ^ 0x38, (addr)) -#define __test_and_clear_le_bit(nr, addr) \ - __test_and_clear_bit((nr) ^ 0x38, (addr)) -#define __test_and_set_le_bit(nr, addr) \ - __test_and_set_bit((nr) ^ 0x38, (addr)) - -#define ext2_set_bit(nr,addr) \ - __test_and_set_le_bit((nr), (unsigned long*)addr) -#define ext2_clear_bit(nr, addr) \ - __test_and_clear_le_bit((nr), (unsigned long*)addr) - -#define ext2_set_bit_atomic(lock, nr, addr) \ - test_and_set_le_bit((nr), (unsigned long*)addr) -#define ext2_clear_bit_atomic(lock, nr, addr) \ - test_and_clear_le_bit((nr), (unsigned long*)addr) - - -#define ext2_test_bit(nr, addr) test_le_bit((nr),(unsigned long*)addr) -#define ext2_find_first_zero_bit(addr, size) \ - find_first_zero_le_bit((unsigned long*)addr, size) -#define ext2_find_next_zero_bit(addr, size, off) \ - find_next_zero_le_bit((unsigned long*)addr, size, off) - -#define minix_test_and_set_bit(nr,addr) test_and_set_bit(nr,addr) -#define minix_set_bit(nr,addr) set_bit(nr,addr) -#define minix_test_and_clear_bit(nr,addr) test_and_clear_bit(nr,addr) -#define minix_test_bit(nr,addr) test_bit(nr,addr) -#define minix_find_first_zero_bit(addr,size) find_first_zero_bit(addr,size) - -#endif /* __KERNEL__ */ -#endif /* _PPC64_BITOPS_H */ Index: working-2.6/arch/powerpc/lib/bitops.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/lib/bitops.c 2005-11-01 15:51:56.000000000 +1100 @@ -0,0 +1,150 @@ +#include +#include +#include +#include + +/** + * find_next_bit - find the next set bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The maximum size to search + */ +unsigned long find_next_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp &= (~0UL << offset); + if (size < BITS_PER_LONG) + goto found_first; + if (tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if ((tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp &= (~0UL >> (64 - size)); + if (tmp == 0UL) /* Are any bits set? */ + return result + size; /* Nope. */ +found_middle: + return result + __ffs(tmp); +} +EXPORT_SYMBOL(find_next_bit); + +/* + * This implementation of find_{first,next}_zero_bit was stolen from + * Linus' asm-alpha/bitops.h. + */ +unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset & ~(BITS_PER_LONG-1); + unsigned long tmp; + + if (offset >= size) + return size; + size -= result; + offset %= BITS_PER_LONG; + if (offset) { + tmp = *(p++); + tmp |= ~0UL >> (BITS_PER_LONG - offset); + if (size < BITS_PER_LONG) + goto found_first; + if (~tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + while (size & ~(BITS_PER_LONG-1)) { + if (~(tmp = *(p++))) + goto found_middle; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = *p; + +found_first: + tmp |= ~0UL << size; + if (tmp == ~0UL) /* Are any bits zero? */ + return result + size; /* Nope. */ +found_middle: + return result + ffz(tmp); +} +EXPORT_SYMBOL(find_next_zero_bit); + +static inline unsigned int ext2_ilog2(unsigned int x) +{ + int lz; + + asm("cntlzw %0,%1": "=r"(lz):"r"(x)); + return 31 - lz; +} + +static inline unsigned int ext2_ffz(unsigned int x) +{ + u32 rc; + if ((x = ~x) == 0) + return 32; + rc = ext2_ilog2(x & -x); + return rc; +} + +unsigned long find_next_zero_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + const unsigned int *p = ((const unsigned int *)addr) + (offset >> 5); + unsigned int result = offset & ~31; + unsigned int tmp; + + if (offset >= size) + return size; + size -= result; + offset &= 31; + if (offset) { + tmp = cpu_to_le32p(p++); + tmp |= ~0U >> (32 - offset); /* bug or feature ? */ + if (size < 32) + goto found_first; + if (tmp != ~0) + goto found_middle; + size -= 32; + result += 32; + } + while (size >= 32) { + if ((tmp = cpu_to_le32p(p++)) != ~0) + goto found_middle; + result += 32; + size -= 32; + } + if (!size) + return result; + tmp = cpu_to_le32p(p); +found_first: + tmp |= ~0 << size; + if (tmp == ~0) /* Are any bits zero? */ + return result + size; /* Nope. */ +found_middle: + return result + ext2_ffz(tmp); +} +EXPORT_SYMBOL(find_next_zero_le_bit); Index: working-2.6/arch/ppc64/kernel/bitops.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/bitops.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,147 +0,0 @@ -/* - * These are too big to be inlined. - */ - -#include -#include -#include -#include - -unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned long *p = addr + (offset >> 6); - unsigned long result = offset & ~63UL; - unsigned long tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 63UL; - if (offset) { - tmp = *(p++); - tmp |= ~0UL >> (64 - offset); - if (size < 64) - goto found_first; - if (~tmp) - goto found_middle; - size -= 64; - result += 64; - } - while (size & ~63UL) { - if (~(tmp = *(p++))) - goto found_middle; - result += 64; - size -= 64; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp |= ~0UL << size; - if (tmp == ~0UL) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ffz(tmp); -} - -EXPORT_SYMBOL(find_next_zero_bit); - -unsigned long find_next_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned long *p = addr + (offset >> 6); - unsigned long result = offset & ~63UL; - unsigned long tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 63UL; - if (offset) { - tmp = *(p++); - tmp &= (~0UL << offset); - if (size < 64) - goto found_first; - if (tmp) - goto found_middle; - size -= 64; - result += 64; - } - while (size & ~63UL) { - if ((tmp = *(p++))) - goto found_middle; - result += 64; - size -= 64; - } - if (!size) - return result; - tmp = *p; - -found_first: - tmp &= (~0UL >> (64 - size)); - if (tmp == 0UL) /* Are any bits set? */ - return result + size; /* Nope. */ -found_middle: - return result + __ffs(tmp); -} - -EXPORT_SYMBOL(find_next_bit); - -static inline unsigned int ext2_ilog2(unsigned int x) -{ - int lz; - - asm("cntlzw %0,%1": "=r"(lz):"r"(x)); - return 31 - lz; -} - -static inline unsigned int ext2_ffz(unsigned int x) -{ - u32 rc; - if ((x = ~x) == 0) - return 32; - rc = ext2_ilog2(x & -x); - return rc; -} - -unsigned long find_next_zero_le_bit(const unsigned long *addr, unsigned long size, - unsigned long offset) -{ - const unsigned int *p = ((const unsigned int *)addr) + (offset >> 5); - unsigned int result = offset & ~31; - unsigned int tmp; - - if (offset >= size) - return size; - size -= result; - offset &= 31; - if (offset) { - tmp = cpu_to_le32p(p++); - tmp |= ~0U >> (32 - offset); /* bug or feature ? */ - if (size < 32) - goto found_first; - if (tmp != ~0) - goto found_middle; - size -= 32; - result += 32; - } - while (size >= 32) { - if ((tmp = cpu_to_le32p(p++)) != ~0) - goto found_middle; - result += 32; - size -= 32; - } - if (!size) - return result; - tmp = cpu_to_le32p(p); -found_first: - tmp |= ~0 << size; - if (tmp == ~0) /* Are any bits zero? */ - return result + size; /* Nope. */ -found_middle: - return result + ext2_ffz(tmp); -} - -EXPORT_SYMBOL(find_next_zero_le_bit); Index: working-2.6/arch/powerpc/kernel/ppc_ksyms.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/ppc_ksyms.c 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/ppc_ksyms.c 2005-11-01 15:51:56.000000000 +1100 @@ -81,15 +81,6 @@ EXPORT_SYMBOL(ucSystemType); #endif -#if !defined(__INLINE_BITOPS) -EXPORT_SYMBOL(set_bit); -EXPORT_SYMBOL(clear_bit); -EXPORT_SYMBOL(change_bit); -EXPORT_SYMBOL(test_and_set_bit); -EXPORT_SYMBOL(test_and_clear_bit); -EXPORT_SYMBOL(test_and_change_bit); -#endif /* __INLINE_BITOPS */ - EXPORT_SYMBOL(strcpy); EXPORT_SYMBOL(strncpy); EXPORT_SYMBOL(strcat); Index: working-2.6/arch/ppc/kernel/bitops.c =================================================================== --- working-2.6.orig/arch/ppc/kernel/bitops.c 2005-10-25 11:59:53.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,126 +0,0 @@ -/* - * Copyright (C) 1996 Paul Mackerras. - */ - -#include -#include - -/* - * If the bitops are not inlined in bitops.h, they are defined here. - * -- paulus - */ -#if !__INLINE_BITOPS -void set_bit(int nr, volatile void * addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - or %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc" ); -} - -void clear_bit(int nr, volatile void *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - andc %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -void change_bit(int nr, volatile void *addr) -{ - unsigned long old; - unsigned long mask = 1 << (nr & 0x1f); - unsigned long *p = ((unsigned long *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%3 \n\ - xor %0,%0,%2 \n" - PPC405_ERR77(0,%3) -" stwcx. %0,0,%3 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); -} - -int test_and_set_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - or %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} - -int test_and_clear_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - andc %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} - -int test_and_change_bit(int nr, volatile void *addr) -{ - unsigned int old, t; - unsigned int mask = 1 << (nr & 0x1f); - volatile unsigned int *p = ((volatile unsigned int *)addr) + (nr >> 5); - - __asm__ __volatile__(SMP_WMB "\n\ -1: lwarx %0,0,%4 \n\ - xor %1,%0,%3 \n" - PPC405_ERR77(0,%4) -" stwcx. %1,0,%4 \n\ - bne 1b" - SMP_MB - : "=&r" (old), "=&r" (t), "=m" (*p) - : "r" (mask), "r" (p), "m" (*p) - : "cc"); - - return (old & mask) != 0; -} -#endif /* !__INLINE_BITOPS */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -13,7 +13,7 @@ obj-y += irq.o idle.o dma.o \ signal.o \ - align.o bitops.o pacaData.o \ + align.o pacaData.o \ udbg.o ioctl32.o \ rtc.o \ cpu_setup_power4.o \ Index: working-2.6/include/asm-powerpc/bitops.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/bitops.h 2005-11-01 15:51:56.000000000 +1100 @@ -0,0 +1,437 @@ +/* + * PowerPC atomic bit operations. + * + * Merged version by David Gibson . + * Based on ppc64 versions by: Dave Engebretsen, Todd Inglett, Don + * Reed, Pat McCarthy, Peter Bergner, Anton Blanchard. They + * originally took it from the ppc32 code. + * + * Within a word, bits are numbered LSB first. Lot's of places make + * this assumption by directly testing bits with (val & (1< 1 word) bitmaps on a + * big-endian system because, unlike little endian, the number of each + * bit depends on the word size. + * + * The bitop functions are defined to work on unsigned longs, so for a + * ppc64 system the bits end up numbered: + * |63..............0|127............64|191...........128|255...........196| + * and on ppc32: + * |31.....0|63....31|95....64|127...96|159..128|191..160|223..192|255..224| + * + * There are a few little-endian macros used mostly for filesystem + * bitmaps, these work on similar bit arrays layouts, but + * byte-oriented: + * |7...0|15...8|23...16|31...24|39...32|47...40|55...48|63...56| + * + * The main difference is that bit 3-5 (64b) or 3-4 (32b) in the bit + * number field needs to be reversed compared to the big-endian bit + * fields. This can be achieved by XOR with 0x38 (64b) or 0x18 (32b). + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_POWERPC_BITOPS_H +#define _ASM_POWERPC_BITOPS_H + +#ifdef __KERNEL__ + +#include +#include +#include + +/* + * clear_bit doesn't imply a memory barrier + */ +#define smp_mb__before_clear_bit() smp_mb() +#define smp_mb__after_clear_bit() smp_mb() + +#define BITOP_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) +#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) +#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) + +#ifdef CONFIG_PPC64 +#define LARXL "ldarx" +#define STCXL "stdcx." +#define CNTLZL "cntlzd" +#else +#define LARXL "lwarx" +#define STCXL "stwcx." +#define CNTLZL "cntlzw" +#endif + +static __inline__ void set_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "or %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ void clear_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "andc %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ void change_bit(int nr, volatile unsigned long *addr) +{ + unsigned long old; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "xor %0,%0,%2\n" + PPC405_ERR77(0,%3) + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r"(old), "=m"(*p) + : "r"(mask), "r"(p), "m"(*p) + : "cc" ); +} + +static __inline__ int test_and_set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_set_bit\n" + "or %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ int test_and_clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_clear_bit\n" + "andc %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ int test_and_change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long old, t; + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + __asm__ __volatile__( + EIEIO_ON_SMP +"1:" LARXL " %0,0,%3 # test_and_change_bit\n" + "xor %1,%0,%2 \n" + PPC405_ERR77(0,%3) + STCXL " %1,0,%3 \n" + "bne- 1b" + ISYNC_ON_SMP + : "=&r" (old), "=&r" (t) + : "r" (mask), "r" (p) + : "cc", "memory"); + + return (old & mask) != 0; +} + +static __inline__ void set_bits(unsigned long mask, unsigned long *addr) +{ + unsigned long old; + + __asm__ __volatile__( +"1:" LARXL " %0,0,%3 # set_bit\n" + "or %0,%0,%2\n" + STCXL " %0,0,%3\n" + "bne- 1b" + : "=&r" (old), "=m" (*addr) + : "r" (mask), "r" (addr), "m" (*addr) + : "cc"); +} + +/* Non-atomic versions */ +static __inline__ int test_bit(unsigned long nr, + __const__ volatile unsigned long *addr) +{ + return 1UL & (addr[BITOP_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); +} + +static __inline__ void __set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p |= mask; +} + +static __inline__ void __clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p &= ~mask; +} + +static __inline__ void __change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + + *p ^= mask; +} + +static __inline__ int __test_and_set_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old | mask; + return (old & mask) != 0; +} + +static __inline__ int __test_and_clear_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old & ~mask; + return (old & mask) != 0; +} + +static __inline__ int __test_and_change_bit(unsigned long nr, + volatile unsigned long *addr) +{ + unsigned long mask = BITOP_MASK(nr); + unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); + unsigned long old = *p; + + *p = old ^ mask; + return (old & mask) != 0; +} + +/* + * Return the zero-based bit position (LE, not IBM bit numbering) of + * the most significant 1-bit in a double word. + */ +static __inline__ int __ilog2(unsigned long x) +{ + int lz; + + asm (CNTLZL " %0,%1" : "=r" (lz) : "r" (x)); + return BITS_PER_LONG - 1 - lz; +} + +/* + * Determines the bit position of the least significant 0 bit in the + * specified double word. The returned bit position will be + * zero-based, starting from the right side (63/31 - 0). + */ +static __inline__ unsigned long ffz(unsigned long x) +{ + /* no zero exists anywhere in the 8 byte area. */ + if ((x = ~x) == 0) + return BITS_PER_LONG; + + /* + * Calculate the bit position of the least signficant '1' bit in x + * (since x has been changed this will actually be the least signficant + * '0' bit in * the original x). Note: (x & -x) gives us a mask that + * is the least significant * (RIGHT-most) 1-bit of the value in x. + */ + return __ilog2(x & -x); +} + +static __inline__ int __ffs(unsigned long x) +{ + return __ilog2(x & -x); +} + +/* + * ffs: find first bit set. This is defined the same way as + * the libc and compiler builtin ffs routines, therefore + * differs in spirit from the above ffz (man ffs). + */ +static __inline__ int ffs(int x) +{ + unsigned long i = (unsigned long)x; + return __ilog2(i & -i) + 1; +} + +/* + * fls: find last (most-significant) bit set. + * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. + */ +static __inline__ int fls(unsigned int x) +{ + int lz; + + asm ("cntlzw %0,%1" : "=r" (lz) : "r" (x)); + return 32 - lz; +} + +/* + * hweightN: returns the hamming weight (i.e. the number + * of bits set) of a N-bit word + */ +#define hweight64(x) generic_hweight64(x) +#define hweight32(x) generic_hweight32(x) +#define hweight16(x) generic_hweight16(x) +#define hweight8(x) generic_hweight8(x) + +#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) +unsigned long find_next_zero_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); +/** + * find_first_bit - find the first set bit in a memory region + * @addr: The address to start the search at + * @size: The maximum size to search + * + * Returns the bit-number of the first set bit, not the number of the byte + * containing a bit. + */ +#define find_first_bit(addr, size) find_next_bit((addr), (size), 0) +unsigned long find_next_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); + +/* Little-endian versions */ + +static __inline__ int test_le_bit(unsigned long nr, + __const__ unsigned long *addr) +{ + __const__ unsigned char *tmp = (__const__ unsigned char *) addr; + return (tmp[nr >> 3] >> (nr & 7)) & 1; +} + +#define __set_le_bit(nr, addr) \ + __set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define __clear_le_bit(nr, addr) \ + __clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define test_and_set_le_bit(nr, addr) \ + test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define test_and_clear_le_bit(nr, addr) \ + test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define __test_and_set_le_bit(nr, addr) \ + __test_and_set_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) +#define __test_and_clear_le_bit(nr, addr) \ + __test_and_clear_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) + +#define find_first_zero_le_bit(addr, size) find_next_zero_le_bit((addr), (size), 0) +unsigned long find_next_zero_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); + +/* Bitmap functions for the ext2 filesystem */ + +#define ext2_set_bit(nr,addr) \ + __test_and_set_le_bit((nr), (unsigned long*)addr) +#define ext2_clear_bit(nr, addr) \ + __test_and_clear_le_bit((nr), (unsigned long*)addr) + +#define ext2_set_bit_atomic(lock, nr, addr) \ + test_and_set_le_bit((nr), (unsigned long*)addr) +#define ext2_clear_bit_atomic(lock, nr, addr) \ + test_and_clear_le_bit((nr), (unsigned long*)addr) + +#define ext2_test_bit(nr, addr) test_le_bit((nr),(unsigned long*)addr) + +#define ext2_find_first_zero_bit(addr, size) \ + find_first_zero_le_bit((unsigned long*)addr, size) +#define ext2_find_next_zero_bit(addr, size, off) \ + find_next_zero_le_bit((unsigned long*)addr, size, off) + +/* Bitmap functions for the minix filesystem. */ + +#define minix_test_and_set_bit(nr,addr) \ + __test_and_set_le_bit(nr, (unsigned long *)addr) +#define minix_set_bit(nr,addr) \ + __set_le_bit(nr, (unsigned long *)addr) +#define minix_test_and_clear_bit(nr,addr) \ + __test_and_clear_le_bit(nr, (unsigned long *)addr) +#define minix_test_bit(nr,addr) \ + test_le_bit(nr, (unsigned long *)addr) + +#define minix_find_first_zero_bit(addr,size) \ + find_first_zero_le_bit((unsigned long *)addr, size) + +/* + * Every architecture must define this function. It's the fastest + * way of searching a 140-bit bitmap where the first 100 bits are + * unlikely to be set. It's guaranteed that at least one of the 140 + * bits is cleared. + */ +static inline int sched_find_first_bit(const unsigned long *b) +{ +#ifdef CONFIG_PPC64 + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 64; + return __ffs(b[2]) + 128; +#else + if (unlikely(b[0])) + return __ffs(b[0]); + if (unlikely(b[1])) + return __ffs(b[1]) + 32; + if (unlikely(b[2])) + return __ffs(b[2]) + 64; + if (b[3]) + return __ffs(b[3]) + 96; + return __ffs(b[4]) + 128; +#endif +} + +#endif /* __KERNEL__ */ + +#endif /* _ASM_POWERPC_BITOPS_H */ Index: working-2.6/include/asm-ppc64/mmu_context.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu_context.h 2005-10-25 11:59:59.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu_context.h 2005-11-01 15:51:56.000000000 +1100 @@ -16,21 +16,6 @@ * 2 of the License, or (at your option) any later version. */ -/* - * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. - */ -static inline int sched_find_first_bit(unsigned long *b) -{ - if (unlikely(b[0])) - return __ffs(b[0]); - if (unlikely(b[1])) - return __ffs(b[1]) + 64; - return __ffs(b[2]) + 128; -} - static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) { } Index: working-2.6/arch/ppc/Makefile =================================================================== --- working-2.6.orig/arch/ppc/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/ppc/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -66,7 +66,8 @@ core-y += arch/ppc/kernel/ arch/powerpc/kernel/ \ arch/ppc/platforms/ \ arch/ppc/mm/ arch/ppc/lib/ \ - arch/ppc/syslib/ arch/powerpc/sysdev/ + arch/ppc/syslib/ arch/powerpc/sysdev/ \ + arch/powerpc/lib/ core-$(CONFIG_4xx) += arch/ppc/platforms/4xx/ core-$(CONFIG_83xx) += arch/ppc/platforms/83xx/ core-$(CONFIG_85xx) += arch/ppc/platforms/85xx/ Index: working-2.6/arch/powerpc/lib/Makefile =================================================================== --- working-2.6.orig/arch/powerpc/lib/Makefile 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/arch/powerpc/lib/Makefile 2005-11-01 15:51:56.000000000 +1100 @@ -3,13 +3,14 @@ # ifeq ($(CONFIG_PPC_MERGE),y) -obj-y := string.o +obj-y := string.o strcase.o +obj-$(CONFIG_PPC32) += div64.o copy_32.o checksum_32.o endif -obj-y += strcase.o -obj-$(CONFIG_PPC32) += div64.o copy_32.o checksum_32.o +obj-y += bitops.o obj-$(CONFIG_PPC64) += checksum_64.o copypage_64.o copyuser_64.o \ - memcpy_64.o usercopy_64.o mem_64.o + memcpy_64.o usercopy_64.o mem_64.o \ + strcase.o obj-$(CONFIG_PPC_ISERIES) += e2a.o obj-$(CONFIG_XMON) += sstep.o -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From mikey at neuling.org Tue Nov 1 18:14:40 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:40 +1100 (EST) Subject: [PATCH 0/3] powerpc: Fix legacy drivers for remove io_page_mask patch Message-ID: <1130829279.790724.373020147318.qpush@coopers> These patches are the same as I send the other day, except updated for the merge tree. Also now contains an updated version of Anton's original patch which will apply cleanly to the merge tree. Anton's patch is necessary for running kexec with some e1000 revisions. The issue we have is that the reset is not being sent to the e1000 correctly, resulting in it still running during the second boot. Other two patches fix drivers which have issues with the first patch. Mikey From mikey at neuling.org Tue Nov 1 18:14:41 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:41 +1100 (EST) Subject: [PATCH 1/3] powerpc: Updated remove io_page_mask In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071441.1C53668662@ozlabs.org> From: Anton Blanchard Retransmit of Anton's patch from here: http://ozlabs.org/pipermail/linuxppc64-dev/2005-May/003922.html Updated for merge tree. Signed-off-by: Michael Neuling arch/powerpc/platforms/iseries/pci.c | 3 --- arch/powerpc/platforms/maple/pci.c | 3 --- arch/powerpc/platforms/powermac/pci.c | 3 --- arch/ppc64/kernel/iomap.c | 2 -- arch/ppc64/kernel/pci.c | 30 +++--------------------------- include/asm-ppc64/eeh.h | 15 +++------------ include/asm-ppc64/io.h | 6 ------ 7 files changed, 6 insertions(+), 56 deletions(-) Index: linux-2.6/arch/powerpc/platforms/iseries/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/iseries/pci.c 2005-11-01 11:04:27.000000000 +1100 @@ -45,8 +45,6 @@ #include "pci.h" #include "call_pci.h" -extern unsigned long io_page_mask; - /* * Forward declares of prototypes. */ @@ -288,7 +286,6 @@ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iomm_table_initialize(); find_and_init_phbs(); - io_page_mask = -1; PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } Index: linux-2.6/arch/powerpc/platforms/maple/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/maple/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/maple/pci.c 2005-11-01 11:06:30.000000000 +1100 @@ -455,9 +455,6 @@ /* Tell pci.c to use the common resource allocation mecanism */ pci_probe_only = 0; - - /* Allow all IO */ - io_page_mask = -1; } int maple_pci_get_legacy_ide_irq(struct pci_dev *pdev, int channel) Index: linux-2.6/arch/powerpc/platforms/powermac/pci.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/powermac/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/powerpc/platforms/powermac/pci.c 2005-11-01 11:10:07.000000000 +1100 @@ -926,9 +926,6 @@ /* Tell pci.c to not use the common resource allocation mechanism */ pci_probe_only = 1; - /* Allow all IO */ - io_page_mask = -1; - #else /* CONFIG_PPC64 */ init_p2pbridge(); fixup_nec_usb2(); Index: linux-2.6/arch/ppc64/kernel/iomap.c =================================================================== --- linux-2.6.orig/arch/ppc64/kernel/iomap.c 2005-11-01 10:30:32.000000000 +1100 +++ linux-2.6/arch/ppc64/kernel/iomap.c 2005-11-01 11:05:01.000000000 +1100 @@ -108,8 +108,6 @@ void __iomem *ioport_map(unsigned long port, unsigned int len) { - if (!_IO_IS_VALID(port)) - return NULL; return (void __iomem *) (port+pci_io_base); } Index: linux-2.6/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6.orig/arch/ppc64/kernel/pci.c 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/arch/ppc64/kernel/pci.c 2005-11-01 11:08:44.000000000 +1100 @@ -42,14 +42,6 @@ unsigned long pci_probe_only = 1; unsigned long pci_assign_all_buses = 0; -/* - * legal IO pages under MAX_ISA_PORT. This is to ensure we don't touch - * devices we don't have access to. - */ -unsigned long io_page_mask; - -EXPORT_SYMBOL(io_page_mask); - #ifdef CONFIG_PPC_MULTIPLATFORM static void fixup_resource(struct resource *res, struct pci_dev *dev); static void do_bus_setup(struct pci_bus *bus); @@ -995,8 +987,6 @@ pci_process_ISA_OF_ranges(isa_dn, hose->io_base_phys, hose->io_base_virt); of_node_put(isa_dn); - /* Allow all IO */ - io_page_mask = -1; } } @@ -1132,27 +1122,13 @@ static void __devinit fixup_resource(struct resource *res, struct pci_dev *dev) { struct pci_controller *hose = pci_bus_to_host(dev->bus); - unsigned long start, end, mask, offset; + unsigned long offset; if (res->flags & IORESOURCE_IO) { offset = (unsigned long)hose->io_base_virt - pci_io_base; - start = res->start += offset; - end = res->end += offset; - - /* Need to allow IO access to pages that are in the - ISA range */ - if (start < MAX_ISA_PORT) { - if (end > MAX_ISA_PORT) - end = MAX_ISA_PORT; - - start >>= PAGE_SHIFT; - end >>= PAGE_SHIFT; - - /* get the range of pages for the map */ - mask = ((1 << (end+1)) - 1) ^ ((1 << start) - 1); - io_page_mask |= mask; - } + res->start += offset; + res->end += offset; } else if (res->flags & IORESOURCE_MEM) { res->start += hose->pci_mem_offset; res->end += hose->pci_mem_offset; Index: linux-2.6/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.orig/include/asm-ppc64/eeh.h 2005-11-01 10:30:32.000000000 +1100 +++ linux-2.6/include/asm-ppc64/eeh.h 2005-11-01 11:13:05.000000000 +1100 @@ -311,8 +311,6 @@ static inline u8 eeh_inb(unsigned long port) { u8 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_8((u8 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u8)) return eeh_check_failure((void __iomem *)(port), val); @@ -321,15 +319,12 @@ static inline void eeh_outb(u8 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_8((u8 __iomem *)(port+pci_io_base), val); + out_8((u8 __iomem *)(port+pci_io_base), val); } static inline u16 eeh_inw(unsigned long port) { u16 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_le16((u16 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u16)) return eeh_check_failure((void __iomem *)(port), val); @@ -338,15 +333,12 @@ static inline void eeh_outw(u16 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_le16((u16 __iomem *)(port+pci_io_base), val); + out_le16((u16 __iomem *)(port+pci_io_base), val); } static inline u32 eeh_inl(unsigned long port) { u32 val; - if (!_IO_IS_VALID(port)) - return ~0; val = in_le32((u32 __iomem *)(port+pci_io_base)); if (EEH_POSSIBLE_ERROR(val, u32)) return eeh_check_failure((void __iomem *)(port), val); @@ -355,8 +347,7 @@ static inline void eeh_outl(u32 val, unsigned long port) { - if (_IO_IS_VALID(port)) - out_le32((u32 __iomem *)(port+pci_io_base), val); + out_le32((u32 __iomem *)(port+pci_io_base), val); } /* in-string eeh macros */ Index: linux-2.6/include/asm-ppc64/io.h =================================================================== --- linux-2.6.orig/include/asm-ppc64/io.h 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/include/asm-ppc64/io.h 2005-11-01 11:13:40.000000000 +1100 @@ -33,12 +33,6 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; -extern unsigned long io_page_mask; - -#define MAX_ISA_PORT 0x10000 - -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ - & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ From mikey at neuling.org Tue Nov 1 18:14:41 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:41 +1100 (EST) Subject: [PATCH 2/3] powerpc: Updated parallel port init fix In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071441.1B6BB68665@ozlabs.org> Updated for powerpc merge tree. Signed-off-by: Michael Neuling include/asm-powerpc/parport.h | 28 ++++++++++++++++++++++++++-- 1 files changed, 26 insertions(+), 2 deletions(-) Index: linux-2.6/include/asm-powerpc/parport.h =================================================================== --- linux-2.6.orig/include/asm-powerpc/parport.h 2005-11-01 10:30:35.000000000 +1100 +++ linux-2.6/include/asm-powerpc/parport.h 2005-11-01 11:35:05.000000000 +1100 @@ -9,10 +9,34 @@ #ifndef _ASM_POWERPC_PARPORT_H #define _ASM_POWERPC_PARPORT_H -static int __devinit parport_pc_find_isa_ports (int autoirq, int autodma); +#include + +extern struct parport *parport_pc_probe_port (unsigned long int base, + unsigned long int base_hi, + int irq, int dma, + struct pci_dev *dev); + static int __devinit parport_pc_find_nonpci_ports (int autoirq, int autodma) { - return parport_pc_find_isa_ports (autoirq, autodma); + struct device_node *np; + u32 *prop; + u32 io1, io2; + int propsize; + int count = 0; + for (np = NULL; (np = of_find_compatible_node(np, + "parallel", + "pnpPNP,400")) != NULL;) { + prop = (u32 *)get_property(np, "reg", &propsize); + if (!prop || propsize > 6*sizeof(u32)) + continue; + io1 = prop[1]; io2 = prop[2]; + prop = (u32 *)get_property(np, "interrupts", NULL); + if (!prop) + continue; + if (parport_pc_probe_port(io1, io2, prop[0], autodma, NULL) != NULL) + count++; + } + return count; } #endif /* !(_ASM_POWERPC_PARPORT_H) */ From mikey at neuling.org Tue Nov 1 18:14:42 2005 From: mikey at neuling.org (Michael Neuling) Date: Tue, 1 Nov 2005 18:14:42 +1100 (EST) Subject: [PATCH 3/3] powerpc: Updated PC speaker init fix In-Reply-To: <1130829279.790724.373020147318.qpush@coopers> Message-ID: <20051101071442.CD1C66866A@ozlabs.org> Updated for powerpc merge tree. Adds architecture specific init to pcspkr. Signed-off-by: Michael Neuling Acked-by: Paul Mackerras drivers/input/misc/pcspkr.c | 5 +++++ include/asm-powerpc/8253pit.h | 13 +++++++++++++ 2 files changed, 18 insertions(+) Index: linux-2.6/drivers/input/misc/pcspkr.c =================================================================== --- linux-2.6.orig/drivers/input/misc/pcspkr.c 2005-10-31 15:16:39.000000000 +1100 +++ linux-2.6/drivers/input/misc/pcspkr.c 2005-10-31 15:21:13.000000000 +1100 @@ -66,6 +66,11 @@ static int __init pcspkr_init(void) { +#ifdef HAS_PCSPKR_ARCH_INIT + int rc = pcspkr_arch_init(); + if (rc) + return rc; +#endif pcspkr_dev = input_allocate_device(); if (!pcspkr_dev) return -ENOMEM; Index: linux-2.6/include/asm-powerpc/8253pit.h =================================================================== --- linux-2.6.orig/include/asm-powerpc/8253pit.h 2005-10-31 15:02:18.000000000 +1100 +++ linux-2.6/include/asm-powerpc/8253pit.h 2005-10-31 15:20:30.000000000 +1100 @@ -5,6 +5,19 @@ * 8253/8254 Programmable Interval Timer */ +#include + #define PIT_TICK_RATE 1193182UL +#define HAS_PCSPKR_ARCH_INIT + +static inline int pcspkr_arch_init(void) +{ + struct device_node *np; + + np = of_find_compatible_node(NULL, NULL, "pnpPNP,100"); + of_node_put(np); + return np ? 0 : -ENODEV; +} + #endif /* _ASM_POWERPC_8253PIT_H */ From paulus at samba.org Tue Nov 1 16:53:37 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:53:37 +1100 Subject: Patches for 2.6.15 In-Reply-To: <20051028103041.B15268@cox.net> References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> <20051028103041.B15268@cox.net> Message-ID: <17255.737.832440.137041@cargo.ozlabs.ibm.com> Matt Porter writes: > Ok, we have a set of 4xx patches that I plan to send to Andrew. > They are some existing 4xx SoC/board updates as well as a new > SoC/board. They are obviously mostly confined to the 4xx code paths > but there's likely conflicts in changes to Makefiles, etc. > > Would you prefer these going upstream before or after the > powerpc-merge pull? Did you send them yet? Linus has pulled the powerpc-merge tree, as I'm sure you've noticed. Paul. From paulus at samba.org Tue Nov 1 16:46:29 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:46:29 +1100 Subject: [PATCH] VMX get_user w/ irq disabled In-Reply-To: <20051028115509.1bb23cb6.moilanen@austin.ibm.com> References: <20051028115509.1bb23cb6.moilanen@austin.ibm.com> Message-ID: <17255.309.688169.531174@cargo.ozlabs.ibm.com> Jake Moilanen writes: > Looks like we have a get_user() call with interrupts disabled. While I > haven't seen the problem, I believe we have the same hole in mainline. > > The patch below fixed the problem on Redhat (rebased at 2.6.14). The problem is that altivec_assist_exception gets called with interrupts disabled on ppc64. I haven't decided whether to change head_64.S or just do a local_irq_enable() inside altivec_assist_exception(). Paul. From paulus at samba.org Tue Nov 1 16:57:58 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 16:57:58 +1100 Subject: Patches for 2.6.15 In-Reply-To: References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> Message-ID: <17255.998.459385.399953@cargo.ozlabs.ibm.com> Kumar Gala writes: > Can you merge this in: > > http://patchwork.ozlabs.org/linuxppc/patch?id=2931 Having the same extern declaration in several C files raises a red flag. Could we have that in a suitable header file instead please? Paul. From paulus at samba.org Tue Nov 1 20:55:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 1 Nov 2005 20:55:17 +1100 Subject: [patch 2/5] powerpc: create a new arch/powerpc/platforms/cell/smp.c In-Reply-To: <200511010050.48828.arnd@arndb.de> References: <20051101010836.771791000@localhost> <20051101011133.300238000@localhost> <200511011026.59266.michael@ellerman.id.au> <200511010050.48828.arnd@arndb.de> Message-ID: <17255.15237.695554.410362@cargo.ozlabs.ibm.com> Arnd Bergmann writes: > On Dinsdag 01 November 2005 00:26, Michael Ellerman wrote: > > A lot of your smp routines are identical to the pSeries versions. Wouldn't it > > be preferable to only have one implementation? > > Yes it would. I'm not sure how that would best be done however. Until 2.6.14, > we've just used the pSeries implementation, which does not work any more now > that we want to keep the platform stuff in separate directories. I don't mind putting generic rtas stuff in arch/powerpc/kernel. Paul. From mporter at kernel.crashing.org Wed Nov 2 00:22:50 2005 From: mporter at kernel.crashing.org (Matt Porter) Date: Tue, 1 Nov 2005 06:22:50 -0700 Subject: Patches for 2.6.15 In-Reply-To: <17255.737.832440.137041@cargo.ozlabs.ibm.com>; from paulus@samba.org on Tue, Nov 01, 2005 at 04:53:37PM +1100 References: <17250.8725.358204.62510@cargo.ozlabs.ibm.com> <20051028103041.B15268@cox.net> <17255.737.832440.137041@cargo.ozlabs.ibm.com> Message-ID: <20051101062250.A28639@cox.net> On Tue, Nov 01, 2005 at 04:53:37PM +1100, Paul Mackerras wrote: > Matt Porter writes: > > > Ok, we have a set of 4xx patches that I plan to send to Andrew. > > They are some existing 4xx SoC/board updates as well as a new > > SoC/board. They are obviously mostly confined to the 4xx code paths > > but there's likely conflicts in changes to Makefiles, etc. > > > > Would you prefer these going upstream before or after the > > powerpc-merge pull? > > Did you send them yet? Linus has pulled the powerpc-merge tree, as > I'm sure you've noticed. Yes I did. I saw the merge go into mainline and rebased what was necessary off of that. Andrew now has them queued up for Linus so we are set. BTW, we're starting to look at merging 4xx to arch/powerpc/ now. -Matt From dwmw2 at infradead.org Wed Nov 2 02:54:04 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 01 Nov 2005 15:54:04 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> Message-ID: <1130860444.21212.52.camel@hades.cambridge.redhat.com> On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > It is now possible to build kernels for powermac, pSeries, iSeries and > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > build and run. Hm. Not entirely in line with my experience. Can you share the configs you used? Using http://david/woodhou.se/powerpc-merge-32.config it doesn't actually boot on my powerbook. I'll try it on the Pegasos later or tomorrow, where I have a serial console; it dies very early. Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented anywhere, I also needed to do this to make that config build: --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file #ifdef CONFIG_TAU_AVERAGE /* more straightforward, but potentially misleading */ seq_printf(m, "temperature \t: %u C (uncalibrated)\n", - cpu_temp(i)); + cpu_temp(cpu_id)); #else /* show the actual temp sensor range */ u32 temp; - temp = cpu_temp_both(i); + temp = cpu_temp_both(cpu_id); seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", temp & 0xff, temp >> 16); #endif -- dwmw2 From dwmw2 at infradead.org Wed Nov 2 03:06:47 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 01 Nov 2005 16:06:47 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> Message-ID: <1130861207.21212.66.camel@hades.cambridge.redhat.com> On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > It is now possible to build kernels for powermac, pSeries, iSeries and > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > build and run. The ppc64 build (http://david.woodhou.se/powerpc-merge-64.config) fares worse than ppc32 for me -- it doesn't even build. arch/powerpc/platforms/powermac/pic.c:614: error: ?ppc_cached_irq_mask? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:620: error: ?pmac_irq_hw? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:621: error: ?max_real_irqs? undeclared (first use in this function) arch/powerpc/platforms/powermac/pic.c:641: warning: implicit declaration of function ?pmac_unmask_irq? If I leave CONFIG_ADB_PMU enabled (as I think I should since some G5s have it?) I also see this: drivers/macintosh/via-pmu.c:2410: undefined reference to `.pmac_tweak_clock_spreading' drivers/macintosh/via-pmu.c:2494: undefined reference to `.set_context' drivers/macintosh/via-pmu.c:2670: undefined reference to `._nmask_and_or_msr' drivers/macintosh/via-pmu.c:2592: undefined reference to `.set_context' If I turn CONFIG_ADB_PMU off, I see this: arch/powerpc/platforms/powermac/time.c:335: undefined reference to `.pmu_register_sleep_notifier' I think I'll leave the task of switching the Fedora rawhide kernel to arch/powerpc to another day :) -- dwmw2 From miltonm at bga.com Wed Nov 2 02:02:08 2005 From: miltonm at bga.com (Milton Miller) Date: Tue, 1 Nov 2005 09:02:08 -0600 Subject: Patches for 2.6.15 Message-ID: Paul wrote: > Kumar Gala writes: > > http://patchwork.ozlabs.org/linuxppc/patch?id=2931 > > Having the same extern declaration in several C files raises a red > flag. Could we have that in a suitable header file instead please? And the patch header says: > Having a prototype that uses seq_file without always including > seq_file.h generates a lot of warnings. This happened when asm/irq.h > was merged. Hint: a simple struct seq_file; in the approprate header should suffice. milton From arnd at arndb.de Wed Nov 2 06:13:50 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 1 Nov 2005 20:13:50 +0100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <20051101055324.GA3551@localhost.localdomain> References: <20051101055324.GA3551@localhost.localdomain> Message-ID: <200511012013.51443.arnd@arndb.de> On Dinsdag 01 November 2005 06:53, David Gibson wrote: > +struct ipc64_perm > +{ > +???????__kernel_key_t??key; > +???????__kernel_uid_t??uid; > +???????__kernel_gid_t??gid; > +???????__kernel_uid_t??cuid; > +???????__kernel_gid_t??cgid; > +???????__kernel_mode_t?mode; > +???????unsigned int????seq; > +???????unsigned int????__pad1; > +???????u64?????????????__unused1; > +???????u64?????????????__unused2; > +}; ipc64_perm is a user visible structure, so you have to use __u64 here instead of u64. Even that does not exists if you build with 32 bit and __STRICT_ANSI__, so it might be better yet to use four __u32 for the unused fields. Arnd <>< From ingvar at linpro.no Wed Nov 2 09:49:26 2005 From: ingvar at linpro.no (Ingvar Hagelund) Date: 01 Nov 2005 23:49:26 +0100 Subject: dlpar problem on sles9/openpower Message-ID: This is a pure user question We have an IBM OpenPower 720 with hypervisor and a set of lpars. Earlier we got dlpar to work, at least a couple of times, but not anymore. Here's an example: Ran this on the hmc: ~> chhwres -r virtualio -m Server-9124-720-SNXXXXXXX -p mgmt -o a \ --rsubtype eth -s 11 \ -a "ieee_virtual_eth=1,port_vlan_id=1,is_trunk=1,\"addl_vlan_ids=600,601\"" HSCL294C DLPAR ADD Virtual I/O resources failed: HMC adding Virtual I/O ...... HMC Virtual slot DLPAR operation failed. Here are the virtual slot IDs that failed and the reasons for failure: 11 The dynamic logical partitioning operation failed. On the lpar "mgmt", running sles9 sp2 kernel 2.6.5-7.193-pseries64, we have the magic rpms from IBM installed: evlog-drv-tmpl-0.8-1 diagela-1.3.0.0-6 lsvpd-0.12.7-1 ppc64-utils-2.5-2 librtas-1.2-1 ... and the following IBM services are running: # lssrc -a Subsystem Group PID Status ctrmc rsct 5011 active IBM.ServiceRM rsct_rm 5103 active IBM.DRM rsct_rm 5110 active IBM.HostRM rsct_rm 5186 active IBM.CSMAgentRM rsct_rm 5217 active IBM.ERRM rsct_rm 5221 active IBM.AuditRM rsct_rm 5266 active ctcas rsct 7705 active IBM.SensorRM rsct_rm 7721 active IBM.FSRM rsct_rm 7722 active IBM.ConfigRM rsct_rm 7723 active The hmc seems to be able to talk to the managed system ~> lshwres -r virtualio --rsubtype eth -m Server-9124-720-SNXXXXXXX \ --level lpar --filter "lpar_names=mgmt" lpar_name=mgmt,lpar_id=1,slot_num=10,state=null,ieee_virtual_eth=1,port_vlan_id=1,"addl_vlan_ids=500,501,502,503",is_trunk=1,is_required=0,mac_addr=AE808000100A ... and to the lpar: ~> lspartition -dlpar <#0> Partition:<1, power0.somewhere.tld, 10.0.0.2> Active:<1>, OS:, DCaps:<0xf>, CmdCaps:<0x1, 0x1>, PinnedMem:<0> So, where can I start digging? Regards, Ingvar -- Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the very wise cannot see all ends. Gandalf From tdgarcia at us.ibm.com Wed Nov 2 09:51:24 2005 From: tdgarcia at us.ibm.com (tdgarcia) Date: Tue, 01 Nov 2005 16:51:24 -0600 Subject: lockmeter port for ppc64 Message-ID: <4367F16C.5080003@us.ibm.com> My team and I adapted your lockmeter code to the ppc64 architecture. I am attaching the ppc64 specific code to this email. This code should patch cleanly to the 2.6.13 kernel. diff -Narup linux-2.6.13/arch/ppc64/Kconfig.debug linux-2.6.13-lockmeter/arch/ppc64/Kconfig.debug --- linux-2.6.13/arch/ppc64/Kconfig.debug 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/arch/ppc64/Kconfig.debug 2005-10-11 07:40:28.000000000 -0700 @@ -19,6 +19,13 @@ config KPROBES for kernel debugging, non-intrusive instrumentation and testing. If in doubt, say "N". +config LOCKMETER + bool "Kernel lock metering" + depends on SMP && !PREEMPT + help + Say Y to enable kernel lock metering, which adds overhead to SMP locks, + but allows you to see various statistics using the lockstat command. + config DEBUG_STACK_USAGE bool "Stack utilization instrumentation" depends on DEBUG_KERNEL diff -Narup linux-2.6.13/arch/ppc64/lib/dec_and_lock.c linux-2.6.13-lockmeter/arch/ppc64/lib/dec_and_lock.c --- linux-2.6.13/arch/ppc64/lib/dec_and_lock.c 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/arch/ppc64/lib/dec_and_lock.c 2005-10-10 13:02:47.000000000 -0700 @@ -28,7 +28,13 @@ */ #ifndef ATOMIC_DEC_AND_LOCK + +#ifndef CONFIG_LOCKMETER +int atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +#else int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +#endif /* CONFIG_LOCKMETER */ + { int counter; int newcount; @@ -51,5 +57,10 @@ int _atomic_dec_and_lock(atomic_t *atomi return 0; } +#ifndef CONFIG_LOCKMETER +EXPORT_SYMBOL(atomic_dec_and_lock); +#else EXPORT_SYMBOL(_atomic_dec_and_lock); +#endif /* CONFIG_LOCKMETER */ + #endif /* ATOMIC_DEC_AND_LOCK */ diff -Narup linux-2.6.13/include/asm-ppc64/lockmeter.h linux-2.6.13-lockmeter/include/asm-ppc64/lockmeter.h --- linux-2.6.13/include/asm-ppc64/lockmeter.h 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6.13-lockmeter/include/asm-ppc64/lockmeter.h 2005-10-11 08:48:48.000000000 -0700 @@ -0,0 +1,110 @@ +/* + * Copyright (C) 1999,2000 Silicon Graphics, Inc. + * + * Written by John Hawkes (hawkes at sgi.com) + * Based on klstat.h by Jack Steiner (steiner at sgi.com) + * + * Modified by Ray Bryant (raybry at us.ibm.com) + * Changes Copyright (C) 2000 IBM, Inc. + * Added save of index in spinlock_t to improve efficiency + * of "hold" time reporting for spinlocks. + * Added support for hold time statistics for read and write + * locks. + * Moved machine dependent code here from include/lockmeter.h. + * + * Modified by Tony Garcia (garcia1 at us.ibm.com) + * Ported to Power PC 64 + */ + +#ifndef _PPC64_LOCKMETER_H +#define _PPC64_LOCKMETER_H + + +#include +#include +#include + +#include /* definitions for SPRN_TBRL + SPRN_TBRU, mftb() */ +extern unsigned long ppc_proc_freq; + +#define CPU_CYCLE_FREQUENCY ppc_proc_freq + +#define THIS_CPU_NUMBER smp_processor_id() + +/* + * macros to cache and retrieve an index value inside of a spin lock + * these macros assume that there are less than 65536 simultaneous + * (read mode) holders of a rwlock. Not normally a problem!! + * we also assume that the hash table has less than 65535 entries. + */ +/* + * instrumented spinlock structure -- never used to allocate storage + * only used in macros below to overlay a spinlock_t + */ +typedef struct inst_spinlock_s { + volatile unsigned int lock; + unsigned int index; +} inst_spinlock_t; + +#define PUT_INDEX(lock_ptr,indexv) ((inst_spinlock_t *)(lock_ptr))->index = indexv +#define GET_INDEX(lock_ptr) ((inst_spinlock_t *)(lock_ptr))->index + +/* + * macros to cache and retrieve an index value in a read/write lock + * as well as the cpu where a reader busy period started + * we use the 2nd word (the debug word) for this, so require the + * debug word to be present + */ +/* + * instrumented rwlock structure -- never used to allocate storage + * only used in macros below to overlay a rwlock_t + */ +typedef struct inst_rwlock_s { + volatile signed int lock; + unsigned int index; + unsigned int cpu; +} inst_rwlock_t; + +#define PUT_RWINDEX(rwlock_ptr,indexv) ((inst_rwlock_t *)(rwlock_ptr))->index = indexv +#define GET_RWINDEX(rwlock_ptr) ((inst_rwlock_t *)(rwlock_ptr))->index +#define PUT_RW_CPU(rwlock_ptr,cpuv) ((inst_rwlock_t *)(rwlock_ptr))->cpu = cpuv +#define GET_RW_CPU(rwlock_ptr) ((inst_rwlock_t *)(rwlock_ptr))->cpu + +/* + * return the number of readers for a rwlock_t + */ +#define RWLOCK_READERS(rwlock_ptr) rwlock_readers(rwlock_ptr) + +/* Return number of readers */ +extern inline int rwlock_readers(rwlock_t *rwlock_ptr) +{ + signed int tmp = rwlock_ptr->lock; + + if ( tmp > 0 ) + return tmp; + else + return 0; +} + +/* + * return true if rwlock is write locked + * (note that other lock attempts can cause the lock value to be negative) + */ +#define RWLOCK_IS_WRITE_LOCKED(rwlock_ptr) ((signed int)(rwlock_ptr)->lock < 0) +#define RWLOCK_IS_READ_LOCKED(rwlock_ptr) ((signed int)(rwlock_ptr)->lock > 0 ) + +/*Written by Carl L. to get the time base counters on ppc, + rplaces the Intel only call rtds*/ +static inline long get_cycles64 (void) +{ + unsigned long tb; + + /* read the upper and lower 32 bit Time base counter */ + tb = mfspr(SPRN_TBRU); + tb = (tb << 32) | mfspr(SPRN_TBRL); + + return(tb); +} + +#endif /* _PPC64_LOCKMETER_H */ diff -Narup linux-2.6.13/include/asm-ppc64/spinlock.h linux-2.6.13-lockmeter/include/asm-ppc64/spinlock.h --- linux-2.6.13/include/asm-ppc64/spinlock.h 2005-08-28 16:41:01.000000000 -0700 +++ linux-2.6.13-lockmeter/include/asm-ppc64/spinlock.h 2005-10-10 14:04:25.000000000 -0700 @@ -23,6 +23,9 @@ typedef struct { volatile unsigned int lock; +#ifdef CONFIG_LOCKMETER + unsigned int lockmeter_magic; +#endif /* CONFIG_LOCKMETER */ #ifdef CONFIG_PREEMPT unsigned int break_lock; #endif @@ -30,13 +33,20 @@ typedef struct { typedef struct { volatile signed int lock; +#ifdef CONFIG_LOCKMETER + unsigned int index; + unsigned int cpu; +#endif /* CONFIG_LOCKMETER */ #ifdef CONFIG_PREEMPT unsigned int break_lock; #endif } rwlock_t; -#ifdef __KERNEL__ -#define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 } +#ifdef CONFIG_LOCKMETER + #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 } +#else + #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 , 0} +#endif /* CONFIG_LOCKMETER */ #define spin_is_locked(x) ((x)->lock != 0) #define spin_lock_init(x) do { *(x) = SPIN_LOCK_UNLOCKED; } while(0) @@ -144,7 +154,7 @@ static void __inline__ _raw_spin_lock_fl * irq-safe write-lock, but readers can get non-irqsafe * read-locks. */ -#define RW_LOCK_UNLOCKED (rwlock_t) { 0 } +#define RW_LOCK_UNLOCKED (rwlock_t) { 0 , 0 , 0 } #define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0) @@ -157,6 +167,44 @@ static __inline__ void _raw_write_unlock rw->lock = 0; } +#if defined(CONFIG_LOCKMETER) && defined(CONFIG_HAVE_DEC_LOCK) +extern void _metered_spin_lock (spinlock_t *lock, void *caller_pc); +extern void _metered_spin_unlock(spinlock_t *lock); + +/* + * Matches what is in arch/ppc64/lib/dec_and_lock.c, except this one is + * "static inline" so that the spin_lock(), if actually invoked, is charged + * against the real caller, not against the catch-all atomic_dec_and_lock + */ +static inline int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock) +{ + int counter; + int newcount; + + for (;;) { + counter = atomic_read(atomic); + newcount = counter - 1; + if (!newcount) + break; /* do it the slow way */ + + newcount = cmpxchg(&atomic->counter, counter, newcount); + if (newcount == counter) + return 0; + } + + preempt_disable(); + _metered_spin_lock(lock, __builtin_return_address(0)); + if (atomic_dec_and_test(atomic)) + return 1; + _metered_spin_unlock(lock); + preempt_enable(); + + return 0; +} + +#define ATOMIC_DEC_AND_LOCK +#endif /* CONFIG_LOCKMETER and CONFIG_HAVE_DEC_LOCK */ + /* * This returns the old value in the lock + 1, * so we got a read lock if the return value is > 0. @@ -256,5 +304,4 @@ static void __inline__ _raw_write_lock(r } } -#endif /* __KERNEL__ */ #endif /* __ASM_SPINLOCK_H */ From hien at us.ibm.com Wed Nov 2 11:14:42 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Tue, 01 Nov 2005 16:14:42 -0800 Subject: [PATCH] exporting validate_sp Message-ID: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> This patch will export the validate_sp() function (part of dump_stack code). I am developers for the systemtap project: http://sourceware.org/systemtap/ The SystemTap runtime includes a function for capturing a stack trace as a string. For the ppc64 port, we need to have the validate_sp() function exported so it is accessible to our stack-trace function, which is part of a SystemTap-generated kernel module. This patch should apply to kernel 2.6.14-rc5-mm1. Thanks, Hien. Signed-off-by: Hien Nguyen --- linux-2.6.14-rc5.org/arch/ppc64/kernel/process.c 2005-10-19 23:23:05.000000000 -0700 +++ linux-2.6.14-rc5/arch/ppc64/kernel/process.c 2005-11-01 12:54:23.000000000 -0800 @@ -626,6 +626,7 @@ return 0; } +EXPORT_SYMBOL_GPL(validate_sp); unsigned long get_wchan(struct task_struct *p) { From david at gibson.dropbear.id.au Wed Nov 2 11:06:44 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 11:06:44 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <20051102000644.GB8308@localhost.localdomain> On Tue, Nov 01, 2005 at 03:54:04PM +0000, David Woodhouse wrote: > On Mon, 2005-10-31 at 15:23 +1100, Paul Mackerras wrote: > > It is now possible to build kernels for powermac, pSeries, iSeries and > > maple with ARCH=powerpc, and for powermac, both 32-bit and 64-bit > > build and run. > > Hm. Not entirely in line with my experience. Can you share the configs > you used? I gather paulus doesn't believe in CONFIG_TAU. > Using http://david/woodhou.se/powerpc-merge-32.config it doesn't > actually boot on my powerbook. I'll try it on the Pegasos later or > tomorrow, where I have a serial console; it dies very early. > > Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented > anywhere, I also needed to do this to make that config build: > > --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 > +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 > @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file > #ifdef CONFIG_TAU_AVERAGE > /* more straightforward, but potentially misleading */ > seq_printf(m, "temperature \t: %u C (uncalibrated)\n", > - cpu_temp(i)); > + cpu_temp(cpu_id)); > #else > /* show the actual temp sensor range */ > u32 temp; > - temp = cpu_temp_both(i); > + temp = cpu_temp_both(cpu_id); > seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", > temp & 0xff, temp >> 16); > #endif > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Wed Nov 2 11:44:26 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 11:44:26 +1100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <200511012013.51443.arnd@arndb.de> References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> Message-ID: <20051102004426.GC8308@localhost.localdomain> On Tue, Nov 01, 2005 at 08:13:50PM +0100, Arnd Bergmann wrote: > On Dinsdag 01 November 2005 06:53, David Gibson wrote: > > +struct ipc64_perm > > +{ > > +???????__kernel_key_t??key; > > +???????__kernel_uid_t??uid; > > +???????__kernel_gid_t??gid; > > +???????__kernel_uid_t??cuid; > > +???????__kernel_gid_t??cgid; > > +???????__kernel_mode_t?mode; > > +???????unsigned int????seq; > > +???????unsigned int????__pad1; > > +???????u64?????????????__unused1; > > +???????u64?????????????__unused2; > > +}; > > ipc64_perm is a user visible structure, so you have to use > __u64 here instead of u64. Even that does not exists if > you build with 32 bit and __STRICT_ANSI__, so it might > be better yet to use four __u32 for the unused fields. Oops. I realised it was user visible, but forgot the wrinkle that the 'uXX' names can't be used there. Here's a patch to correct it. Paulus, please apply. Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in user-visible headers. This patch corrects the problem, replacing the unused fields with an array of four __u32s. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 @@ -27,8 +27,7 @@ __kernel_mode_t mode; unsigned int seq; unsigned int __pad1; - u64 __unused1; - u64 __unused2; + __u32 __unused[4]; }; #endif /* _ASM_POWERPC_IPCBUF_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Nov 2 12:03:26 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 12:03:26 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <17256.4190.184855.821331@cargo.ozlabs.ibm.com> David Woodhouse writes: > Hm. Not entirely in line with my experience. Can you share the configs > you used? Sure, attached (as a .tar.gz). For 32-bit pmac, you currently have to disable CONFIG_PREP and (I believe) the TAU options. For the 64-bit configs I basically just used the defconfigs in arch/ppc64/configs. > Using http://david/woodhou.se/powerpc-merge-32.config it doesn't > actually boot on my powerbook. I'll try it on the Pegasos later or > tomorrow, where I have a serial console; it dies very early. That's probably either the pci quirk that got added to do USB host controller handoff unconditionally on all platforms, and which touches the device without doing pci_enable_device or checking whether MMIO is enabled. A fix has gone into Linus' tree for that. There was also a bug added to the adbhid.c driver which would cause an oops when you pressed a key if you had an ADB keyboard (which powerbooks do). That's also fixed in Linus' tree. > Aside from disabling CONFIG_NVRAM because call_rtas() isn't implemented > anywhere, I also needed to do this to make that config build: > > --- linux-2.6.14/arch/powerpc/kernel/setup-common.c.orig 2005-11-01 10:14:32.000000000 +0000 > +++ linux-2.6.14/arch/powerpc/kernel/setup-common.c 2005-11-01 10:15:03.000000000 +0000 > @@ -203,11 +203,11 @@ static int show_cpuinfo(struct seq_file > #ifdef CONFIG_TAU_AVERAGE > /* more straightforward, but potentially misleading */ > seq_printf(m, "temperature \t: %u C (uncalibrated)\n", > - cpu_temp(i)); > + cpu_temp(cpu_id)); > #else > /* show the actual temp sensor range */ > u32 temp; > - temp = cpu_temp_both(i); > + temp = cpu_temp_both(cpu_id); > seq_printf(m, "temperature \t: %u-%u C (uncalibrated)\n", > temp & 0xff, temp >> 16); > #endif Thanks, I'll put that in. Paul. From paulus at samba.org Wed Nov 2 12:04:07 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 12:04:07 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130860444.21212.52.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> Message-ID: <17256.4231.124394.723713@cargo.ozlabs.ibm.com> David Woodhouse writes: > Hm. Not entirely in line with my experience. Can you share the configs > you used? Forgot to attach the configs on my previous reply. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: configs.tar.gz Type: application/octet-stream Size: 19644 bytes Desc: config tarball Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051102/55f01850/attachment.obj From david at gibson.dropbear.id.au Wed Nov 2 13:58:22 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 13:58:22 +1100 Subject: powerpc: Merge futex.h Message-ID: <20051102025822.GB10682@localhost.localdomain> This patch merges the ppc32 and ppc64 versions of futex.h, essentially by taking the ppc64 version as the powerpc version. The old ppc32 version did not implement the futex_atomic_op_inuser() callback (it always returned -ENOSYS), so FUTEX_WAKE_OP would not work on ppc32. In fact the ppc64 version of this function is almost suitable for ppc32 as well - the only change needed is to extend ppc_asm.h with a macro expanding to to the right pseudo-op to store a pointer (either ".long" or ".llong"). Built and booted on pSeries. Built for 32-bit powermac. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/futex.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-powerpc/futex.h 2005-11-02 13:43:08.000000000 +1100 @@ -0,0 +1,84 @@ +#ifndef _ASM_POWERPC_FUTEX_H +#define _ASM_POWERPC_FUTEX_H + +#ifdef __KERNEL__ + +#include +#include +#include +#include +#include + +#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ + __asm__ __volatile ( \ + SYNC_ON_SMP \ +"1: lwarx %0,0,%2\n" \ + insn \ +"2: stwcx. %1,0,%2\n" \ + "bne- 1b\n" \ + "li %1,0\n" \ +"3: .section .fixup,\"ax\"\n" \ +"4: li %1,%3\n" \ + "b 3b\n" \ + ".previous\n" \ + ".section __ex_table,\"a\"\n" \ + ".align 3\n" \ + DATAL " 1b,4b,2b,4b\n" \ + ".previous" \ + : "=&r" (oldval), "=&r" (ret) \ + : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ + : "cr0", "memory") + +static inline int futex_atomic_op_inuser (int encoded_op, int __user *uaddr) +{ + int op = (encoded_op >> 28) & 7; + int cmp = (encoded_op >> 24) & 15; + int oparg = (encoded_op << 8) >> 20; + int cmparg = (encoded_op << 20) >> 20; + int oldval = 0, ret; + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) + oparg = 1 << oparg; + + if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) + return -EFAULT; + + inc_preempt_count(); + + switch (op) { + case FUTEX_OP_SET: + __futex_atomic_op("", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_ADD: + __futex_atomic_op("add %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_OR: + __futex_atomic_op("or %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_ANDN: + __futex_atomic_op("andc %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + case FUTEX_OP_XOR: + __futex_atomic_op("xor %1,%0,%1\n", ret, oldval, uaddr, oparg); + break; + default: + ret = -ENOSYS; + } + + dec_preempt_count(); + + if (!ret) { + switch (cmp) { + case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; + case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; + case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; + case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; + case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; + case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; + default: ret = -ENOSYS; + } + } + return ret; +} + +#endif /* __KERNEL__ */ +#endif /* _ASM_POWERPC_FUTEX_H */ Index: working-2.6/include/asm-ppc/futex.h =================================================================== --- working-2.6.orig/include/asm-ppc/futex.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,53 +0,0 @@ -#ifndef _ASM_FUTEX_H -#define _ASM_FUTEX_H - -#ifdef __KERNEL__ - -#include -#include -#include - -static inline int -futex_atomic_op_inuser (int encoded_op, int __user *uaddr) -{ - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; - int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; - - inc_preempt_count(); - - switch (op) { - case FUTEX_OP_SET: - case FUTEX_OP_ADD: - case FUTEX_OP_OR: - case FUTEX_OP_ANDN: - case FUTEX_OP_XOR: - default: - ret = -ENOSYS; - } - - dec_preempt_count(); - - if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } - return ret; -} - -#endif -#endif Index: working-2.6/include/asm-ppc64/futex.h =================================================================== --- working-2.6.orig/include/asm-ppc64/futex.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,83 +0,0 @@ -#ifndef _ASM_FUTEX_H -#define _ASM_FUTEX_H - -#ifdef __KERNEL__ - -#include -#include -#include -#include - -#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \ - __asm__ __volatile (SYNC_ON_SMP \ -"1: lwarx %0,0,%2\n" \ - insn \ -"2: stwcx. %1,0,%2\n\ - bne- 1b\n\ - li %1,0\n\ -3: .section .fixup,\"ax\"\n\ -4: li %1,%3\n\ - b 3b\n\ - .previous\n\ - .section __ex_table,\"a\"\n\ - .align 3\n\ - .llong 1b,4b,2b,4b\n\ - .previous" \ - : "=&r" (oldval), "=&r" (ret) \ - : "b" (uaddr), "i" (-EFAULT), "1" (oparg) \ - : "cr0", "memory") - -static inline int -futex_atomic_op_inuser (int encoded_op, int __user *uaddr) -{ - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; - int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; - - inc_preempt_count(); - - switch (op) { - case FUTEX_OP_SET: - __futex_atomic_op("", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_ADD: - __futex_atomic_op("add %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_OR: - __futex_atomic_op("or %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_ANDN: - __futex_atomic_op("andc %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - case FUTEX_OP_XOR: - __futex_atomic_op("xor %1,%0,%1\n", ret, oldval, uaddr, oparg); - break; - default: - ret = -ENOSYS; - } - - dec_preempt_count(); - - if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } - return ret; -} - -#endif -#endif Index: working-2.6/include/asm-powerpc/ppc_asm.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ppc_asm.h 2005-10-31 15:20:57.000000000 +1100 +++ working-2.6/include/asm-powerpc/ppc_asm.h 2005-11-02 13:48:08.000000000 +1100 @@ -506,6 +506,13 @@ #else #define __ASM_CONST(x) x##UL #define ASM_CONST(x) __ASM_CONST(x) + +#ifdef CONFIG_PPC64 +#define DATAL ".llong" +#else +#define DATAL ".long" +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PPC_ASM_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Wed Nov 2 15:13:20 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 2 Nov 2005 15:13:20 +1100 Subject: powerpc: Move dart.h Message-ID: <20051102041320.GA15666@localhost.localdomain> asm-ppc64/dart.h is included in exactly one place - arch/powerpc/sysdev/u3_iommu.c. This patch, therefore, moves it into arch/powerpc/sysdev. While we're at it, update the #ifndef/#define protecting the include, and the filename in the comments of u3_iommu.c. Built and booted on pSeries and G5, built for ppc32 powermac. Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/sysdev/dart.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/powerpc/sysdev/dart.h 2005-11-02 14:53:28.000000000 +1100 @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2004 Olof Johansson , IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef _POWERPC_SYSDEV_DART_H +#define _POWERPC_SYSDEV_DART_H + + +/* physical base of DART registers */ +#define DART_BASE 0xf8033000UL + +/* Offset from base to control register */ +#define DARTCNTL 0 +/* Offset from base to exception register */ +#define DARTEXCP 0x10 +/* Offset from base to TLB tag registers */ +#define DARTTAG 0x1000 + + +/* Control Register fields */ + +/* base address of table (pfn) */ +#define DARTCNTL_BASE_MASK 0xfffff +#define DARTCNTL_BASE_SHIFT 12 + +#define DARTCNTL_FLUSHTLB 0x400 +#define DARTCNTL_ENABLE 0x200 + +/* size of table in pages */ +#define DARTCNTL_SIZE_MASK 0x1ff +#define DARTCNTL_SIZE_SHIFT 0 + + +/* DART table fields */ + +#define DARTMAP_VALID 0x80000000 +#define DARTMAP_RPNMASK 0x00ffffff + + +#define DART_PAGE_SHIFT 12 +#define DART_PAGE_SIZE (1 << DART_PAGE_SHIFT) +#define DART_PAGE_FACTOR (PAGE_SHIFT - DART_PAGE_SHIFT) + + +#endif /* _POWERPC_SYSDEV_DART_H */ Index: working-2.6/include/asm-ppc64/dart.h =================================================================== --- working-2.6.orig/include/asm-ppc64/dart.h 2005-10-31 15:20:22.000000000 +1100 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,59 +0,0 @@ -/* - * Copyright (C) 2004 Olof Johansson , IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#ifndef _ASM_DART_H -#define _ASM_DART_H - - -/* physical base of DART registers */ -#define DART_BASE 0xf8033000UL - -/* Offset from base to control register */ -#define DARTCNTL 0 -/* Offset from base to exception register */ -#define DARTEXCP 0x10 -/* Offset from base to TLB tag registers */ -#define DARTTAG 0x1000 - - -/* Control Register fields */ - -/* base address of table (pfn) */ -#define DARTCNTL_BASE_MASK 0xfffff -#define DARTCNTL_BASE_SHIFT 12 - -#define DARTCNTL_FLUSHTLB 0x400 -#define DARTCNTL_ENABLE 0x200 - -/* size of table in pages */ -#define DARTCNTL_SIZE_MASK 0x1ff -#define DARTCNTL_SIZE_SHIFT 0 - - -/* DART table fields */ - -#define DARTMAP_VALID 0x80000000 -#define DARTMAP_RPNMASK 0x00ffffff - - -#define DART_PAGE_SHIFT 12 -#define DART_PAGE_SIZE (1 << DART_PAGE_SHIFT) -#define DART_PAGE_FACTOR (PAGE_SHIFT - DART_PAGE_SHIFT) - - -#endif Index: working-2.6/arch/powerpc/sysdev/u3_iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/sysdev/u3_iommu.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/sysdev/u3_iommu.c 2005-11-02 14:54:16.000000000 +1100 @@ -1,5 +1,5 @@ /* - * arch/ppc64/kernel/u3_iommu.c + * arch/powerpc/sysdev/u3_iommu.c * * Copyright (C) 2004 Olof Johansson , IBM Corporation * @@ -44,9 +44,10 @@ #include #include #include -#include #include +#include "dart.h" + extern int iommu_force_on; /* Physical base address and size of the DART table */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From paulus at samba.org Wed Nov 2 15:43:44 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Nov 2005 15:43:44 +1100 Subject: please pull the powerpc-merge.git tree In-Reply-To: <1130861207.21212.66.camel@hades.cambridge.redhat.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> Message-ID: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> David Woodhouse writes: > The ppc64 build (http://david.woodhou.se/powerpc-merge-64.config) fares > worse than ppc32 for me -- it doesn't even build. Those errors were all due to getting powerbook sleep code included because you have CONFIG_PM=y. I have changed things so that that code doesn't get included on a 64-bit build (at least until BenH gets sleep going on the G5 :). I also pulled in Linus' tree, and now drivers/char/tlclk.c fails to build for some reason. I claim that's not my fault, however. :) Paul. From michael at ellerman.id.au Wed Nov 2 18:23:33 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Wed, 2 Nov 2005 18:23:33 +1100 (EST) Subject: [PATCH] powerpc: Fix random memory corruption in merged elf.h Message-ID: <20051102072333.B91D568664@ozlabs.org> The merged verison of ELF_CORE_COPY_REGS is basically the PPC64 version, with a memset that came from PPC and a few types abstracted out into #defines. But it's not _quite_ right. The first problem is we calculate the number of registers with: nregs = sizeof(struct pt_regs) / sizeof(ELF_GREG_TYPE) For a 32-bit process on a 64-bit kernel that's bogus because the registers are 64 bits, but ELF_GREG_TYPE is u32, so nregs == 88 which is wrong. The other problem is the memset, which assumes a struct pt_regs is smaller than a struct elf_regs. For a 32-bit process on a 64-bit kernel that's false. The fix is to calculate the number of regs using sizeof(unsigned long), which should always be right, and just memset the whole damn thing _before_ copying the registers in. Signed-off-by: Michael Ellerman --- include/asm-powerpc/elf.h | 22 +++++++++++++--------- 1 files changed, 13 insertions(+), 9 deletions(-) Index: kexec/include/asm-powerpc/elf.h =================================================================== --- kexec.orig/include/asm-powerpc/elf.h +++ kexec/include/asm-powerpc/elf.h @@ -178,18 +178,22 @@ typedef elf_vrreg_t elf_vrregset_t32[ELF static inline void ppc_elf_core_copy_regs(elf_gregset_t elf_regs, struct pt_regs *regs) { - int i; - int gprs = sizeof(struct pt_regs)/sizeof(ELF_GREG_TYPE); + int i, nregs; - if (gprs > ELF_NGREG) - gprs = ELF_NGREG; + memset((void *)elf_regs, 0, sizeof(elf_gregset_t)); - for (i=0; i < gprs; i++) + /* Our registers are always unsigned longs, whether we're a 32 bit + * process or 64 bit, on either a 64 bit or 32 bit kernel. + * Don't use ELF_GREG_TYPE here. */ + nregs = sizeof(struct pt_regs) / sizeof(unsigned long); + if (nregs > ELF_NGREG) + nregs = ELF_NGREG; + + for (i = 0; i < nregs; i++) { + /* This will correctly truncate 64 bit registers to 32 bits + * for a 32 bit process on a 64 bit kernel. */ elf_regs[i] = (elf_greg_t)((ELF_GREG_TYPE *)regs)[i]; - - memset((char *)(elf_regs) + sizeof(struct pt_regs), 0, \ - sizeof(elf_gregset_t) - sizeof(struct pt_regs)); - + } } #define ELF_CORE_COPY_REGS(gregs, regs) ppc_elf_core_copy_regs(gregs, regs); From benh at kernel.crashing.org Wed Nov 2 18:23:18 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 02 Nov 2005 18:23:18 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130915220.20136.14.camel@gaston> References: <1130915220.20136.14.camel@gaston> Message-ID: <1130916198.20136.17.camel@gaston> On Wed, 2005-11-02 at 18:07 +1100, Benjamin Herrenschmidt wrote: > It took a while, but finally, here is the 64K pages support patch for > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > changes the kernel base page size to 64K. The resulting kernel still > boots on any hardware. On current machines with 4K pages support only, > the kernel will maintain 16 "subpages" for each 64K page > transparently. > > Note that while real 64K capable HW has been tested, the current patch > will not enable it yet as such hardware is not released yet, and I'm > still verifying with the firmware architects the proper to get the > information from the newer hypervisors. > > Signed-off-by: Benjamin Herrenschmidt Oh, and since the mailing lists are probably filtering this out due to the patch size, here's an URL where you can find it too: http://gate.crashing.org/~benh/ppc64-64k-pages.diff Ben. From dwmw2 at infradead.org Wed Nov 2 18:35:12 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 07:35:12 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> <17256.17408.247708.622755@cargo.ozlabs.ibm.com> Message-ID: <1130916912.10031.143.camel@baythorne.infradead.org> On Wed, 2005-11-02 at 15:43 +1100, Paul Mackerras wrote: > I also pulled in Linus' tree, and now drivers/char/tlclk.c fails to > build for some reason. I claim that's not my fault, however. :) It just needs included. That reminds me -- I needed that in platforms/chrp/pegasos_eth.c too. diff --git a/arch/powerpc/platforms/chrp/pegasos_eth.c b/arch/powerpc/platforms/chrp/pegasos_eth.c --- a/arch/powerpc/platforms/chrp/pegasos_eth.c +++ b/arch/powerpc/platforms/chrp/pegasos_eth.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #define PEGASOS2_MARVELL_REGBASE (0xf1000000) diff --git a/drivers/char/tlclk.c b/drivers/char/tlclk.c --- a/drivers/char/tlclk.c +++ b/drivers/char/tlclk.c @@ -43,6 +43,7 @@ #include #include #include +#include #include /* inb/outb */ #include -- dwmw2 From dwmw2 at infradead.org Wed Nov 2 20:06:19 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 09:06:19 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.4190.184855.821331@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130860444.21212.52.camel@hades.cambridge.redhat.com> <17256.4190.184855.821331@cargo.ozlabs.ibm.com> Message-ID: <1130922379.10031.154.camel@baythorne.infradead.org> On Wed, 2005-11-02 at 12:03 +1100, Paul Mackerras wrote: > That's probably either the pci quirk that got added to do USB host > controller handoff unconditionally on all platforms, and which touches > the device without doing pci_enable_device or checking whether MMIO is > enabled. A fix has gone into Linus' tree for that. > > There was also a bug added to the adbhid.c driver which would cause an > oops when you pressed a key if you had an ADB keyboard (which > powerbooks do). That's also fixed in Linus' tree. It was neither of those -- after a few warnings about sleeping in inappropriate contexts it just seems to stop. The Pegasos is a little more informative -- lots of 'hda: lost interrupt' on that. Keyboard seems to work though, and Bogomips calculation -- so maybe it's just PCI interrupts which are missing. I'll poke at it further. I'll also try again on the powerbook today and see if I can get anything more useful out of it. -- dwmw2 From vst at vlnb.net Wed Nov 2 17:51:41 2005 From: vst at vlnb.net (Vladislav Bolkhovitin) Date: Wed, 02 Nov 2005 09:51:41 +0300 Subject: [PATCH 0/3] ibmvscsis scsi target In-Reply-To: <435EE61C.8020404@torque.net> References: <20051017143644.GA9992@cs.umn.edu> <435EE61C.8020404@torque.net> Message-ID: <436861FD.6080300@vlnb.net> Douglas Gilbert wrote: > Dave Boutcher wrote: > >>James, >> >>Here's the ibmvscsis SCSI target submitted for inclusion in 2.4.15. >>This driver meets a couple of akpm's criteria for worthiness, in that >>its actually been shipping for a while in a distro kernel, and (given >>the posts when I broke compatibility) is being used. >> >>This version is basically the same as the recent RFC version I sent >>out, with a few bug fixes. It addresses a comment from Anton about >>using gratuitously small max_sectors limits, and has a few other >>miscellanious fixes. >> >>The only other significant comment generated by the the RFC was from >>Christoph, and requested that this work be combined with the sgtg work >>that Mike Christie and Tomonori Fujita are working on. I definitely >>will start contributing to that work, and will convert this driver to >>their framework when it becomes complete. I would rather not keep >>this driver out of mainline for the amount of time that may take. > > > Dave, > While I'm partial to things that start with "sg...", I > had problems finding that project until I tried "stgt". Doug, Dave, Have you seen SCST (SCSI target mid-layer for Linux) on http://scst.sourceforge.net? It's much more advanced, than stgt, and much more moved ahead. Vlad From schwab at suse.de Thu Nov 3 00:12:01 2005 From: schwab at suse.de (Andreas Schwab) Date: Wed, 02 Nov 2005 14:12:01 +0100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: <20051102004426.GC8308@localhost.localdomain> (David Gibson's message of "Wed, 2 Nov 2005 11:44:26 +1100") References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> <20051102004426.GC8308@localhost.localdomain> Message-ID: David Gibson writes: > Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in > user-visible headers. This patch corrects the problem, replacing the > unused fields with an array of four __u32s. > > Signed-off-by: David Gibson > > Index: working-2.6/include/asm-powerpc/ipcbuf.h > =================================================================== > --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 > +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 > @@ -27,8 +27,7 @@ > __kernel_mode_t mode; > unsigned int seq; > unsigned int __pad1; > - u64 __unused1; > - u64 __unused2; > + __u32 __unused[4]; I think you are changing the alignment of the structure. A u64 has bigger alignment than a u32[2]. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From johnrose at austin.ibm.com Thu Nov 3 03:29:55 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 02 Nov 2005 10:29:55 -0600 Subject: [PATCH] fix add notifier crashes Message-ID: <1130948995.32348.35.camel@sinatra.austin.ibm.com> Hi Paul- The extraction of PCI stuff from struct device_node left some false assumptions in notifier code. As a result, dynamic add crashes when non-PCI nodes are added. This patch fixes these assumptions. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pci_dn.c~add_crash_fix arch/ppc64/kernel/pci_dn.c --- 2_6_linus_2/arch/ppc64/kernel/pci_dn.c~add_crash_fix 2005-10-31 10:51:19.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pci_dn.c 2005-10-31 10:56:47.000000000 -0600 @@ -181,13 +181,14 @@ EXPORT_SYMBOL(fetch_dev_dn); static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) { struct device_node *np = node; - struct pci_dn *pci; + struct pci_dn *pci = NULL; int err = NOTIFY_OK; switch (action) { case PSERIES_RECONFIG_ADD: pci = np->parent->data; - update_dn_pci_info(np, pci->phb); + if (pci) + update_dn_pci_info(np, pci->phb); break; default: err = NOTIFY_DONE; diff -puN arch/powerpc/platforms/pseries/iommu.c~add_crash_fix arch/powerpc/platforms/pseries/iommu.c --- 2_6_linus_2/arch/powerpc/platforms/pseries/iommu.c~add_crash_fix 2005-10-31 15:19:14.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/powerpc/platforms/pseries/iommu.c 2005-10-31 15:20:44.000000000 -0600 @@ -498,7 +498,7 @@ static int iommu_reconfig_notifier(struc switch (action) { case PSERIES_RECONFIG_REMOVE: - if (pci->iommu_table && + if (pci && pci->iommu_table && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); break; _ From johnrose at austin.ibm.com Thu Nov 3 03:40:06 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 02 Nov 2005 10:40:06 -0600 Subject: dlpar problem on sles9/openpower In-Reply-To: References: Message-ID: <1130949606.32348.45.camel@sinatra.austin.ibm.com> Hi Ingvar- > On the lpar "mgmt", running sles9 sp2 kernel 2.6.5-7.193-pseries64, we > have the magic rpms from IBM installed: > > evlog-drv-tmpl-0.8-1 > diagela-1.3.0.0-6 > lsvpd-0.12.7-1 > ppc64-utils-2.5-2 > librtas-1.2-1 I assume that you have the rpa-dlpar package as well, since you said DLPAR worked at an earlier point. :) I would check two things. First, check the dmesg for any blurbs around the time of the failure. Second, use "rpttr /var/ct/IW/log/mc/IBM.DRM/trace". The output from this includes much gibberish, but the translated hex strings can include the inputs to the "drmgr" command and any error messages. Check near the bottom of the file. Unfortunately, the HMC annoyingly hides error messages for DLPAR of virtual adapters, so we have to dig here. Good luck- John From dwmw2 at infradead.org Thu Nov 3 03:54:46 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 02 Nov 2005 16:54:46 +0000 Subject: please pull the powerpc-merge.git tree In-Reply-To: <17256.17408.247708.622755@cargo.ozlabs.ibm.com> References: <17253.39993.502458.390760@cargo.ozlabs.ibm.com> <1130861207.21212.66.camel@hades.cambridge.redhat.com> <17256.17408.247708.622755@cargo.ozlabs.ibm.com> Message-ID: <1130950487.21212.89.camel@hades.cambridge.redhat.com> On Wed, 2005-11-02 at 15:43 +1100, Paul Mackerras wrote: > Those errors were all due to getting powerbook sleep code included > because you have CONFIG_PM=y. I have changed things so that that code > doesn't get included on a 64-bit build (at least until BenH gets sleep > going on the G5 :). OK, now the Fedora rawhide kernel builds for ppc64 with arch/powerpc and runs on both my POWER5 and G5 test boxes. I need this if I want nvram support on the G5 though. Should we be using CONFIG_GENERIC_NVRAM on ppc64, and actually allowing the nvram support to be optional? --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -351,7 +350,7 @@ void __init pmac_setup_arch(void) find_via_pmu(); smu_init(); -#ifdef CONFIG_NVRAM +#if defined(CONFIG_NVRAM) || defined(CONFIG_PPC64) pmac_nvram_init(); #endif -- dwmw2 From hch at lst.de Thu Nov 3 04:40:37 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 2 Nov 2005 18:40:37 +0100 Subject: [PATCH] exporting validate_sp In-Reply-To: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> Message-ID: <20051102174037.GA23650@lst.de> On Tue, Nov 01, 2005 at 04:14:42PM -0800, Hien Nguyen wrote: > This patch will export the validate_sp() function (part of dump_stack > code). > > I am developers for the systemtap project: > http://sourceware.org/systemtap/ > > The SystemTap runtime includes a function for capturing a stack trace as > a string. For the ppc64 port, we need to have the validate_sp() > function exported so it is accessible to our stack-trace function, which > is part of a SystemTap-generated kernel module. > > This patch should apply to kernel 2.6.14-rc5-mm1. NACK. this is not something that should be exported. especiall not for some odd crap that hopefully never will get merged. From hien at us.ibm.com Thu Nov 3 04:55:22 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Wed, 02 Nov 2005 09:55:22 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <20051102174037.GA23650@lst.de> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> Message-ID: <4368FD8A.3090000@us.ibm.com> Christoph Hellwig wrote: > especiall not for >some odd crap that hopefully never will get merged. > > > > I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi actively work on this project for a while). And systemtap itself does not try to merge any code to the main kernel. From david at gibson.dropbear.id.au Thu Nov 3 10:13:58 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 3 Nov 2005 10:13:58 +1100 Subject: powerpc: Merge ipcbuf.h In-Reply-To: References: <20051101055324.GA3551@localhost.localdomain> <200511012013.51443.arnd@arndb.de> <20051102004426.GC8308@localhost.localdomain> Message-ID: <20051102231358.GA24772@localhost.localdomain> On Wed, Nov 02, 2005 at 02:12:01PM +0100, Andreas Schwab wrote: > David Gibson writes: > > > Oops, when merging ipcbuf.h, I forgot that 'u64' can't be used in > > user-visible headers. This patch corrects the problem, replacing the > > unused fields with an array of four __u32s. > > > > Signed-off-by: David Gibson > > > > Index: working-2.6/include/asm-powerpc/ipcbuf.h > > =================================================================== > > --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 10:41:06.000000000 +1100 > > +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-02 11:41:36.000000000 +1100 > > @@ -27,8 +27,7 @@ > > __kernel_mode_t mode; > > unsigned int seq; > > unsigned int __pad1; > > - u64 __unused1; > > - u64 __unused2; > > + __u32 __unused[4]; > > I think you are changing the alignment of the structure. A u64 has bigger > alignment than a u32[2]. Bother, so it does. Paulus, please apply. powerpc: Keep fixing merged ipcbuf.h Oops, replacing the two u64s in struct ipc64_perm with __u32s changed the alignment of that structure, which could mess up userspace. Revert to using two unsigned long longs (which is what ppc32 had originally). ppc64 orignally had two unsigned longs, but long long is the same size on 64 bit, so this should be ok there too. Signed-off-by: David Gibson Index: working-2.6/include/asm-powerpc/ipcbuf.h =================================================================== --- working-2.6.orig/include/asm-powerpc/ipcbuf.h 2005-11-02 15:47:11.000000000 +1100 +++ working-2.6/include/asm-powerpc/ipcbuf.h 2005-11-03 10:10:58.000000000 +1100 @@ -27,7 +27,8 @@ __kernel_mode_t mode; unsigned int seq; unsigned int __pad1; - __u32 __unused[4]; + unsigned long long __unused1; + unsigned long long __unused2; }; #endif /* _ASM_POWERPC_IPCBUF_H */ -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From hch at lst.de Thu Nov 3 10:30:11 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 3 Nov 2005 00:30:11 +0100 Subject: [PATCH] exporting validate_sp In-Reply-To: <4368FD8A.3090000@us.ibm.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> Message-ID: <20051102233011.GA29200@lst.de> On Wed, Nov 02, 2005 at 09:55:22AM -0800, Hien Nguyen wrote: > Christoph Hellwig wrote: > > > especiall not for > >some odd crap that hopefully never will get merged. > > > > > > > > > I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi > actively work on this project for a while). all these companies are known for producing lots of crap. > And systemtap itself does not try to merge any code to the main kernel. and we're never adding exports for out of tree code. you're out of luck. From paulus at samba.org Thu Nov 3 14:16:29 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 3 Nov 2005 14:16:29 +1100 Subject: [PATCH] ppc64: 64K pages support In-Reply-To: <1130916198.20136.17.camel@gaston> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> Message-ID: <17257.33037.210237.986072@cargo.ozlabs.ibm.com> Benjamin Herrenschmidt writes: > It took a while, but finally, here is the 64K pages support patch for > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > changes the kernel base page size to 64K. The resulting kernel still > boots on any hardware. On current machines with 4K pages support only, > the kernel will maintain 16 "subpages" for each 64K page > transparently. > > Note that while real 64K capable HW has been tested, the current patch > will not enable it yet as such hardware is not released yet, and I'm > still verifying with the firmware architects the proper to get the > information from the newer hypervisors. > > Signed-off-by: Benjamin Herrenschmidt Acked-by: Paul Mackerras From david at gibson.dropbear.id.au Thu Nov 3 16:26:34 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 3 Nov 2005 16:26:34 +1100 Subject: ppc64: Fix bug in SLB miss handler for hugepages In-Reply-To: <17257.33037.210237.986072@cargo.ozlabs.ibm.com> References: <1130915220.20136.14.camel@gaston> <1130916198.20136.17.camel@gaston> <17257.33037.210237.986072@cargo.ozlabs.ibm.com> Message-ID: <20051103052634.GD24772@localhost.localdomain> On Thu, Nov 03, 2005 at 02:16:29PM +1100, Paul Mackerras wrote: > Benjamin Herrenschmidt writes: > > > It took a while, but finally, here is the 64K pages support patch for > > ppc64. This patch adds a new CONFIG_PPC_64K_PAGES which, when enabled, > > changes the kernel base page size to 64K. The resulting kernel still > > boots on any hardware. On current machines with 4K pages support only, > > the kernel will maintain 16 "subpages" for each 64K page > > transparently. > > > > Note that while real 64K capable HW has been tested, the current patch > > will not enable it yet as such hardware is not released yet, and I'm > > still verifying with the firmware architects the proper to get the > > information from the newer hypervisors. > > > > Signed-off-by: Benjamin Herrenschmidt > > Acked-by: Paul Mackerras This patch, however, should be applied on top to fix some problems with hugepage (some pre-existing, another introduced by this patch). The patch fixes a bug in the SLB miss handler for hugepages on ppc64 introduced by the dynamic hugepage patch (commit id c594adad5653491813959277fb87a2fef54c4e05) due to a misunderstanding of the srd instruction's behaviour (mea culpa). The problem arises when a 64-bit process maps some hugepages in the low 4GB of the address space (unusual). In this case, as well as the 256M segment in question being marked for hugepages, other segments at 32G intervals will be incorrectly marked for hugepages. In the process, this patch tweaks the semantics of the hugepage bitmaps to be more sensible. Previously, an address below 4G was marked for hugepages if the appropriate segment bit in the "low areas" bitmask was set *or* if the low bit in the "high areas" bitmap was set (which would mark all addresses below 1TB for hugepage). With this patch, any given address is governed by a single bitmap. Addresses below 4GB are marked for hugepage if and only if their bit is set in the "low areas" bitmap (256M granularity). Addresses between 4GB and 1TB are marked for hugepage iff the low bit in the "high areas" bitmap is set. Higher addresses are marked for hugepage iff their bit in the "high areas" bitmap is set (1TB granularity). To avoid conflicts, this patch must be applied on top of BenH's pending patch for 64k base page size [0]. As such, this patch also addresses a hugepage problem introduced by that patch. That patch allows hugepages of 1MB in size on hardware which supports it, however, that won't work when using 4k pages (4 level pagetable), because in that case hugepage PTEs are stored at the PMD level, and each PMD entry maps 2MB. This patch simply disallows hugepages in that case (we can do something cleverer to re-enable them some other day). Built, booted, and a handful of hugepage related tests passed on POWER5 LPAR (both ARCH=powerpc and ARCH=ppc64). [0] http://gate.crashing.org/~benh/ppc64-64k-pages.diff Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/mm/slb_low.S =================================================================== --- working-2.6.orig/arch/powerpc/mm/slb_low.S 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/slb_low.S 2005-11-03 14:55:56.000000000 +1100 @@ -80,12 +80,17 @@ BEGIN_FTR_SECTION b 1f END_FTR_SECTION_IFCLR(CPU_FTR_16M_PAGE) + cmpldi r10,16 + + lhz r9,PACALOWHTLBAREAS(r13) + mr r11,r10 + blt 5f + lhz r9,PACAHIGHHTLBAREAS(r13) srdi r11,r10,(HTLB_AREA_SHIFT-SID_SHIFT) - srd r9,r9,r11 - lhz r11,PACALOWHTLBAREAS(r13) - srd r11,r11,r10 - or. r9,r9,r11 + +5: srd r9,r9,r11 + andi. r9,r9,1 beq 1f _GLOBAL(slb_miss_user_load_huge) li r11,0 Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-03 15:40:56.000000000 +1100 @@ -329,12 +329,14 @@ */ if (mmu_psize_defs[MMU_PAGE_16M].shift) mmu_huge_psize = MMU_PAGE_16M; + /* With 4k/4level pagetables, we can't (for now) cope with a + * huge page size < PMD_SIZE */ else if (mmu_psize_defs[MMU_PAGE_1M].shift) mmu_huge_psize = MMU_PAGE_1M; /* Calculate HPAGE_SHIFT and sanity check it */ - if (mmu_psize_defs[mmu_huge_psize].shift > 16 && - mmu_psize_defs[mmu_huge_psize].shift < 28) + if (mmu_psize_defs[mmu_huge_psize].shift > MIN_HUGEPTE_SHIFT && + mmu_psize_defs[mmu_huge_psize].shift < SID_SHIFT) HPAGE_SHIFT = mmu_psize_defs[mmu_huge_psize].shift; else HPAGE_SHIFT = 0; /* No huge pages dude ! */ Index: working-2.6/include/asm-ppc64/pgtable-4k.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable-4k.h 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable-4k.h 2005-11-03 15:38:40.000000000 +1100 @@ -23,6 +23,9 @@ #define PMD_SIZE (1UL << PMD_SHIFT) #define PMD_MASK (~(PMD_SIZE-1)) +/* With 4k base page size, hugepage PTEs go at the PMD level */ +#define MIN_HUGEPTE_SHIFT PMD_SHIFT + /* PUD_SHIFT determines what a third-level page table entry can map */ #define PUD_SHIFT (PMD_SHIFT + PMD_INDEX_SIZE) #define PUD_SIZE (1UL << PUD_SHIFT) Index: working-2.6/include/asm-ppc64/pgtable-64k.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable-64k.h 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/include/asm-ppc64/pgtable-64k.h 2005-11-03 15:39:07.000000000 +1100 @@ -14,6 +14,9 @@ #define PTRS_PER_PMD (1 << PMD_INDEX_SIZE) #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE) +/* With 4k base page size, hugepage PTEs go at the PMD level */ +#define MIN_HUGEPTE_SHIFT PAGE_SHIFT + /* PMD_SHIFT determines what a second-level page table entry can map */ #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) #define PMD_SIZE (1UL << PMD_SHIFT) Index: working-2.6/arch/powerpc/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hugetlbpage.c 2005-11-03 14:52:16.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hugetlbpage.c 2005-11-03 15:56:34.000000000 +1100 @@ -212,6 +212,12 @@ BUG_ON(area >= NUM_HIGH_AREAS); + /* Hack, so that each addresses is controlled by exactly one + * of the high or low area bitmaps, the first high area starts + * at 4GB, not 0 */ + if (start == 0) + start = 0x100000000UL; + /* Check no VMAs are in the region */ vma = find_vma(mm, start); if (vma && (vma->vm_start < end)) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From olof at lixom.net Fri Nov 4 06:49:27 2005 From: olof at lixom.net (Olof Johansson) Date: Thu, 3 Nov 2005 11:49:27 -0800 Subject: [PATCH] POWERPC/PPC64: Fix CONFIG_SMP=n build for ppc64 Message-ID: <20051103194927.GC8515@pb15.lixom.net> Hi, Below is against 2.6.14-git5: --- Two CONFIG_SMP=n build fixes due to missing includes. Signed-off-by: Olof Johansson Index: 2.6/arch/ppc64/kernel/sysfs.c =================================================================== --- 2.6.orig/arch/ppc64/kernel/sysfs.c 2005-11-03 10:33:42.000000000 -0800 +++ 2.6/arch/ppc64/kernel/sysfs.c 2005-11-03 10:33:51.000000000 -0800 @@ -20,6 +20,7 @@ #include #include #include +#include static DEFINE_PER_CPU(struct cpu, cpu_devices); Index: 2.6/arch/powerpc/kernel/time.c =================================================================== --- 2.6.orig/arch/powerpc/kernel/time.c 2005-11-03 10:45:43.000000000 -0800 +++ 2.6/arch/powerpc/kernel/time.c 2005-11-03 10:49:52.000000000 -0800 @@ -69,6 +69,7 @@ #include #include #endif +#include /* keep track of when we need to update the rtc */ time_t last_rtc_update; From tim.bird at am.sony.com Fri Nov 4 06:59:31 2005 From: tim.bird at am.sony.com (Tim Bird) Date: Thu, 03 Nov 2005 11:59:31 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <20051102233011.GA29200@lst.de> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> <20051102233011.GA29200@lst.de> Message-ID: <436A6C23.2050604@am.sony.com> Christoph Hellwig wrote: > On Wed, Nov 02, 2005 at 09:55:22AM -0800, Hien Nguyen wrote: > >>Christoph Hellwig wrote: >>>especiall not for >>>some odd crap that hopefully never will get merged. >> >>I disagree, systemtap is not some odd crap (Redhat, IBM, Intel, Hitachi >>actively work on this project for a while). > > all these companies are known for producing lots of crap. > >>And systemtap itself does not try to merge any code to the main kernel. > > and we're never adding exports for out of tree code. you're out of > luck. These lines must be directly from Christoph's "How to motivate people" management handbook. Don't worry Hein. Other people (though maybe quieter than Christoph) see value in the SystemTap work. I hope it will continue to be developed and improved. -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= From hien at us.ibm.com Fri Nov 4 07:42:28 2005 From: hien at us.ibm.com (Hien Nguyen) Date: Thu, 03 Nov 2005 12:42:28 -0800 Subject: [PATCH] exporting validate_sp In-Reply-To: <436A6C23.2050604@am.sony.com> References: <1130890483.4032.20.camel@dyn9047022138.beaverton.ibm.com> <20051102174037.GA23650@lst.de> <4368FD8A.3090000@us.ibm.com> <20051102233011.GA29200@lst.de> <436A6C23.2050604@am.sony.com> Message-ID: <436A7634.20602@us.ibm.com> Tim Bird wrote: >These lines must be directly from Christoph's >"How to motivate people" management handbook. > >Don't worry Hein. Other people (though maybe quieter than >Christoph) see value in the SystemTap work. I hope it will >continue to be developed and improved. > -- Tim > >============================= >Tim Bird >Architecture Group Chair, CE Linux Forum >Senior Staff Engineer, Sony Electronics >============================= > > > > Thanks for the kind words. Yes, our intention is to make systemtap better, safer. Hien. From linas at austin.ibm.com Fri Nov 4 07:53:31 2005 From: linas at austin.ibm.com (linas) Date: Thu, 3 Nov 2005 14:53:31 -0600 Subject: [PATCH] fix add notifier crashes In-Reply-To: <1130948995.32348.35.camel@sinatra.austin.ibm.com> References: <1130948995.32348.35.camel@sinatra.austin.ibm.com> Message-ID: <20051103205331.GN19593@austin.ibm.com> On Wed, Nov 02, 2005 at 10:29:55AM -0600, John Rose was heard to remark: > Hi Paul- > > The extraction of PCI stuff from struct device_node left some false > assumptions in notifier code. As a result, dynamic add crashes when > non-PCI nodes are added. This patch fixes these assumptions. This is more or less the same as the patch I sent on 4 October. It was called "crash-on-pci-slot-add.patch" There's another closely related null ptr deref that is fixed in the patch "rpaphp-crashing.patch" that was the next one in that series. Anyway, other than that, it looks good to me. --linas From david at gibson.dropbear.id.au Fri Nov 4 11:16:53 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 4 Nov 2005 11:16:53 +1100 Subject: powerpc: Kill ppcdebug Message-ID: <20051104001653.GC29025@localhost.localdomain> The ancient ppcdebug/PPCDBG mechanism is now only used in two places. First, in the hash setup code, one of the bits allows the size of the hash table to be reduced by a factor of 8 - which would be better accomplished with a command line option for that purpose. The other was a bunch of bus walking related messages in the iSeries code, which would seem to be insufficient reason to keep the mechanism. This patch removes the last traces of this mechanism. Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc). Signed-off-by: David Gibson Index: working-2.6/arch/powerpc/kernel/signal_32.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_32.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_32.c 2005-11-04 10:23:36.000000000 +1100 @@ -44,7 +44,6 @@ #include #ifdef CONFIG_PPC64 #include "ppc32.h" -#include #include #include #else Index: working-2.6/arch/powerpc/mm/init_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/init_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/init_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -57,7 +57,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/mm/pgtable_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/pgtable_64.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/powerpc/mm/pgtable_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/iseries/smp.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/smp.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/smp.c 2005-11-04 10:23:20.000000000 +1100 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/iommu.c 2005-11-04 10:23:20.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/platforms/pseries/lpar.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/lpar.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/lpar.c 2005-11-04 10:23:20.000000000 +1100 @@ -31,7 +31,6 @@ #include #include #include -#include #include #include #include @@ -39,6 +38,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) Index: working-2.6/arch/powerpc/platforms/pseries/ras.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/pseries/ras.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/pseries/ras.c 2005-11-04 10:23:20.000000000 +1100 @@ -48,7 +48,7 @@ #include #include #include -#include +#include static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(ras_log_buf_lock); Index: working-2.6/arch/ppc64/kernel/prom.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom.c 2005-10-31 15:44:59.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom.c 2005-11-04 10:23:20.000000000 +1100 @@ -46,7 +46,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/ppc64/kernel/prom_init.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/prom_init.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/prom_init.c 2005-11-04 10:23:20.000000000 +1100 @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/sysdev/u3_iommu.c =================================================================== --- working-2.6.orig/arch/powerpc/sysdev/u3_iommu.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/sysdev/u3_iommu.c 2005-11-04 10:23:20.000000000 +1100 @@ -37,7 +37,6 @@ #include #include #include -#include #include #include #include Index: working-2.6/arch/powerpc/kernel/setup_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/setup_64.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/setup_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -60,6 +59,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -352,12 +352,6 @@ DBG(" -> early_setup()\n"); /* - * Fill the default DBG level (do we want to keep - * that old mecanism around forever ?) - */ - ppcdbg_initialize(); - - /* * Do early initializations using the flattened device * tree, like retreiving the physical memory map or * calculating/retreiving the hash table size @@ -605,7 +599,6 @@ printk("-----------------------------------------------------\n"); printk("ppc64_pft_size = 0x%lx\n", ppc64_pft_size); - printk("ppc64_debug_switch = 0x%lx\n", ppc64_debug_switch); printk("ppc64_interrupt_controller = 0x%ld\n", ppc64_interrupt_controller); printk("systemcfg = 0x%p\n", systemcfg); printk("systemcfg->platform = 0x%x\n", systemcfg->platform); Index: working-2.6/include/asm-ppc64/ppcdebug.h =================================================================== --- working-2.6.orig/include/asm-ppc64/ppcdebug.h 2005-10-25 11:59:59.000000000 +1000 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,108 +0,0 @@ -#ifndef __PPCDEBUG_H -#define __PPCDEBUG_H -/******************************************************************** - * Author: Adam Litke, IBM Corp - * (c) 2001 - * - * This file contains definitions and macros for a runtime debugging - * system for ppc64 (This should also work on 32 bit with a few - * adjustments. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - * - ********************************************************************/ - -#include -#include -#include -#include - -#define PPCDBG_BITVAL(X) ((1UL)<<((unsigned long)(X))) - -/* Defined below are the bit positions of various debug flags in the - * ppc64_debug_switch variable. - * -- When adding new values, please enter them into trace names below -- - * - * Values 62 & 63 can be used to stress the hardware page table management - * code. They must be set statically, any attempt to change them dynamically - * would be a very bad idea. - */ -#define PPCDBG_MMINIT PPCDBG_BITVAL(0) -#define PPCDBG_MM PPCDBG_BITVAL(1) -#define PPCDBG_SYS32 PPCDBG_BITVAL(2) -#define PPCDBG_SYS32NI PPCDBG_BITVAL(3) -#define PPCDBG_SYS32X PPCDBG_BITVAL(4) -#define PPCDBG_SYS32M PPCDBG_BITVAL(5) -#define PPCDBG_SYS64 PPCDBG_BITVAL(6) -#define PPCDBG_SYS64NI PPCDBG_BITVAL(7) -#define PPCDBG_SYS64X PPCDBG_BITVAL(8) -#define PPCDBG_SIGNAL PPCDBG_BITVAL(9) -#define PPCDBG_SIGNALXMON PPCDBG_BITVAL(10) -#define PPCDBG_BINFMT32 PPCDBG_BITVAL(11) -#define PPCDBG_BINFMT64 PPCDBG_BITVAL(12) -#define PPCDBG_BINFMTXMON PPCDBG_BITVAL(13) -#define PPCDBG_BINFMT_32ADDR PPCDBG_BITVAL(14) -#define PPCDBG_ALIGNFIXUP PPCDBG_BITVAL(15) -#define PPCDBG_TCEINIT PPCDBG_BITVAL(16) -#define PPCDBG_TCE PPCDBG_BITVAL(17) -#define PPCDBG_PHBINIT PPCDBG_BITVAL(18) -#define PPCDBG_SMP PPCDBG_BITVAL(19) -#define PPCDBG_BOOT PPCDBG_BITVAL(20) -#define PPCDBG_BUSWALK PPCDBG_BITVAL(21) -#define PPCDBG_PROM PPCDBG_BITVAL(22) -#define PPCDBG_RTAS PPCDBG_BITVAL(23) -#define PPCDBG_HTABSTRESS PPCDBG_BITVAL(62) -#define PPCDBG_HTABSIZE PPCDBG_BITVAL(63) -#define PPCDBG_NONE (0UL) -#define PPCDBG_ALL (0xffffffffUL) - -/* The default initial value for the debug switch */ -#define PPC_DEBUG_DEFAULT 0 -/* #define PPC_DEBUG_DEFAULT PPCDBG_ALL */ - -#define PPCDBG_NUM_FLAGS 64 - -extern u64 ppc64_debug_switch; - -#ifdef WANT_PPCDBG_TAB -/* A table of debug switch names to allow name lookup in xmon - * (and whoever else wants it. - */ -char *trace_names[PPCDBG_NUM_FLAGS] = { - /* Known debug names */ - "mminit", "mm", - "syscall32", "syscall32_ni", "syscall32x", "syscall32m", - "syscall64", "syscall64_ni", "syscall64x", - "signal", "signal_xmon", - "binfmt32", "binfmt64", "binfmt_xmon", "binfmt_32addr", - "alignfixup", "tceinit", "tce", "phb_init", - "smp", "boot", "buswalk", "prom", - "rtas" -}; -#else -extern char *trace_names[64]; -#endif /* WANT_PPCDBG_TAB */ - -#ifdef CONFIG_PPCDBG -/* Macro to conditionally print debug based on debug_switch */ -#define PPCDBG(...) udbg_ppcdbg(__VA_ARGS__) - -/* Macro to conditionally call a debug routine based on debug_switch */ -#define PPCDBGCALL(FLAGS,FUNCTION) ifppcdebug(FLAGS) FUNCTION - -/* Macros to test for debug states */ -#define ifppcdebug(FLAGS) if (udbg_ifdebug(FLAGS)) -#define ppcdebugset(FLAGS) (udbg_ifdebug(FLAGS)) -#define PPCDBG_BINFMT (test_thread_flag(TIF_32BIT) ? PPCDBG_BINFMT32 : PPCDBG_BINFMT64) - -#else -#define PPCDBG(...) do {;} while (0) -#define PPCDBGCALL(FLAGS,FUNCTION) do {;} while (0) -#define ifppcdebug(...) if (0) -#define ppcdebugset(FLAGS) (0) -#endif /* CONFIG_PPCDBG */ - -#endif /*__PPCDEBUG_H */ Index: working-2.6/arch/ppc64/kernel/udbg.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/udbg.c 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/udbg.c 2005-11-04 10:23:20.000000000 +1100 @@ -10,12 +10,10 @@ */ #include -#define WANT_PPCDBG_TAB /* Only defined here */ #include #include #include #include -#include #include void (*udbg_putc)(unsigned char c); @@ -89,59 +87,6 @@ va_end(args); } -/* PPCDBG stuff */ - -u64 ppc64_debug_switch; - -/* Special print used by PPCDBG() macro */ -void udbg_ppcdbg(unsigned long debug_flags, const char *fmt, ...) -{ - unsigned long active_debugs = debug_flags & ppc64_debug_switch; - - if (active_debugs) { - va_list ap; - unsigned char buf[UDBG_BUFSIZE]; - unsigned long i, len = 0; - - for (i=0; i < PPCDBG_NUM_FLAGS; i++) { - if (((1U << i) & active_debugs) && - trace_names[i]) { - len += strlen(trace_names[i]); - udbg_puts(trace_names[i]); - break; - } - } - - snprintf(buf, UDBG_BUFSIZE, " [%s]: ", current->comm); - len += strlen(buf); - udbg_puts(buf); - - while (len < 18) { - udbg_puts(" "); - len++; - } - - va_start(ap, fmt); - vsnprintf(buf, UDBG_BUFSIZE, fmt, ap); - udbg_puts(buf); - va_end(ap); - } -} - -unsigned long udbg_ifdebug(unsigned long flags) -{ - return (flags & ppc64_debug_switch); -} - -/* - * Initialize the PPCDBG state. Called before relocation has been enabled. - */ -void __init ppcdbg_initialize(void) -{ - ppc64_debug_switch = PPC_DEBUG_DEFAULT; /* | PPCDBG_BUSWALK | */ - /* PPCDBG_PHBINIT | PPCDBG_MM | PPCDBG_MMINIT | PPCDBG_TCEINIT | PPCDBG_TCE */; -} - /* * Early boot console based on udbg */ Index: working-2.6/include/asm-ppc64/udbg.h =================================================================== --- working-2.6.orig/include/asm-ppc64/udbg.h 2005-10-31 15:20:22.000000000 +1100 +++ working-2.6/include/asm-ppc64/udbg.h 2005-11-04 10:23:20.000000000 +1100 @@ -23,9 +23,6 @@ extern void register_early_udbg_console(void); extern void udbg_printf(const char *fmt, ...); -extern void udbg_ppcdbg(unsigned long flags, const char *fmt, ...); -extern unsigned long udbg_ifdebug(unsigned long flags); -extern void __init ppcdbg_initialize(void); extern void udbg_init_uart(void __iomem *comport, unsigned int speed); Index: working-2.6/arch/powerpc/mm/hash_utils_64.c =================================================================== --- working-2.6.orig/arch/powerpc/mm/hash_utils_64.c 2005-10-31 15:20:20.000000000 +1100 +++ working-2.6/arch/powerpc/mm/hash_utils_64.c 2005-11-04 10:23:20.000000000 +1100 @@ -32,7 +32,6 @@ #include #include -#include #include #include #include @@ -194,12 +193,6 @@ htab_size_bytes = get_hashtable_size(); pteg_count = htab_size_bytes >> 7; - /* For debug, make the HTAB 1/8 as big as it normally would be. */ - ifppcdebug(PPCDBG_HTABSIZE) { - pteg_count >>= 3; - htab_size_bytes = pteg_count << 7; - } - htab_hash_mask = pteg_count - 1; if (systemcfg->platform & PLATFORM_LPAR) { Index: working-2.6/arch/powerpc/platforms/iseries/irq.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/irq.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/irq.c 2005-11-04 10:23:20.000000000 +1100 @@ -35,7 +35,6 @@ #include #include -#include #include #include #include @@ -227,8 +226,6 @@ /* Unmask secondary INTA */ mask = 0x80000000; HvCallPci_unmaskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_enable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* This is called by iSeries_activate_IRQs */ @@ -310,8 +307,6 @@ /* Mask secondary INTA */ mask = 0x80000000; HvCallPci_maskInterrupts(bus, subBus, deviceId, mask); - PPCDBG(PPCDBG_BUSWALK, "iSeries_disable_IRQ 0x%02X.%02X.%02X 0x%04X\n", - bus, subBus, deviceId, irq); } /* Index: working-2.6/arch/powerpc/platforms/iseries/pci.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/pci.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/pci.c 2005-11-04 10:23:20.000000000 +1100 @@ -32,7 +32,6 @@ #include #include #include -#include #include #include @@ -207,10 +206,6 @@ struct device_node *node; struct pci_dn *pdn; - PPCDBG(PPCDBG_BUSWALK, - "-build_device_node 0x%02X.%02X.%02X Function: %02X\n", - Bus, SubBus, AgentId, Function); - node = kmalloc(sizeof(struct device_node), GFP_KERNEL); if (node == NULL) return NULL; @@ -243,8 +238,6 @@ struct pci_controller *phb; HvBusNumber bus; - PPCDBG(PPCDBG_BUSWALK, "find_and_init_phbs Entry\n"); - /* Check all possible buses. */ for (bus = 0; bus < 256; bus++) { int ret = HvCallXm_testBus(bus); @@ -261,9 +254,6 @@ phb->last_busno = bus; phb->ops = &iSeries_pci_ops; - PPCDBG(PPCDBG_BUSWALK, "PCI:Create iSeries pci_controller(%p), Bus: %04X\n", - phb, bus); - /* Find and connect the devices. */ scan_PHB_slots(phb); } @@ -285,11 +275,9 @@ */ void iSeries_pcibios_init(void) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iomm_table_initialize(); find_and_init_phbs(); io_page_mask = -1; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } /* @@ -301,8 +289,6 @@ struct device_node *node; int DeviceCount = 0; - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup Entry.\n"); - /* Fix up at the device node and pci_dev relationship */ mf_display_src(0xC9000100); @@ -316,9 +302,6 @@ ++DeviceCount; pdev->sysdata = (void *)node; PCI_DN(node)->pcidev = pdev; - PPCDBG(PPCDBG_BUSWALK, - "pdev 0x%p <==> DevNode 0x%p\n", - pdev, node); allocate_device_bars(pdev); iSeries_Device_Information(pdev, DeviceCount); iommu_devnode_init_iSeries(node); @@ -333,13 +316,10 @@ void pcibios_fixup_bus(struct pci_bus *PciBus) { - PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_fixup_bus(0x%04X) Entry.\n", - PciBus->number); } void pcibios_fixup_resources(struct pci_dev *pdev) { - PPCDBG(PPCDBG_BUSWALK, "fixup_resources pdev %p\n", pdev); } /* @@ -401,9 +381,6 @@ printk("found device at bus %d idsel %d func %d (AgentId %x)\n", bus, IdSel, Function, AgentId); /* Connect EADs: 0x18.00.12 = 0x00 */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:Connect EADs: 0x%02X.%02X.%02X\n", - bus, SubBus, AgentId); HvRc = HvCallPci_getBusUnitInfo(bus, SubBus, AgentId, iseries_hv_addr(BridgeInfo), sizeof(struct HvCallPci_BridgeInfo)); @@ -414,14 +391,6 @@ BridgeInfo->maxAgents, BridgeInfo->maxSubBusNumber, BridgeInfo->logicalSlotNumber); - PPCDBG(PPCDBG_BUSWALK, - "PCI: BridgeInfo, Type:0x%02X, SubBus:0x%02X, MaxAgents:0x%02X, MaxSubBus: 0x%02X, LSlot: 0x%02X\n", - BridgeInfo->busUnitInfo.deviceType, - BridgeInfo->subBusNumber, - BridgeInfo->maxAgents, - BridgeInfo->maxSubBusNumber, - BridgeInfo->logicalSlotNumber); - if (BridgeInfo->busUnitInfo.deviceType == HvCallPci_BridgeDevice) { /* Scan_Bridge_Slot...: 0x18.00.12 */ @@ -454,9 +423,6 @@ /* iSeries_allocate_IRQ.: 0x18.00.12(0xA3) */ Irq = iSeries_allocate_IRQ(Bus, 0, EADsIdSel); - PPCDBG(PPCDBG_BUSWALK, - "PCI:- allocate and assign IRQ 0x%02X.%02X.%02X = 0x%02X\n", - Bus, 0, EADsIdSel, Irq); /* * Connect all functions of any device found. @@ -482,9 +448,6 @@ printk("read vendor ID: %x\n", VendorId); /* FoundDevice: 0x18.28.10 = 0x12AE */ - PPCDBG(PPCDBG_BUSWALK, - "PCI:- FoundDevice: 0x%02X.%02X.%02X = 0x%04X, irq %d\n", - Bus, SubBus, AgentId, VendorId, Irq); HvRc = HvCallPci_configStore8(Bus, SubBus, AgentId, PCI_INTERRUPT_LINE, Irq); if (HvRc != 0) Index: working-2.6/arch/powerpc/platforms/iseries/setup.c =================================================================== --- working-2.6.orig/arch/powerpc/platforms/iseries/setup.c 2005-11-03 16:26:57.000000000 +1100 +++ working-2.6/arch/powerpc/platforms/iseries/setup.c 2005-11-04 10:23:20.000000000 +1100 @@ -71,8 +71,6 @@ #endif /* Function Prototypes */ -extern void ppcdbg_initialize(void); - static void build_iSeries_Memory_Map(void); static void iseries_shared_idle(void); static void iseries_dedicated_idle(void); @@ -309,8 +307,6 @@ ppc64_firmware_features = FW_FEATURE_ISERIES; - ppcdbg_initialize(); - ppc64_interrupt_controller = IC_ISERIES; #if defined(CONFIG_BLK_DEV_INITRD) Index: working-2.6/arch/ppc64/Kconfig.debug =================================================================== --- working-2.6.orig/arch/ppc64/Kconfig.debug 2005-10-25 11:59:53.000000000 +1000 +++ working-2.6/arch/ppc64/Kconfig.debug 2005-11-04 10:23:20.000000000 +1100 @@ -55,10 +55,6 @@ xmon is normally disabled unless booted with 'xmon=on'. Use 'xmon=off' to disable xmon init during runtime. -config PPCDBG - bool "Include PPCDBG realtime debugging" - depends on DEBUG_KERNEL - config IRQSTACKS bool "Use separate kernel stacks when processing interrupts" help Index: working-2.6/arch/powerpc/kernel/signal_64.c =================================================================== --- working-2.6.orig/arch/powerpc/kernel/signal_64.c 2005-11-04 10:21:12.000000000 +1100 +++ working-2.6/arch/powerpc/kernel/signal_64.c 2005-11-04 10:23:47.000000000 +1100 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From linas at linas.org Fri Nov 4 10:59:18 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 17:59:18 -0600 Subject: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Message-ID: <20051103235918.GA25616@mail.gnucash.org> What follows is a long sequence of mostly small patches to implement PCI Error Recovery by adding notification callbacks to the PCI device driver structure, implementing the recovery in 5 device drivers (3 ethernet, two scsi drivers), and adding the actual error detection and recovery code to the ppc64/powerpc arch tree. Highlights: -- Patches 1-14: Misc required ppc64/powerpc cleanup/bugfixes/restructuring -- Patch 15: Overview documentation -- Patch 16: changes to include/linux/pci.h -- Patches 17-26: error detection and recovery for pSeries PCI bridge chips -- Patchs 27-32: recovery patches for ethernet, scsi device drivers -- Patches 33-42: More misc ppc64-specific changes Signed-off-by: Linas Vepstas From michael at ellerman.id.au Fri Nov 4 11:34:26 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 4 Nov 2005 11:34:26 +1100 Subject: powerpc: Kill ppcdebug In-Reply-To: <20051104001653.GC29025@localhost.localdomain> References: <20051104001653.GC29025@localhost.localdomain> Message-ID: <200511041134.30106.michael@ellerman.id.au> On Fri, 4 Nov 2005 11:16, David Gibson wrote: > The ancient ppcdebug/PPCDBG mechanism is now only used in two places. > First, in the hash setup code, one of the bits allows the size of the > hash table to be reduced by a factor of 8 - which would be better > accomplished with a command line option for that purpose. The other > was a bunch of bus walking related messages in the iSeries code, which > would seem to be insufficient reason to keep the mechanism. > > This patch removes the last traces of this mechanism. I agree it's pretty ugly, but I thought the concept was at least nice, ie. runttime enablable debugging. The current scheme of having to #define DEBUG in a gazillion different files is pretty painful. Oh well. -- Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051104/204e9ca5/attachment.pgp From linas at austin.ibm.com Fri Nov 4 11:42:26 2005 From: linas at austin.ibm.com (linas) Date: Thu, 3 Nov 2005 18:42:26 -0600 Subject: [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces In-Reply-To: <20051103235918.GA25616@mail.gnucash.org> References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004226.GQ19593@austin.ibm.com> 01-pci-dn-uniformization.patch This patch changes the rtas_pci interface to use the new struct pci_dn structure for two routines that work with pci device nodes. This patch also does some minor janitorial work: it uses some handy macros and cleans up some trailing whitespace in the affected file. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 11:59:11.879644789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:01:21.403477910 -0600 @@ -71,10 +71,6 @@ * and sent out for processing. */ -/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */ -#define BUID_HI(buid) ((buid) >> 32) -#define BUID_LO(buid) ((buid) & 0xffffffff) - /* EEH event workqueue setup. */ static DEFINE_SPINLOCK(eeh_eventlist_lock); LIST_HEAD(eeh_eventlist); Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-10-31 11:59:11.880644649 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-10-31 12:01:21.404477769 -0600 @@ -26,6 +26,10 @@ extern struct pci_dev *ppc64_isabridge_dev; /* may be NULL if no ISA bus */ +/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */ +#define BUID_HI(buid) ((buid) >> 32) +#define BUID_LO(buid) ((buid) & 0xffffffff) + /* PCI device_node operations */ struct device_node; typedef void *(*traverse_func)(struct device_node *me, void *data); Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c 2005-10-31 11:59:11.879644789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c 2005-10-31 12:01:21.407477349 -0600 @@ -5,19 +5,19 @@ * Copyright (C) 2003 Anton Blanchard , IBM * * RTAS specific routines for PCI. - * + * * Based on code from pci.c, chrp_pci.c and pSeries_pci.c * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA @@ -47,7 +47,7 @@ static int ibm_read_pci_config; static int ibm_write_pci_config; -static int config_access_valid(struct pci_dn *dn, int where) +static inline int config_access_valid(struct pci_dn *dn, int where) { if (where < 256) return 1; @@ -72,16 +72,14 @@ return 0; } -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; int ret; - struct pci_dn *pdn; - if (!dn || !dn->data) + if (!pdn) return PCIBIOS_DEVICE_NOT_FOUND; - pdn = dn->data; if (!config_access_valid(pdn, where)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -90,7 +88,7 @@ buid = pdn->phb->buid; if (buid) { ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, - addr, buid >> 32, buid & 0xffffffff, size); + addr, BUID_HI(buid), BUID_LO(buid), size); } else { ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size); } @@ -100,7 +98,7 @@ return PCIBIOS_DEVICE_NOT_FOUND; if (returnval == EEH_IO_ERROR_VALUE(size) && - eeh_dn_check_failure (dn, NULL)) + eeh_dn_check_failure (pdn->node, NULL)) return PCIBIOS_DEVICE_NOT_FOUND; return PCIBIOS_SUCCESSFUL; @@ -118,23 +116,23 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->data && PCI_DN(dn)->devfn == devfn + for (dn = busdn->child; dn; dn = dn->sibling) { + struct pci_dn *pdn = PCI_DN(dn); + if (pdn && pdn->devfn == devfn && of_device_available(dn)) - return rtas_read_config(dn, where, size, val); + return rtas_read_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } -int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val) { unsigned long buid, addr; int ret; - struct pci_dn *pdn; - if (!dn || !dn->data) + if (!pdn) return PCIBIOS_DEVICE_NOT_FOUND; - pdn = dn->data; if (!config_access_valid(pdn, where)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -142,7 +140,8 @@ (pdn->devfn << 8) | (where & 0xff); buid = pdn->phb->buid; if (buid) { - ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val); + ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, + BUID_HI(buid), BUID_LO(buid), size, (ulong) val); } else { ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val); } @@ -165,10 +164,12 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->data && PCI_DN(dn)->devfn == devfn + for (dn = busdn->child; dn; dn = dn->sibling) { + struct pci_dn *pdn = PCI_DN(dn); + if (pdn && pdn->devfn == devfn && of_device_available(dn)) - return rtas_write_config(dn, where, size, val); + return rtas_write_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } @@ -221,7 +222,7 @@ /* Python's register file is 1 MB in size. */ chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000); - /* + /* * Firmware doesn't always clear this bit which is critical * for good performance - Anton */ @@ -292,7 +293,7 @@ if (bus_range == NULL || len < 2 * sizeof(int)) { return 1; } - + phb->first_busno = bus_range[0]; phb->last_busno = bus_range[1]; From linas at linas.org Fri Nov 4 11:47:50 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:47:50 -0600 Subject: [PATCH 2/42]: ppc64: misc minor cleanup References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004750.GA26782@mail.gnucash.org> 02-eeh-minor-cleanup.patch This patch performs some minor cleanup of the eeh.c file, including: -- trim some trailing whitespace -- remove extraneous #includes -- use the macro PCI_DN uniformly, instead of the void pointer chase. -- typos in comments -- improved debug printk's Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:01:21.403477910 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:06:16.222121166 -0600 @@ -1,32 +1,31 @@ /* * eeh.c * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include #include #include -#include #include #include #include #include #include #include +#include #include #include #include @@ -49,8 +48,8 @@ * were "empty": all reads return 0xff's and all writes are silently * ignored. EEH slot isolation events can be triggered by parity * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by dust, vibration, humidity, - * radioactivity or plain-old failed hardware. + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. * * Note, however, that one of the leading causes of EEH slot * freeze events are buggy device drivers, buggy device microcode, @@ -256,18 +255,17 @@ dn = pci_device_to_OF_node(dev); if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", - pci_name(dev)); + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); return; } /* Skip any devices for which EEH is not enabled. */ - pdn = dn->data; + pdn = PCI_DN(dn); if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { #ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s\n", - pci_name(dev)); + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); #endif return; } @@ -410,16 +408,16 @@ * @dn: device node to read * @rets: array to return results in */ -static int read_slot_reset_state(struct device_node *dn, int rets[]) +static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) { int token, outputs; - struct pci_dn *pdn = dn->data; if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { token = ibm_read_slot_reset_state2; outputs = 4; } else { token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ outputs = 3; } @@ -496,7 +494,7 @@ /** * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * @token i/o token, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -522,7 +520,7 @@ * will query firmware for the EEH status. * * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a solt isolation event notification. + * a non-zero value and queues up a slot isolation event notification. * * It is safe to call this routine in an interrupt context. */ @@ -542,7 +540,7 @@ if (!dn) return 0; - pdn = dn->data; + pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || @@ -562,7 +560,7 @@ atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { /* re-read the slot reset state */ - if (read_slot_reset_state(dn, rets) != 0) + if (read_slot_reset_state(pdn, rets) != 0) rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } @@ -576,7 +574,7 @@ * function zero of a multi-function device. * In any case they must share a common PHB. */ - ret = read_slot_reset_state(dn, rets); + ret = read_slot_reset_state(pdn, rets); if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { __get_cpu_var(false_positives)++; return 0; @@ -635,7 +633,6 @@ * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -680,7 +677,7 @@ u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); u32 *regs; int enable; - struct pci_dn *pdn = dn->data; + struct pci_dn *pdn = PCI_DN(dn); pdn->eeh_mode = 0; @@ -732,7 +729,7 @@ /* This device doesn't support EEH, but it may have an * EEH parent, in which case we mark it as supported. */ - if (dn->parent && dn->parent->data + if (dn->parent && PCI_DN(dn->parent) && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { /* Parent supports EEH. */ pdn->eeh_mode |= EEH_MODE_SUPPORTED; @@ -745,7 +742,7 @@ dn->full_name); } - return NULL; + return NULL; } /* @@ -793,13 +790,11 @@ for (phb = of_find_node_by_name(NULL, "pci"); phb; phb = of_find_node_by_name(phb, "pci")) { unsigned long buid; - struct pci_dn *pci; buid = get_phb_buid(phb); - if (buid == 0 || phb->data == NULL) + if (buid == 0 || PCI_DN(phb) == NULL) continue; - pci = phb->data; info.buid_lo = BUID_LO(buid); info.buid_hi = BUID_HI(buid); traverse_pci_devices(phb, early_enable_eeh, &info); @@ -828,11 +823,13 @@ struct pci_controller *phb; struct eeh_early_enable_info info; - if (!dn || !dn->data) + if (!dn || !PCI_DN(dn)) return; phb = PCI_DN(dn)->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } From linas at linas.org Fri Nov 4 11:48:45 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:48:45 -0600 Subject: [PATCH 3/42]: ppc64: PCI address cache minor fixes References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004845.GA26803@mail.gnucash.org> 03-eeh-addr-cache-cleanup.patch This is a minor patch to clean up a buglet related to the PCI address cache. (The buglet doesn't manifes itself unless there are also bugs elsewhere, which is why its minor.). Also: -- Improved debug printing. -- Declare some private routines as static -- Adds reference counting to struct pci_dn->pcidev structure Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:07:15.072864803 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:10:23.985360685 -0600 @@ -219,9 +219,9 @@ while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -240,6 +240,11 @@ piar->pcidev = dev; piar->flags = flags; +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -301,7 +306,7 @@ * we maintain a cache of devices that can be quickly searched. * This routine adds a device to that cache. */ -void pci_addr_cache_insert_device(struct pci_dev *dev) +static void pci_addr_cache_insert_device(struct pci_dev *dev) { unsigned long flags; @@ -344,7 +349,7 @@ * the tree multiple times (once per resource). * But so what; device removal doesn't need to be that fast. */ -void pci_addr_cache_remove_device(struct pci_dev *dev) +static void pci_addr_cache_remove_device(struct pci_dev *dev) { unsigned long flags; @@ -366,6 +371,9 @@ { struct pci_dev *dev = NULL; + if (!eeh_subsystem_enabled) + return; + spin_lock_init(&pci_io_addr_cache_root.piar_lock); while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { @@ -837,7 +845,7 @@ info.buid_lo = BUID_LO(phb->buid); early_enable_eeh(dn, &info); } -EXPORT_SYMBOL(eeh_add_device_early); +EXPORT_SYMBOL_GPL(eeh_add_device_early); /** * eeh_add_device_late - perform EEH initialization for the indicated pci device @@ -848,6 +856,8 @@ */ void eeh_add_device_late(struct pci_dev *dev) { + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -855,9 +865,13 @@ printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); #endif + pci_dev_get (dev); + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = dev; + pci_addr_cache_insert_device (dev); } -EXPORT_SYMBOL(eeh_add_device_late); +EXPORT_SYMBOL_GPL(eeh_add_device_late); /** * eeh_remove_device - undo EEH setup for the indicated pci device @@ -868,6 +882,7 @@ */ void eeh_remove_device(struct pci_dev *dev) { + struct device_node *dn; if (!dev || !eeh_subsystem_enabled) return; @@ -876,8 +891,12 @@ printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); #endif pci_addr_cache_remove_device(dev); + + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = NULL; + pci_dev_put (dev); } -EXPORT_SYMBOL(eeh_remove_device); +EXPORT_SYMBOL_GPL(eeh_remove_device); static int proc_eeh_show(struct seq_file *m, void *v) { Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-10-31 12:01:21.404477769 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-10-31 12:10:06.152862619 -0600 @@ -39,10 +39,6 @@ void pci_devs_phb_init(void); void pci_devs_phb_init_dynamic(struct pci_controller *phb); -/* PCI address cache management routines */ -void pci_addr_cache_insert_device(struct pci_dev *dev); -void pci_addr_cache_remove_device(struct pci_dev *dev); - /* From rtas_pci.h */ void init_pci_config_tokens (void); unsigned long get_phb_buid (struct device_node *); From linas at linas.org Fri Nov 4 11:48:52 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:48:52 -0600 Subject: [PATCH 4/42]: ppc64: PCI error rate statistics References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004852.GA26811@mail.gnucash.org> 04-eeh-statistics.patch This minor patch adds some statistics-gathering counters that allow the behaviour of the EEH subsystem o be monitored. While far from perfect, it does provide a rudimentary device that makes understanding of the current state of the system a bit easier. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:10:23.985360685 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:11:57.134291514 -0600 @@ -102,6 +102,10 @@ static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -493,8 +497,6 @@ notifier_call_chain (&eeh_notifier_chain, EEH_NOTIFY_FREEZE, event); - __get_cpu_var(slot_resets)++; - pci_dev_put(event->dev); kfree(event); } @@ -546,17 +548,24 @@ if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; +#ifdef DEBUG + printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name); +#endif return 0; } if (!pdn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -590,6 +599,7 @@ /* prevent repeated reports of this failure */ pdn->eeh_mode |= EEH_MODE_ISOLATED; + __get_cpu_var(slot_resets)++; reset_state = rets[0]; @@ -657,8 +667,10 @@ /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -903,12 +915,17 @@ unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -916,13 +933,17 @@ seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" + "eeh_false_positives=%ld\n" + "eeh_ignored_failures=%ld\n" + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; From linas at linas.org Fri Nov 4 11:49:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:01 -0600 Subject: [PATCH 5/42]: ppc64: RTAS error reporting restructuring References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004901.GA26819@mail.gnucash.org> 05-eeh-slot-error-detail.patch This patch encapsulates a section of code that reports the EEH event. The new subroutine can be used in several places to report the error. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:11:57.134291514 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:13:09.282168648 -0600 @@ -397,6 +397,28 @@ /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct pci_dn *pdn, int severity) +{ + unsigned long flags; + int rc; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -454,9 +476,12 @@ * Since the panic_on_oops sysctl is used to halt the system * in light of potential corruption, we can use it here. */ - if (panic_on_oops) + if (panic_on_oops) { + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, pci_name(dev)); + } else { __get_cpu_var(ignored_failures)++; printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", @@ -539,7 +564,7 @@ int ret; int rets[3]; unsigned long flags; - int rc, reset_state; + int reset_state; struct eeh_event *event; struct pci_dn *pdn; @@ -603,20 +628,7 @@ reset_state = rets[0]; - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); + eeh_slot_error_detail (pdn, 1 /* Temporary Error */); printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", rets[0], dn->name, dn->full_name); @@ -783,6 +795,8 @@ struct device_node *phb, *np; struct eeh_early_enable_info info; + spin_lock_init(&slot_errbuf_lock); + np = of_find_node_by_path("/rtas"); if (np == NULL) return; From linas at linas.org Fri Nov 4 11:49:15 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:15 -0600 Subject: [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004915.GA26827@mail.gnucash.org> 06-eeh-empty-slot-error.patch Performing PCI config-space reads to empty PCI slots can lead to reports of "permanent failure" from the firmware. Ignore permanent failures on empty slots. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:13:09.282168648 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:15:26.162962756 -0600 @@ -617,7 +617,32 @@ * In any case they must share a common PHB. */ ret = read_slot_reset_state(pdn, rets); - if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { + + /* If the call to firmware failed, punt */ + if (ret != 0) { + printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + return 0; + } + + /* If EEH is not supported on this device, punt. */ + if (rets[1] != 1) { + printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + return 0; + } + + /* If not the kind of error we know about, punt. */ + if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { + __get_cpu_var(false_positives)++; + return 0; + } + + /* Note that config-io to empty slots may fail; + * we recognize empty because they don't have children. */ + if ((rets[0] == 5) && (dn->child == NULL)) { __get_cpu_var(false_positives)++; return 0; } @@ -650,7 +675,7 @@ /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - dump_stack(); + if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); return 0; From linas at linas.org Fri Nov 4 11:49:23 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:23 -0600 Subject: [PATCH 7/42]: ppc64: serialize reports of PCI errors References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004923.GA26835@mail.gnucash.org> 07-eeh-report-race.patch When a PCI slot is isolated, all PCI functions under that slot are affected. If hese functions have separate device drivers, the EEH isolation event might be reported multiple times. This patch adds a lock to prevent the racing of such multiple reports. It also marks every device under the slot as having experienced an EEH event, so that multiple reports may be recognized more easily. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:15:26.162962756 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:16:19.766441392 -0600 @@ -96,6 +96,9 @@ static int eeh_subsystem_enabled; +/* Lock to avoid races due to multiple reports of an error */ +static DEFINE_SPINLOCK(confirm_error_lock); + /* Buffer for reporting slot-error-detail rtas calls */ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; static DEFINE_SPINLOCK(slot_errbuf_lock); @@ -544,6 +547,55 @@ return pa | (token & (PAGE_SIZE-1)); } +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + +/** Mark all devices that are peers of this device as failed. + * Mark the device driver too, so that it can see the failure + * immediately; this is critical, since some drivers poll + * status registers in interrupts ... If a driver is polling, + * and the slot is frozen, then the driver can deadlock in + * an interrupt context, which is bad. + */ + +static inline void __eeh_mark_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + + if (dn->child) + __eeh_mark_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void __eeh_clear_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; + if (dn->child) + __eeh_clear_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void eeh_clear_slot (struct device_node *dn) +{ + unsigned long flags; + spin_lock_irqsave(&confirm_error_lock, flags); + __eeh_clear_slot (dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -567,6 +619,8 @@ int reset_state; struct eeh_event *event; struct pci_dn *pdn; + struct device_node *pe_dn; + int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -594,10 +648,14 @@ return 0; } - /* - * If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check... + /* If we already have a pending isolation event for this + * slot, we know it's bad already, we don't need to check. + * Do this checking under a lock; as multiple PCI devices + * in one slot might report errors simultaneously, and we + * only want one error recovery routine running. */ + spin_lock_irqsave(&confirm_error_lock, flags); + rc = 1; if (pdn->eeh_mode & EEH_MODE_ISOLATED) { atomic_inc(&eeh_fail_count); if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { @@ -606,7 +664,7 @@ rets[0] = -1; /* reset state unknown */ eeh_panic(dev, rets[0]); } - return 0; + goto dn_unlock; } /* @@ -623,7 +681,8 @@ printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", ret, dn->full_name); __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* If EEH is not supported on this device, punt. */ @@ -631,25 +690,33 @@ printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", ret, dn->full_name); __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* If not the kind of error we know about, punt. */ if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } /* Note that config-io to empty slots may fail; * we recognize empty because they don't have children. */ if ((rets[0] == 5) && (dn->child == NULL)) { __get_cpu_var(false_positives)++; - return 0; + rc = 0; + goto dn_unlock; } - /* prevent repeated reports of this failure */ - pdn->eeh_mode |= EEH_MODE_ISOLATED; - __get_cpu_var(slot_resets)++; + __get_cpu_var(slot_resets)++; + + /* Avoid repeated reports of this failure, including problems + * with other functions on this device, and functions under + * bridges. */ + pe_dn = find_device_pe (dn); + __eeh_mark_slot (pe_dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); reset_state = rets[0]; @@ -678,10 +745,14 @@ if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); - return 0; + return 1; + +dn_unlock: + spin_unlock_irqrestore(&confirm_error_lock, flags); + return rc; } -EXPORT_SYMBOL(eeh_dn_check_failure); +EXPORT_SYMBOL_GPL(eeh_dn_check_failure); /** * eeh_check_failure - check if all 1's data is due to EEH slot freeze @@ -820,6 +891,7 @@ struct device_node *phb, *np; struct eeh_early_enable_info info; + spin_lock_init(&confirm_error_lock); spin_lock_init(&slot_errbuf_lock); np = of_find_node_by_path("/rtas"); From linas at linas.org Fri Nov 4 11:49:31 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:31 -0600 Subject: [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004931.GA26844@mail.gnucash.org> 08-eeh-spin-counter.patch One an EEH event is triggers, all further I/O to a device is blocked (until reset). Bad device drivers may end up spinning in their interrupt handlers, trying to read an interrupt status register that will never change state. This patch moves that spin counter to a per-device structure, and adds some diagnostic prints to help locate the bad driver. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:16:19.766441392 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:18:21.924300428 -0600 @@ -78,14 +78,12 @@ static struct notifier_block *eeh_notifier_chain; -/* - * If a device driver keeps reading an MMIO register in an interrupt +/* If a device driver keeps reading an MMIO register in an interrupt * handler after a slot isolation event has occurred, we assume it * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -521,7 +519,6 @@ "%s\n", event->reset_state, pci_name(event->dev)); - atomic_set(&eeh_fail_count, 0); notifier_call_chain (&eeh_notifier_chain, EEH_NOTIFY_FREEZE, event); @@ -657,12 +654,18 @@ spin_lock_irqsave(&confirm_error_lock, flags); rc = 1; if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + pdn->eeh_check_count ++; + if (pdn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + pdn->eeh_check_count); + dump_stack(); + /* re-read the slot reset state */ if (read_slot_reset_state(pdn, rets) != 0) rets[0] = -1; /* reset state unknown */ - eeh_panic(dev, rets[0]); + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); } goto dn_unlock; } @@ -808,6 +811,8 @@ struct pci_dn *pdn = PCI_DN(dn); pdn->eeh_mode = 0; + pdn->eeh_check_count = 0; + pdn->eeh_freeze_count = 0; if (status && strcmp(status, "ok") != 0) return NULL; /* ignore devices with bad status */ From linas at linas.org Fri Nov 4 11:49:38 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:38 -0600 Subject: [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004938.GA26852@mail.gnucash.org> 09-hotplug-bugfix.patch In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add of a hotplug slot will crash the system, with the following (abbreviated) stack trace: cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0] pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0 lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118 c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable) c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64 c000000000060754 .notifier_call_chain+0x68/0x9c The root cause was that __init __alloc_bootmem() was called long after boot had finished, resulting in a crash because this routine is undefined after boot time. The patch below fixes this crash, and adds some docs to clarify the code. p.s. congrats to all for getting slashdotted on this yesterday! Signed-off-by: Linas Vepstas Mailed to: paulus at samba.org CC: linuxppc64-dev at ozlabs.org, linux-kernel at vger.kernel.org, johnrose at linux.ibm.com On Monday 3 October 2005 revised on 4 Ocober to [PATCH 1/2] ppc64: Crash in DLPAR code on PCI hotplug add Index: linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci_dn.c 2005-10-31 12:19:03.211506966 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c 2005-10-31 12:19:47.420303479 -0600 @@ -43,7 +43,7 @@ u32 *regs; struct pci_dn *pdn; - if (phb->is_dynamic) + if (mem_init_done) pdn = kmalloc(sizeof(*pdn), GFP_KERNEL); else pdn = alloc_bootmem(sizeof(*pdn)); @@ -120,6 +120,14 @@ return NULL; } +/** + * pci_devs_phb_init_dynamic - setup pci devices under this PHB + * phb: pci-to-host bridge (top-level bridge connecting to cpu) + * + * This routine is called both during boot, (before the memory + * subsystem is set up, before kmalloc is valid) and during the + * dynamic lpar operation of adding a PHB to a running system. + */ void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb) { struct device_node * dn = (struct device_node *) phb->arch_data; @@ -200,9 +208,14 @@ .notifier_call = pci_dn_reconfig_notifier, }; -/* - * Actually initialize the phbs. - * The buswalk on this phb has not happened yet. +/** + * pci_devs_phb_init - Initialize phbs and pci devs under them. + * + * This routine walks over all phb's (pci-host bridges) on the + * system, and sets up assorted pci-related structures + * (including pci info in the device node structs) for each + * pci device found underneath. This routine runs once, + * early in the boot sequence. */ void __init pci_devs_phb_init(void) { From linas at linas.org Fri Nov 4 11:49:45 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:45 -0600 Subject: [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004945.GA26860@mail.gnucash.org> 10-EEH-enable-bugfix.patch Bugfix: With the curent linux-2.6.14-rc2-git6, EEH errors are ignored because thier detection requires an unusued, uninitialized flag to be set. This patch removes the unused flag. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-10-31 12:54:20.919034814 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c 2005-10-31 12:54:48.165215962 -0600 @@ -631,11 +631,12 @@ pdn = PCI_DN(dn); /* Access to IO BARs might get this far and still not want checking. */ - if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { __get_cpu_var(ignored_check)++; #ifdef DEBUG - printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name); + printk ("EEH:ignored check (%x) for %s %s\n", + pdn->eeh_mode, pci_name (dev), dn->full_name); #endif return 0; } Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-10-31 12:54:20.919034814 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-10-31 12:54:48.167215682 -0600 @@ -63,7 +63,6 @@ int devfn; /* for pci devices */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; - int eeh_capable; /* from firmware */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ int eeh_is_bridge; /* device is pci-to-pci bridge */ From linas at linas.org Fri Nov 4 11:49:51 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:49:51 -0600 Subject: [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104004951.GA26868@mail.gnucash.org> 11-eeh-move-to-powerpc.patch Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c No other changes (except for Makefile to build it) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-11-02 14:29:22.485829789 -0600 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,1093 +0,0 @@ -/* - * eeh.c - * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#undef DEBUG - -/** Overview: - * EEH, or "Extended Error Handling" is a PCI bridge technology for - * dealing with PCI bus errors that can't be dealt with within the - * usual PCI framework, except by check-stopping the CPU. Systems - * that are designed for high-availability/reliability cannot afford - * to crash due to a "mere" PCI error, thus the need for EEH. - * An EEH-capable bridge operates by converting a detected error - * into a "slot freeze", taking the PCI adapter off-line, making - * the slot behave, from the OS'es point of view, as if the slot - * were "empty": all reads return 0xff's and all writes are silently - * ignored. EEH slot isolation events can be triggered by parity - * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by low voltage on the bus, dust, - * vibration, humidity, radioactivity or plain-old failed hardware. - * - * Note, however, that one of the leading causes of EEH slot - * freeze events are buggy device drivers, buggy device microcode, - * or buggy device hardware. This is because any attempt by the - * device to bus-master data to a memory address that is not - * assigned to the device will trigger a slot freeze. (The idea - * is to prevent devices-gone-wild from corrupting system memory). - * Buggy hardware/drivers will have a miserable time co-existing - * with EEH. - * - * Ideally, a PCI device driver, when suspecting that an isolation - * event has occured (e.g. by reading 0xff's), will then ask EEH - * whether this is the case, and then take appropriate steps to - * reset the PCI slot, the PCI device, and then resume operations. - * However, until that day, the checking is done here, with the - * eeh_check_failure() routine embedded in the MMIO macros. If - * the slot is found to be isolated, an "EEH Event" is synthesized - * and sent out for processing. - */ - -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - -/* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. - */ -#define EEH_MAX_FAILS 100000 - -/* RTAS tokens */ -static int ibm_set_eeh_option; -static int ibm_set_slot_reset; -static int ibm_read_slot_reset_state; -static int ibm_read_slot_reset_state2; -static int ibm_slot_error_detail; - -static int eeh_subsystem_enabled; - -/* Lock to avoid races due to multiple reports of an error */ -static DEFINE_SPINLOCK(confirm_error_lock); - -/* Buffer for reporting slot-error-detail rtas calls */ -static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -static DEFINE_SPINLOCK(slot_errbuf_lock); -static int eeh_error_buf_size; - -/* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); - -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - -/* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ - -void eeh_slot_error_detail (struct pci_dn *pdn, int severity) -{ - unsigned long flags; - int rc; - - /* Log the error with the rtas logger */ - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - severity); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); -} - -/** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** - * read_slot_reset_state - Read the reset state of a device node's slot - * @dn: device node to read - * @rets: array to return results in - */ -static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) -{ - int token, outputs; - - if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { - token = ibm_read_slot_reset_state2; - outputs = 4; - } else { - token = ibm_read_slot_reset_state; - rets[2] = 0; /* fake PE Unavailable info */ - outputs = 3; - } - - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); -} - -/** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xA.... - */ -static inline unsigned long eeh_token_to_phys(unsigned long token) -{ - pte_t *ptep; - unsigned long pa; - - ptep = find_linux_pte(init_mm.pgd, token); - if (!ptep) - return token; - pa = pte_pfn(*ptep) << PAGE_SHIFT; - - return pa | (token & (PAGE_SIZE-1)); -} - -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - -/** Mark all devices that are peers of this device as failed. - * Mark the device driver too, so that it can see the failure - * immediately; this is critical, since some drivers poll - * status registers in interrupts ... If a driver is polling, - * and the slot is frozen, then the driver can deadlock in - * an interrupt context, which is bad. - */ - -static inline void __eeh_mark_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; - - if (dn->child) - __eeh_mark_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void __eeh_clear_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void eeh_clear_slot (struct device_node *dn) -{ - unsigned long flags; - spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); -} - -/** - * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze - * @dn device node - * @dev pci device, if known - * - * Check for an EEH failure for the given device node. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze. This routine - * will query firmware for the EEH status. - * - * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a slot isolation event notification. - * - * It is safe to call this routine in an interrupt context. - */ -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) -{ - int ret; - int rets[3]; - unsigned long flags; - int reset_state; - struct eeh_event *event; - struct pci_dn *pdn; - struct device_node *pe_dn; - int rc = 0; - - __get_cpu_var(total_mmio_ffs)++; - - if (!eeh_subsystem_enabled) - return 0; - - if (!dn) { - __get_cpu_var(no_dn)++; - return 0; - } - pdn = PCI_DN(dn); - - /* Access to IO BARs might get this far and still not want checking. */ - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; -#ifdef DEBUG - printk ("EEH:ignored check (%x) for %s %s\n", - pdn->eeh_mode, pci_name (dev), dn->full_name); -#endif - return 0; - } - - if (!pdn->eeh_config_addr) { - __get_cpu_var(no_cfg_addr)++; - return 0; - } - - /* If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check. - * Do this checking under a lock; as multiple PCI devices - * in one slot might report errors simultaneously, and we - * only want one error recovery routine running. - */ - spin_lock_irqsave(&confirm_error_lock, flags); - rc = 1; - if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - pdn->eeh_check_count ++; - if (pdn->eeh_check_count >= EEH_MAX_FAILS) { - printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", - pdn->eeh_check_count); - dump_stack(); - - /* re-read the slot reset state */ - if (read_slot_reset_state(pdn, rets) != 0) - rets[0] = -1; /* reset state unknown */ - - /* If we are here, then we hit an infinite loop. Stop. */ - panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); - } - goto dn_unlock; - } - - /* - * Now test for an EEH failure. This is VERY expensive. - * Note that the eeh_config_addr may be a parent device - * in the case of a device behind a bridge, or it may be - * function zero of a multi-function device. - * In any case they must share a common PHB. - */ - ret = read_slot_reset_state(pdn, rets); - - /* If the call to firmware failed, punt */ - if (ret != 0) { - printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If EEH is not supported on this device, punt. */ - if (rets[1] != 1) { - printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If not the kind of error we know about, punt. */ - if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* Note that config-io to empty slots may fail; - * we recognize empty because they don't have children. */ - if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - __get_cpu_var(slot_resets)++; - - /* Avoid repeated reports of this failure, including problems - * with other functions on this device, and functions under - * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); - - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - - /* Most EEH events are due to device driver bugs. Having - * a stack trace will help the device-driver authors figure - * out what happened. So print that out. */ - if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - - return 1; - -dn_unlock: - spin_unlock_irqrestore(&confirm_error_lock, flags); - return rc; -} - -EXPORT_SYMBOL_GPL(eeh_dn_check_failure); - -/** - * eeh_check_failure - check if all 1's data is due to EEH slot freeze - * @token i/o token, should be address in the form 0xA.... - * @val value, should be all 1's (XXX why do we need this arg??) - * - * Check for an EEH failure at the given token address. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze event. This routine - * will query firmware for the EEH status. - * - * Note this routine is safe to call in an interrupt context. - */ -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) -{ - unsigned long addr; - struct pci_dev *dev; - struct device_node *dn; - - /* Finding the phys addr + pci device; this is pretty quick. */ - addr = eeh_token_to_phys((unsigned long __force) token); - dev = pci_get_device_by_addr(addr); - if (!dev) { - __get_cpu_var(no_device)++; - return val; - } - - dn = pci_device_to_OF_node(dev); - eeh_dn_check_failure (dn, dev); - - pci_dev_put(dev); - return val; -} - -EXPORT_SYMBOL(eeh_check_failure); - -struct eeh_early_enable_info { - unsigned int buid_hi; - unsigned int buid_lo; -}; - -/* Enable eeh for the given device node. */ -static void *early_enable_eeh(struct device_node *dn, void *data) -{ - struct eeh_early_enable_info *info = data; - int ret; - char *status = get_property(dn, "status", NULL); - u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); - u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); - u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); - u32 *regs; - int enable; - struct pci_dn *pdn = PCI_DN(dn); - - pdn->eeh_mode = 0; - pdn->eeh_check_count = 0; - pdn->eeh_freeze_count = 0; - - if (status && strcmp(status, "ok") != 0) - return NULL; /* ignore devices with bad status */ - - /* Ignore bad nodes. */ - if (!class_code || !vendor_id || !device_id) - return NULL; - - /* There is nothing to check on PCI to ISA bridges */ - if (dn->type && !strcmp(dn->type, "isa")) { - pdn->eeh_mode |= EEH_MODE_NOCHECK; - return NULL; - } - - /* - * Now decide if we are going to "Disable" EEH checking - * for this device. We still run with the EEH hardware active, - * but we won't be checking for ff's. This means a driver - * could return bad data (very bad!), an interrupt handler could - * hang waiting on status bits that won't change, etc. - * But there are a few cases like display devices that make sense. - */ - enable = 1; /* i.e. we will do checking */ - if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) - enable = 0; - - if (!enable) - pdn->eeh_mode |= EEH_MODE_NOCHECK; - - /* Ok... see if this device supports EEH. Some do, some don't, - * and the only way to find out is to check each and every one. */ - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - /* Try to enable eeh */ - ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); - if (ret == 0) { - eeh_subsystem_enabled = 1; - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = regs[0]; -#ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); -#endif - } else { - - /* This device doesn't support EEH, but it may have an - * EEH parent, in which case we mark it as supported. */ - if (dn->parent && PCI_DN(dn->parent) - && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - /* Parent supports EEH. */ - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; - return NULL; - } - } - } else { - printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", - dn->full_name); - } - - return NULL; -} - -/* - * Initialize EEH by trying to enable it for all of the adapters in the system. - * As a side effect we can determine here if eeh is supported at all. - * Note that we leave EEH on so failed config cycles won't cause a machine - * check. If a user turns off EEH for a particular adapter they are really - * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't - * grant access to a slot if EEH isn't enabled, and so we always enable - * EEH for all slots/all devices. - * - * The eeh-force-off option disables EEH checking globally, for all slots. - * Even if force-off is set, the EEH hardware is still enabled, so that - * newer systems can boot. - */ -void __init eeh_init(void) -{ - struct device_node *phb, *np; - struct eeh_early_enable_info info; - - spin_lock_init(&confirm_error_lock); - spin_lock_init(&slot_errbuf_lock); - - np = of_find_node_by_path("/rtas"); - if (np == NULL) - return; - - ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); - ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); - ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); - ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); - ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); - - if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) - return; - - eeh_error_buf_size = rtas_token("rtas-error-log-max"); - if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { - eeh_error_buf_size = 1024; - } - if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { - printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " - "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); - eeh_error_buf_size = RTAS_ERROR_LOG_MAX; - } - - /* Enable EEH for all adapters. Note that eeh requires buid's */ - for (phb = of_find_node_by_name(NULL, "pci"); phb; - phb = of_find_node_by_name(phb, "pci")) { - unsigned long buid; - - buid = get_phb_buid(phb); - if (buid == 0 || PCI_DN(phb) == NULL) - continue; - - info.buid_lo = BUID_LO(buid); - info.buid_hi = BUID_HI(buid); - traverse_pci_devices(phb, early_enable_eeh, &info); - } - - if (eeh_subsystem_enabled) - printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - printk(KERN_WARNING "EEH: No capable adapters found\n"); -} - -/** - * eeh_add_device_early - enable EEH for the indicated device_node - * @dn: device node for which to set up EEH - * - * This routine must be used to perform EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - * This routine must be called before any i/o is performed to the - * adapter (inluding any config-space i/o). - * Whether this actually enables EEH or not for this device depends - * on the CEC architecture, type of the device, on earlier boot - * command-line arguments & etc. - */ -void eeh_add_device_early(struct device_node *dn) -{ - struct pci_controller *phb; - struct eeh_early_enable_info info; - - if (!dn || !PCI_DN(dn)) - return; - phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); - return; - } - - info.buid_hi = BUID_HI(phb->buid); - info.buid_lo = BUID_LO(phb->buid); - early_enable_eeh(dn, &info); -} -EXPORT_SYMBOL_GPL(eeh_add_device_early); - -/** - * eeh_add_device_late - perform EEH initialization for the indicated pci device - * @dev: pci device for which to set up EEH - * - * This routine must be used to complete EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - */ -void eeh_add_device_late(struct pci_dev *dev) -{ - struct device_node *dn; - - if (!dev || !eeh_subsystem_enabled) - return; - -#ifdef DEBUG - printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); -#endif - - pci_dev_get (dev); - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; - - pci_addr_cache_insert_device (dev); -} -EXPORT_SYMBOL_GPL(eeh_add_device_late); - -/** - * eeh_remove_device - undo EEH setup for the indicated pci device - * @dev: pci device to be removed - * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). - */ -void eeh_remove_device(struct pci_dev *dev) -{ - struct device_node *dn; - if (!dev || !eeh_subsystem_enabled) - return; - - /* Unregister the device with the EEH/PCI address search system */ -#ifdef DEBUG - printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); -#endif - pci_addr_cache_remove_device(dev); - - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = NULL; - pci_dev_put (dev); -} -EXPORT_SYMBOL_GPL(eeh_remove_device); - -static int proc_eeh_show(struct seq_file *m, void *v) -{ - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - - if (0 == eeh_subsystem_enabled) { - seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); - } else { - seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, - "no device=%ld\n" - "no device node=%ld\n" - "no config address=%ld\n" - "check not wanted=%ld\n" - "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); - } - - return 0; -} - -static int proc_eeh_open(struct inode *inode, struct file *file) -{ - return single_open(file, proc_eeh_show, NULL); -} - -static struct file_operations proc_eeh_operations = { - .open = proc_eeh_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int __init eeh_init_proc(void) -{ - struct proc_dir_entry *e; - - if (systemcfg->platform & PLATFORM_PSERIES) { - e = create_proc_entry("ppc64/eeh", 0, NULL); - if (e) - e->proc_fops = &proc_eeh_operations; - } - - return 0; -} -__initcall(eeh_init_proc); Index: linux-2.6.14-git3/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/Makefile 2005-11-02 14:29:22.485829789 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/Makefile 2005-11-02 14:30:49.805589414 -0600 @@ -35,7 +35,6 @@ bpa_iic.o spider-pic.o obj-$(CONFIG_KEXEC) += machine_kexec.o -obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-10-31 11:19:47.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:31:36.150092654 -0600 @@ -3,3 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o +obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:30:49.790591516 -0600 @@ -0,0 +1,1093 @@ +/* + * eeh.c + * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +/** Overview: + * EEH, or "Extended Error Handling" is a PCI bridge technology for + * dealing with PCI bus errors that can't be dealt with within the + * usual PCI framework, except by check-stopping the CPU. Systems + * that are designed for high-availability/reliability cannot afford + * to crash due to a "mere" PCI error, thus the need for EEH. + * An EEH-capable bridge operates by converting a detected error + * into a "slot freeze", taking the PCI adapter off-line, making + * the slot behave, from the OS'es point of view, as if the slot + * were "empty": all reads return 0xff's and all writes are silently + * ignored. EEH slot isolation events can be triggered by parity + * errors on the address or data busses (e.g. during posted writes), + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. + * + * Note, however, that one of the leading causes of EEH slot + * freeze events are buggy device drivers, buggy device microcode, + * or buggy device hardware. This is because any attempt by the + * device to bus-master data to a memory address that is not + * assigned to the device will trigger a slot freeze. (The idea + * is to prevent devices-gone-wild from corrupting system memory). + * Buggy hardware/drivers will have a miserable time co-existing + * with EEH. + * + * Ideally, a PCI device driver, when suspecting that an isolation + * event has occured (e.g. by reading 0xff's), will then ask EEH + * whether this is the case, and then take appropriate steps to + * reset the PCI slot, the PCI device, and then resume operations. + * However, until that day, the checking is done here, with the + * eeh_check_failure() routine embedded in the MMIO macros. If + * the slot is found to be isolated, an "EEH Event" is synthesized + * and sent out for processing. + */ + +/* EEH event workqueue setup. */ +static DEFINE_SPINLOCK(eeh_eventlist_lock); +LIST_HEAD(eeh_eventlist); +static void eeh_event_handler(void *); +DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); + +static struct notifier_block *eeh_notifier_chain; + +/* If a device driver keeps reading an MMIO register in an interrupt + * handler after a slot isolation event has occurred, we assume it + * is broken and panic. This sets the threshold for how many read + * attempts we allow before panicking. + */ +#define EEH_MAX_FAILS 100000 + +/* RTAS tokens */ +static int ibm_set_eeh_option; +static int ibm_set_slot_reset; +static int ibm_read_slot_reset_state; +static int ibm_read_slot_reset_state2; +static int ibm_slot_error_detail; + +static int eeh_subsystem_enabled; + +/* Lock to avoid races due to multiple reports of an error */ +static DEFINE_SPINLOCK(confirm_error_lock); + +/* Buffer for reporting slot-error-detail rtas calls */ +static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; +static DEFINE_SPINLOCK(slot_errbuf_lock); +static int eeh_error_buf_size; + +/* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); +static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); +static DEFINE_PER_CPU(unsigned long, false_positives); +static DEFINE_PER_CPU(unsigned long, ignored_failures); +static DEFINE_PER_CPU(unsigned long, slot_resets); + +/** + * The pci address cache subsystem. This subsystem places + * PCI device address resources into a red-black tree, sorted + * according to the address range, so that given only an i/o + * address, the corresponding PCI device can be **quickly** + * found. It is safe to perform an address lookup in an interrupt + * context; this ability is an important feature. + * + * Currently, the only customer of this code is the EEH subsystem; + * thus, this code has been somewhat tailored to suit EEH better. + * In particular, the cache does *not* hold the addresses of devices + * for which EEH is not enabled. + * + * (Implementation Note: The RB tree seems to be better/faster + * than any hash algo I could think of for this problem, even + * with the penalty of slow pointer chases for d-cache misses). + */ +struct pci_io_addr_range +{ + struct rb_node rb_node; + unsigned long addr_lo; + unsigned long addr_hi; + struct pci_dev *pcidev; + unsigned int flags; +}; + +static struct pci_io_addr_cache +{ + struct rb_root rb_root; + spinlock_t piar_lock; +} pci_io_addr_cache_root; + +static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) +{ + struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; + + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (addr < piar->addr_lo) { + n = n->rb_left; + } else { + if (addr > piar->addr_hi) { + n = n->rb_right; + } else { + pci_dev_get(piar->pcidev); + return piar->pcidev; + } + } + } + + return NULL; +} + +/** + * pci_get_device_by_addr - Get device, given only address + * @addr: mmio (PIO) phys address or i/o port number + * + * Given an mmio phys address, or a port number, find a pci device + * that implements this address. Be sure to pci_dev_put the device + * when finished. I/O port numbers are assumed to be offset + * from zero (that is, they do *not* have pci_io_addr added in). + * It is safe to call this function within an interrupt. + */ +static struct pci_dev *pci_get_device_by_addr(unsigned long addr) +{ + struct pci_dev *dev; + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + dev = __pci_get_device_by_addr(addr); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); + return dev; +} + +#ifdef DEBUG +/* + * Handy-dandy debug print routine, does nothing more + * than print out the contents of our addr cache. + */ +static void pci_addr_cache_print(struct pci_io_addr_cache *cache) +{ + struct rb_node *n; + int cnt = 0; + + n = rb_first(&cache->rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", + (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, + piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + cnt++; + n = rb_next(n); + } +} +#endif + +/* Insert address range into the rb tree. */ +static struct pci_io_addr_range * +pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, + unsigned long ahi, unsigned int flags) +{ + struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; + struct rb_node *parent = NULL; + struct pci_io_addr_range *piar; + + /* Walk tree, find a place to insert into tree */ + while (*p) { + parent = *p; + piar = rb_entry(parent, struct pci_io_addr_range, rb_node); + if (ahi < piar->addr_lo) { + p = &parent->rb_left; + } else if (alo > piar->addr_hi) { + p = &parent->rb_right; + } else { + if (dev != piar->pcidev || + alo != piar->addr_lo || ahi != piar->addr_hi) { + printk(KERN_WARNING "PIAR: overlapping address range\n"); + } + return piar; + } + } + piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); + if (!piar) + return NULL; + + piar->addr_lo = alo; + piar->addr_hi = ahi; + piar->pcidev = dev; + piar->flags = flags; + +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + + rb_link_node(&piar->rb_node, parent, p); + rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); + + return piar; +} + +static void __pci_addr_cache_insert_device(struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_dn *pdn; + int i; + int inserted = 0; + + dn = pci_device_to_OF_node(dev); + if (!dn) { + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); + return; + } + + /* Skip any devices for which EEH is not enabled. */ + pdn = PCI_DN(dn); + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { +#ifdef DEBUG + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); +#endif + return; + } + + /* The cache holds a reference to the device... */ + pci_dev_get(dev); + + /* Walk resources on this device, poke them into the tree */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long start = pci_resource_start(dev,i); + unsigned long end = pci_resource_end(dev,i); + unsigned int flags = pci_resource_flags(dev,i); + + /* We are interested only bus addresses, not dma or other stuff */ + if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if (start == 0 || ~start == 0 || end == 0 || ~end == 0) + continue; + pci_addr_cache_insert(dev, start, end, flags); + inserted = 1; + } + + /* If there was nothing to add, the cache has no reference... */ + if (!inserted) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_insert_device - Add a device to the address cache + * @dev: PCI device whose I/O addresses we are interested in. + * + * In order to support the fast lookup of devices based on addresses, + * we maintain a cache of devices that can be quickly searched. + * This routine adds a device to that cache. + */ +static void pci_addr_cache_insert_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_insert_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) +{ + struct rb_node *n; + int removed = 0; + +restart: + n = rb_first(&pci_io_addr_cache_root.rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (piar->pcidev == dev) { + rb_erase(n, &pci_io_addr_cache_root.rb_root); + removed = 1; + kfree(piar); + goto restart; + } + n = rb_next(n); + } + + /* The cache no longer holds its reference to this device... */ + if (removed) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_remove_device - remove pci device from addr cache + * @dev: device to remove + * + * Remove a device from the addr-cache tree. + * This is potentially expensive, since it will walk + * the tree multiple times (once per resource). + * But so what; device removal doesn't need to be that fast. + */ +static void pci_addr_cache_remove_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_remove_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +/** + * pci_addr_cache_build - Build a cache of I/O addresses + * + * Build a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + * This routine scans all pci busses to build the cache. + * Must be run late in boot process, after the pci controllers + * have been scaned for devices (after all device resources are known). + */ +void __init pci_addr_cache_build(void) +{ + struct pci_dev *dev = NULL; + + if (!eeh_subsystem_enabled) + return; + + spin_lock_init(&pci_io_addr_cache_root.piar_lock); + + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { + /* Ignore PCI bridges ( XXX why ??) */ + if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { + continue; + } + pci_addr_cache_insert_device(dev); + } + +#ifdef DEBUG + /* Verify tree built up above, echo back the list of addrs. */ + pci_addr_cache_print(&pci_io_addr_cache_root); +#endif +} + +/* --------------------------------------------------------------- */ +/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ + +void eeh_slot_error_detail (struct pci_dn *pdn, int severity) +{ + unsigned long flags; + int rc; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +/** + * eeh_register_notifier - Register to find out about EEH events. + * @nb: notifier block to callback on events + */ +int eeh_register_notifier(struct notifier_block *nb) +{ + return notifier_chain_register(&eeh_notifier_chain, nb); +} + +/** + * eeh_unregister_notifier - Unregister to an EEH event notifier. + * @nb: notifier block to callback on events + */ +int eeh_unregister_notifier(struct notifier_block *nb) +{ + return notifier_chain_unregister(&eeh_notifier_chain, nb); +} + +/** + * read_slot_reset_state - Read the reset state of a device node's slot + * @dn: device node to read + * @rets: array to return results in + */ +static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) +{ + int token, outputs; + + if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { + token = ibm_read_slot_reset_state2; + outputs = 4; + } else { + token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ + outputs = 3; + } + + return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); +} + +/** + * eeh_panic - call panic() for an eeh event that cannot be handled. + * The philosophy of this routine is that it is better to panic and + * halt the OS than it is to risk possible data corruption by + * oblivious device drivers that don't know better. + * + * @dev pci device that had an eeh event + * @reset_state current reset state of the device slot + */ +static void eeh_panic(struct pci_dev *dev, int reset_state) +{ + /* + * XXX We should create a separate sysctl for this. + * + * Since the panic_on_oops sysctl is used to halt the system + * in light of potential corruption, we can use it here. + */ + if (panic_on_oops) { + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); + panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, + pci_name(dev)); + } + else { + __get_cpu_var(ignored_failures)++; + printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", + reset_state, pci_name(dev)); + } +} + +/** + * eeh_event_handler - dispatch EEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static void eeh_event_handler(void *dummy) +{ + unsigned long flags; + struct eeh_event *event; + + while (1) { + spin_lock_irqsave(&eeh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&eeh_eventlist)) { + event = list_entry(eeh_eventlist.next, struct eeh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " + "%s\n", event->reset_state, + pci_name(event->dev)); + + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); + + pci_dev_put(event->dev); + kfree(event); + } +} + +/** + * eeh_token_to_phys - convert EEH address token to phys address + * @token i/o token, should be address in the form 0xA.... + */ +static inline unsigned long eeh_token_to_phys(unsigned long token) +{ + pte_t *ptep; + unsigned long pa; + + ptep = find_linux_pte(init_mm.pgd, token); + if (!ptep) + return token; + pa = pte_pfn(*ptep) << PAGE_SHIFT; + + return pa | (token & (PAGE_SIZE-1)); +} + +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + +/** Mark all devices that are peers of this device as failed. + * Mark the device driver too, so that it can see the failure + * immediately; this is critical, since some drivers poll + * status registers in interrupts ... If a driver is polling, + * and the slot is frozen, then the driver can deadlock in + * an interrupt context, which is bad. + */ + +static inline void __eeh_mark_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + + if (dn->child) + __eeh_mark_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void __eeh_clear_slot (struct device_node *dn) +{ + while (dn) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; + if (dn->child) + __eeh_clear_slot (dn->child); + dn = dn->sibling; + } +} + +static inline void eeh_clear_slot (struct device_node *dn) +{ + unsigned long flags; + spin_lock_irqsave(&confirm_error_lock, flags); + __eeh_clear_slot (dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); +} + +/** + * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze + * @dn device node + * @dev pci device, if known + * + * Check for an EEH failure for the given device node. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze. This routine + * will query firmware for the EEH status. + * + * Returns 0 if there has not been an EEH error; otherwise returns + * a non-zero value and queues up a slot isolation event notification. + * + * It is safe to call this routine in an interrupt context. + */ +int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) +{ + int ret; + int rets[3]; + unsigned long flags; + int reset_state; + struct eeh_event *event; + struct pci_dn *pdn; + struct device_node *pe_dn; + int rc = 0; + + __get_cpu_var(total_mmio_ffs)++; + + if (!eeh_subsystem_enabled) + return 0; + + if (!dn) { + __get_cpu_var(no_dn)++; + return 0; + } + pdn = PCI_DN(dn); + + /* Access to IO BARs might get this far and still not want checking. */ + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; +#ifdef DEBUG + printk ("EEH:ignored check (%x) for %s %s\n", + pdn->eeh_mode, pci_name (dev), dn->full_name); +#endif + return 0; + } + + if (!pdn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; + return 0; + } + + /* If we already have a pending isolation event for this + * slot, we know it's bad already, we don't need to check. + * Do this checking under a lock; as multiple PCI devices + * in one slot might report errors simultaneously, and we + * only want one error recovery routine running. + */ + spin_lock_irqsave(&confirm_error_lock, flags); + rc = 1; + if (pdn->eeh_mode & EEH_MODE_ISOLATED) { + pdn->eeh_check_count ++; + if (pdn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + pdn->eeh_check_count); + dump_stack(); + + /* re-read the slot reset state */ + if (read_slot_reset_state(pdn, rets) != 0) + rets[0] = -1; /* reset state unknown */ + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); + } + goto dn_unlock; + } + + /* + * Now test for an EEH failure. This is VERY expensive. + * Note that the eeh_config_addr may be a parent device + * in the case of a device behind a bridge, or it may be + * function zero of a multi-function device. + * In any case they must share a common PHB. + */ + ret = read_slot_reset_state(pdn, rets); + + /* If the call to firmware failed, punt */ + if (ret != 0) { + printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If EEH is not supported on this device, punt. */ + if (rets[1] != 1) { + printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", + ret, dn->full_name); + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* If not the kind of error we know about, punt. */ + if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + /* Note that config-io to empty slots may fail; + * we recognize empty because they don't have children. */ + if ((rets[0] == 5) && (dn->child == NULL)) { + __get_cpu_var(false_positives)++; + rc = 0; + goto dn_unlock; + } + + __get_cpu_var(slot_resets)++; + + /* Avoid repeated reports of this failure, including problems + * with other functions on this device, and functions under + * bridges. */ + pe_dn = find_device_pe (dn); + __eeh_mark_slot (pe_dn); + spin_unlock_irqrestore(&confirm_error_lock, flags); + + reset_state = rets[0]; + + eeh_slot_error_detail (pdn, 1 /* Temporary Error */); + + printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", + rets[0], dn->name, dn->full_name); + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + eeh_panic(dev, reset_state); + return 1; + } + + event->dev = dev; + event->dn = dn; + event->reset_state = reset_state; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&eeh_eventlist_lock, flags); + list_add(&event->list, &eeh_eventlist); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + + /* Most EEH events are due to device driver bugs. Having + * a stack trace will help the device-driver authors figure + * out what happened. So print that out. */ + if (rets[0] != 5) dump_stack(); + schedule_work(&eeh_event_wq); + + return 1; + +dn_unlock: + spin_unlock_irqrestore(&confirm_error_lock, flags); + return rc; +} + +EXPORT_SYMBOL_GPL(eeh_dn_check_failure); + +/** + * eeh_check_failure - check if all 1's data is due to EEH slot freeze + * @token i/o token, should be address in the form 0xA.... + * @val value, should be all 1's (XXX why do we need this arg??) + * + * Check for an EEH failure at the given token address. Call this + * routine if the result of a read was all 0xff's and you want to + * find out if this is due to an EEH slot freeze event. This routine + * will query firmware for the EEH status. + * + * Note this routine is safe to call in an interrupt context. + */ +unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) +{ + unsigned long addr; + struct pci_dev *dev; + struct device_node *dn; + + /* Finding the phys addr + pci device; this is pretty quick. */ + addr = eeh_token_to_phys((unsigned long __force) token); + dev = pci_get_device_by_addr(addr); + if (!dev) { + __get_cpu_var(no_device)++; + return val; + } + + dn = pci_device_to_OF_node(dev); + eeh_dn_check_failure (dn, dev); + + pci_dev_put(dev); + return val; +} + +EXPORT_SYMBOL(eeh_check_failure); + +struct eeh_early_enable_info { + unsigned int buid_hi; + unsigned int buid_lo; +}; + +/* Enable eeh for the given device node. */ +static void *early_enable_eeh(struct device_node *dn, void *data) +{ + struct eeh_early_enable_info *info = data; + int ret; + char *status = get_property(dn, "status", NULL); + u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); + u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); + u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); + u32 *regs; + int enable; + struct pci_dn *pdn = PCI_DN(dn); + + pdn->eeh_mode = 0; + pdn->eeh_check_count = 0; + pdn->eeh_freeze_count = 0; + + if (status && strcmp(status, "ok") != 0) + return NULL; /* ignore devices with bad status */ + + /* Ignore bad nodes. */ + if (!class_code || !vendor_id || !device_id) + return NULL; + + /* There is nothing to check on PCI to ISA bridges */ + if (dn->type && !strcmp(dn->type, "isa")) { + pdn->eeh_mode |= EEH_MODE_NOCHECK; + return NULL; + } + + /* + * Now decide if we are going to "Disable" EEH checking + * for this device. We still run with the EEH hardware active, + * but we won't be checking for ff's. This means a driver + * could return bad data (very bad!), an interrupt handler could + * hang waiting on status bits that won't change, etc. + * But there are a few cases like display devices that make sense. + */ + enable = 1; /* i.e. we will do checking */ + if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) + enable = 0; + + if (!enable) + pdn->eeh_mode |= EEH_MODE_NOCHECK; + + /* Ok... see if this device supports EEH. Some do, some don't, + * and the only way to find out is to check each and every one. */ + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + /* Try to enable eeh */ + ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, + regs[0], info->buid_hi, info->buid_lo, + EEH_ENABLE); + if (ret == 0) { + eeh_subsystem_enabled = 1; + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = regs[0]; +#ifdef DEBUG + printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); +#endif + } else { + + /* This device doesn't support EEH, but it may have an + * EEH parent, in which case we mark it as supported. */ + if (dn->parent && PCI_DN(dn->parent) + && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + /* Parent supports EEH. */ + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; + return NULL; + } + } + } else { + printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", + dn->full_name); + } + + return NULL; +} + +/* + * Initialize EEH by trying to enable it for all of the adapters in the system. + * As a side effect we can determine here if eeh is supported at all. + * Note that we leave EEH on so failed config cycles won't cause a machine + * check. If a user turns off EEH for a particular adapter they are really + * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't + * grant access to a slot if EEH isn't enabled, and so we always enable + * EEH for all slots/all devices. + * + * The eeh-force-off option disables EEH checking globally, for all slots. + * Even if force-off is set, the EEH hardware is still enabled, so that + * newer systems can boot. + */ +void __init eeh_init(void) +{ + struct device_node *phb, *np; + struct eeh_early_enable_info info; + + spin_lock_init(&confirm_error_lock); + spin_lock_init(&slot_errbuf_lock); + + np = of_find_node_by_path("/rtas"); + if (np == NULL) + return; + + ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); + ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); + ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); + ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); + ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); + + if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) + return; + + eeh_error_buf_size = rtas_token("rtas-error-log-max"); + if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { + eeh_error_buf_size = 1024; + } + if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { + printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " + "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); + eeh_error_buf_size = RTAS_ERROR_LOG_MAX; + } + + /* Enable EEH for all adapters. Note that eeh requires buid's */ + for (phb = of_find_node_by_name(NULL, "pci"); phb; + phb = of_find_node_by_name(phb, "pci")) { + unsigned long buid; + + buid = get_phb_buid(phb); + if (buid == 0 || PCI_DN(phb) == NULL) + continue; + + info.buid_lo = BUID_LO(buid); + info.buid_hi = BUID_HI(buid); + traverse_pci_devices(phb, early_enable_eeh, &info); + } + + if (eeh_subsystem_enabled) + printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); + else + printk(KERN_WARNING "EEH: No capable adapters found\n"); +} + +/** + * eeh_add_device_early - enable EEH for the indicated device_node + * @dn: device node for which to set up EEH + * + * This routine must be used to perform EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + * This routine must be called before any i/o is performed to the + * adapter (inluding any config-space i/o). + * Whether this actually enables EEH or not for this device depends + * on the CEC architecture, type of the device, on earlier boot + * command-line arguments & etc. + */ +void eeh_add_device_early(struct device_node *dn) +{ + struct pci_controller *phb; + struct eeh_early_enable_info info; + + if (!dn || !PCI_DN(dn)) + return; + phb = PCI_DN(dn)->phb; + if (NULL == phb || 0 == phb->buid) { + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); + return; + } + + info.buid_hi = BUID_HI(phb->buid); + info.buid_lo = BUID_LO(phb->buid); + early_enable_eeh(dn, &info); +} +EXPORT_SYMBOL_GPL(eeh_add_device_early); + +/** + * eeh_add_device_late - perform EEH initialization for the indicated pci device + * @dev: pci device for which to set up EEH + * + * This routine must be used to complete EEH initialization for PCI + * devices that were added after system boot (e.g. hotplug, dlpar). + */ +void eeh_add_device_late(struct pci_dev *dev) +{ + struct device_node *dn; + + if (!dev || !eeh_subsystem_enabled) + return; + +#ifdef DEBUG + printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); +#endif + + pci_dev_get (dev); + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = dev; + + pci_addr_cache_insert_device (dev); +} +EXPORT_SYMBOL_GPL(eeh_add_device_late); + +/** + * eeh_remove_device - undo EEH setup for the indicated pci device + * @dev: pci device to be removed + * + * This routine should be when a device is removed from a running + * system (e.g. by hotplug or dlpar). + */ +void eeh_remove_device(struct pci_dev *dev) +{ + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) + return; + + /* Unregister the device with the EEH/PCI address search system */ +#ifdef DEBUG + printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); +#endif + pci_addr_cache_remove_device(dev); + + dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->pcidev = NULL; + pci_dev_put (dev); +} +EXPORT_SYMBOL_GPL(eeh_remove_device); + +static int proc_eeh_show(struct seq_file *m, void *v) +{ + unsigned int cpu; + unsigned long ffs = 0, positives = 0, failures = 0; + unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; + + for_each_cpu(cpu) { + ffs += per_cpu(total_mmio_ffs, cpu); + positives += per_cpu(false_positives, cpu); + failures += per_cpu(ignored_failures, cpu); + resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); + } + + if (0 == eeh_subsystem_enabled) { + seq_printf(m, "EEH Subsystem is globally disabled\n"); + seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); + } else { + seq_printf(m, "EEH Subsystem is enabled\n"); + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" + "eeh_false_positives=%ld\n" + "eeh_ignored_failures=%ld\n" + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); + } + + return 0; +} + +static int proc_eeh_open(struct inode *inode, struct file *file) +{ + return single_open(file, proc_eeh_show, NULL); +} + +static struct file_operations proc_eeh_operations = { + .open = proc_eeh_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int __init eeh_init_proc(void) +{ + struct proc_dir_entry *e; + + if (systemcfg->platform & PLATFORM_PSERIES) { + e = create_proc_entry("ppc64/eeh", 0, NULL); + if (e) + e->proc_fops = &proc_eeh_operations; + } + + return 0; +} +__initcall(eeh_init_proc); From linas at linas.org Fri Nov 4 11:50:04 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:04 -0600 Subject: [PATCH 12/42]: ppc64: PCI error event dispatcher References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005004.GA26878@mail.gnucash.org> 12-eeh-event-dispatcher.patch ppc64: EEH Recovery dispatcher thread This patch adds a mechanism to create recovery threads when an EEH event is received. Since an EEH freeze state may be detected within an interrupt context, we need to get out of the interrupt context before starting recovery. This dispatcher does this in two steps: first, it uses a workqueue to get out, and then lanuches a kernel thread, so that the recovery routine can sleep for exteded periods without upseting the keventd. A kernel thread is created with each EEH event, rather than having one long-running daemon started at boot time. This is because it is anticipated that EEH events will be very rare (very very rare, ideally) and so its pointless to cluter the process tables with a daemon that will almost never run. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:30:49.790591516 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:32:35.713742506 -0600 @@ -19,7 +19,6 @@ #include #include -#include #include #include #include @@ -27,12 +26,12 @@ #include #include #include +#include #include #include +#include #include -#include #include -#include #undef DEBUG @@ -70,14 +69,6 @@ * and sent out for processing. */ -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - /* If a device driver keeps reading an MMIO register in an interrupt * handler after a slot isolation event has occurred, we assume it * is broken and panic. This sets the threshold for how many read @@ -421,24 +412,6 @@ } /** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** * read_slot_reset_state - Read the reset state of a device node's slot * @dn: device node to read * @rets: array to return results in @@ -461,73 +434,6 @@ } /** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** * eeh_token_to_phys - convert EEH address token to phys address * @token i/o token, should be address in the form 0xA.... */ @@ -613,8 +519,6 @@ int ret; int rets[3]; unsigned long flags; - int reset_state; - struct eeh_event *event; struct pci_dn *pdn; struct device_node *pe_dn; int rc = 0; @@ -722,33 +626,12 @@ __eeh_mark_slot (pe_dn); spin_unlock_irqrestore(&confirm_error_lock, flags); - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - + eeh_send_failure_event (dn, dev, rets[0], rets[2]); + /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - return 1; dn_unlock: @@ -793,6 +676,14 @@ EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + +#define EEH_ENABLE 1 + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -850,8 +741,9 @@ /* First register entry is addr (00BBSS00) */ /* Try to enable eeh */ ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); + regs[0], info->buid_hi, info->buid_lo, + EEH_ENABLE); + if (ret == 0) { eeh_subsystem_enabled = 1; pdn->eeh_mode |= EEH_MODE_SUPPORTED; Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h 2005-11-02 14:32:35.718741805 -0600 @@ -0,0 +1,52 @@ +/* + * eeh_event.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Copyright (c) 2005 Linas Vepstas + */ + +#ifndef ASM_PPC64_EEH_EVENT_H +#define ASM_PPC64_EEH_EVENT_H + +/** EEH event -- structure holding pci controller data that describes + * a change in the isolation status of a PCI slot. A pointer + * to this struct is passed as the data pointer in a notify callback. + */ +struct eeh_event { + struct list_head list; + struct device_node *dn; /* struct device node */ + struct pci_dev *dev; /* affected device */ + int state; + int time_unavail; /* milliseconds until device might be available */ +}; + +/** + * eeh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine builds a PCI error event which will be delivered + * to all listeners on the peh_notifier_chain. + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int eeh_send_failure_event (struct device_node *dn, + struct pci_dev *dev, + int reset_state, + int time_unavail); + +#endif /* ASM_PPC64_EEH_EVENT_H */ Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:29:21.496968403 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:32:35.725740824 -0600 @@ -1,4 +1,4 @@ -/* +/* * eeh.h * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation. * @@ -6,12 +6,12 @@ * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. - * + * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. - * + * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA @@ -27,8 +27,6 @@ struct pci_dev; struct device_node; -struct device_node; -struct notifier_block; #ifdef CONFIG_EEH @@ -37,6 +35,10 @@ #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -59,36 +61,14 @@ * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). + * This routine should be called when a device is removed from + * a running system (e.g. by hotplug or dlpar). It unregisters + * the PCI device from the EEH subsystem. I/O errors affecting + * this device will no longer be detected after this call; thus, + * i/o errors affecting this slot may leave this device unusable. */ void eeh_remove_device(struct pci_dev *); -#define EEH_DISABLE 0 -#define EEH_ENABLE 1 -#define EEH_RELEASE_LOADSTORE 2 -#define EEH_RELEASE_DMA 3 - -/** - * Notifier event flags. - */ -#define EEH_NOTIFY_FREEZE 1 - -/** EEH event -- structure holding pci slot data that describes - * a change in the isolation status of a PCI slot. A pointer - * to this struct is passed as the data pointer in a notify callback. - */ -struct eeh_event { - struct list_head list; - struct pci_dev *dev; - struct device_node *dn; - int reset_state; -}; - -/** Register to find out about EEH events. */ -int eeh_register_notifier(struct notifier_block *nb); -int eeh_unregister_notifier(struct notifier_block *nb); - /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. * @@ -129,7 +109,7 @@ #define EEH_IO_ERROR_VALUE(size) (-1UL) #endif /* CONFIG_EEH */ -/* +/* * MMIO read/write operations with EEH support. */ static inline u8 eeh_readb(const volatile void __iomem *addr) Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:32:35.731739983 -0600 @@ -0,0 +1,155 @@ +/* + * eeh_event.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Copyright (c) 2005 Linas Vepstas + */ + +#include +#include +#include + +/** Overview: + * EEH error states may be detected within exception handlers; + * however, the recovery processing needs to occur asynchronously + * in a normal kernel context and not an interrupt context. + * This pair of routines creates an event and queues it onto a + * work-queue, where a worker thread can drive recovery. + */ + +/* EEH event workqueue setup. */ +static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED; +LIST_HEAD(eeh_eventlist); +static void eeh_thread_launcher(void *); +DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL); + +/** + * eeh_panic - call panic() for an eeh event that cannot be handled. + * The philosophy of this routine is that it is better to panic and + * halt the OS than it is to risk possible data corruption by + * oblivious device drivers that don't know better. + * + * @dev pci device that had an eeh event + * @reset_state current reset state of the device slot + */ +static void eeh_panic(struct pci_dev *dev, int reset_state) +{ + /* + * Since the panic_on_oops sysctl is used to halt the system + * in light of potential corruption, we can use it here. + */ + if (panic_on_oops) { + panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, + pci_name(dev)); + } + else { + printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", + reset_state, pci_name(dev)); + } +} + +/** + * eeh_event_handler - dispatch EEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static int eeh_event_handler(void * dummy) +{ + unsigned long flags; + struct eeh_event *event; + + daemonize ("eehd"); + + while (1) { + set_current_state(TASK_INTERRUPTIBLE); + + spin_lock_irqsave(&eeh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&eeh_eventlist)) { + event = list_entry(eeh_eventlist.next, struct eeh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n", + pci_name(event->dev)); + + eeh_panic (event->dev, event->state); + + kfree(event); + } + + return 0; +} + +/** + * eeh_thread_launcher + * + * @dummy - unused + */ +static void eeh_thread_launcher(void *dummy) +{ + if (kernel_thread(eeh_event_handler, NULL, CLONE_KERNEL) < 0) + printk(KERN_ERR "Failed to start EEH daemon\n"); +} + +/** + * eeh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int eeh_send_failure_event (struct device_node *dn, + struct pci_dev *dev, + int state, + int time_unavail) +{ + unsigned long flags; + struct eeh_event *event; + + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + printk (KERN_ERR "EEH: out of memory, event not handled\n"); + return 1; + } + + if (dev) + pci_dev_get(dev); + + event->dn = dn; + event->dev = dev; + event->state = state; + event->time_unavail = time_unavail; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&eeh_eventlist_lock, flags); + list_add(&event->list, &eeh_eventlist); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + + schedule_work(&eeh_event_wq); + + return 0; +} + +/********************** END OF FILE ******************************/ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:31:36.150092654 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:32:55.306995693 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o +obj-$(CONFIG_EEH) += eeh.o eeh_event.o From linas at linas.org Fri Nov 4 11:50:10 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:10 -0600 Subject: [PATCH 13/42]: ppc64: PCI reset support routines References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005010.GA26901@mail.gnucash.org> 13-eeh-recovery-support-routines.patch EEH Recovery support routines This patch adds routines required to help drive the recovery of EEH-frozen slots. The main function is to drive the PCI #RST signal line high for a qurter of a second, and then allow for a second & a half of settle time. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:29:20.596094683 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:33:42.083437903 -0600 @@ -51,4 +51,18 @@ extern unsigned long pci_assign_all_buses; extern int pci_read_irq_line(struct pci_dev *pci_dev); +/* ---- EEH internal-use-only related routines ---- */ +#ifdef CONFIG_EEH +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct pci_dn *); + +#endif + #endif /* _ASM_POWERPC_PPC_PCI_H */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:32:35.713742506 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:33:42.096436081 -0600 @@ -17,6 +17,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#include #include #include #include @@ -677,6 +678,104 @@ EXPORT_SYMBOL(eeh_check_failure); /* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +/** Return negative value if a permanent error, else return + * a number of milliseconds to wait until the PCI slot is + * ready to be used. + */ +static int +eeh_slot_availability(struct pci_dn *pdn) +{ + int rc; + int rets[3]; + + rc = read_slot_reset_state(pdn, rets); + + if (rc) return rc; + + if (rets[1] == 0) return -1; /* EEH is not supported */ + if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 5) { + if (rets[2] == 0) return -1; /* permanently unavailable */ + return rets[2]; /* number of millisecs to wait */ + } + return -1; +} + +/** rtas_pci_slot_reset raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + * + * Clear the EEH-frozen condition on a slot. This routine + * asserts the PCI #RST line if the 'state' argument is '1', + * and drops the #RST line if 'state is '0'. This routine is + * safe to call in an interrupt context. + * + */ + +static void +rtas_pci_slot_reset(struct pci_dn *pdn, int state) +{ + int rc; + + BUG_ON (pdn==NULL); + + if (!pdn->phb) { + printk (KERN_WARNING "EEH: in slot reset, device node %s has no phb\n", + pdn->node->full_name); + return; + } + + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, + pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid), + state); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot, (%d) #RST=%d dn=%s\n", + rc, state, pdn->node->full_name); + return; + } + + if (state == 0) + eeh_clear_slot (pdn->node->parent->child); +} + +/** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second + * dn -- device node to be reset. + */ + +void +rtas_set_slot_reset(struct pci_dn *pdn) +{ + int i, rc; + + rtas_pci_slot_reset (pdn, 1); + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. */ + +#define PCI_BUS_RST_HOLD_TIME_MSEC 250 + msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + rtas_pci_slot_reset (pdn, 0); + + /* After a PCI slot has been reset, the PCI Express spec requires + * a 1.5 second idle time for the bus to stabilize, before starting + * up traffic. */ +#define PCI_BUS_SETTLE_TIME_MSEC 1800 + msleep (PCI_BUS_SETTLE_TIME_MSEC); + + /* Now double check with the firmware to make sure the device is + * ready to be used; if not, wait for recovery. */ + for (i=0; i<10; i++) { + rc = eeh_slot_availability (pdn); + if (rc <= 0) break; + + msleep (rc+100); + } +} + +/* ------------------------------------------------------------- */ /* The code below deals with enabling EEH for devices during the * early boot sequence. EEH must be enabled before any PCI probing * can be done. From linas at linas.org Fri Nov 4 11:50:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:17 -0600 Subject: [PATCH 14/42]: ppc64: Save & restore of PCI device BARS References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005017.GA26911@mail.gnucash.org> 14-eeh-device-bar-save.patch After a PCI device has been resest, the device BAR's and other config space info must be restored to the same state as they were in when the firmware first handed us this device. This will allow the PCI device driver, when restarted, to correctly recognize and set up the device. Tis patch saves the device config space as early as reasonable after the firmware has handed over the device. Te state resore funcion is inteded for use by the EEH recovery routines. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:33:42.096436081 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:34:19.926132452 -0600 @@ -77,6 +77,9 @@ */ #define EEH_MAX_FAILS 100000 +/* Misc forward declaraions */ +static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); + /* RTAS tokens */ static int ibm_set_eeh_option; static int ibm_set_slot_reset; @@ -366,6 +369,7 @@ */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; if (!eeh_subsystem_enabled) @@ -379,6 +383,10 @@ continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + eeh_save_bars(dev, PCI_DN(dn)); } #ifdef DEBUG @@ -775,6 +783,108 @@ } } +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct pci_dn *pdn) +{ + int i; + + if (NULL==pdn->phb) return; + for (i=4; i<10; i++) { + rtas_write_config(pdn, i*4, 4, pdn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(pdn, 12*4, 4, pdn->config_space[12]); + +#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF)) +#define SAVED_BYTE(OFF) (((u8 *)(pdn->config_space))[BYTE_SWAP(OFF)]) + + rtas_write_config (pdn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (pdn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + /* max latency, min grant, interrupt pin and line */ + rtas_write_config(pdn, 15*4, 4, pdn->config_space[15]); +} + +/** + * eeh_restore_bars - restore the PCI config space info + * + * This routine performs a recursive walk to the children + * of this device as well. + */ +void eeh_restore_bars(struct pci_dn *pdn) +{ + struct device_node *dn; + if (!pdn) + return; + + if (! pdn->eeh_is_bridge) + __restore_bars (pdn); + + dn = pdn->node->child; + while (dn) { + eeh_restore_bars (PCI_DN(dn)); + dn = dn->sibling; + } +} + +/** + * eeh_save_bars - save device bars + * + * Save the values of the device bars. Unlike the restore + * routine, this routine is *not* recursive. This is because + * PCI devices are added individuallly; but, for the restore, + * an entire slot is reset at a time. + */ +static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +{ + int i; + + if (!pdev || !pdn ) + return; + + for (i = 0; i < 16; i++) + pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]); + + if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + pdn->eeh_is_bridge = 1; +} + +void +rtas_configure_bridge(struct pci_dn *pdn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + pdn->eeh_config_addr, + BUID_HI(pdn->phb->buid), + BUID_LO(pdn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge (%d) for %s\n", + rc, pdn->node->full_name); + } +} + /* ------------------------------------------------------------- */ /* The code below deals with enabling EEH for devices during the * early boot sequence. EEH must be enabled before any PCI probing @@ -977,6 +1087,7 @@ void eeh_add_device_late(struct pci_dev *dev) { struct device_node *dn; + struct pci_dn *pdn; if (!dev || !eeh_subsystem_enabled) return; @@ -987,9 +1098,11 @@ pci_dev_get (dev); dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; + pdn = PCI_DN(dn); + pdn->pcidev = dev; pci_addr_cache_insert_device (dev); + eeh_save_bars(dev, pdn); } EXPORT_SYMBOL_GPL(eeh_add_device_late); Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:33:42.083437903 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:34:19.931131751 -0600 @@ -63,6 +63,29 @@ */ void rtas_set_slot_reset (struct pci_dn *); +/** + * eeh_restore_bars - Restore device configuration info. + * + * A reset of a PCI device will clear out its config space. + * This routines will restore the config space for this + * device, and is children, to values previously obtained + * from the firmware. + */ +void eeh_restore_bars(struct pci_dn *); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure all PCI bridges devices + * located behind the indicated node. Required after a + * pci device reset. Does essentially the same hing as + * eeh_restore_bars, but for brdges, and lets firmware + * do the work. + */ +void rtas_configure_bridge(struct pci_dn *); + +int rtas_write_config(struct pci_dn *, int where, int size, u32 val); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:50:26 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:26 -0600 Subject: [PATCH 15/42]: Documentation: PCI Error Recovery References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005026.GA26919@mail.gnucash.org> 215-pci-error-recovery_docs.patch PCI Error Recovery: documentation patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds documentation describing the current error recovery proposal. Signed-off-by: Linas Vepstas Documentation/pci-error-recovery.txt | 246 +++++++++++++++++++++++++++++++++++ MAINTAINERS | 7 2 files changed, 253 insertions(+) Index: linux-2.6.14-git3/Documentation/pci-error-recovery.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/Documentation/pci-error-recovery.txt 2005-11-02 14:34:25.663328101 -0600 @@ -0,0 +1,246 @@ + + PCI Error Recovery + ------------------ + May 31, 2005 + + Current document maintainer: + Linas Vepstas + + +Some PCI bus controllers are able to detect certain "hard" PCI errors +on the bus, such as parity errors on the data and address busses, as +well as SERR and PERR errors. These chipsets are then able to disable +I/O to/from the affected device, so that, for example, a bad DMA +address doesn't end up corrupting system memory. These same chipsets +are also able to reset the affected PCI device, and return it to +working condition. This document describes a generic API form +performing error recovery. + +The core idea is that after a PCI error has been detected, there must +be a way for the kernel to coordinate with all affected device drivers +so that the pci card can be made operational again, possibly after +performing a full electrical #RST of the PCI card. The API below +provides a generic API for device drivers to be notified of PCI +errors, and to be notified of, and respond to, a reset sequence. + +Preliminary sketch of API, cut-n-pasted-n-modified email from +Ben Herrenschmidt, circa 5 april 2005 + +The error recovery API support is exposed to the driver in the form of +a structure of function pointers pointed to by a new field in struct +pci_driver. The absence of this pointer in pci_driver denotes an +"non-aware" driver, behaviour on these is platform dependant. +Platforms like ppc64 can try to simulate pci hotplug remove/add. + +The definition of "pci_error_token" is not covered here. It is based on +Seto's work on the synchronous error detection. We still need to define +functions for extracting infos out of an opaque error token. This is +separate from this API. + +This structure has the form: + +struct pci_error_handlers +{ + int (*error_detected)(struct pci_dev *dev, pci_error_token error); + int (*mmio_enabled)(struct pci_dev *dev); + int (*resume)(struct pci_dev *dev); + int (*link_reset)(struct pci_dev *dev); + int (*slot_reset)(struct pci_dev *dev); +}; + +A driver doesn't have to implement all of these callbacks. The +only mandatory one is error_detected(). If a callback is not +implemented, the corresponding feature is considered unsupported. +For example, if mmio_enabled() and resume() aren't there, then the +driver is assumed as not doing any direct recovery and requires +a reset. If link_reset() is not implemented, the card is assumed as +not caring about link resets, in which case, if recover is supported, +the core can try recover (but not slot_reset() unless it really did +reset the slot). If slot_reset() is not supported, link_reset() can +be called instead on a slot reset. + +At first, the call will always be : + + 1) error_detected() + + Error detected. This is sent once after an error has been detected. At +this point, the device might not be accessible anymore depending on the +platform (the slot will be isolated on ppc64). The driver may already +have "noticed" the error because of a failing IO, but this is the proper +"synchronisation point", that is, it gives a chance to the driver to +cleanup, waiting for pending stuff (timers, whatever, etc...) to +complete; it can take semaphores, schedule, etc... everything but touch +the device. Within this function and after it returns, the driver +shouldn't do any new IOs. Called in task context. This is sort of a +"quiesce" point. See note about interrupts at the end of this doc. + + Result codes: + - PCIERR_RESULT_CAN_RECOVER: + Driever returns this if it thinks it might be able to recover + the HW by just banging IOs or if it wants to be given + a chance to extract some diagnostic informations (see + below). + - PCIERR_RESULT_NEED_RESET: + Driver returns this if it thinks it can't recover unless the + slot is reset. + - PCIERR_RESULT_DISCONNECT: + Return this if driver thinks it won't recover at all, + (this will detach the driver ? or just leave it + dangling ? to be decided) + +So at this point, we have called error_detected() for all drivers +on the segment that had the error. On ppc64, the slot is isolated. What +happens now typically depends on the result from the drivers. If all +drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would +re-enable IOs on the slot (or do nothing special if the platform doesn't +isolate slots) and call 2). If not and we can reset slots, we go to 4), +if neither, we have a dead slot. If it's an hotplug slot, we might +"simulate" reset by triggering HW unplug/replug though. + +>>> Current ppc64 implementation assumes that a device driver will +>>> *not* schedule or semaphore in this routine; the current ppc64 +>>> implementation uses one kernel thread to notify all devices; +>>> thus, of one device sleeps/schedules, all devices are affected. +>>> Doing better requires complex multi-threaded logic in the error +>>> recovery implementation (e.g. waiting for all notification threads +>>> to "join" before proceeding with recovery.) This seems excessively +>>> complex and not worth implementing. + +>>> The current ppc64 implementation doesn't much care if the device +>>> attempts i/o at this point, or not. I/O's will fail, returning +>>> a value of 0xff on read, and writes will be dropped. If the device +>>> driver attempts more than 10K I/O's to a frozen adapter, it will +>>> assume that the device driver has gone into an infinite loop, and +>>> it will panic the the kernel. + + 2) mmio_enabled() + + This is the "early recovery" call. IOs are allowed again, but DMA is +not (hrm... to be discussed, I prefer not), with some restrictions. This +is NOT a callback for the driver to start operations again, only to +peek/poke at the device, extract diagnostic information, if any, and +eventually do things like trigger a device local reset or some such, +but not restart operations. This is sent if all drivers on a segment +agree that they can try to recover and no automatic link reset was +performed by the HW. If the platform can't just re-enable IOs without +a slot reset or a link reset, it doesn't call this callback and goes +directly to 3) or 4). All IOs should be done _synchronously_ from +within this callback, errors triggered by them will be returned via +the normal pci_check_whatever() api, no new error_detected() callback +will be issued due to an error happening here. However, such an error +might cause IOs to be re-blocked for the whole segment, and thus +invalidate the recovery that other devices on the same segment might +have done, forcing the whole segment into one of the next states, +that is link reset or slot reset. + + Result codes: + - PCIERR_RESULT_RECOVERED + Driver returns this if it thinks the device is fully + functionnal and thinks it is ready to start + normal driver operations again. There is no + guarantee that the driver will actually be + allowed to proceed, as another driver on the + same segment might have failed and thus triggered a + slot reset on platforms that support it. + + - PCIERR_RESULT_NEED_RESET + Driver returns this if it thinks the device is not + recoverable in it's current state and it needs a slot + reset to proceed. + + - PCIERR_RESULT_DISCONNECT + Same as above. Total failure, no recovery even after + reset driver dead. (To be defined more precisely) + +>>> The current ppc64 implementation does not implement this callback. + + 3) link_reset() + + This is called after the link has been reset. This is typically +a PCI Express specific state at this point and is done whenever a +non-fatal error has been detected that can be "solved" by resetting +the link. This call informs the driver of the reset and the driver +should check if the device appears to be in working condition. +This function acts a bit like 2) mmio_enabled(), in that the driver +is not supposed to restart normal driver I/O operations right away. +Instead, it should just "probe" the device to check it's recoverability +status. If all is right, then the core will call resume() once all +drivers have ack'd link_reset(). + + Result codes: + (identical to mmio_enabled) + +>>> The current ppc64 implementation does not implement this callback. + + 4) slot_reset() + + This is called after the slot has been soft or hard reset by the +platform. A soft reset consists of asserting the adapter #RST line +and then restoring the PCI BARs and PCI configuration header. If the +platform supports PCI hotplug, then it might instead perform a hard +reset by toggling power on the slot off/on. This call gives drivers +the chance to re-initialize the hardware (re-download firmware, etc.), +but drivers shouldn't restart normal I/O processing operations at +this point. (See note about interrupts; interrupts aren't guaranteed +to be delivered until the resume() callback has been called). If all +device drivers report success on this callback, the patform will call +resume() to complete the error handling and let the driver restart +normal I/O processing. + +A driver can still return a critical failure for this function if +it can't get the device operational after reset. If the platform +previously tried a soft reset, it migh now try a hard reset (power +cycle) and then call slot_reset() again. It the device still can't +be recovered, there is nothing more that can be done; the platform +will typically report a "permanent failure" in such a case. The +device will be considered "dead" in this case. + + Result codes: + - PCIERR_RESULT_DISCONNECT + Same as above. + +>>> The current ppc64 implementation does not try a power-cycle reset +>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. + + 5) resume() + + This is called if all drivers on the segment have returned +PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. +That basically tells the driver to restart activity, tht everything +is back and running. No result code is taken into account here. If +a new error happens, it will restart a new error handling process. + +That's it. I think this covers all the possibilities. The way those +callbacks are called is platform policy. A platform with no slot reset +capability for example may want to just "ignore" drivers that can't +recover (disconnect them) and try to let other cards on the same segment +recover. Keep in mind that in most real life cases, though, there will +be only one driver per segment. + +Now, there is a note about interrupts. If you get an interrupt and your +device is dead or has been isolated, there is a problem :) + +After much thinking, I decided to leave that to the platform. That is, +the recovery API only precies that: + + - There is no guarantee that interrupt delivery can proceed from any +device on the segment starting from the error detection and until the +restart callback is sent, at which point interrupts are expected to be +fully operational. + + - There is no guarantee that interrupt delivery is stopped, that is, ad +river that gets an interrupts after detecting an error, or that detects +and error within the interrupt handler such that it prevents proper +ack'ing of the interrupt (and thus removal of the source) should just +return IRQ_NOTHANDLED. It's up to the platform to deal with taht +condition, typically by masking the irq source during the duration of +the error handling. It is expected that the platform "knows" which +interrupts are routed to error-management capable slots and can deal +with temporarily disabling that irq number during error processing (this +isn't terribly complex). That means some IRQ latency for other devices +sharing the interrupt, but there is simply no other way. High end +platforms aren't supposed to share interrupts between many devices +anyway :) + + +Revised: 31 May 2005 Linas Vepstas Index: linux-2.6.14-git3/MAINTAINERS =================================================================== --- linux-2.6.14-git3.orig/MAINTAINERS 2005-11-02 14:29:19.433257684 -0600 +++ linux-2.6.14-git3/MAINTAINERS 2005-11-02 14:34:25.700322915 -0600 @@ -1885,6 +1885,13 @@ L: linux-abi-devel at lists.sourceforge.net S: Maintained +PCI ERROR RECOVERY +P: Linas Vepstas +M: linas at austin.ibm.com +L: linux-kernel at vger.kernel.org +L: linux-pci at atrey.karlin.mff.cuni.cz +S: Supported + PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES) P: Thomas Sailer M: sailer at ife.ee.ethz.ch From linas at linas.org Fri Nov 4 11:50:35 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:35 -0600 Subject: [PATCH 16/42]: PCI: PCI Error reporting callbacks References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005035.GA26929@mail.gnucash.org> 16-pci-error-recovery_header.patch PCI Error Recovery: header file patch Various PCI bus errors can be signaled by newer PCI controllers. Recovering from those errors requires an infrastructure to notify affected device drivers of the error, and a way of walking through a reset sequence. This patch adds a set of callbacks to be used by error recovery routines to notify device drivers of the various stages of recovery. Signed-off-by: Linas Vepstas -- include/linux/pci.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 49 insertions(+) Index: linux-2.6.14-git3/include/linux/pci.h =================================================================== --- linux-2.6.14-git3.orig/include/linux/pci.h 2005-11-02 14:29:18.856338553 -0600 +++ linux-2.6.14-git3/include/linux/pci.h 2005-11-02 14:34:32.272401512 -0600 @@ -78,6 +78,16 @@ #define PCI_UNKNOWN ((pci_power_t __force) 5) #define PCI_POWER_ERROR ((pci_power_t __force) -1) +/** The pci_channel state describes connectivity between the CPU and + * the pci device. If some PCI bus between here and the pci device + * has crashed or locked up, this info is reflected here. + */ +enum pci_channel_state { + pci_channel_io_normal = 0, /* I/O channel is in normal state */ + pci_channel_io_frozen = 1, /* I/O to channel is blocked */ + pci_channel_io_perm_failure, /* PCI card is dead */ +}; + /* * The pci_dev structure is used to describe PCI devices. */ @@ -110,6 +120,7 @@ this is D0-D3, D0 being fully functional, and D3 being off. */ + enum pci_channel_state error_state; /* current connectivity state */ struct device dev; /* Generic device interface */ /* device is compatible with these IDs */ @@ -232,6 +243,43 @@ unsigned int use_driver_data:1; /* pci_driver->driver_data is used */ }; +/* ---------------------------------------------------------------- */ +/** PCI error recovery infrastructure. If a PCI device driver provides + * a set fof callbacks in struct pci_error_handlers, then that device driver + * will be notified of PCI bus errors, and will be driven to recovery + * when an error occurs. + */ + +enum pcierr_result { + PCIERR_RESULT_NONE=0, /* no result/none/not supported in device driver */ + PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */ + PCIERR_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ + PCIERR_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ + PCIERR_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ +}; + +/* PCI bus error event callbacks */ +struct pci_error_handlers +{ + /* PCI bus error detected on this device */ + int (*error_detected)(struct pci_dev *dev, + enum pci_channel_state error); + + /* MMIO has been re-enabled, but not DMA */ + int (*mmio_enabled)(struct pci_dev *dev); + + /* PCI Express link has been reset */ + int (*link_reset)(struct pci_dev *dev); + + /* PCI slot has been reset */ + int (*slot_reset)(struct pci_dev *dev); + + /* Device driver may resume normal operations */ + void (*resume)(struct pci_dev *dev); +}; + +/* ---------------------------------------------------------------- */ + struct module; struct pci_driver { struct list_head node; @@ -245,6 +293,7 @@ int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */ void (*shutdown) (struct pci_dev *dev); + struct pci_error_handlers *err_handler; struct device_driver driver; struct pci_dynids dynids; }; From linas at linas.org Fri Nov 4 11:50:48 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:50:48 -0600 Subject: [PATCH 17/42]: ppc64: mark failed devices References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005048.GA26970@mail.gnucash.org> 17-eeh-slot-marking-bug.patch A device that experiences a PCI outage may be just one deivce out of many that was affected. In order to avoid repeated reports of a failure, the entire tree of affected devices should be marked as failed. This patch marks up the entire tree. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:34:19.926132452 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:35:39.290005477 -0600 @@ -479,32 +479,47 @@ * an interrupt context, which is bad. */ -static inline void __eeh_mark_slot (struct device_node *dn) +static inline void __eeh_mark_slot (struct device_node *dn, int mode_flag) { while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; + if (PCI_DN(dn)) { + PCI_DN(dn)->eeh_mode |= mode_flag; - if (dn->child) - __eeh_mark_slot (dn->child); + if (dn->child) + __eeh_mark_slot (dn->child, mode_flag); + } dn = dn->sibling; } } -static inline void __eeh_clear_slot (struct device_node *dn) +void eeh_mark_slot (struct device_node *dn, int mode_flag) +{ + dn = find_device_pe (dn); + PCI_DN(dn)->eeh_mode |= mode_flag; + __eeh_mark_slot (dn->child, mode_flag); +} + +static inline void __eeh_clear_slot (struct device_node *dn, int mode_flag) { while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); + if (PCI_DN(dn)) { + PCI_DN(dn)->eeh_mode &= ~mode_flag; + PCI_DN(dn)->eeh_check_count = 0; + if (dn->child) + __eeh_clear_slot (dn->child, mode_flag); + } dn = dn->sibling; } } -static inline void eeh_clear_slot (struct device_node *dn) +void eeh_clear_slot (struct device_node *dn, int mode_flag) { unsigned long flags; spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); + dn = find_device_pe (dn); + PCI_DN(dn)->eeh_mode &= ~mode_flag; + PCI_DN(dn)->eeh_check_count = 0; + __eeh_clear_slot (dn->child, mode_flag); spin_unlock_irqrestore(&confirm_error_lock, flags); } @@ -529,7 +544,6 @@ int rets[3]; unsigned long flags; struct pci_dn *pdn; - struct device_node *pe_dn; int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -631,8 +645,7 @@ /* Avoid repeated reports of this failure, including problems * with other functions on this device, and functions under * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); + eeh_mark_slot (dn, EEH_MODE_ISOLATED); spin_unlock_irqrestore(&confirm_error_lock, flags); eeh_send_failure_event (dn, dev, rets[0], rets[2]); @@ -744,9 +757,6 @@ rc, state, pdn->node->full_name); return; } - - if (state == 0) - eeh_clear_slot (pdn->node->parent->child); } /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second @@ -765,6 +775,12 @@ #define PCI_BUS_RST_HOLD_TIME_MSEC 250 msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + + /* We might get hit with another EEH freeze as soon as the + * pci slot reset line is dropped. Make sure we don't miss + * these, and clear the flag now. */ + eeh_clear_slot (pdn->node, EEH_MODE_ISOLATED); + rtas_pci_slot_reset (pdn, 0); /* After a PCI slot has been reset, the PCI Express spec requires Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:34:19.931131751 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:35:39.295004776 -0600 @@ -86,6 +86,13 @@ int rtas_write_config(struct pci_dn *, int where, int size, u32 val); +/** + * mark and clear slots: find "partition endpoint" PE and set or + * clear the flags for each subnode of the PE. + */ +void eeh_mark_slot (struct device_node *dn, int mode_flag); +void eeh_clear_slot (struct device_node *dn, int mode_flag); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:51:03 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:03 -0600 Subject: [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005103.GA26983@mail.gnucash.org> 18-crash-on-pci-slot-add.patch This patch fixes a bugs related to dlpar slot add. -- Crash is due to the fact the some children of pci nodes are not pci nodes themselves, and thus do not have pci_dn structures. For example: /pci at 800000020000002/pci at 2,3/usb at 1/hub at 1 /pci at 800000020000002/pci at 2,3/usb at 1,1/hub at 1 A typical stack trace: Vector: 300 (Data Access) at [c0000000555637d0] pc: c000000000202a50: .dlpar_add_slot+0x108/0x410 c000000000202e78 .add_slot_store+0x7c/0xac c000000000202da0 .dlpar_attr_store+0x48/0x64 c0000000000f8ee4 .sysfs_write_file+0x100/0x1a0 A similar stack trace is involved for the slot remove. This code survived testing, of adding and removing different slots, 23 times each, so far, as of this writing. Signed-off-by: Linas Vepstas emailed to To: paulus at samba.org Cc: linuxppc64-dev at ozlabs.org, johnrose at linux.ibm.com, linux-kernel at vger.kernel.org Subject: [PATCH 2/2] ppc64: Crash in DLPAR code on remove operation on 4 October 2005 Index: linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.14-git6.orig/arch/ppc64/kernel/pci_dn.c 2005-11-03 14:15:40.520737607 -0600 +++ linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c 2005-11-03 14:15:45.182083115 -0600 @@ -194,7 +194,10 @@ switch (action) { case PSERIES_RECONFIG_ADD: - pci = np->parent->data; + pci = PCI_DN(np->parent); + if (!pci) + return NOTIFY_OK; + update_dn_pci_info(np, pci->phb); break; default: Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/iommu.c 2005-11-03 14:14:32.131340002 -0600 +++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c 2005-11-03 14:49:42.621970876 -0600 @@ -494,10 +494,13 @@ { int err = NOTIFY_OK; struct device_node *np = node; - struct pci_dn *pci = np->data; + struct pci_dn *pci; switch (action) { case PSERIES_RECONFIG_REMOVE: + pci = PCI_DN(np); + if (!pci) + return NOTIFY_OK; if (pci->iommu_table && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); From linas at linas.org Fri Nov 4 11:51:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:17 -0600 Subject: [PATCH 19/42]: ppc64: bugfix: crash on PHB add References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005117.GA26991@mail.gnucash.org> 19-rpaphp-crashing.patch This patch fixes a bug related to dlpar PHB add, after a PHB removal. -- The crash was due to the PHB not having a pci_dn structure yet, when the phb is being added. This code survived testing, of adding and removeig the PHB and all slots underneath it, 17 times so far, as of this writing. Signed-off-by: Linas Vepstas emailed to To: paulus at samba.org Cc: linuxppc64-dev at ozlabs.org, linux-pci at atrey.karlin.mff.cuni.cz, johnrose at linux.ibm.com, linux-kernel at vger.kernel.org Subject: [PATCH] rpaphp: PCI Hotplug crash on PHB DLPAR add on 4 October 2005 Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:29:02.115685162 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:35:52.800111285 -0600 @@ -306,7 +306,7 @@ { struct pci_controller *phb; - if (PCI_DN(dn)->phb) { + if (PCI_DN(dn) && PCI_DN(dn)->phb) { /* PHB already exists */ return -EINVAL; } From linas at linas.org Fri Nov 4 11:51:31 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:31 -0600 Subject: [PATCH 20/42]: ppc64: PCI hotplug common code elimination References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005131.GA27000@mail.gnucash.org> 20-rpaphp-eeh-cleanup.patch This patch move some code from the rpaphp directory, to the ppc64 directory, where it should have been all along (Among other things, I need it in the ppc64 directory for the PCI error recovery.) Please note that patch affects TWO maintainers: Paul, after applying the ppc64 part, please ask that GregKH appli the PCI part. It is safe to have the ppc64 part go in first. It would be bad to have the PCI part go in first. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:35:39.290005477 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:36:41.255317484 -0600 @@ -1093,6 +1093,15 @@ } EXPORT_SYMBOL_GPL(eeh_add_device_early); +void eeh_add_device_tree_early(struct device_node *dn) +{ + struct device_node *sib; + for (sib = dn->child; sib; sib = sib->sibling) + eeh_add_device_tree_early(sib); + eeh_add_device_early(dn); +} +EXPORT_SYMBOL_GPL(eeh_add_device_tree_early); + /** * eeh_add_device_late - perform EEH initialization for the indicated pci device * @dev: pci device for which to set up EEH @@ -1147,6 +1156,23 @@ } EXPORT_SYMBOL_GPL(eeh_remove_device); +void eeh_remove_bus_device(struct pci_dev *dev) +{ + eeh_remove_device(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *bus = dev->subordinate; + struct list_head *ln; + if (!bus) + return; + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev) + eeh_remove_bus_device(pdev); + } + } +} +EXPORT_SYMBOL_GPL(eeh_remove_bus_device); + static int proc_eeh_show(struct seq_file *m, void *v) { unsigned int cpu; Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:32:35.725740824 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:36:41.263316362 -0600 @@ -55,6 +55,7 @@ * to finish the eeh setup for this device. */ void eeh_add_device_early(struct device_node *); +void eeh_add_device_tree_early(struct device_node *); void eeh_add_device_late(struct pci_dev *); /** @@ -70,6 +71,15 @@ void eeh_remove_device(struct pci_dev *); /** + * eeh_remove_device_recursive - undo EEH for device & children. + * @dev: pci device to be removed + * + * As above, this removes the device; it also removes child + * pci devices as well. + */ +void eeh_remove_bus_device(struct pci_dev *); + +/** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. * * If this macro yields TRUE, the caller relays to eeh_check_failure() Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:28:58.955128188 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:41.271315241 -0600 @@ -253,17 +253,6 @@ return dev; } -static void enable_eeh(struct device_node *dn) -{ - struct device_node *sib; - - for (sib = dn->child; sib; sib = sib->sibling) - enable_eeh(sib); - eeh_add_device_early(dn); - return; - -} - static void print_slot_pci_funcs(struct pci_bus *bus) { struct device_node *dn; @@ -289,7 +278,7 @@ if (!dn) goto exit; - enable_eeh(dn); + eeh_add_device_tree_early(dn); dev = rpaphp_pci_config_slot(bus); if (!dev) { err("%s: can't find any devices.\n", __FUNCTION__); @@ -303,30 +292,12 @@ } EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); -static void rpaphp_eeh_remove_bus_device(struct pci_dev *dev) -{ - eeh_remove_device(dev); - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { - struct pci_bus *bus = dev->subordinate; - struct list_head *ln; - if (!bus) - return; - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev) - rpaphp_eeh_remove_bus_device(pdev); - } - - } - return; -} - int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) { struct pci_dev *dev, *tmp; list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { - rpaphp_eeh_remove_bus_device(dev); + eeh_remove_bus_device(dev); pci_remove_bus_device(dev); } return 0; From linas at linas.org Fri Nov 4 11:51:46 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:51:46 -0600 Subject: [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005146.GA27008@mail.gnucash.org> 21-rpaphp-eeh-cleanup.patch This patch cleans up some rpa dlpar code. Basically, the rpaphp_config_pci_adapter() was a wrapper routine, which made two calls, and wrapped a bunch of verbose no-op code around it. This was consolidated wih the routine it called. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:41.271315241 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:48.081360405 -0600 @@ -221,18 +221,21 @@ rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. *****************************************************************************/ -static struct pci_dev * -rpaphp_pci_config_slot(struct pci_bus *bus) +int +rpaphp_config_pci_adapter(struct pci_bus *bus) { struct device_node *dn = pci_bus_to_OF_node(bus); struct pci_dev *dev = NULL; + int rc = -ENODEV; int slotno; int num; dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name); if (!dn || !dn->child) - return NULL; + goto exit; + eeh_add_device_tree_early(dn); + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); /* pci_scan_slot should find all children */ @@ -243,15 +246,23 @@ } if (list_empty(&bus->devices)) { err("%s: No new device found\n", __FUNCTION__); - return NULL; + goto exit; } list_for_each_entry(dev, &bus->devices, bus_list) { if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); } - return dev; + dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name); + list_for_each_entry (dev, &bus->devices, bus_list) + dbg("\t%s\n", pci_name(dev)); + + rc = 0; +exit: + dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); + return rc; } +EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); static void print_slot_pci_funcs(struct pci_bus *bus) { @@ -268,30 +279,6 @@ return; } -int rpaphp_config_pci_adapter(struct pci_bus *bus) -{ - struct device_node *dn = pci_bus_to_OF_node(bus); - struct pci_dev *dev; - int rc = -ENODEV; - - dbg("Entry %s: slot[%s]\n", __FUNCTION__, dn->full_name); - if (!dn) - goto exit; - - eeh_add_device_tree_early(dn); - dev = rpaphp_pci_config_slot(bus); - if (!dev) { - err("%s: can't find any devices.\n", __FUNCTION__); - goto exit; - } - print_slot_pci_funcs(bus); - rc = 0; -exit: - dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); - return rc; -} -EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); - int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) { struct pci_dev *dev, *tmp; From linas at linas.org Fri Nov 4 11:52:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:01 -0600 Subject: [PATCH 22/42]: PCI: remove duplicted pci hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005201.GA27016@mail.gnucash.org> 22-rpaphp-eliminate-dupe-code.patch The RPAPHP code contains two routines that appear to be gratiuitous copies of very similar pci code. In particular, rpaphp_claim_resource ~~ pci_claim_resource rpadlpar_claim_one_bus == pcibios_claim_one_bus This patch removes the rpaphp versions of the code. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:48.081360405 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:51.785840999 -0600 @@ -62,28 +62,6 @@ } EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus); -int rpaphp_claim_resource(struct pci_dev *dev, int resource) -{ - struct resource *res = &dev->resource[resource]; - struct resource *root = pci_find_parent_resource(dev, res); - char *dtype = resource < PCI_BRIDGE_RESOURCES ? "device" : "bridge"; - int err = -EINVAL; - - if (root != NULL) { - err = request_resource(root, res); - } - - if (err) { - err("PCI: %s region %d of %s %s [%lx:%lx]\n", - root ? "Address space collision on" : - "No parent found for", - resource, dtype, pci_name(dev), res->start, res->end); - } - return err; -} - -EXPORT_SYMBOL_GPL(rpaphp_claim_resource); - static int rpaphp_get_sensor_state(struct slot *slot, int *state) { int rc; @@ -178,7 +156,7 @@ if (r->parent || !r->start || !r->flags) continue; - rpaphp_claim_resource(dev, i); + pci_claim_resource(dev, i); } } } Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:35:52.800111285 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:36:51.793839877 -0600 @@ -112,28 +112,6 @@ return NULL; } -static void rpadlpar_claim_one_bus(struct pci_bus *b) -{ - struct list_head *ld; - struct pci_bus *child_bus; - - for (ld = b->devices.next; ld != &b->devices; ld = ld->next) { - struct pci_dev *dev = pci_dev_b(ld); - int i; - - for (i = 0; i < PCI_NUM_RESOURCES; i++) { - struct resource *r = &dev->resource[i]; - - if (r->parent || !r->start || !r->flags) - continue; - rpaphp_claim_resource(dev, i); - } - } - - list_for_each_entry(child_bus, &b->children, node) - rpadlpar_claim_one_bus(child_bus); -} - static int pci_add_secondary_bus(struct device_node *dn, struct pci_dev *bridge_dev) { @@ -158,7 +136,7 @@ pcibios_fixup_bus(child); /* Claim new bus resources */ - rpadlpar_claim_one_bus(bridge_dev->bus); + pcibios_claim_one_bus(bridge_dev->bus); if (hose->last_busno < child->number) hose->last_busno = child->number; Index: linux-2.6.14-git3/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci.c 2005-11-02 14:28:57.119385510 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/pci.c 2005-11-02 14:36:51.808837774 -0600 @@ -197,7 +197,7 @@ spin_unlock(&hose_spinlock); } -static void __init pcibios_claim_one_bus(struct pci_bus *b) +void __devinit pcibios_claim_one_bus(struct pci_bus *b) { struct pci_dev *dev; struct pci_bus *child_bus; Index: linux-2.6.14-git3/include/asm-ppc64/pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci.h 2005-11-02 14:28:57.119385510 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci.h 2005-11-02 14:36:51.813837073 -0600 @@ -160,6 +160,8 @@ extern void pcibios_fixup_device_resources(struct pci_dev *dev, struct pci_bus *bus); +extern void pcibios_claim_one_bus(struct pci_bus *b); + extern struct pci_controller *init_phb_dynamic(struct device_node *dn); extern int pci_read_irq_line(struct pci_dev *dev); From linas at linas.org Fri Nov 4 11:52:16 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:16 -0600 Subject: [PATCH 23/42]: ppc64: migrate common PCI hotplug code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005216.GA27025@mail.gnucash.org> 23-rpaphp-migrate.patch This patch moves some pci device add & remove code from the PCI hotplug directory to the arch/ppc64/kernel directory, and cleans it up a tad. The primary reason for this is that the code performs some fairly generic operations that are shared with the PCI error recovery code (living in the arch/ppc64/kernel directory). Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c 2005-11-02 14:39:24.724396565 -0600 @@ -0,0 +1,174 @@ +/* + * PCI Dynamic LPAR, PCI Hot Plug and PCI EEH recovery code + * for RPA-compliant PPC64 platform. + * Copyright (C) 2003 Linda Xie + * Copyright (C) 2005 International Business Machines + * + * Updates, 2005, John Rose + * Updates, 2005, Linas Vepstas + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +static struct pci_bus * +find_bus_among_children(struct pci_bus *bus, + struct device_node *dn) +{ + struct pci_bus *child = NULL; + struct list_head *tmp; + struct device_node *busdn; + + busdn = pci_bus_to_OF_node(bus); + if (busdn == dn) + return bus; + + list_for_each(tmp, &bus->children) { + child = find_bus_among_children(pci_bus_b(tmp), dn); + if (child) + break; + }; + return child; +} + +struct pci_bus * +pcibios_find_pci_bus(struct device_node *dn) +{ + struct pci_dn *pdn = dn->data; + + if (!pdn || !pdn->phb || !pdn->phb->bus) + return NULL; + + return find_bus_among_children(pdn->phb->bus, dn); +} + +/** + * pcibios_remove_pci_devices - remove all devices under this bus + * + * Remove all of the PCI devices under this bus both from the + * linux pci device tree, and from the ppc64 EEH address cache. + */ +void +pcibios_remove_pci_devices(struct pci_bus *bus) +{ + struct pci_dev *dev, *tmp; + + list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { + eeh_remove_bus_device(dev); + pci_remove_bus_device(dev); + } +} + +/* Must be called before pci_bus_add_devices */ +static void +pcibios_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus) +{ + struct pci_dev *dev; + + list_for_each_entry(dev, &bus->devices, bus_list) { + /* + * Skip already-present devices (which are on the + * global device list.) + */ + if (list_empty(&dev->global_list)) { + int i; + + /* Need to setup IOMMU tables */ + ppc_md.iommu_dev_setup(dev); + + if(fix_bus) + pcibios_fixup_device_resources(dev, bus); + pci_read_irq_line(dev); + for (i = 0; i < PCI_NUM_RESOURCES; i++) { + struct resource *r = &dev->resource[i]; + + if (r->parent || !r->start || !r->flags) + continue; + pci_claim_resource(dev, i); + } + } + } +} + +static int +pcibios_pci_config_bridge(struct pci_dev *dev) +{ + u8 sec_busno; + struct pci_bus *child_bus; + struct pci_dev *child_dev; + + /* Get busno of downstream bus */ + pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno); + + /* Add to children of PCI bridge dev->bus */ + child_bus = pci_add_new_bus(dev->bus, dev, sec_busno); + if (!child_bus) { + printk (KERN_ERR "%s: could not add second bus\n", __FUNCTION__); + return -EIO; + } + sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number); + + pci_scan_child_bus(child_bus); + + list_for_each_entry(child_dev, &child_bus->devices, bus_list) { + eeh_add_device_late(child_dev); + } + + /* Fixup new pci devices without touching bus struct */ + pcibios_fixup_new_pci_devices(child_bus, 0); + + /* Make the discovered devices available */ + pci_bus_add_devices(child_bus); + return 0; +} + +/** + * pcibios_add_pci_devices - adds new pci devices to bus + * + * This routine will find and fixup new pci devices under + * the indicated bus. This routine presumes that there + * might already be some devices under this bridge, so + * it carefully tries to add only new devices. (And that + * is how this routine differs from other, similar pcibios + * routines.) + */ +void +pcibios_add_pci_devices(struct pci_bus * bus) +{ + int slotno, num; + struct pci_dev *dev; + struct device_node *dn = pci_bus_to_OF_node(bus); + + eeh_add_device_tree_early(dn); + + /* pci_scan_slot should find all children */ + slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); + num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); + if (num) { + pcibios_fixup_new_pci_devices(bus, 1); + pci_bus_add_devices(bus); + } + + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late (dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + pcibios_pci_config_bridge(dev); + } +} Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:36:51.785840999 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c 2005-11-02 14:39:24.730395724 -0600 @@ -32,36 +32,6 @@ #include "../pci.h" /* for pci_add_new_bus */ #include "rpaphp.h" -static struct pci_bus *find_bus_among_children(struct pci_bus *bus, - struct device_node *dn) -{ - struct pci_bus *child = NULL; - struct list_head *tmp; - struct device_node *busdn; - - busdn = pci_bus_to_OF_node(bus); - if (busdn == dn) - return bus; - - list_for_each(tmp, &bus->children) { - child = find_bus_among_children(pci_bus_b(tmp), dn); - if (child) - break; - } - return child; -} - -struct pci_bus *rpaphp_find_pci_bus(struct device_node *dn) -{ - struct pci_dn *pdn = dn->data; - - if (!pdn || !pdn->phb || !pdn->phb->bus) - return NULL; - - return find_bus_among_children(pdn->phb->bus, dn); -} -EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus); - static int rpaphp_get_sensor_state(struct slot *slot, int *state) { int rc; @@ -120,7 +90,7 @@ /* config/unconfig adapter */ *value = slot->state; } else { - bus = rpaphp_find_pci_bus(slot->dn); + bus = pcibios_find_pci_bus(slot->dn); if (bus && !list_empty(&bus->devices)) *value = CONFIGURED; else @@ -131,117 +101,6 @@ return rc; } -/* Must be called before pci_bus_add_devices */ -static void -rpaphp_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus) -{ - struct pci_dev *dev; - - list_for_each_entry(dev, &bus->devices, bus_list) { - /* - * Skip already-present devices (which are on the - * global device list.) - */ - if (list_empty(&dev->global_list)) { - int i; - - /* Need to setup IOMMU tables */ - ppc_md.iommu_dev_setup(dev); - - if(fix_bus) - pcibios_fixup_device_resources(dev, bus); - pci_read_irq_line(dev); - for (i = 0; i < PCI_NUM_RESOURCES; i++) { - struct resource *r = &dev->resource[i]; - - if (r->parent || !r->start || !r->flags) - continue; - pci_claim_resource(dev, i); - } - } - } -} - -static int rpaphp_pci_config_bridge(struct pci_dev *dev) -{ - u8 sec_busno; - struct pci_bus *child_bus; - struct pci_dev *child_dev; - - dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); - - /* get busno of downstream bus */ - pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno); - - /* add to children of PCI bridge dev->bus */ - child_bus = pci_add_new_bus(dev->bus, dev, sec_busno); - if (!child_bus) { - err("%s: could not add second bus\n", __FUNCTION__); - return -EIO; - } - sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number); - /* do pci_scan_child_bus */ - pci_scan_child_bus(child_bus); - - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ - rpaphp_fixup_new_pci_devices(child_bus, 0); - - /* Make the discovered devices available */ - pci_bus_add_devices(child_bus); - return 0; -} - -/***************************************************************************** - rpaphp_pci_config_slot() will configure all devices under the - given slot->dn and return the the first pci_dev. - *****************************************************************************/ -int -rpaphp_config_pci_adapter(struct pci_bus *bus) -{ - struct device_node *dn = pci_bus_to_OF_node(bus); - struct pci_dev *dev = NULL; - int rc = -ENODEV; - int slotno; - int num; - - dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name); - if (!dn || !dn->child) - goto exit; - - eeh_add_device_tree_early(dn); - - slotno = PCI_SLOT(PCI_DN(dn->child)->devfn); - - /* pci_scan_slot should find all children */ - num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0)); - if (num) { - rpaphp_fixup_new_pci_devices(bus, 1); - pci_bus_add_devices(bus); - } - if (list_empty(&bus->devices)) { - err("%s: No new device found\n", __FUNCTION__); - goto exit; - } - list_for_each_entry(dev, &bus->devices, bus_list) { - if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - rpaphp_pci_config_bridge(dev); - } - - dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name); - list_for_each_entry (dev, &bus->devices, bus_list) - dbg("\t%s\n", pci_name(dev)); - - rc = 0; -exit: - dbg("Exit %s: rc=%d\n", __FUNCTION__, rc); - return rc; -} -EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter); - static void print_slot_pci_funcs(struct pci_bus *bus) { struct device_node *dn; @@ -257,17 +116,6 @@ return; } -int rpaphp_unconfig_pci_adapter(struct pci_bus *bus) -{ - struct pci_dev *dev, *tmp; - - list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) { - eeh_remove_bus_device(dev); - pci_remove_bus_device(dev); - } - return 0; -} - static int setup_pci_hotplug_slot_info(struct slot *slot) { dbg("%s Initilize the PCI slot's hotplug->info structure ...\n", @@ -303,7 +151,7 @@ struct pci_bus *bus; BUG_ON(!dn); - bus = rpaphp_find_pci_bus(dn); + bus = pcibios_find_pci_bus(dn); if (!bus) { err("%s: no pci_bus for dn %s\n", __FUNCTION__, dn->full_name); goto exit_rc; @@ -328,10 +176,7 @@ if (slot->hotplug_slot->info->adapter_status == NOT_CONFIGURED) { dbg("%s CONFIGURING pci adapter in slot[%s]\n", __FUNCTION__, slot->name); - if (rpaphp_config_pci_adapter(slot->bus)) { - err("%s: CONFIG pci adapter failed\n", __FUNCTION__); - goto exit_rc; - } + pcibios_add_pci_devices(slot->bus); } else if (slot->hotplug_slot->info->adapter_status != CONFIGURED) { err("%s: slot[%s]'s adapter_status is NOT_VALID.\n", @@ -377,16 +222,10 @@ /* if slot is not empty, enable the adapter */ if (state == PRESENT) { dbg("%s : slot[%s] is occupied.\n", __FUNCTION__, slot->name); - retval = rpaphp_config_pci_adapter(slot->bus); - if (!retval) { - slot->state = CONFIGURED; - dbg("%s: PCI devices in slot[%s] has been configured\n", + pcibios_add_pci_devices(slot->bus); + slot->state = CONFIGURED; + dbg("%s: PCI devices in slot[%s] has been configured\n", __FUNCTION__, slot->name); - } else { - slot->state = NOT_CONFIGURED; - dbg("%s: no pci_dev struct for adapter in slot[%s]\n", - __FUNCTION__, slot->name); - } } else if (state == EMPTY) { dbg("%s : slot[%s] is empty\n", __FUNCTION__, slot->name); slot->state = EMPTY; Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:36:51.793839877 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c 2005-11-02 14:39:24.737394743 -0600 @@ -197,9 +197,8 @@ static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn) { struct pci_dev *dev; - int rc; - if (rpaphp_find_pci_bus(dn)) + if (pcibios_find_pci_bus(dn)) return -EINVAL; /* Add pci bus */ @@ -211,12 +210,7 @@ } if (dn->child) { - rc = rpaphp_config_pci_adapter(dev->subordinate); - if (rc < 0) { - printk(KERN_ERR "%s: unable to enable slot %s\n", - __FUNCTION__, drc_name); - return -EIO; - } + pcibios_add_pci_devices(dev->subordinate); } /* Add hotplug slot */ @@ -255,7 +249,7 @@ struct pci_dn *pdn; int rc = 0; - if (!rpaphp_find_pci_bus(dn)) + if (!pcibios_find_pci_bus(dn)) return -EINVAL; slot = find_slot(dn); @@ -400,7 +394,7 @@ struct pci_bus *bus; struct slot *slot; - bus = rpaphp_find_pci_bus(dn); + bus = pcibios_find_pci_bus(dn); if (!bus) return -EINVAL; Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_core.c 2005-11-02 14:28:55.984544585 -0600 +++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c 2005-11-02 14:39:24.744393761 -0600 @@ -426,7 +426,8 @@ dbg("DISABLING SLOT %s\n", slot->name); down(&rpaphp_sem); - retval = rpaphp_unconfig_pci_adapter(slot->bus); + pcibios_remove_pci_devices(slot->bus); + retval = 0; up(&rpaphp_sem); slot->state = NOT_CONFIGURED; info("%s: devices in slot[%s] unconfigured.\n", __FUNCTION__, Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:32:55.306995693 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:40:05.531674439 -0600 @@ -1,6 +1,6 @@ obj-y := pci.o lpar.o hvCall.o nvram.o reconfig.o \ - setup.o iommu.o rtas-fw.o ras.o + setup.o iommu.o rtas-fw.o ras.o pci_dlpar.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_event.o Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:28:55.984544585 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 14:39:24.755392219 -0600 @@ -121,9 +121,18 @@ return bus->sysdata; /* Must be root bus (PHB) */ } +/** Find the bus corresponding to the indicated device node */ +struct pci_bus * pcibios_find_pci_bus(struct device_node *dn); + extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, struct device_node *dev, int primary); +/** Remove all of the PCI devices under this bus */ +void pcibios_remove_pci_devices(struct pci_bus *bus); + +/** Discover new pci devices under this bus, and add them */ +void pcibios_add_pci_devices(struct pci_bus * bus); + extern int pcibios_remove_root_bus(struct pci_controller *phb); extern void phbs_remap_io(void); From linas at linas.org Fri Nov 4 11:52:49 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:52:49 -0600 Subject: [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005249.GA27034@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:36:41.255317484 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:41:18.427452474 -0600 @@ -485,6 +485,11 @@ if (PCI_DN(dn)) { PCI_DN(dn)->eeh_mode |= mode_flag; + /* Mark the pci device driver too */ + struct pci_dev *dev = PCI_DN(dn)->pcidev; + if (dev && dev->driver) + dev->error_state = pci_channel_io_frozen; + if (dn->child) __eeh_mark_slot (dn->child, mode_flag); } @@ -544,6 +549,7 @@ int rets[3]; unsigned long flags; struct pci_dn *pdn; + enum pci_channel_state state; int rc = 0; __get_cpu_var(total_mmio_ffs)++; @@ -648,8 +654,13 @@ eeh_mark_slot (dn, EEH_MODE_ISOLATED); spin_unlock_irqrestore(&confirm_error_lock, flags); - eeh_send_failure_event (dn, dev, rets[0], rets[2]); - + state = pci_channel_io_normal; + if ((rets[0] == 2) || (rets[0] == 4)) + state = pci_channel_io_frozen; + if (rets[0] == 5) + state = pci_channel_io_perm_failure; + eeh_send_failure_event (dn, dev, state, rets[2]); + /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ @@ -953,8 +964,10 @@ * But there are a few cases like display devices that make sense. */ enable = 1; /* i.e. we will do checking */ +#if 0 if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) enable = 0; +#endif if (!enable) pdn->eeh_mode |= EEH_MODE_NOCHECK; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:41:18.435451353 -0600 @@ -0,0 +1,366 @@ +/* + * PCI Error Recovery Driver for RPA-compliant PPC64 platform. + * Copyright (C) 2004, 2005 Linas Vepstas + * + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or + * NON INFRINGEMENT. See the GNU General Public License for more + * details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + * + * Send feedback to + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +static inline const char * pcid_name (struct pci_dev *pdev) +{ + if (pdev->dev.driver) + return pdev->dev.driver->name; + return ""; +} + +/** + * Return the "partitionable endpoint" (pe) under which this device lies + */ +static struct device_node * find_device_pe(struct device_node *dn) +{ + while ((dn->parent) && PCI_DN(dn->parent) && + (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { + dn = dn->parent; + } + return dn; +} + + +#ifdef DEBUG +static void print_device_node_tree (struct pci_dn *pdn, int dent) +{ + int i; + if (!pdn) return; + for (i=0;inode->name, pdn->eeh_mode, pdn->eeh_config_addr, + pdn->eeh_pe_config_addr, pdn->node->full_name); + dent += 3; + struct device_node *pc = pdn->node->child; + while (pc) { + print_device_node_tree(PCI_DN(pc), dent); + pc = pc->sibling; + } +} +#endif + +/** + * irq_in_use - return true if this irq is being used + */ +static int irq_in_use(unsigned int irq) +{ + int rc = 0; + unsigned long flags; + struct irq_desc *desc = irq_desc + irq; + + spin_lock_irqsave(&desc->lock, flags); + if (desc->action) + rc = 1; + spin_unlock_irqrestore(&desc->lock, flags); + return rc; +} + +/* ------------------------------------------------------- */ +/** eeh_report_error - report an EEH error to each device, + * collect up and merge the device responses. + */ + +static void eeh_report_error(struct pci_dev *dev, void *userdata) +{ + enum pcierr_result rc, *res = userdata; + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_frozen; + + if (!driver) + return; + + if (irq_in_use (dev->irq)) { + struct device_node *dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->error_detected) + return; + + rc = driver->err_handler->error_detected (dev, pci_channel_io_frozen); + if (*res == PCIERR_RESULT_NONE) *res = rc; + if (*res == PCIERR_RESULT_NEED_RESET) return; + if (*res == PCIERR_RESULT_DISCONNECT && + rc == PCIERR_RESULT_NEED_RESET) *res = rc; +} + +/** eeh_report_reset -- tell this device that the pci slot + * has been reset. + */ + +static void eeh_report_reset(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + struct device_node *dn = pci_device_to_OF_node(dev); + + if (!driver) + return; + + if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) { + PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED; + enable_irq(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->slot_reset) + return; + + driver->err_handler->slot_reset(dev); +} + +static void eeh_report_resume(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_normal; + + if (!driver) + return; + if (!driver->err_handler) + return; + if (!driver->err_handler->resume) + return; + + driver->err_handler->resume(dev); +} + +static void eeh_report_failure(struct pci_dev *dev, void *userdata) +{ + struct pci_driver *driver = dev->driver; + + dev->error_state = pci_channel_io_perm_failure; + + if (!driver) + return; + + if (irq_in_use (dev->irq)) { + struct device_node *dn = pci_device_to_OF_node(dev); + PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED; + disable_irq_nosync(dev->irq); + } + if (!driver->err_handler) + return; + if (!driver->err_handler->error_detected) + return; + driver->err_handler->error_detected(dev, pci_channel_io_perm_failure); +} + +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ + +/** + * eeh_reset_device() -- perform actual reset of a pci slot + * Args: bus: pointer to the pci bus structure corresponding + * to the isolated slot. A non-null value will + * cause all devices under the bus to be removed + * and then re-added. + * pe_dn: pointer to a "Partionable Endpoint" device node. + * This is the top-level structure on which pci + * bus resets can be performed. + */ + +static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +{ + if (bus) + pcibios_remove_pci_devices(bus); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset(pe_dn); + + /* Walk over all functions on this device */ + rtas_configure_bridge(pe_dn); + eeh_restore_bars(pe_dn); + + /* Give the system 5 seconds to finish running the user-space + * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, + * this is a hack, but if we don't do this, and try to bring + * the device up before the scripts have taken it down, + * potentially weird things happen. + */ + if (bus) { + ssleep (5); + pcibios_add_pci_devices(bus); + } +} + +/* The longest amount of time to wait for a pci device + * to come back on line, in seconds. + */ +#define MAX_WAIT_FOR_RECOVERY 15 + +void handle_eeh_events (struct eeh_event *event) +{ + struct device_node *frozen_dn; + struct pci_dn *frozen_pdn; + struct pci_bus *frozen_bus; + int perm_failure = 0; + + frozen_dn = find_device_pe(event->dn); + frozen_bus = pcibios_find_pci_bus(frozen_dn); + + if (!frozen_dn) { + printk(KERN_ERR "EEH: Error: Cannot find partition endpoint for %s\n", + pci_name(event->dev)); + return; + } + + /* There are two different styles for coming up with the PE. + * In the old style, it was the highest EEH-capable device + * which was always an EADS pci bridge. In the new style, + * there might not be any EADS bridges, and even when there are, + * the firmware marks them as "EEH incapable". So another + * two-step is needed to find the pci bus.. */ + if (!frozen_bus) + frozen_bus = pcibios_find_pci_bus (frozen_dn->parent); + + if (!frozen_bus) { + printk(KERN_ERR "EEH: Cannot find PCI bus for %s\n", + frozen_dn->full_name); + return; + } + +#if 0 + /* We may get "permanent failure" messages on empty slots. + * These are false alarms. Empty slots have no child dn. */ + if ((event->state == pci_channel_io_perm_failure) && (frozen_device == NULL)) + return; +#endif + + frozen_pdn = PCI_DN(frozen_dn); + frozen_pdn->eeh_freeze_count++; + + if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES) + perm_failure = 1; + + /* If the reset state is a '5' and the time to reset is 0 (infinity) + * or is more then 15 seconds, then mark this as a permanent failure. + */ + if ((event->state == pci_channel_io_perm_failure) && + ((event->time_unavail <= 0) || + (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) + { + perm_failure = 1; + } + + /* Log the error with the rtas logger. */ + if (perm_failure) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk(KERN_ERR + "EEH: PCI device %s - %s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev), + frozen_pdn->eeh_freeze_count); + + eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); + + /* Notify all devices that they're about to go down. */ + pci_walk_bus(frozen_bus, eeh_report_failure, 0); + + /* Shut down the device drivers for good. */ + pcibios_remove_pci_devices(frozen_bus); + return; + } + + eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */); + printk(KERN_WARNING + "EEH: This PCI device has failed %d times since last reboot: %s - %s\n", + frozen_pdn->eeh_freeze_count, + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev)); + + /* Walk the various device drivers attached to this slot through + * a reset sequence, giving each an opportunity to do what it needs + * to accomplish the reset. Each child gets a report of the + * status ... if any child can't handle the reset, then the entire + * slot is dlpar removed and added. + */ + enum pcierr_result result = PCIERR_RESULT_NONE; + pci_walk_bus(frozen_bus, eeh_report_error, &result); + + /* If all device drivers were EEH-unaware, then shut + * down all of the device drivers, and hope they + * go down willingly, without panicing the system. + */ + if (result == PCIERR_RESULT_NONE) { + eeh_reset_device(frozen_pdn, frozen_bus); + } + + /* If any device called out for a reset, then reset the slot */ + if (result == PCIERR_RESULT_NEED_RESET) { + eeh_reset_device(frozen_pdn, NULL); + pci_walk_bus(frozen_bus, eeh_report_reset, 0); + } + + /* If all devices reported they can proceed, the re-enable PIO */ + if (result == PCIERR_RESULT_CAN_RECOVER) { + /* XXX Not supported; we brute-force reset the device */ + eeh_reset_device(frozen_pdn, NULL); + pci_walk_bus(frozen_bus, eeh_report_reset, 0); + } + + /* Tell all device drivers that they can resume operations */ + pci_walk_bus(frozen_bus, eeh_report_resume, 0); +} + +/* ---------- end of file ---------- */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:32:35.731739983 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c 2005-11-02 14:41:18.440450652 -0600 @@ -21,6 +21,7 @@ #include #include #include +#include /** Overview: * EEH error states may be detected within exception handlers; @@ -37,31 +38,6 @@ DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL); /** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** * eeh_event_handler - dispatch EEH events. The detection of a frozen * slot can occur inside an interrupt, where it can be hard to do * anything about it. The goal of this routine is to pull these @@ -82,10 +58,16 @@ spin_lock_irqsave(&eeh_eventlist_lock, flags); event = NULL; + + /* Unqueue the event, get ready to process. */ if (!list_empty(&eeh_eventlist)) { event = list_entry(eeh_eventlist.next, struct eeh_event, list); list_del(&event->list); } + + if (event) + eeh_mark_slot(event->dn, EEH_MODE_RECOVERING); + spin_unlock_irqrestore(&eeh_eventlist_lock, flags); if (event == NULL) break; @@ -93,8 +75,11 @@ printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n", pci_name(event->dev)); - eeh_panic (event->dev, event->state); + handle_eeh_events(event); + + eeh_clear_slot(event->dn, EEH_MODE_RECOVERING); + pci_dev_put(event->dev); kfree(event); } @@ -122,7 +107,7 @@ */ int eeh_send_failure_event (struct device_node *dn, struct pci_dev *dev, - int state, + enum pci_channel_state state, int time_unavail) { unsigned long flags; Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/eeh_event.h 2005-11-02 14:32:35.718741805 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h 2005-11-02 14:41:18.444450091 -0600 @@ -29,7 +29,7 @@ struct list_head list; struct device_node *dn; /* struct device node */ struct pci_dev *dev; /* affected device */ - int state; + enum pci_channel_state state; /* PCI bus state for the affected device */ int time_unavail; /* milliseconds until device might be available */ }; @@ -46,7 +46,10 @@ */ int eeh_send_failure_event (struct device_node *dn, struct pci_dev *dev, - int reset_state, + enum pci_channel_state state, int time_unavail); +/* Main recovery function */ +void handle_eeh_events (struct eeh_event *); + #endif /* ASM_PPC64_EEH_EVENT_H */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:40:05.531674439 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:41:48.393250352 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_driver.o eeh_event.o Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:35:39.295004776 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:41:18.454448689 -0600 @@ -54,6 +54,15 @@ /* ---- EEH internal-use-only related routines ---- */ #ifdef CONFIG_EEH /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct pci_dn *pdn, int severity); + +/** * rtas_set_slot_reset -- unfreeze a frozen slot * * Clear the EEH-frozen condition on a slot. This routine Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h 2005-11-02 14:36:41.263316362 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/eeh.h 2005-11-02 14:41:18.461447707 -0600 @@ -31,9 +31,11 @@ #ifdef CONFIG_EEH /* Values for eeh_mode bits in device_node */ -#define EEH_MODE_SUPPORTED (1<<0) -#define EEH_MODE_NOCHECK (1<<1) -#define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_SUPPORTED (1<<0) +#define EEH_MODE_NOCHECK (1<<1) +#define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_RECOVERING (1<<3) +#define EEH_MODE_IRQ_DISABLED (1<<4) /* Max number of EEH freezes allowed before we consider the device * to be permanently disabled. */ From linas at linas.org Fri Nov 4 11:53:07 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:07 -0600 Subject: [PATCH 25/42]: ppc64: Split out PCI address cache to its own file References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005307.GA27041@mail.gnucash.org> 25-pci-address-cache.patch The core EEH files is rather large. This patch splits out a self-contained chunk of it into its own file. This is the chunk that performes the caching and lookup of pci devices based on the i/o addresses of thier resoures. This code is almos archiecture-independent and could be used by any system that wanted to find a pci device based only on the i/o address used by the device. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:41:48.393250352 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile 2005-11-02 14:42:58.323443756 -0600 @@ -3,4 +3,4 @@ obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o -obj-$(CONFIG_EEH) += eeh.o eeh_driver.o eeh_event.o +obj-$(CONFIG_EEH) += eeh.o eeh_cache.o eeh_driver.o eeh_event.o Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:41:18.427452474 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:42:38.986155538 -0600 @@ -77,9 +77,6 @@ */ #define EEH_MAX_FAILS 100000 -/* Misc forward declaraions */ -static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); - /* RTAS tokens */ static int ibm_set_eeh_option; static int ibm_set_slot_reset; @@ -107,296 +104,8 @@ static DEFINE_PER_CPU(unsigned long, ignored_failures); static DEFINE_PER_CPU(unsigned long, slot_resets); -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct device_node *dn; - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - - /* Save the BAR's; firmware doesn't restore these after EEH reset */ - dn = pci_device_to_OF_node(dev); - eeh_save_bars(dev, PCI_DN(dn)); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - /* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +/* Below lies the EEH event infrastructure */ void eeh_slot_error_detail (struct pci_dn *pdn, int severity) { @@ -880,7 +589,7 @@ * PCI devices are added individuallly; but, for the restore, * an entire slot is reset at a time. */ -static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) { int i; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:42:38.994154417 -0600 @@ -0,0 +1,317 @@ +/* + * eeh_cache.c + * PCI address cache; allows the lookup of PCI devices based on I/O address + * + * Copyright (C) 2004 Linas Vepstas IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +/** + * The pci address cache subsystem. This subsystem places + * PCI device address resources into a red-black tree, sorted + * according to the address range, so that given only an i/o + * address, the corresponding PCI device can be **quickly** + * found. It is safe to perform an address lookup in an interrupt + * context; this ability is an important feature. + * + * Currently, the only customer of this code is the EEH subsystem; + * thus, this code has been somewhat tailored to suit EEH better. + * In particular, the cache does *not* hold the addresses of devices + * for which EEH is not enabled. + * + * (Implementation Note: The RB tree seems to be better/faster + * than any hash algo I could think of for this problem, even + * with the penalty of slow pointer chases for d-cache misses). + */ +struct pci_io_addr_range +{ + struct rb_node rb_node; + unsigned long addr_lo; + unsigned long addr_hi; + struct pci_dev *pcidev; + unsigned int flags; +}; + +static struct pci_io_addr_cache +{ + struct rb_root rb_root; + spinlock_t piar_lock; +} pci_io_addr_cache_root; + +static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) +{ + struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; + + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (addr < piar->addr_lo) { + n = n->rb_left; + } else { + if (addr > piar->addr_hi) { + n = n->rb_right; + } else { + pci_dev_get(piar->pcidev); + return piar->pcidev; + } + } + } + + return NULL; +} + +/** + * pci_get_device_by_addr - Get device, given only address + * @addr: mmio (PIO) phys address or i/o port number + * + * Given an mmio phys address, or a port number, find a pci device + * that implements this address. Be sure to pci_dev_put the device + * when finished. I/O port numbers are assumed to be offset + * from zero (that is, they do *not* have pci_io_addr added in). + * It is safe to call this function within an interrupt. + */ +struct pci_dev *pci_get_device_by_addr(unsigned long addr) +{ + struct pci_dev *dev; + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + dev = __pci_get_device_by_addr(addr); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); + return dev; +} + +#ifdef DEBUG +/* + * Handy-dandy debug print routine, does nothing more + * than print out the contents of our addr cache. + */ +static void pci_addr_cache_print(struct pci_io_addr_cache *cache) +{ + struct rb_node *n; + int cnt = 0; + + n = rb_first(&cache->rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", + (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, + piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); + cnt++; + n = rb_next(n); + } +} +#endif + +/* Insert address range into the rb tree. */ +static struct pci_io_addr_range * +pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, + unsigned long ahi, unsigned int flags) +{ + struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; + struct rb_node *parent = NULL; + struct pci_io_addr_range *piar; + + /* Walk tree, find a place to insert into tree */ + while (*p) { + parent = *p; + piar = rb_entry(parent, struct pci_io_addr_range, rb_node); + if (ahi < piar->addr_lo) { + p = &parent->rb_left; + } else if (alo > piar->addr_hi) { + p = &parent->rb_right; + } else { + if (dev != piar->pcidev || + alo != piar->addr_lo || ahi != piar->addr_hi) { + printk(KERN_WARNING "PIAR: overlapping address range\n"); + } + return piar; + } + } + piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); + if (!piar) + return NULL; + + piar->addr_lo = alo; + piar->addr_hi = ahi; + piar->pcidev = dev; + piar->flags = flags; + +#ifdef DEBUG + printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif + + rb_link_node(&piar->rb_node, parent, p); + rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); + + return piar; +} + +static void __pci_addr_cache_insert_device(struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_dn *pdn; + int i; + int inserted = 0; + + dn = pci_device_to_OF_node(dev); + if (!dn) { + printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); + return; + } + + /* Skip any devices for which EEH is not enabled. */ + pdn = PCI_DN(dn); + if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || + pdn->eeh_mode & EEH_MODE_NOCHECK) { +#ifdef DEBUG + printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", + pci_name(dev), pdn->node->full_name); +#endif + return; + } + + /* The cache holds a reference to the device... */ + pci_dev_get(dev); + + /* Walk resources on this device, poke them into the tree */ + for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { + unsigned long start = pci_resource_start(dev,i); + unsigned long end = pci_resource_end(dev,i); + unsigned int flags = pci_resource_flags(dev,i); + + /* We are interested only bus addresses, not dma or other stuff */ + if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) + continue; + if (start == 0 || ~start == 0 || end == 0 || ~end == 0) + continue; + pci_addr_cache_insert(dev, start, end, flags); + inserted = 1; + } + + /* If there was nothing to add, the cache has no reference... */ + if (!inserted) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_insert_device - Add a device to the address cache + * @dev: PCI device whose I/O addresses we are interested in. + * + * In order to support the fast lookup of devices based on addresses, + * we maintain a cache of devices that can be quickly searched. + * This routine adds a device to that cache. + */ +void pci_addr_cache_insert_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_insert_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) +{ + struct rb_node *n; + int removed = 0; + +restart: + n = rb_first(&pci_io_addr_cache_root.rb_root); + while (n) { + struct pci_io_addr_range *piar; + piar = rb_entry(n, struct pci_io_addr_range, rb_node); + + if (piar->pcidev == dev) { + rb_erase(n, &pci_io_addr_cache_root.rb_root); + removed = 1; + kfree(piar); + goto restart; + } + n = rb_next(n); + } + + /* The cache no longer holds its reference to this device... */ + if (removed) + pci_dev_put(dev); +} + +/** + * pci_addr_cache_remove_device - remove pci device from addr cache + * @dev: device to remove + * + * Remove a device from the addr-cache tree. + * This is potentially expensive, since it will walk + * the tree multiple times (once per resource). + * But so what; device removal doesn't need to be that fast. + */ +void pci_addr_cache_remove_device(struct pci_dev *dev) +{ + unsigned long flags; + + spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); + __pci_addr_cache_remove_device(dev); + spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); +} + +/** + * pci_addr_cache_build - Build a cache of I/O addresses + * + * Build a cache of pci i/o addresses. This cache will be used to + * find the pci device that corresponds to a given address. + * This routine scans all pci busses to build the cache. + * Must be run late in boot process, after the pci controllers + * have been scaned for devices (after all device resources are known). + */ +void __init pci_addr_cache_build(void) +{ + struct device_node *dn; + struct pci_dev *dev = NULL; + + spin_lock_init(&pci_io_addr_cache_root.piar_lock); + + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { + /* Ignore PCI bridges */ + if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) + continue; + + pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + eeh_save_bars(dev, PCI_DN(dn)); + } + +#ifdef DEBUG + /* Verify tree built up above, echo back the list of addrs. */ + pci_addr_cache_print(&pci_io_addr_cache_root); +#endif +} + Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:41:18.454448689 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:42:38.998153856 -0600 @@ -53,6 +53,14 @@ /* ---- EEH internal-use-only related routines ---- */ #ifdef CONFIG_EEH + +void pci_addr_cache_insert_device(struct pci_dev *dev); +void pci_addr_cache_remove_device(struct pci_dev *dev); +void pci_addr_cache_build(void); +struct pci_dev *pci_get_device_by_addr(unsigned long addr); + +void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); + /** * eeh_slot_error_detail -- record and EEH error condition to the log * @severity: 1 if temporary, 2 if permanent failure. From linas at linas.org Fri Nov 4 11:53:20 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:20 -0600 Subject: [PATCH 26/42]: ppc64: Add "partion endpoint" support References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005320.GA27049@mail.gnucash.org> 26-eeh-partition-endpoint.patch New versions of firmware introduce a new method by which the "partition endpoint" (the point at which the pci bus is cut). This code adds the support for this (mandatory) new feature. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:42:38.986155538 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:43:49.212307192 -0600 @@ -83,6 +83,7 @@ static int ibm_read_slot_reset_state; static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; +static int ibm_get_config_addr_info; static int eeh_subsystem_enabled; @@ -457,6 +458,7 @@ static void rtas_pci_slot_reset(struct pci_dn *pdn, int state) { + int config_addr; int rc; BUG_ON (pdn==NULL); @@ -467,8 +469,13 @@ return; } + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, - pdn->eeh_config_addr, + config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid), state); @@ -695,8 +702,22 @@ eeh_subsystem_enabled = 1; pdn->eeh_mode |= EEH_MODE_SUPPORTED; pdn->eeh_config_addr = regs[0]; + + /* If the newer, better, ibm,get-config-addr-info is supported, + * then use that instead. */ + pdn->eeh_pe_config_addr = 0; + if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) { + unsigned int rets[2]; + ret = rtas_call (ibm_get_config_addr_info, 4, 2, rets, + pdn->eeh_config_addr, + info->buid_hi, info->buid_lo, + 0); + if (ret == 0) + pdn->eeh_pe_config_addr = rets[0]; + } #ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); + printk(KERN_DEBUG "EEH: %s: eeh enabled, config=%x pe_config=%x\n", + dn->full_name, pdn->eeh_config_addr, pdn->eeh_pe_config_addr); #endif } else { @@ -748,6 +769,7 @@ ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); + ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info"); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:39:24.755392219 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 14:43:49.218306351 -0600 @@ -63,6 +63,7 @@ int devfn; /* for pci devices */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_pe_config_addr; /* new-style partition endpoint address */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ int eeh_is_bridge; /* device is pci-to-pci bridge */ From linas at linas.org Fri Nov 4 11:53:36 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:36 -0600 Subject: [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005336.GA27057@mail.gnucash.org> 27-pci-error-recovery_IPR-driver.patch Subject: PCI Error Recovery: IPR SCSI device driver Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the IPR SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas Signed-off-by: Brian King -- Index: linux-2.6.14-git3/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/ipr.c 2005-11-02 14:28:53.284922999 -0600 +++ linux-2.6.14-git3/drivers/scsi/ipr.c 2005-11-02 14:43:52.782806465 -0600 @@ -5328,6 +5328,94 @@ shutdown_type); } +/* --------------- PCI Error Recovery infrastructure ----------- */ +/** If the PCI slot is frozen, hold off all i/o + * activity; then, as soon as the slot is available again, + * initiate an adapter reset. + */ +static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd) +{ + /* Disallow new interrupts, avoid loop */ + ipr_cmd->ioa_cfg->allow_interrupts = 0; + list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); + ipr_cmd->done = ipr_reset_ioa_job; + return IPR_RC_JOB_RETURN; +} + +/** ipr_eeh_frozen -- called when slot has experience PCI bus error. + * This routine is called to tell us that the PCI bus is down. + * Can't do anything here, except put the device driver into a + * holding pattern, waiting for the PCI bus to come back. + */ +static void ipr_eeh_frozen (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +/** ipr_eeh_slot_reset - called when pci slot has been reset. + * + * This routine is called by the pci error recovery recovery + * code after the PCI slot has been reset, just before we + * should resume normal operations. + */ +static int ipr_eeh_slot_reset(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + // pci_enable_device(pdev); + // pci_set_master(pdev); + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, + IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); + + return PCIERR_RESULT_RECOVERED; +} + +/** This routine is called when the PCI bus has permanently + * failed. This routine should purge all pending I/O and + * shut down the device driver (close and unload). + */ +static void ipr_eeh_perm_failure(struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + if (ioa_cfg->sdt_state == WAIT_FOR_DUMP) + ioa_cfg->sdt_state = ABORT_DUMP; + ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES; + ioa_cfg->in_ioa_bringdown = 1; + ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static int ipr_eeh_error_detected(struct pci_dev *pdev, + enum pci_channel_state state) +{ + switch (state) { + case pci_channel_io_frozen: + ipr_eeh_frozen (pdev); + return PCIERR_RESULT_NEED_RESET; + + case pci_channel_io_perm_failure: + ipr_eeh_perm_failure (pdev); + return PCIERR_RESULT_DISCONNECT; + break; + default: + break; + } + return PCIERR_RESULT_NEED_RESET; +} + +/* ------------- end of PCI Error Recovery suport ----------- */ + /** * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..) * @ioa_cfg: ioa cfg struct @@ -6065,12 +6153,18 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +static struct pci_error_handlers ipr_err_handler = { + .error_detected = ipr_eeh_error_detected, + .slot_reset = ipr_eeh_slot_reset, +}; + static struct pci_driver ipr_driver = { .name = IPR_NAME, .id_table = ipr_pci_table, .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, + .err_handler = &ipr_err_handler, }; /** From linas at linas.org Fri Nov 4 11:53:46 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:46 -0600 Subject: [PATCH 28/42]: SCSI: add PCI error recovery to Symbios dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005346.GA27066@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the Symbios SCSI device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:28:52.512031337 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:43:56.084343457 -0600 @@ -686,6 +686,10 @@ if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("["); + /* Avoid spinloop trying to handle interrupts on frozen device */ + if (np->s.io_state != pci_channel_io_normal) + return IRQ_HANDLED; + spin_lock_irqsave(np->s.host->host_lock, flags); sym_interrupt(np); spin_unlock_irqrestore(np->s.host->host_lock, flags); @@ -759,6 +763,25 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +static void sym_eeh_timeout(u_long p) +{ + struct sym_eh_wait *ep = (struct sym_eh_wait *) p; + if (!ep) + return; + complete(&ep->done); +} + +static void sym_eeh_done(struct sym_eh_wait *ep) +{ + if (!ep) + return; + ep->timed_out = 0; + if (!del_timer(&ep->timer)) + return; + + complete(&ep->done); +} + /* * Generic method for our eh processing. * The 'op' argument tells what we have to do. @@ -799,6 +822,35 @@ /* Try to proceed the operation we have been asked for */ sts = -1; + + /* We may be in an error condition because the PCI bus + * went down. In this case, we need to wait until the + * PCI bus is reset, the card is reset, and only then + * proceed with the scsi error recovery. We'll wait + * for 15 seconds for this to happen. + */ +#define WAIT_FOR_PCI_RECOVERY 15 + if (np->s.io_state != pci_channel_io_normal) { + struct sym_eh_wait eeh, *eep = &eeh; + np->s.io_reset_wait = eep; + init_completion(&eep->done); + init_timer(&eep->timer); + eep->to_do = SYM_EH_DO_WAIT; + eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ); + eep->timer.function = sym_eeh_timeout; + eep->timer.data = (u_long)eep; + eep->timed_out = 1; /* Be pessimistic for once :) */ + add_timer(&eep->timer); + spin_unlock_irq(np->s.host->host_lock); + wait_for_completion(&eep->done); + spin_lock_irq(np->s.host->host_lock); + if (eep->timed_out) { + printk (KERN_ERR "%s: Timed out waiting for PCI reset\n", + sym_name(np)); + } + np->s.io_reset_wait = NULL; + } + switch(op) { case SYM_EH_ABORT: sts = sym_abort_scsiio(np, cmd, 1); @@ -1584,6 +1636,8 @@ np->maxoffs = dev->chip.offset_max; np->maxburst = dev->chip.burst_max; np->myaddr = dev->host_id; + np->s.io_state = pci_channel_io_normal; + np->s.io_reset_wait = NULL; /* * Edit its name. @@ -1916,6 +1970,58 @@ return 1; } +/* ------------- PCI Error Recovery infrastructure -------------- */ +/** sym2_io_error_detected() is called when PCI error is detected */ +static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + np->s.io_state = state; + // XXX If slot is permanently frozen, then what? + // Should we scsi_remove_host() maybe ?? + + /* Request a slot slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** sym2_io_slot_reset is called when the pci bus has been reset. + * Restart the card from scratch. */ +static int sym2_io_slot_reset (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + printk (KERN_INFO "%s: recovering from a PCI slot reset\n", + sym_name(np)); + + if (pci_enable_device(pdev)) + printk (KERN_ERR "%s: device setup failed most egregiously\n", + sym_name(np)); + + pci_set_master(pdev); + enable_irq (pdev->irq); + + /* Perform host reset only on one instance of the card */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_reset_scsi_bus(np, 0); + + return PCIERR_RESULT_RECOVERED; +} + +/** sym2_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void sym2_io_resume (struct pci_dev *pdev) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); + + /* Perform device startup only once for this card. */ + if (0 == PCI_FUNC (pdev->devfn)) + sym_start_up (np, 1); + + np->s.io_state = pci_channel_io_normal; + sym_eeh_done (np->s.io_reset_wait); +} + /* * Driver host template. */ @@ -2169,11 +2275,18 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +static struct pci_error_handlers sym2_err_handler = { + .error_detected = sym2_io_error_detected, + .slot_reset = sym2_io_slot_reset, + .resume = sym2_io_resume, +}; + static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), + .err_handler = &sym2_err_handler, }; static int __init sym2_init(void) Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-02 14:28:52.513031197 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h 2005-11-02 14:43:56.089342756 -0600 @@ -181,6 +181,10 @@ char chip_name[8]; struct pci_dev *device; + /* pci bus i/o state; waiter for clearing of i/o state */ + enum pci_channel_state io_state; + struct sym_eh_wait *io_reset_wait; + struct Scsi_Host *host; void __iomem * ioaddr; /* MMIO kernel io address */ Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-02 14:28:52.513031197 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-11-02 14:43:56.141335464 -0600 @@ -2809,6 +2809,7 @@ u_char istat, istatc; u_char dstat; u_short sist; + u_int icnt; /* * interrupt on the fly ? @@ -2850,6 +2851,7 @@ sist = 0; dstat = 0; istatc = istat; + icnt = 0; do { if (istatc & SIP) sist |= INW(np, nc_sist); @@ -2857,6 +2859,19 @@ dstat |= INB(np, nc_dstat); istatc = INB(np, nc_istat); istat |= istatc; + + /* Prevent deadlock waiting on a condition that may never clear. */ + /* XXX this is a temporary kludge; the correct to detect + * a PCI bus error would be to use the io_check interfaces + * proposed by Hidetoshi Seto + * Problem with polling like that is the state flag might not + * be set. + */ + icnt ++; + if (100 < icnt) { + if (np->s.device->error_state != pci_channel_io_normal) + return; + } } while (istatc & (SIP|DIP)); if (DEBUG_FLAGS & DEBUG_TINY) From linas at linas.org Fri Nov 4 11:53:53 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:53:53 -0600 Subject: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005353.GA27074@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ethernet e100 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/e100.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e100.c 2005-11-02 14:28:51.524169808 -0600 +++ linux-2.6.14-git3/drivers/net/e100.c 2005-11-02 14:43:58.890949857 -0600 @@ -2465,6 +2465,75 @@ } +/* ------------------ PCI Error Recovery infrastructure -------------- */ +/** e100_io_error_detected() is called when PCI error is detected */ +static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + + /* Same as calling e100_down(netdev_priv(netdev)), but generic */ + netdev->stop(netdev); + + /* Is a detach needed ?? */ + // netif_device_detach(netdev); + + /* Request a slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** e100_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. */ +static int e100_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Only one device per card can do a reset */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + e100_hw_reset(nic); + e100_phy_init(nic); + + if(e100_hw_init(nic)) { + DPRINTK(HW, ERR, "e100_hw_init failed\n"); + return PCIERR_RESULT_DISCONNECT; + } + + return PCIERR_RESULT_RECOVERED; +} + +/** e100_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + */ +static void e100_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct nic *nic = netdev_priv(netdev); + + /* ack any pending wake events, disable PME */ + pci_enable_wake(pdev, 0, 0); + + netif_device_attach(netdev); + if(netif_running(netdev)) { + e100_open (netdev); + mod_timer(&nic->watchdog, jiffies); + } +} + +static struct pci_error_handlers e100_err_handler = { + .error_detected = e100_io_error_detected, + .slot_reset = e100_io_slot_reset, + .resume = e100_io_resume, +}; + + static struct pci_driver e100_driver = { .name = DRV_NAME, .id_table = e100_id_table, @@ -2475,6 +2544,7 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, + .err_handler = &e100_err_handler, }; static int __init e100_init_module(void) From linas at linas.org Fri Nov 4 11:54:04 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:04 -0600 Subject: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005404.GA27082@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel gigabit ethernet e1000 device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c 2005-11-02 14:28:50.471317390 -0600 +++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:00.730691851 -0600 @@ -206,6 +206,16 @@ void e1000_rx_schedule(void *data); #endif +static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state); +static int e1000_io_slot_reset(struct pci_dev *pdev); +static void e1000_io_resume(struct pci_dev *pdev); + +static struct pci_error_handlers e1000_err_handler = { + .error_detected = e1000_io_error_detected, + .slot_reset = e1000_io_slot_reset, + .resume = e1000_io_resume, +}; + /* Exported from other modules */ extern void e1000_check_options(struct e1000_adapter *adapter); @@ -218,8 +228,9 @@ /* Power Managment Hooks */ #ifdef CONFIG_PM .suspend = e1000_suspend, - .resume = e1000_resume + .resume = e1000_resume, #endif + .err_handler = &e1000_err_handler, }; MODULE_AUTHOR("Intel Corporation, "); @@ -2937,6 +2948,10 @@ #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF + /* Prevent stats update while adapter is being reset */ + if (adapter->link_speed == 0) + return; + spin_lock_irqsave(&adapter->stats_lock, flags); /* these counters are modified from e1000_adjust_tbi_stats, @@ -4358,4 +4373,88 @@ } #endif +/* --------------- PCI Error Recovery infrastructure ------------ */ +/** e1000_io_error_detected() is called when PCI error is detected */ +static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (netif_running(netdev)) + e1000_down(adapter); + + /* Request a slot slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** e1000_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * e1000_resume routine. + */ +static int e1000_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + pci_enable_wake(pdev, 3, 0); + pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */ + + /* Perform card reset only on one instance of the card */ + if(0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + e1000_reset(adapter); + E1000_WRITE_REG(&adapter->hw, WUS, ~0); + + return PCIERR_RESULT_RECOVERED; +} + +/** e1000_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * e1000_resume routine. + */ +static void e1000_io_resume(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct e1000_adapter *adapter = netdev->priv; + uint32_t manc, swsm; + + if(netif_running(netdev)) { + if (e1000_up(adapter)) { + printk("e1000: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + + if(adapter->hw.mac_type >= e1000_82540 && + adapter->hw.media_type == e1000_media_type_copper) { + manc = E1000_READ_REG(&adapter->hw, MANC); + manc &= ~(E1000_MANC_ARP_EN); + E1000_WRITE_REG(&adapter->hw, MANC, manc); + } + + switch(adapter->hw.mac_type) { + case e1000_82573: + swsm = E1000_READ_REG(&adapter->hw, SWSM); + E1000_WRITE_REG(&adapter->hw, SWSM, + swsm | E1000_SWSM_DRV_LOAD); + break; + default: + break; + } + + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); +} + /* e1000_main.c */ From linas at linas.org Fri Nov 4 11:54:11 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:11 -0600 Subject: [PATCH 31/42]: ethernet: add PCI error recovery to ixgb dev driver References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005411.GA27090@mail.gnucash.org> Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the intel ten-gigabit ethernet ixgb device driver. The patch has been tested, and appears to work well. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:28:49.225492020 -0600 +++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:02.380460486 -0600 @@ -132,6 +132,16 @@ static void ixgb_netpoll(struct net_device *dev); #endif +static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state); +static int ixgb_io_slot_reset (struct pci_dev *pdev); +static void ixgb_io_resume (struct pci_dev *pdev); + +static struct pci_error_handlers ixgb_err_handler = { + .error_detected = ixgb_io_error_detected, + .slot_reset = ixgb_io_slot_reset, + .resume = ixgb_io_resume, +}; + /* Exported from other modules */ extern void ixgb_check_options(struct ixgb_adapter *adapter); @@ -141,6 +151,8 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), + .err_handler = &ixgb_err_handler, + }; MODULE_AUTHOR("Intel Corporation, "); @@ -1654,8 +1666,16 @@ unsigned int i; #endif +#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY + if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) { + if (eeh_slot_is_isolated (adapter->pdev)) + // disable_irq_nosync (adapter->pdev->irq); + return IRQ_NONE; /* Not our interrupt */ + } +#else if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ +#endif /* CONFIG_IXGB_EEH_RECOVERY */ if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) { mod_timer(&adapter->watchdog_timer, jiffies); @@ -2125,4 +2145,70 @@ } #endif +/* -------------- PCI Error Recovery infrastructure ---------------- */ +/** ixgb_io_error_detected() is called when PCI error is detected */ +static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) + ixgb_down(adapter, TRUE); + + /* Request a slot reset. */ + return PCIERR_RESULT_NEED_RESET; +} + +/** ixgb_io_slot_reset is called after the pci bus has been reset. + * Restart the card from scratch. + * Implementation resembles the first-half of the + * ixgb_resume routine. + */ +static int ixgb_io_slot_reset (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(pci_enable_device(pdev)) { + printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n"); + return PCIERR_RESULT_DISCONNECT; + } + pci_set_master(pdev); + + /* Perform card reset only on one instance of the card */ + if (0 != PCI_FUNC (pdev->devfn)) + return PCIERR_RESULT_RECOVERED; + + ixgb_reset(adapter); + + return PCIERR_RESULT_RECOVERED; +} + +/** ixgb_io_resume is called when the error recovery driver + * tells us that its OK to resume normal operation. + * Implementation resembles the second-half of the + * ixgb_resume routine. + */ +static void ixgb_io_resume (struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + struct ixgb_adapter *adapter = netdev->priv; + + if(netif_running(netdev)) { + if(ixgb_up(adapter)) { + printk ("ixgb: can't bring device back up after reset\n"); + return; + } + } + + netif_device_attach(netdev); + if(netif_running(netdev)) + mod_timer(&adapter->watchdog_timer, jiffies); + + /* Reading all-ff's from the adapter will completely hose + * the counts and statistics. So just clear them out */ + memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); + ixgb_update_stats(adapter); +} + /* ixgb_main.c */ From linas at linas.org Fri Nov 4 11:54:17 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:17 -0600 Subject: [PATCH 32/42]: RFC: Add compile-time config options References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005417.GA27098@mail.gnucash.org> 32-pci-error-recovery_config-option.patch This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the various device drivers. This patch is "optional" in that its a little bit messy, but it does solve a little problem. -- The good news: this gives some users (e.g. embeddd systems) the option of not compiling in this code, thus making thier device drivers a tiny bit smaller. -- The bad news: This also clutters up the drivers with extraneous markup and the config process with yet another config. I don't know if this patch is worth it. Apply or reject, as desired ... Its up to you ... :-) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/drivers/scsi/ipr.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/ipr.c 2005-11-02 14:43:52.782806465 -0600 +++ linux-2.6.14-git3/drivers/scsi/ipr.c 2005-11-02 14:44:04.167209911 -0600 @@ -5329,6 +5329,8 @@ } /* --------------- PCI Error Recovery infrastructure ----------- */ +#ifdef CONFIG_PCIERR_RECOVERY + /** If the PCI slot is frozen, hold off all i/o * activity; then, as soon as the slot is available again, * initiate an adapter reset. @@ -5414,6 +5416,7 @@ return PCIERR_RESULT_NEED_RESET; } +#endif /* CONFIG_PCIERR_RECOVERY */ /* ------------- end of PCI Error Recovery suport ----------- */ /** @@ -6153,10 +6156,12 @@ }; MODULE_DEVICE_TABLE(pci, ipr_pci_table); +#ifdef CONFIG_PCIERR_RECOVERY static struct pci_error_handlers ipr_err_handler = { .error_detected = ipr_eeh_error_detected, .slot_reset = ipr_eeh_slot_reset, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver ipr_driver = { .name = IPR_NAME, @@ -6164,7 +6169,9 @@ .probe = ipr_probe, .remove = ipr_remove, .shutdown = ipr_shutdown, +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &ipr_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; /** Index: linux-2.6.14-git3/drivers/pci/Kconfig =================================================================== --- linux-2.6.14-git3.orig/drivers/pci/Kconfig 2005-11-02 14:28:48.597580036 -0600 +++ linux-2.6.14-git3/drivers/pci/Kconfig 2005-11-02 14:44:04.172209210 -0600 @@ -13,6 +13,21 @@ If you don't know what to do here, say N. +config PCIERR_RECOVERY + bool "PCI Error Recovery support" + depends on PCI + depends on PPC_PSERIES + default y + help + PCI Error Recovery is a mechanism by which crashed/hung + PCI adapters are automatically detected and rebooted without + otherwise disturbing the operation of the system. Support + for this recovery requires special PCI bridge chips (some + PCI-E chips may have this support) as well as support in + the device drivers (not all device drivers can handle this). + + When in doubt, say Y. + config PCI_LEGACY_PROC bool "Legacy /proc/pci interface" depends on PCI Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c =================================================================== --- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:43:56.084343457 -0600 +++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-11-02 14:44:04.195205985 -0600 @@ -763,6 +763,7 @@ */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +#ifdef CONFIG_PCIERR_RECOVERY static void sym_eeh_timeout(u_long p) { struct sym_eh_wait *ep = (struct sym_eh_wait *) p; @@ -781,6 +782,7 @@ complete(&ep->done); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* * Generic method for our eh processing. @@ -823,6 +825,7 @@ /* Try to proceed the operation we have been asked for */ sts = -1; +#ifdef CONFIG_PCIERR_RECOVERY /* We may be in an error condition because the PCI bus * went down. In this case, we need to wait until the * PCI bus is reset, the card is reset, and only then @@ -850,6 +853,7 @@ } np->s.io_reset_wait = NULL; } +#endif /* CONFIG_PCIERR_RECOVERY */ switch(op) { case SYM_EH_ABORT: @@ -1971,6 +1975,7 @@ } /* ------------- PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** sym2_io_error_detected() is called when PCI error is detected */ static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) { @@ -2021,6 +2026,7 @@ np->s.io_state = pci_channel_io_normal; sym_eeh_done (np->s.io_reset_wait); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* * Driver host template. @@ -2275,18 +2281,22 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table); +#ifdef CONFIG_PCIERR_RECOVERY static struct pci_error_handlers sym2_err_handler = { .error_detected = sym2_io_error_detected, .slot_reset = sym2_io_slot_reset, .resume = sym2_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver sym2_driver = { .name = NAME53C8XX, .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &sym2_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; static int __init sym2_init(void) Index: linux-2.6.14-git3/drivers/net/e100.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e100.c 2005-11-02 14:43:58.890949857 -0600 +++ linux-2.6.14-git3/drivers/net/e100.c 2005-11-02 14:44:04.222202199 -0600 @@ -2466,6 +2466,7 @@ /* ------------------ PCI Error Recovery infrastructure -------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** e100_io_error_detected() is called when PCI error is detected */ static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) { @@ -2532,6 +2533,7 @@ .slot_reset = e100_io_slot_reset, .resume = e100_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ static struct pci_driver e100_driver = { @@ -2544,7 +2546,9 @@ .resume = e100_resume, #endif .shutdown = e100_shutdown, +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &e100_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; static int __init e100_init_module(void) Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:00.730691851 -0600 +++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c 2005-11-02 14:44:04.266196029 -0600 @@ -206,6 +206,7 @@ void e1000_rx_schedule(void *data); #endif +#ifdef CONFIG_PCIERR_RECOVERY static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state); static int e1000_io_slot_reset(struct pci_dev *pdev); static void e1000_io_resume(struct pci_dev *pdev); @@ -215,6 +216,7 @@ .slot_reset = e1000_io_slot_reset, .resume = e1000_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ /* Exported from other modules */ @@ -230,7 +232,9 @@ .suspend = e1000_suspend, .resume = e1000_resume, #endif +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &e1000_err_handler, +#endif /* CONFIG_PCIERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -4374,6 +4378,7 @@ #endif /* --------------- PCI Error Recovery infrastructure ------------ */ +#ifdef CONFIG_PCIERR_RECOVERY /** e1000_io_error_detected() is called when PCI error is detected */ static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state) { @@ -4456,5 +4461,6 @@ if(netif_running(netdev)) mod_timer(&adapter->watchdog_timer, jiffies); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* e1000_main.c */ Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c =================================================================== --- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:02.380460486 -0600 +++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c 2005-11-02 14:44:04.289192804 -0600 @@ -132,6 +132,7 @@ static void ixgb_netpoll(struct net_device *dev); #endif +#ifdef CONFIG_PCIERR_RECOVERY static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state); static int ixgb_io_slot_reset (struct pci_dev *pdev); static void ixgb_io_resume (struct pci_dev *pdev); @@ -141,6 +142,7 @@ .slot_reset = ixgb_io_slot_reset, .resume = ixgb_io_resume, }; +#endif /* CONFIG_PCIERR_RECOVERY */ /* Exported from other modules */ @@ -151,8 +153,9 @@ .id_table = ixgb_pci_tbl, .probe = ixgb_probe, .remove = __devexit_p(ixgb_remove), +#ifdef CONFIG_PCIERR_RECOVERY .err_handler = &ixgb_err_handler, - +#endif /* CONFIG_PCIERR_RECOVERY */ }; MODULE_AUTHOR("Intel Corporation, "); @@ -2146,6 +2149,7 @@ #endif /* -------------- PCI Error Recovery infrastructure ---------------- */ +#ifdef CONFIG_PCIERR_RECOVERY /** ixgb_io_error_detected() is called when PCI error is detected */ static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state) { @@ -2210,5 +2214,6 @@ memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats)); ixgb_update_stats(adapter); } +#endif /* CONFIG_PCIERR_RECOVERY */ /* ixgb_main.c */ From linas at linas.org Fri Nov 4 11:54:23 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:23 -0600 Subject: [PATCH 33/42]: ppc64: remove bogus printk References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005423.GA27106@mail.gnucash.org> 233-eeh-buid-fix.patch Remove un-desired warning print from EEH code. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:43:49.212307192 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:00.429319560 -0600 @@ -824,12 +824,10 @@ if (!dn || !PCI_DN(dn)) return; phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); + + /* USB Bus children of PCI devices will not have BUID's */ + if (NULL == phb || 0 == phb->buid) return; - } info.buid_hi = BUID_HI(phb->buid); info.buid_lo = BUID_LO(phb->buid); From linas at linas.org Fri Nov 4 11:54:29 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:29 -0600 Subject: [PATCH 34/42]: ppc64: Remove duplicate code References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005429.GA27114@mail.gnucash.org> 234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:41:18.435451353 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:45:43.638259683 -0600 @@ -42,19 +42,6 @@ return ""; } -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - - #ifdef DEBUG static void print_device_node_tree (struct pci_dn *pdn, int dent) { Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:00.429319560 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:43.651257860 -0600 @@ -172,7 +172,7 @@ /** * Return the "partitionable endpoint" (pe) under which this device lies */ -static struct device_node * find_device_pe(struct device_node *dn) +struct device_node * find_device_pe(struct device_node *dn) { while ((dn->parent) && PCI_DN(dn->parent) && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:42:38.998153856 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:45:43.656257159 -0600 @@ -110,6 +110,9 @@ void eeh_mark_slot (struct device_node *dn, int mode_flag); void eeh_clear_slot (struct device_node *dn, int mode_flag); +/* Find the associated "Partiationable Endpoint" PE */ +struct device_node * find_device_pe(struct device_node *dn); + #endif #endif /* _ASM_POWERPC_PPC_PCI_H */ From linas at linas.org Fri Nov 4 11:54:34 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:34 -0600 Subject: [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005434.GA27122@mail.gnucash.org> 235-eeh-set-pcidev-bugfix.patch The pci device field should be initialized to a valid value. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:42:38.994154417 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 14:46:23.687642815 -0600 @@ -307,6 +307,9 @@ /* Save the BAR's; firmware doesn't restore these after EEH reset */ dn = pci_device_to_OF_node(dev); eeh_save_bars(dev, PCI_DN(dn)); + + pci_dev_get (dev); /* matching put is in eeh_remove_device() */ + PCI_DN(dn)->pcidev = dev; } #ifdef DEBUG From linas at linas.org Fri Nov 4 11:54:39 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:39 -0600 Subject: [PATCH 36/42]: ppc64: Use PE configuration address consistently References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005439.GA27130@mail.gnucash.org> 236-eeh-config-addr.patch The PE configuration address wasn't being cnsistently used in all locations where a config address is called for. This patch adds it to the places it should have appeared in. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:45:43.651257860 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:07.798456202 -0600 @@ -110,6 +110,7 @@ void eeh_slot_error_detail (struct pci_dn *pdn, int severity) { + int config_addr; unsigned long flags; int rc; @@ -117,8 +118,13 @@ spin_lock_irqsave(&slot_errbuf_lock, flags); memset(slot_errbuf, 0, eeh_error_buf_size); + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, + 8, 1, NULL, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid), NULL, 0, virt_to_phys(slot_errbuf), @@ -138,6 +144,7 @@ static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) { int token, outputs; + int config_addr; if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { token = ibm_read_slot_reset_state2; @@ -148,7 +155,12 @@ outputs = 3; } - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + + return rtas_call(token, 3, outputs, rets, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); } @@ -284,7 +296,7 @@ return 0; } - if (!pdn->eeh_config_addr) { + if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) { __get_cpu_var(no_cfg_addr)++; return 0; } @@ -613,13 +625,20 @@ void rtas_configure_bridge(struct pci_dn *pdn) { + int config_addr; int token = rtas_token ("ibm,configure-bridge"); int rc; if (token == RTAS_UNKNOWN_SERVICE) return; + + /* Use PE configuration address, if present */ + config_addr = pdn->eeh_config_addr; + if (pdn->eeh_pe_config_addr) + config_addr = pdn->eeh_pe_config_addr; + rc = rtas_call(token,3,1, NULL, - pdn->eeh_config_addr, + config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); if (rc) { From linas at linas.org Fri Nov 4 11:54:47 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:47 -0600 Subject: [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them. References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005447.GA27138@mail.gnucash.org> 237-eeh-bridge-token.patch Minor: the rtas-bridge toekn should be set up the same way that all the other tokens rtas tokens are set up. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:07.798456202 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:38.997080468 -0600 @@ -84,6 +84,7 @@ static int ibm_read_slot_reset_state2; static int ibm_slot_error_detail; static int ibm_get_config_addr_info; +static int ibm_configure_bridge; static int eeh_subsystem_enabled; @@ -626,18 +627,14 @@ rtas_configure_bridge(struct pci_dn *pdn) { int config_addr; - int token = rtas_token ("ibm,configure-bridge"); int rc; - if (token == RTAS_UNKNOWN_SERVICE) - return; - /* Use PE configuration address, if present */ config_addr = pdn->eeh_config_addr; if (pdn->eeh_pe_config_addr) config_addr = pdn->eeh_pe_config_addr; - rc = rtas_call(token,3,1, NULL, + rc = rtas_call(ibm_configure_bridge,3,1, NULL, config_addr, BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); @@ -789,6 +786,7 @@ ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info"); + ibm_configure_bridge = rtas_token ("ibm,configure-bridge"); if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) return; From linas at linas.org Fri Nov 4 11:54:54 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:54:54 -0600 Subject: [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed. References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005454.GA27146@mail.gnucash.org> 238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:47:38.997080468 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:13.093298267 -0600 @@ -450,11 +450,16 @@ if (rc) return rc; if (rets[1] == 0) return -1; /* EEH is not supported */ - if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 0) return 0; /* Oll Korrect */ if (rets[0] == 5) { if (rets[2] == 0) return -1; /* permanently unavailable */ return rets[2]; /* number of millisecs to wait */ } + if (rets[0] == 1) + return 250; + + printk (KERN_ERR "EEH: Slot unavailable: rc=%d, rets=%d %d %d\n", + rc, rets[0], rets[1], rets[2]); return -1; } @@ -501,9 +506,11 @@ /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second * dn -- device node to be reset. + * + * Return 0 if success, else a non-zero value. */ -void +int rtas_set_slot_reset(struct pci_dn *pdn) { int i, rc; @@ -533,10 +540,21 @@ * ready to be used; if not, wait for recovery. */ for (i=0; i<10; i++) { rc = eeh_slot_availability (pdn); - if (rc <= 0) break; + if (rc < 0) + printk (KERN_ERR "EEH: failed (%d) to reset slot %s\n", rc, pdn->node->full_name); + if (rc == 0) + return 0; + if (rc < 0) + return -1; msleep (rc+100); } + + rc = eeh_slot_availability (pdn); + if (rc) + printk (KERN_ERR "EEH: timeout resetting slot %s\n", pdn->node->full_name); + + return rc; } /* ------------------------------------------------------- */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:45:43.638259683 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:13.100297285 -0600 @@ -200,14 +200,18 @@ * bus resets can be performed. */ -static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) +static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus) { + int rc; if (bus) pcibios_remove_pci_devices(bus); /* Reset the pci controller. (Asserts RST#; resets config space). - * Reconfigure bridges and devices */ - rtas_set_slot_reset(pe_dn); + * Reconfigure bridges and devices. Don't try to bring the system + * up if the reset failed for some reason. */ + rc = rtas_set_slot_reset(pe_dn); + if (rc) + return rc; /* Walk over all functions on this device */ rtas_configure_bridge(pe_dn); @@ -223,6 +227,8 @@ ssleep (5); pcibios_add_pci_devices(bus); } + + return 0; } /* The longest amount of time to wait for a pci device @@ -235,7 +241,7 @@ struct device_node *frozen_dn; struct pci_dn *frozen_pdn; struct pci_bus *frozen_bus; - int perm_failure = 0; + int rc = 0; frozen_dn = find_device_pe(event->dn); frozen_bus = pcibios_find_pci_bus(frozen_dn); @@ -272,7 +278,7 @@ frozen_pdn->eeh_freeze_count++; if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES) - perm_failure = 1; + goto hard_fail; /* If the reset state is a '5' and the time to reset is 0 (infinity) * or is more then 15 seconds, then mark this as a permanent failure. @@ -280,34 +286,7 @@ if ((event->state == pci_channel_io_perm_failure) && ((event->time_unavail <= 0) || (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) - { - perm_failure = 1; - } - - /* Log the error with the rtas logger. */ - if (perm_failure) { - /* - * About 90% of all real-life EEH failures in the field - * are due to poorly seated PCI cards. Only 10% or so are - * due to actual, failed cards. - */ - printk(KERN_ERR - "EEH: PCI device %s - %s has failed %d times \n" - "and has been permanently disabled. Please try reseating\n" - "this device or replacing it.\n", - pci_name (frozen_pdn->pcidev), - pcid_name(frozen_pdn->pcidev), - frozen_pdn->eeh_freeze_count); - - eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); - - /* Notify all devices that they're about to go down. */ - pci_walk_bus(frozen_bus, eeh_report_failure, 0); - - /* Shut down the device drivers for good. */ - pcibios_remove_pci_devices(frozen_bus); - return; - } + goto hard_fail; eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */); printk(KERN_WARNING @@ -330,24 +309,54 @@ * go down willingly, without panicing the system. */ if (result == PCIERR_RESULT_NONE) { - eeh_reset_device(frozen_pdn, frozen_bus); + rc = eeh_reset_device(frozen_pdn, frozen_bus); + if (rc) + goto hard_fail; } /* If any device called out for a reset, then reset the slot */ if (result == PCIERR_RESULT_NEED_RESET) { - eeh_reset_device(frozen_pdn, NULL); + rc = eeh_reset_device(frozen_pdn, NULL); + if (rc) + goto hard_fail; pci_walk_bus(frozen_bus, eeh_report_reset, 0); } /* If all devices reported they can proceed, the re-enable PIO */ if (result == PCIERR_RESULT_CAN_RECOVER) { /* XXX Not supported; we brute-force reset the device */ - eeh_reset_device(frozen_pdn, NULL); + rc = eeh_reset_device(frozen_pdn, NULL); + if (rc) + goto hard_fail; pci_walk_bus(frozen_bus, eeh_report_reset, 0); } /* Tell all device drivers that they can resume operations */ pci_walk_bus(frozen_bus, eeh_report_resume, 0); + + return; + +hard_fail: + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk(KERN_ERR + "EEH: PCI device %s - %s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (frozen_pdn->pcidev), + pcid_name(frozen_pdn->pcidev), + frozen_pdn->eeh_freeze_count); + + eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */); + + /* Notify all devices that they're about to go down. */ + pci_walk_bus(frozen_bus, eeh_report_failure, 0); + + /* Shut down the device drivers for good. */ + pcibios_remove_pci_devices(frozen_bus); } /* ---------- end of file ---------- */ Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 14:45:43.656257159 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 14:48:13.104296724 -0600 @@ -77,8 +77,10 @@ * does this by asserting the PCI #RST line for 1/8th of * a second; this routine will sleep while the adapter is * being reset. + * + * Returns a non-zero value if the reset failed. */ -void rtas_set_slot_reset (struct pci_dn *); +int rtas_set_slot_reset (struct pci_dn *); /** * eeh_restore_bars - Restore device configuration info. From linas at linas.org Fri Nov 4 11:55:01 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:01 -0600 Subject: [PATCH 39/42]: ppc64: handle multifunction PCI devices properly References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005501.GA27154@mail.gnucash.org> 239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, tehse devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:13.093298267 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 14:48:44.941831253 -0600 @@ -223,6 +223,11 @@ void eeh_mark_slot (struct device_node *dn, int mode_flag) { dn = find_device_pe (dn); + + /* Back up one, since config addrs might be shared */ + if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr) + dn = dn->parent; + PCI_DN(dn)->eeh_mode |= mode_flag; __eeh_mark_slot (dn->child, mode_flag); } @@ -244,7 +249,13 @@ { unsigned long flags; spin_lock_irqsave(&confirm_error_lock, flags); + dn = find_device_pe (dn); + + /* Back up one, since config addrs might be shared */ + if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr) + dn = dn->parent; + PCI_DN(dn)->eeh_mode &= ~mode_flag; PCI_DN(dn)->eeh_check_count = 0; __eeh_clear_slot (dn->child, mode_flag); @@ -609,7 +620,7 @@ if (!pdn) return; - if (! pdn->eeh_is_bridge) + if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge)) __restore_bars (pdn); dn = pdn->node->child; Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:13.100297285 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c 2005-11-02 14:48:44.950829991 -0600 @@ -213,9 +213,23 @@ if (rc) return rc; - /* Walk over all functions on this device */ - rtas_configure_bridge(pe_dn); - eeh_restore_bars(pe_dn); + /* New-style config addrs might be shared across multiple devices, + * Walk over all functions on this device */ + if (pe_dn->eeh_pe_config_addr) { + struct device_node *pe = pe_dn->node; + pe = pe->parent->child; + while (pe) { + struct pci_dn *ppe = PCI_DN(pe); + if (pe_dn->eeh_pe_config_addr == ppe->eeh_pe_config_addr) { + rtas_configure_bridge(ppe); + eeh_restore_bars(ppe); + } + pe = pe->sibling; + } + } else { + rtas_configure_bridge(pe_dn); + eeh_restore_bars(pe_dn); + } /* Give the system 5 seconds to finish running the user-space * hotplug shutdown scripts, e.g. ifdown for ethernet. Yes, From linas at linas.org Fri Nov 4 11:55:14 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:14 -0600 Subject: [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005514.GA27179@mail.gnucash.org> 240-ioremap-null-ptr-test.patch Under highly unusual circumstances, a buggy driver will ask a null ptr to be ioremapped, an operation that curently suceeds but leads to later trouble. Instead, refuse to remap the null pointer. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/mm/pgtable_64.c 2005-11-02 14:59:56.507624778 -0600 +++ linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c 2005-11-02 15:01:04.284115774 -0600 @@ -185,7 +185,7 @@ pa = addr & PAGE_MASK; size = PAGE_ALIGN(addr + size) - pa; - if (size == 0) + if ((size == 0) || (pa == 0)) return NULL; if (mem_init_done) { From linas at linas.org Fri Nov 4 11:55:19 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:19 -0600 Subject: [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005519.GA27189@mail.gnucash.org> 241-eeh-save-bars-earlier.patch Save the PCI device bars *before* any PCI probing is done. Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c 2005-10-31 12:01:21.000000000 -0600 +++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c 2005-11-02 16:52:48.556202006 -0600 @@ -72,7 +72,7 @@ return 0; } -static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) +int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h 2005-11-02 16:53:29.000000000 -0600 +++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h 2005-11-02 17:28:14.843073955 -0600 @@ -59,8 +59,6 @@ void pci_addr_cache_build(void); struct pci_dev *pci_get_device_by_addr(unsigned long addr); -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn); - /** * eeh_slot_error_detail -- record and EEH error condition to the log * @severity: 1 if temporary, 2 if permanent failure. @@ -104,6 +102,7 @@ void rtas_configure_bridge(struct pci_dn *); int rtas_write_config(struct pci_dn *, int where, int size, u32 val); +int rtas_read_config(struct pci_dn *, int where, int size, u32 *val); /** * mark and clear slots: find "partition endpoint" PE and set or Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h 2005-11-02 14:43:49.000000000 -0600 +++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h 2005-11-02 17:13:07.358586231 -0600 @@ -58,15 +58,15 @@ struct iommu_table; struct pci_dn { - int busno; /* for pci devices */ - int bussubno; /* for pci devices */ - int devfn; /* for pci devices */ + int busno; /* pci bus number */ + int bussubno; /* pci subordinate bus number */ + int devfn; /* pci device and function number */ + int class_code; /* pci device class */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; int eeh_pe_config_addr; /* new-style partition endpoint address */ int eeh_check_count; /* # times driver ignored error */ int eeh_freeze_count; /* # times this device froze up. */ - int eeh_is_bridge; /* device is pci-to-pci bridge */ int pci_ext_config_space; /* for pci devices */ struct pci_controller *phb; /* for pci devices */ Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 16:45:55.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:42:28.243139205 -0600 @@ -106,6 +106,8 @@ static DEFINE_PER_CPU(unsigned long, ignored_failures); static DEFINE_PER_CPU(unsigned long, slot_resets); +#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE) + /* --------------------------------------------------------------- */ /* Below lies the EEH event infrastructure */ @@ -620,7 +622,7 @@ if (!pdn) return; - if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge)) + if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && !IS_BRIDGE(pdn->class_code)) __restore_bars (pdn); dn = pdn->node->child; @@ -638,18 +640,15 @@ * PCI devices are added individuallly; but, for the restore, * an entire slot is reset at a time. */ -void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn) +static void eeh_save_bars(struct pci_dn *pdn) { int i; - if (!pdev || !pdn ) + if (!pdn ) return; for (i = 0; i < 16; i++) - pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]); - - if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) - pdn->eeh_is_bridge = 1; + rtas_read_config(pdn, i * 4, 4, &pdn->config_space[i]); } void @@ -699,6 +698,7 @@ int enable; struct pci_dn *pdn = PCI_DN(dn); + pdn->class_code = *class_code; pdn->eeh_mode = 0; pdn->eeh_check_count = 0; pdn->eeh_freeze_count = 0; @@ -781,6 +781,7 @@ dn->full_name); } + eeh_save_bars(pdn); return NULL; } @@ -915,7 +916,6 @@ pdn->pcidev = dev; pci_addr_cache_insert_device (dev); - eeh_save_bars(dev, pdn); } EXPORT_SYMBOL_GPL(eeh_add_device_late); Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 16:45:55.000000000 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c 2005-11-02 18:40:54.893242771 -0600 @@ -304,10 +304,7 @@ pci_addr_cache_insert_device(dev); - /* Save the BAR's; firmware doesn't restore these after EEH reset */ dn = pci_device_to_OF_node(dev); - eeh_save_bars(dev, PCI_DN(dn)); - pci_dev_get (dev); /* matching put is in eeh_remove_device() */ PCI_DN(dn)->pcidev = dev; } From linas at linas.org Fri Nov 4 11:55:25 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:55:25 -0600 Subject: [PATCH 42/42]: ppc64: get rid of per_cpu counters References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005525.GA27197@mail.gnucash.org> 242-eeh-no-percpu-counters.patch Remove per-cpu counters from the EEH code. These statistics counters are incremented at a very low-frequency, and the performance gains of per-cpu variables are negligable. By conrast, the counters weren't safe against cpu gard operations, and its not worth the effeort to make them so (other than to turn them into plain globals). Signed-off-by: Linas Vepstas -- Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:42:28.243139205 -0600 +++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c 2005-11-02 18:49:24.196716323 -0600 @@ -97,14 +97,14 @@ static int eeh_error_buf_size; /* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); +static unsigned long no_device; +static unsigned long no_dn; +static unsigned long no_cfg_addr; +static unsigned long ignored_check; +static unsigned long total_mmio_ffs; +static unsigned long false_positives; +static unsigned long ignored_failures; +static unsigned long slot_resets; #define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE) @@ -288,13 +288,13 @@ enum pci_channel_state state; int rc = 0; - __get_cpu_var(total_mmio_ffs)++; + total_mmio_ffs++; if (!eeh_subsystem_enabled) return 0; if (!dn) { - __get_cpu_var(no_dn)++; + no_dn++; return 0; } pdn = PCI_DN(dn); @@ -302,7 +302,7 @@ /* Access to IO BARs might get this far and still not want checking. */ if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; + ignored_check++; #ifdef DEBUG printk ("EEH:ignored check (%x) for %s %s\n", pdn->eeh_mode, pci_name (dev), dn->full_name); @@ -311,7 +311,7 @@ } if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) { - __get_cpu_var(no_cfg_addr)++; + no_cfg_addr++; return 0; } @@ -353,7 +353,7 @@ if (ret != 0) { printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", ret, dn->full_name); - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } @@ -362,14 +362,14 @@ if (rets[1] != 1) { printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", ret, dn->full_name); - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } /* If not the kind of error we know about, punt. */ if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } @@ -377,12 +377,12 @@ /* Note that config-io to empty slots may fail; * we recognize empty because they don't have children. */ if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; + false_positives++; rc = 0; goto dn_unlock; } - __get_cpu_var(slot_resets)++; + slot_resets++; /* Avoid repeated reports of this failure, including problems * with other functions on this device, and functions under @@ -432,7 +432,7 @@ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); if (!dev) { - __get_cpu_var(no_device)++; + no_device++; return val; } @@ -963,25 +963,9 @@ static int proc_eeh_show(struct seq_file *m, void *v) { - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - if (0 == eeh_subsystem_enabled) { seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); + seq_printf(m, "eeh_total_mmio_ffs=%ld\n", total_mmio_ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); seq_printf(m, @@ -993,8 +977,10 @@ "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); + no_device, no_dn, no_cfg_addr, + ignored_check, total_mmio_ffs, + false_positives, ignored_failures, + slot_resets); } return 0; From linas at linas.org Fri Nov 4 11:57:35 2005 From: linas at linas.org (Linas Vepstas) Date: Thu, 3 Nov 2005 18:57:35 -0600 Subject: [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 References: <20051103235918.GA25616@mail.gnucash.org> Message-ID: <20051104005735.GA27243@mail.gnucash.org> 11-eeh-move-to-powerpc.patch Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c No other changes (except for Makefile to build it) Signed-off-by: Linas Vepstas Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c 2005-11-02 14:29:22.485829789 -0600 +++ /dev/null 1970-01-01 00:00:00.000000000 +0000 @@ -1,1093 +0,0 @@ -/* - * eeh.c - * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#undef DEBUG - -/** Overview: - * EEH, or "Extended Error Handling" is a PCI bridge technology for - * dealing with PCI bus errors that can't be dealt with within the - * usual PCI framework, except by check-stopping the CPU. Systems - * that are designed for high-availability/reliability cannot afford - * to crash due to a "mere" PCI error, thus the need for EEH. - * An EEH-capable bridge operates by converting a detected error - * into a "slot freeze", taking the PCI adapter off-line, making - * the slot behave, from the OS'es point of view, as if the slot - * were "empty": all reads return 0xff's and all writes are silently - * ignored. EEH slot isolation events can be triggered by parity - * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by low voltage on the bus, dust, - * vibration, humidity, radioactivity or plain-old failed hardware. - * - * Note, however, that one of the leading causes of EEH slot - * freeze events are buggy device drivers, buggy device microcode, - * or buggy device hardware. This is because any attempt by the - * device to bus-master data to a memory address that is not - * assigned to the device will trigger a slot freeze. (The idea - * is to prevent devices-gone-wild from corrupting system memory). - * Buggy hardware/drivers will have a miserable time co-existing - * with EEH. - * - * Ideally, a PCI device driver, when suspecting that an isolation - * event has occured (e.g. by reading 0xff's), will then ask EEH - * whether this is the case, and then take appropriate steps to - * reset the PCI slot, the PCI device, and then resume operations. - * However, until that day, the checking is done here, with the - * eeh_check_failure() routine embedded in the MMIO macros. If - * the slot is found to be isolated, an "EEH Event" is synthesized - * and sent out for processing. - */ - -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - -/* If a device driver keeps reading an MMIO register in an interrupt - * handler after a slot isolation event has occurred, we assume it - * is broken and panic. This sets the threshold for how many read - * attempts we allow before panicking. - */ -#define EEH_MAX_FAILS 100000 - -/* RTAS tokens */ -static int ibm_set_eeh_option; -static int ibm_set_slot_reset; -static int ibm_read_slot_reset_state; -static int ibm_read_slot_reset_state2; -static int ibm_slot_error_detail; - -static int eeh_subsystem_enabled; - -/* Lock to avoid races due to multiple reports of an error */ -static DEFINE_SPINLOCK(confirm_error_lock); - -/* Buffer for reporting slot-error-detail rtas calls */ -static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; -static DEFINE_SPINLOCK(slot_errbuf_lock); -static int eeh_error_buf_size; - -/* System monitoring statistics */ -static DEFINE_PER_CPU(unsigned long, no_device); -static DEFINE_PER_CPU(unsigned long, no_dn); -static DEFINE_PER_CPU(unsigned long, no_cfg_addr); -static DEFINE_PER_CPU(unsigned long, ignored_check); -static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); -static DEFINE_PER_CPU(unsigned long, false_positives); -static DEFINE_PER_CPU(unsigned long, ignored_failures); -static DEFINE_PER_CPU(unsigned long, slot_resets); - -/** - * The pci address cache subsystem. This subsystem places - * PCI device address resources into a red-black tree, sorted - * according to the address range, so that given only an i/o - * address, the corresponding PCI device can be **quickly** - * found. It is safe to perform an address lookup in an interrupt - * context; this ability is an important feature. - * - * Currently, the only customer of this code is the EEH subsystem; - * thus, this code has been somewhat tailored to suit EEH better. - * In particular, the cache does *not* hold the addresses of devices - * for which EEH is not enabled. - * - * (Implementation Note: The RB tree seems to be better/faster - * than any hash algo I could think of for this problem, even - * with the penalty of slow pointer chases for d-cache misses). - */ -struct pci_io_addr_range -{ - struct rb_node rb_node; - unsigned long addr_lo; - unsigned long addr_hi; - struct pci_dev *pcidev; - unsigned int flags; -}; - -static struct pci_io_addr_cache -{ - struct rb_root rb_root; - spinlock_t piar_lock; -} pci_io_addr_cache_root; - -static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr) -{ - struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node; - - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (addr < piar->addr_lo) { - n = n->rb_left; - } else { - if (addr > piar->addr_hi) { - n = n->rb_right; - } else { - pci_dev_get(piar->pcidev); - return piar->pcidev; - } - } - } - - return NULL; -} - -/** - * pci_get_device_by_addr - Get device, given only address - * @addr: mmio (PIO) phys address or i/o port number - * - * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset - * from zero (that is, they do *not* have pci_io_addr added in). - * It is safe to call this function within an interrupt. - */ -static struct pci_dev *pci_get_device_by_addr(unsigned long addr) -{ - struct pci_dev *dev; - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - dev = __pci_get_device_by_addr(addr); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); - return dev; -} - -#ifdef DEBUG -/* - * Handy-dandy debug print routine, does nothing more - * than print out the contents of our addr cache. - */ -static void pci_addr_cache_print(struct pci_io_addr_cache *cache) -{ - struct rb_node *n; - int cnt = 0; - - n = rb_first(&cache->rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n", - (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt, - piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev)); - cnt++; - n = rb_next(n); - } -} -#endif - -/* Insert address range into the rb tree. */ -static struct pci_io_addr_range * -pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo, - unsigned long ahi, unsigned int flags) -{ - struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node; - struct rb_node *parent = NULL; - struct pci_io_addr_range *piar; - - /* Walk tree, find a place to insert into tree */ - while (*p) { - parent = *p; - piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (ahi < piar->addr_lo) { - p = &parent->rb_left; - } else if (alo > piar->addr_hi) { - p = &parent->rb_right; - } else { - if (dev != piar->pcidev || - alo != piar->addr_lo || ahi != piar->addr_hi) { - printk(KERN_WARNING "PIAR: overlapping address range\n"); - } - return piar; - } - } - piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC); - if (!piar) - return NULL; - - piar->addr_lo = alo; - piar->addr_hi = ahi; - piar->pcidev = dev; - piar->flags = flags; - -#ifdef DEBUG - printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", - alo, ahi, pci_name (dev)); -#endif - - rb_link_node(&piar->rb_node, parent, p); - rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); - - return piar; -} - -static void __pci_addr_cache_insert_device(struct pci_dev *dev) -{ - struct device_node *dn; - struct pci_dn *pdn; - int i; - int inserted = 0; - - dn = pci_device_to_OF_node(dev); - if (!dn) { - printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev)); - return; - } - - /* Skip any devices for which EEH is not enabled. */ - pdn = PCI_DN(dn); - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { -#ifdef DEBUG - printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n", - pci_name(dev), pdn->node->full_name); -#endif - return; - } - - /* The cache holds a reference to the device... */ - pci_dev_get(dev); - - /* Walk resources on this device, poke them into the tree */ - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { - unsigned long start = pci_resource_start(dev,i); - unsigned long end = pci_resource_end(dev,i); - unsigned int flags = pci_resource_flags(dev,i); - - /* We are interested only bus addresses, not dma or other stuff */ - if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM))) - continue; - if (start == 0 || ~start == 0 || end == 0 || ~end == 0) - continue; - pci_addr_cache_insert(dev, start, end, flags); - inserted = 1; - } - - /* If there was nothing to add, the cache has no reference... */ - if (!inserted) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_insert_device - Add a device to the address cache - * @dev: PCI device whose I/O addresses we are interested in. - * - * In order to support the fast lookup of devices based on addresses, - * we maintain a cache of devices that can be quickly searched. - * This routine adds a device to that cache. - */ -static void pci_addr_cache_insert_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_insert_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -static inline void __pci_addr_cache_remove_device(struct pci_dev *dev) -{ - struct rb_node *n; - int removed = 0; - -restart: - n = rb_first(&pci_io_addr_cache_root.rb_root); - while (n) { - struct pci_io_addr_range *piar; - piar = rb_entry(n, struct pci_io_addr_range, rb_node); - - if (piar->pcidev == dev) { - rb_erase(n, &pci_io_addr_cache_root.rb_root); - removed = 1; - kfree(piar); - goto restart; - } - n = rb_next(n); - } - - /* The cache no longer holds its reference to this device... */ - if (removed) - pci_dev_put(dev); -} - -/** - * pci_addr_cache_remove_device - remove pci device from addr cache - * @dev: device to remove - * - * Remove a device from the addr-cache tree. - * This is potentially expensive, since it will walk - * the tree multiple times (once per resource). - * But so what; device removal doesn't need to be that fast. - */ -static void pci_addr_cache_remove_device(struct pci_dev *dev) -{ - unsigned long flags; - - spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags); - __pci_addr_cache_remove_device(dev); - spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags); -} - -/** - * pci_addr_cache_build - Build a cache of I/O addresses - * - * Build a cache of pci i/o addresses. This cache will be used to - * find the pci device that corresponds to a given address. - * This routine scans all pci busses to build the cache. - * Must be run late in boot process, after the pci controllers - * have been scaned for devices (after all device resources are known). - */ -void __init pci_addr_cache_build(void) -{ - struct pci_dev *dev = NULL; - - if (!eeh_subsystem_enabled) - return; - - spin_lock_init(&pci_io_addr_cache_root.piar_lock); - - while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) { - /* Ignore PCI bridges ( XXX why ??) */ - if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) { - continue; - } - pci_addr_cache_insert_device(dev); - } - -#ifdef DEBUG - /* Verify tree built up above, echo back the list of addrs. */ - pci_addr_cache_print(&pci_io_addr_cache_root); -#endif -} - -/* --------------------------------------------------------------- */ -/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ - -void eeh_slot_error_detail (struct pci_dn *pdn, int severity) -{ - unsigned long flags; - int rc; - - /* Log the error with the rtas logger */ - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), - BUID_LO(pdn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - severity); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); -} - -/** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) -{ - return notifier_chain_register(&eeh_notifier_chain, nb); -} - -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); -} - -/** - * read_slot_reset_state - Read the reset state of a device node's slot - * @dn: device node to read - * @rets: array to return results in - */ -static int read_slot_reset_state(struct pci_dn *pdn, int rets[]) -{ - int token, outputs; - - if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) { - token = ibm_read_slot_reset_state2; - outputs = 4; - } else { - token = ibm_read_slot_reset_state; - rets[2] = 0; /* fake PE Unavailable info */ - outputs = 3; - } - - return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr, - BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid)); -} - -/** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) { - struct device_node *dn = pci_device_to_OF_node(dev); - eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */); - panic("EEH: MMIO failure (%d) on device:%s\n", reset_state, - pci_name(dev)); - } - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n", - reset_state, pci_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s\n", event->reset_state, - pci_name(event->dev)); - - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xA.... - */ -static inline unsigned long eeh_token_to_phys(unsigned long token) -{ - pte_t *ptep; - unsigned long pa; - - ptep = find_linux_pte(init_mm.pgd, token); - if (!ptep) - return token; - pa = pte_pfn(*ptep) << PAGE_SHIFT; - - return pa | (token & (PAGE_SIZE-1)); -} - -/** - * Return the "partitionable endpoint" (pe) under which this device lies - */ -static struct device_node * find_device_pe(struct device_node *dn) -{ - while ((dn->parent) && PCI_DN(dn->parent) && - (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - dn = dn->parent; - } - return dn; -} - -/** Mark all devices that are peers of this device as failed. - * Mark the device driver too, so that it can see the failure - * immediately; this is critical, since some drivers poll - * status registers in interrupts ... If a driver is polling, - * and the slot is frozen, then the driver can deadlock in - * an interrupt context, which is bad. - */ - -static inline void __eeh_mark_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED; - - if (dn->child) - __eeh_mark_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void __eeh_clear_slot (struct device_node *dn) -{ - while (dn) { - PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED; - if (dn->child) - __eeh_clear_slot (dn->child); - dn = dn->sibling; - } -} - -static inline void eeh_clear_slot (struct device_node *dn) -{ - unsigned long flags; - spin_lock_irqsave(&confirm_error_lock, flags); - __eeh_clear_slot (dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); -} - -/** - * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze - * @dn device node - * @dev pci device, if known - * - * Check for an EEH failure for the given device node. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze. This routine - * will query firmware for the EEH status. - * - * Returns 0 if there has not been an EEH error; otherwise returns - * a non-zero value and queues up a slot isolation event notification. - * - * It is safe to call this routine in an interrupt context. - */ -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) -{ - int ret; - int rets[3]; - unsigned long flags; - int reset_state; - struct eeh_event *event; - struct pci_dn *pdn; - struct device_node *pe_dn; - int rc = 0; - - __get_cpu_var(total_mmio_ffs)++; - - if (!eeh_subsystem_enabled) - return 0; - - if (!dn) { - __get_cpu_var(no_dn)++; - return 0; - } - pdn = PCI_DN(dn); - - /* Access to IO BARs might get this far and still not want checking. */ - if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) || - pdn->eeh_mode & EEH_MODE_NOCHECK) { - __get_cpu_var(ignored_check)++; -#ifdef DEBUG - printk ("EEH:ignored check (%x) for %s %s\n", - pdn->eeh_mode, pci_name (dev), dn->full_name); -#endif - return 0; - } - - if (!pdn->eeh_config_addr) { - __get_cpu_var(no_cfg_addr)++; - return 0; - } - - /* If we already have a pending isolation event for this - * slot, we know it's bad already, we don't need to check. - * Do this checking under a lock; as multiple PCI devices - * in one slot might report errors simultaneously, and we - * only want one error recovery routine running. - */ - spin_lock_irqsave(&confirm_error_lock, flags); - rc = 1; - if (pdn->eeh_mode & EEH_MODE_ISOLATED) { - pdn->eeh_check_count ++; - if (pdn->eeh_check_count >= EEH_MAX_FAILS) { - printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", - pdn->eeh_check_count); - dump_stack(); - - /* re-read the slot reset state */ - if (read_slot_reset_state(pdn, rets) != 0) - rets[0] = -1; /* reset state unknown */ - - /* If we are here, then we hit an infinite loop. Stop. */ - panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev)); - } - goto dn_unlock; - } - - /* - * Now test for an EEH failure. This is VERY expensive. - * Note that the eeh_config_addr may be a parent device - * in the case of a device behind a bridge, or it may be - * function zero of a multi-function device. - * In any case they must share a common PHB. - */ - ret = read_slot_reset_state(pdn, rets); - - /* If the call to firmware failed, punt */ - if (ret != 0) { - printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If EEH is not supported on this device, punt. */ - if (rets[1] != 1) { - printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n", - ret, dn->full_name); - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* If not the kind of error we know about, punt. */ - if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - /* Note that config-io to empty slots may fail; - * we recognize empty because they don't have children. */ - if ((rets[0] == 5) && (dn->child == NULL)) { - __get_cpu_var(false_positives)++; - rc = 0; - goto dn_unlock; - } - - __get_cpu_var(slot_resets)++; - - /* Avoid repeated reports of this failure, including problems - * with other functions on this device, and functions under - * bridges. */ - pe_dn = find_device_pe (dn); - __eeh_mark_slot (pe_dn); - spin_unlock_irqrestore(&confirm_error_lock, flags); - - reset_state = rets[0]; - - eeh_slot_error_detail (pdn, 1 /* Temporary Error */); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - - /* Most EEH events are due to device driver bugs. Having - * a stack trace will help the device-driver authors figure - * out what happened. So print that out. */ - if (rets[0] != 5) dump_stack(); - schedule_work(&eeh_event_wq); - - return 1; - -dn_unlock: - spin_unlock_irqrestore(&confirm_error_lock, flags); - return rc; -} - -EXPORT_SYMBOL_GPL(eeh_dn_check_failure); - -/** - * eeh_check_failure - check if all 1's data is due to EEH slot freeze - * @token i/o token, should be address in the form 0xA.... - * @val value, should be all 1's (XXX why do we need this arg??) - * - * Check for an EEH failure at the given token address. Call this - * routine if the result of a read was all 0xff's and you want to - * find out if this is due to an EEH slot freeze event. This routine - * will query firmware for the EEH status. - * - * Note this routine is safe to call in an interrupt context. - */ -unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) -{ - unsigned long addr; - struct pci_dev *dev; - struct device_node *dn; - - /* Finding the phys addr + pci device; this is pretty quick. */ - addr = eeh_token_to_phys((unsigned long __force) token); - dev = pci_get_device_by_addr(addr); - if (!dev) { - __get_cpu_var(no_device)++; - return val; - } - - dn = pci_device_to_OF_node(dev); - eeh_dn_check_failure (dn, dev); - - pci_dev_put(dev); - return val; -} - -EXPORT_SYMBOL(eeh_check_failure); - -struct eeh_early_enable_info { - unsigned int buid_hi; - unsigned int buid_lo; -}; - -/* Enable eeh for the given device node. */ -static void *early_enable_eeh(struct device_node *dn, void *data) -{ - struct eeh_early_enable_info *info = data; - int ret; - char *status = get_property(dn, "status", NULL); - u32 *class_code = (u32 *)get_property(dn, "class-code", NULL); - u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL); - u32 *device_id = (u32 *)get_property(dn, "device-id", NULL); - u32 *regs; - int enable; - struct pci_dn *pdn = PCI_DN(dn); - - pdn->eeh_mode = 0; - pdn->eeh_check_count = 0; - pdn->eeh_freeze_count = 0; - - if (status && strcmp(status, "ok") != 0) - return NULL; /* ignore devices with bad status */ - - /* Ignore bad nodes. */ - if (!class_code || !vendor_id || !device_id) - return NULL; - - /* There is nothing to check on PCI to ISA bridges */ - if (dn->type && !strcmp(dn->type, "isa")) { - pdn->eeh_mode |= EEH_MODE_NOCHECK; - return NULL; - } - - /* - * Now decide if we are going to "Disable" EEH checking - * for this device. We still run with the EEH hardware active, - * but we won't be checking for ff's. This means a driver - * could return bad data (very bad!), an interrupt handler could - * hang waiting on status bits that won't change, etc. - * But there are a few cases like display devices that make sense. - */ - enable = 1; /* i.e. we will do checking */ - if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY) - enable = 0; - - if (!enable) - pdn->eeh_mode |= EEH_MODE_NOCHECK; - - /* Ok... see if this device supports EEH. Some do, some don't, - * and the only way to find out is to check each and every one. */ - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - /* Try to enable eeh */ - ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL, - regs[0], info->buid_hi, info->buid_lo, - EEH_ENABLE); - if (ret == 0) { - eeh_subsystem_enabled = 1; - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = regs[0]; -#ifdef DEBUG - printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); -#endif - } else { - - /* This device doesn't support EEH, but it may have an - * EEH parent, in which case we mark it as supported. */ - if (dn->parent && PCI_DN(dn->parent) - && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) { - /* Parent supports EEH. */ - pdn->eeh_mode |= EEH_MODE_SUPPORTED; - pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr; - return NULL; - } - } - } else { - printk(KERN_WARNING "EEH: %s: unable to get reg property.\n", - dn->full_name); - } - - return NULL; -} - -/* - * Initialize EEH by trying to enable it for all of the adapters in the system. - * As a side effect we can determine here if eeh is supported at all. - * Note that we leave EEH on so failed config cycles won't cause a machine - * check. If a user turns off EEH for a particular adapter they are really - * telling Linux to ignore errors. Some hardware (e.g. POWER5) won't - * grant access to a slot if EEH isn't enabled, and so we always enable - * EEH for all slots/all devices. - * - * The eeh-force-off option disables EEH checking globally, for all slots. - * Even if force-off is set, the EEH hardware is still enabled, so that - * newer systems can boot. - */ -void __init eeh_init(void) -{ - struct device_node *phb, *np; - struct eeh_early_enable_info info; - - spin_lock_init(&confirm_error_lock); - spin_lock_init(&slot_errbuf_lock); - - np = of_find_node_by_path("/rtas"); - if (np == NULL) - return; - - ibm_set_eeh_option = rtas_token("ibm,set-eeh-option"); - ibm_set_slot_reset = rtas_token("ibm,set-slot-reset"); - ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2"); - ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state"); - ibm_slot_error_detail = rtas_token("ibm,slot-error-detail"); - - if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE) - return; - - eeh_error_buf_size = rtas_token("rtas-error-log-max"); - if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { - eeh_error_buf_size = 1024; - } - if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) { - printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated " - "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX); - eeh_error_buf_size = RTAS_ERROR_LOG_MAX; - } - - /* Enable EEH for all adapters. Note that eeh requires buid's */ - for (phb = of_find_node_by_name(NULL, "pci"); phb; - phb = of_find_node_by_name(phb, "pci")) { - unsigned long buid; - - buid = get_phb_buid(phb); - if (buid == 0 || PCI_DN(phb) == NULL) - continue; - - info.buid_lo = BUID_LO(buid); - info.buid_hi = BUID_HI(buid); - traverse_pci_devices(phb, early_enable_eeh, &info); - } - - if (eeh_subsystem_enabled) - printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n"); - else - printk(KERN_WARNING "EEH: No capable adapters found\n"); -} - -/** - * eeh_add_device_early - enable EEH for the indicated device_node - * @dn: device node for which to set up EEH - * - * This routine must be used to perform EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - * This routine must be called before any i/o is performed to the - * adapter (inluding any config-space i/o). - * Whether this actually enables EEH or not for this device depends - * on the CEC architecture, type of the device, on earlier boot - * command-line arguments & etc. - */ -void eeh_add_device_early(struct device_node *dn) -{ - struct pci_controller *phb; - struct eeh_early_enable_info info; - - if (!dn || !PCI_DN(dn)) - return; - phb = PCI_DN(dn)->phb; - if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", - dn->full_name); - dump_stack(); - return; - } - - info.buid_hi = BUID_HI(phb->buid); - info.buid_lo = BUID_LO(phb->buid); - early_enable_eeh(dn, &info); -} -EXPORT_SYMBOL_GPL(eeh_add_device_early); - -/** - * eeh_add_device_late - perform EEH initialization for the indicated pci device - * @dev: pci device for which to set up EEH - * - * This routine must be used to complete EEH initialization for PCI - * devices that were added after system boot (e.g. hotplug, dlpar). - */ -void eeh_add_device_late(struct pci_dev *dev) -{ - struct device_node *dn; - - if (!dev || !eeh_subsystem_enabled) - return; - -#ifdef DEBUG - printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev)); -#endif - - pci_dev_get (dev); - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = dev; - - pci_addr_cache_insert_device (dev); -} -EXPORT_SYMBOL_GPL(eeh_add_device_late); - -/** - * eeh_remove_device - undo EEH setup for the indicated pci device - * @dev: pci device to be removed - * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). - */ -void eeh_remove_device(struct pci_dev *dev) -{ - struct device_node *dn; - if (!dev || !eeh_subsystem_enabled) - return; - - /* Unregister the device with the EEH/PCI address search system */ -#ifdef DEBUG - printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev)); -#endif - pci_addr_cache_remove_device(dev); - - dn = pci_device_to_OF_node(dev); - PCI_DN(dn)->pcidev = NULL; - pci_dev_put (dev); -} -EXPORT_SYMBOL_GPL(eeh_remove_device); - -static int proc_eeh_show(struct seq_file *m, void *v) -{ - unsigned int cpu; - unsigned long ffs = 0, positives = 0, failures = 0; - unsigned long resets = 0; - unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; - - for_each_cpu(cpu) { - ffs += per_cpu(total_mmio_ffs, cpu); - positives += per_cpu(false_positives, cpu); - failures += per_cpu(ignored_failures, cpu); - resets += per_cpu(slot_resets, cpu); - no_dev += per_cpu(no_device, cpu); - no_dn += per_cpu(no_dn, cpu); - no_cfg += per_cpu(no_cfg_addr, cpu); - no_check += per_cpu(ignored_check, cpu); - } - - if (0 == eeh_subsystem_enabled) { - seq_printf(m, "EEH Subsystem is globally disabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); - } else { - seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, - "no device=%ld\n" - "no device node=%ld\n" - "no config address=%ld\n" - "check not wanted=%ld\n" - "eeh_total_mmio_ffs=%ld\n" - "eeh_false_positives=%ld\n" - "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n", - no_dev, no_dn, no_cfg, no_check, - ffs, positives, failures, resets); - } - - return 0; -} - -static int proc_eeh_open(struct inode *inode,