From starox at free.fr Tue Feb 1 00:11:44 2005 From: starox at free.fr (Frederic Leroy) Date: Mon, 31 Jan 2005 14:11:44 +0100 (CET) Subject: [minor] Apple Pmac G5 - ATA performance problem Message-ID: <20050131141146.69a0a9e5@miss> Hello, I put a harddrive and 'superdrive' on ATA bus on a PowerMac G5. It's works very fine but I notice the harddrive is going half speed. The harddrive and the optical drive are both in UltraDMA2. The Ata driver don't accept UltraDMA mode above 2. Here is results of Bonnie with a 2G test file on Linux and MacOSX : -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU G5-linux 2000 23867 99.3 42912 9.3 13562 3.4 16871 67.3 23292 3.7 316.6 0.7 G5-macos 0 31504 99.1 54656 18.0 22615 13.7 34970 98.6 54632 23.9 224.4 3.3 -- Frederic Leroy Lost in Germany From greg at kroah.com Tue Feb 1 06:15:46 2005 From: greg at kroah.com (Greg KH) Date: Mon, 31 Jan 2005 11:15:46 -0800 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FE82B6.9060407@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> Message-ID: <20050131191546.GA22428@kroah.com> On Mon, Jan 31, 2005 at 01:10:46PM -0600, Brian King wrote: > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + Kernel functions traditionally return 0 for success and -ESOMETHING for error. Care to fix this up to match that convention? thanks, greg k-h From brking at us.ibm.com Tue Feb 1 06:10:46 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 31 Jan 2005 13:10:46 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <20050129040647.GA6261@kroah.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> Message-ID: <41FE82B6.9060407@us.ibm.com> Greg KH wrote: > On Fri, Jan 28, 2005 at 06:52:34PM +0000, Christoph Hellwig wrote: > >>>+int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } >> >> - prototypes belong to headers >> - weak linkage is the perfect way for total obsfucation >> >>please make this a regular arch hook > > > I agree. Also, when sending PCI related patches, please cc the > linux-pci mailing list. How about this? -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pci_get_cfg_size_all.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/374a0f61/attachment.txt From matthew at wil.cx Tue Feb 1 06:29:55 2005 From: matthew at wil.cx (Matthew Wilcox) Date: Mon, 31 Jan 2005 19:29:55 +0000 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FE82B6.9060407@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> Message-ID: <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> On Mon, Jan 31, 2005 at 01:10:46PM -0600, Brian King wrote: > Greg KH wrote: > >On Fri, Jan 28, 2005 at 06:52:34PM +0000, Christoph Hellwig wrote: > > > >>>+int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { > >>>return 1; } > >> > >>- prototypes belong to headers > >>- weak linkage is the perfect way for total obsfucation > >> > >>please make this a regular arch hook > > > > > >I agree. Also, when sending PCI related patches, please cc the > >linux-pci mailing list. > > How about this? Thanks for copying linux-pci. I hate this patch. Basically, ppc64's config ops are broken and need to check the offset being read. Here's i386: static int pci_conf1_write (int seg, int bus, int devfn, int reg, int len, u32 v alue) { unsigned long flags; if ((bus > 255) || (devfn > 255) || (reg > 255)) return -EINVAL; I think all the config ops in ppc64 are broken and need to check for these limits. Also, it does some checks that are already performed by upper layers: if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; is checked for in drivers/pci/access.c -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain From brking at us.ibm.com Tue Feb 1 06:40:04 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 31 Jan 2005 13:40:04 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FE82B6.9060407@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> Message-ID: <41FE8994.4040802@us.ibm.com> Brian King wrote: > Greg KH wrote: > >> On Fri, Jan 28, 2005 at 06:52:34PM +0000, Christoph Hellwig wrote: >> >>>> +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev >>>> *dev) { return 1; } >>> >>> >>> - prototypes belong to headers >>> - weak linkage is the perfect way for total obsfucation >>> >>> please make this a regular arch hook >> >> >> >> I agree. Also, when sending PCI related patches, please cc the >> linux-pci mailing list. CC'ing the linux-pci mailing list... -brian > How about this? > > > ------------------------------------------------------------------------ > > > When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64 > system, the current code used to determine the config space size > of a device results in a PCI Master abort and an EEH error, resulting > in the device being taken offline. This patch adds an arch hook so > that individual archs can indicate if the underlying system supports > expanded config space accesses or not. > > Signed-off-by: Brian King > --- > > linux-2.6.11-rc2-bk9-bjking1/arch/alpha/kernel/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/arm/kernel/bios32.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/frv/mb93090-mb00/pci-frv.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/i386/pci/common.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/ia64/pci/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/m68knommu/kernel/comempci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/mips/pci/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/mips/pmc-sierra/yosemite/ht.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/parisc/kernel/pci.c | 1 > linux-2.6.11-rc2-bk9-bjking1/arch/ppc/kernel/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/ppc64/kernel/iSeries_pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/ppc64/kernel/pci.c | 18 ++++++++++ > linux-2.6.11-rc2-bk9-bjking1/arch/sh/boards/mpc1211/pci.c | 1 > linux-2.6.11-rc2-bk9-bjking1/arch/sh/boards/overdrive/galileo.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/sh/drivers/pci/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/sh64/kernel/pcibios.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/sparc/kernel/pcic.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/sparc64/kernel/pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/arch/v850/kernel/rte_mb_a_pci.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/drivers/pci/probe.c | 2 + > linux-2.6.11-rc2-bk9-bjking1/include/linux/pci.h | 1 > 21 files changed, 55 insertions(+) > > diff -puN drivers/pci/probe.c~pci_get_cfg_size_all drivers/pci/probe.c > --- linux-2.6.11-rc2-bk9/drivers/pci/probe.c~pci_get_cfg_size_all 2005-01-31 11:16:22.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/drivers/pci/probe.c 2005-01-31 11:22:07.000000000 -0600 > @@ -653,6 +653,8 @@ static int pci_cfg_space_size(struct pci > goto fail; > } > > + if (!pcibios_exp_cfg_space(dev)) > + goto fail; > if (pci_read_config_dword(dev, 256, &status) != PCIBIOS_SUCCESSFUL) > goto fail; > if (status == 0xffffffff) > diff -puN arch/alpha/kernel/pci.c~pci_get_cfg_size_all arch/alpha/kernel/pci.c > --- linux-2.6.11-rc2-bk9/arch/alpha/kernel/pci.c~pci_get_cfg_size_all 2005-01-31 11:16:33.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/alpha/kernel/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -202,6 +202,8 @@ pcibios_setup(char *str) > return str; > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > #ifdef ALPHA_RESTORE_SRM_SETUP > static struct pdev_srm_saved_conf *srm_saved_configs; > > diff -puN arch/arm/kernel/bios32.c~pci_get_cfg_size_all arch/arm/kernel/bios32.c > --- linux-2.6.11-rc2-bk9/arch/arm/kernel/bios32.c~pci_get_cfg_size_all 2005-01-31 11:16:43.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/arm/kernel/bios32.c 2005-01-31 11:22:27.000000000 -0600 > @@ -67,6 +67,8 @@ void pcibios_report_status(u_int status_ > } > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > /* > * We don't use this to fix the device, but initialisation of it. > * It's not the correct use for this, but it works. > diff -puN arch/frv/mb93090-mb00/pci-frv.c~pci_get_cfg_size_all arch/frv/mb93090-mb00/pci-frv.c > --- linux-2.6.11-rc2-bk9/arch/frv/mb93090-mb00/pci-frv.c~pci_get_cfg_size_all 2005-01-31 11:16:55.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/frv/mb93090-mb00/pci-frv.c 2005-01-31 11:22:27.000000000 -0600 > @@ -286,3 +286,5 @@ void pcibios_set_master(struct pci_dev * > printk(KERN_DEBUG "PCI: Setting latency timer of device %s to %d\n", pci_name(dev), lat); > pci_write_config_byte(dev, PCI_LATENCY_TIMER, lat); > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/i386/pci/common.c~pci_get_cfg_size_all arch/i386/pci/common.c > --- linux-2.6.11-rc2-bk9/arch/i386/pci/common.c~pci_get_cfg_size_all 2005-01-31 11:17:01.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/i386/pci/common.c 2005-01-31 11:22:27.000000000 -0600 > @@ -249,3 +249,5 @@ int pcibios_enable_device(struct pci_dev > > return pcibios_enable_irq(dev); > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/ia64/pci/pci.c~pci_get_cfg_size_all arch/ia64/pci/pci.c > --- linux-2.6.11-rc2-bk9/arch/ia64/pci/pci.c~pci_get_cfg_size_all 2005-01-31 11:17:09.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/ia64/pci/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -744,3 +744,5 @@ int pci_vector_resources(int last, int n > > return count; > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/m68knommu/kernel/comempci.c~pci_get_cfg_size_all arch/m68knommu/kernel/comempci.c > --- linux-2.6.11-rc2-bk9/arch/m68knommu/kernel/comempci.c~pci_get_cfg_size_all 2005-01-31 11:17:23.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/m68knommu/kernel/comempci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -987,3 +987,5 @@ void pci_free_consistent(struct pci_dev > } > > /*****************************************************************************/ > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/mips/pci/pci.c~pci_get_cfg_size_all arch/mips/pci/pci.c > --- linux-2.6.11-rc2-bk9/arch/mips/pci/pci.c~pci_get_cfg_size_all 2005-01-31 11:17:33.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/mips/pci/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -300,3 +300,5 @@ char *pcibios_setup(char *str) > { > return str; > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/mips/pmc-sierra/yosemite/ht.c~pci_get_cfg_size_all arch/mips/pmc-sierra/yosemite/ht.c > --- linux-2.6.11-rc2-bk9/arch/mips/pmc-sierra/yosemite/ht.c~pci_get_cfg_size_all 2005-01-31 11:17:44.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/mips/pmc-sierra/yosemite/ht.c 2005-01-31 11:22:27.000000000 -0600 > @@ -451,4 +451,6 @@ unsigned __init int pcibios_assign_all_b > return 0; > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > #endif /* CONFIG_HYPERTRANSPORT */ > diff -puN arch/parisc/kernel/pci.c~pci_get_cfg_size_all arch/parisc/kernel/pci.c > --- linux-2.6.11-rc2-bk9/arch/parisc/kernel/pci.c~pci_get_cfg_size_all 2005-01-31 11:17:50.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/parisc/kernel/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -330,6 +330,7 @@ int pcibios_enable_device(struct pci_dev > return 0; > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > > /* PA-RISC specific */ > void pcibios_register_hba(struct pci_hba_data *hba) > diff -puN arch/ppc/kernel/pci.c~pci_get_cfg_size_all arch/ppc/kernel/pci.c > --- linux-2.6.11-rc2-bk9/arch/ppc/kernel/pci.c~pci_get_cfg_size_all 2005-01-31 11:18:02.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/ppc/kernel/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -1728,6 +1728,8 @@ void pci_iounmap(struct pci_dev *dev, vo > EXPORT_SYMBOL(pci_iomap); > EXPORT_SYMBOL(pci_iounmap); > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > /* > * Null PCI config access functions, for the case when we can't > * find a hose. > diff -puN arch/ppc64/kernel/iSeries_pci.c~pci_get_cfg_size_all arch/ppc64/kernel/iSeries_pci.c > --- linux-2.6.11-rc2-bk9/arch/ppc64/kernel/iSeries_pci.c~pci_get_cfg_size_all 2005-01-31 11:18:09.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/ppc64/kernel/iSeries_pci.c 2005-01-31 11:22:20.000000000 -0600 > @@ -348,6 +348,8 @@ void pcibios_fixup_resources(struct pci_ > PPCDBG(PPCDBG_BUSWALK, "fixup_resources pdev %p\n", pdev); > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 0; } > + > /* > * Loop through each node function to find usable EADs bridges. > */ > diff -puN arch/ppc64/kernel/pci.c~pci_get_cfg_size_all arch/ppc64/kernel/pci.c > --- linux-2.6.11-rc2-bk9/arch/ppc64/kernel/pci.c~pci_get_cfg_size_all 2005-01-31 11:18:13.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/ppc64/kernel/pci.c 2005-01-31 11:22:20.000000000 -0600 > @@ -467,6 +467,24 @@ void pcibios_add_platform_entries(struct > > #ifdef CONFIG_PPC_MULTIPLATFORM > > +int pcibios_exp_cfg_space(struct pci_dev *dev) > +{ > + int *type; > + struct device_node *dn; > + struct pci_controller *hose = pci_bus_to_host(dev->bus); > + > + if (!hose) > + return 0; > + > + dn = (struct device_node *) hose->arch_data; > + type = (int *)get_property(dn, "ibm,pci-config-space-type", NULL); > + > + if (type && *type == 1) > + return 1; > + > + return 0; > +} > + > #define ISA_SPACE_MASK 0x1 > #define ISA_SPACE_IO 0x1 > > diff -puN arch/sh/boards/mpc1211/pci.c~pci_get_cfg_size_all arch/sh/boards/mpc1211/pci.c > --- linux-2.6.11-rc2-bk9/arch/sh/boards/mpc1211/pci.c~pci_get_cfg_size_all 2005-01-31 11:18:24.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sh/boards/mpc1211/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -294,3 +294,4 @@ void pcibios_align_resource(void *data, > } > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/sh/boards/overdrive/galileo.c~pci_get_cfg_size_all arch/sh/boards/overdrive/galileo.c > --- linux-2.6.11-rc2-bk9/arch/sh/boards/overdrive/galileo.c~pci_get_cfg_size_all 2005-01-31 11:18:33.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sh/boards/overdrive/galileo.c 2005-01-31 11:22:27.000000000 -0600 > @@ -586,3 +586,5 @@ void pcibios_set_master(struct pci_dev * > printk("PCI: Setting latency timer of device %s to %d\n", pci_name(dev), lat); > pci_write_config_byte(dev, PCI_LATENCY_TIMER, lat); > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/sh/drivers/pci/pci.c~pci_get_cfg_size_all arch/sh/drivers/pci/pci.c > --- linux-2.6.11-rc2-bk9/arch/sh/drivers/pci/pci.c~pci_get_cfg_size_all 2005-01-31 11:18:49.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sh/drivers/pci/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -153,3 +153,5 @@ void __init pcibios_update_irq(struct pc > { > pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq); > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/sh64/kernel/pcibios.c~pci_get_cfg_size_all arch/sh64/kernel/pcibios.c > --- linux-2.6.11-rc2-bk9/arch/sh64/kernel/pcibios.c~pci_get_cfg_size_all 2005-01-31 11:19:47.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sh64/kernel/pcibios.c 2005-01-31 11:22:27.000000000 -0600 > @@ -166,3 +166,5 @@ void __init pcibios_update_irq(struct pc > { > pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq); > } > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/sparc/kernel/pcic.c~pci_get_cfg_size_all arch/sparc/kernel/pcic.c > --- linux-2.6.11-rc2-bk9/arch/sparc/kernel/pcic.c~pci_get_cfg_size_all 2005-01-31 11:19:52.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sparc/kernel/pcic.c 2005-01-31 11:22:27.000000000 -0600 > @@ -1033,3 +1033,5 @@ void insl(void * __iomem addr, void *dst > } > > subsys_initcall(pcic_init); > + > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > diff -puN arch/sparc64/kernel/pci.c~pci_get_cfg_size_all arch/sparc64/kernel/pci.c > --- linux-2.6.11-rc2-bk9/arch/sparc64/kernel/pci.c~pci_get_cfg_size_all 2005-01-31 11:20:02.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/sparc64/kernel/pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -809,4 +809,6 @@ int pcibios_prep_mwi(struct pci_dev *dev > return 0; > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > #endif /* !(CONFIG_PCI) */ > diff -puN arch/v850/kernel/rte_mb_a_pci.c~pci_get_cfg_size_all arch/v850/kernel/rte_mb_a_pci.c > --- linux-2.6.11-rc2-bk9/arch/v850/kernel/rte_mb_a_pci.c~pci_get_cfg_size_all 2005-01-31 11:20:15.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/arch/v850/kernel/rte_mb_a_pci.c 2005-01-31 11:22:27.000000000 -0600 > @@ -337,6 +337,8 @@ void pcibios_set_master (struct pci_dev > { > } > > +int pcibios_exp_cfg_space(struct pci_dev *dev) { return 1; } > + > > /* Mother-A SRAM memory allocation. This is a simple first-fit allocator. */ > > diff -puN include/linux/pci.h~pci_get_cfg_size_all include/linux/pci.h > --- linux-2.6.11-rc2-bk9/include/linux/pci.h~pci_get_cfg_size_all 2005-01-31 11:20:30.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/include/linux/pci.h 2005-01-31 11:22:07.000000000 -0600 > @@ -723,6 +723,7 @@ extern struct list_head pci_devices; /* > void pcibios_fixup_bus(struct pci_bus *); > int pcibios_enable_device(struct pci_dev *, int mask); > char *pcibios_setup (char *str); > +int pcibios_exp_cfg_space(struct pci_dev *dev); > > /* Used only when drivers/pci/setup.c is used */ > void pcibios_align_resource(void *, struct resource *, > _ -- Brian King eServer Storage I/O IBM Linux Technology Center From sam at ravnborg.org Tue Feb 1 06:27:13 2005 From: sam at ravnborg.org (Sam Ravnborg) Date: Mon, 31 Jan 2005 20:27:13 +0100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline In-Reply-To: <1107151447.5712.81.camel@gaston> References: <1107151447.5712.81.camel@gaston> Message-ID: <20050131192713.GA16268@mars.ravnborg.org> > Index: linux-work/arch/ppc64/kernel/vdso32/Makefile > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-01-31 16:25:56.000000000 +1100 > @@ -0,0 +1,50 @@ > +# Choose compiler > +# > +# XXX FIXME: We probably want to enforce using a biarch compiler by default > +# and thus use (CC) with -m64, while letting the user pass a > +# CROSS32_COMPILE prefix if wanted. Same goes for the zImage > +# wrappers > +# > + > +CROSS32_COMPILE ?= > + > +CROSS32CC := $(CROSS32_COMPILE)gcc > +CROSS32AS := $(CROSS32_COMPILE)as This needs to go into arch/ppc64/Makefile > + > +# List of files in the vdso, has to be asm only for now > + > +src-vdso32 = sigtramp.S gettimeofday.S datapage.S cacheflush.S It is normal kbuild practice to list .o files. So it would be: obj-vdso32 := sigtramp.o gettimeofday.o datapage.o cacheflush.o targets := $(obj-vdso32) obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) One line saved compared to below (not counting the src-vdso32 assignment that is unused). Also notice that ':=' uses all over. No need to use late evaluation when no dynamic references are used ($ $@ etc.). > +# Build rules > + > +obj-vdso32 := $(addsuffix .o, $(basename $(src-vdso32))) > +targets := $(obj-vdso32) vdso32.so > +obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) > +src-vdso32 := $(addprefix $(src)/, $(src-vdso32)) Same comments to the vdso64/Makefile Sam From arnd at arndb.de Tue Feb 1 07:51:04 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 31 Jan 2005 21:51:04 +0100 Subject: pci: Arch hook to determine config space size In-Reply-To: <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <41FE82B6.9060407@us.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> Message-ID: <200501312151.05323.arnd@arndb.de> On Maandag 31 Januar 2005 20:29, Matthew Wilcox wrote: > Thanks for copying linux-pci. ?I hate this patch. > > Basically, ppc64's config ops are broken and need to check the offset > being read. To make things worse, simply allowing the larger config space will silently access the wrong device. The least that needs to be done is to pass the correct address to the firmware. This patch should do the right thing, though I don't have any PCIe card to test with. Note that at least for the rtas pci config access, the bus/devfn values come from the device tree, which makes it somewhat harder to screw them up, and rtas ought to check for obviously wrong addresses as well. Signed-off-by: Arnd Bergmann --- linux-mm.orig/arch/ppc64/kernel/pSeries_pci.c 2005-01-28 07:21:15.000000000 -0500 +++ linux-mm/arch/ppc64/kernel/pSeries_pci.c 2005-01-31 15:56:10.244983464 -0500 @@ -63,7 +63,8 @@ if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; - addr = (dn->busno << 16) | (dn->devfn << 8) | where; + addr = ((where & 0xf00) << 20) | (dn->busno << 16) + | (dn->devfn << 8) | (where & 0x0ff); buid = dn->phb->buid; if (buid) { ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, @@ -111,7 +112,8 @@ if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; - addr = (dn->busno << 16) | (dn->devfn << 8) | where; + addr = ((where & 0xf00) << 20) | (dn->busno << 16) + | (dn->devfn << 8) | (where & 0x0ff); buid = dn->phb->buid; if (buid) { ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/97d5360f/attachment.pgp From brking at us.ibm.com Tue Feb 1 08:35:38 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 31 Jan 2005 15:35:38 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> Message-ID: <41FEA4AA.1080407@us.ibm.com> Matthew Wilcox wrote: > Basically, ppc64's config ops are broken and need to check the offset > being read. Here's i386: > > static int pci_conf1_write (int seg, int bus, int devfn, int reg, int len, u32 v > alue) > { > unsigned long flags; > > if ((bus > 255) || (devfn > 255) || (reg > 255)) > return -EINVAL; Here is a pure ppc64 implementation that does this. > > I think all the config ops in ppc64 are broken and need to check for these > limits. Also, it does some checks that are already performed by upper layers: > > if (where & (size - 1)) > return PCIBIOS_BAD_REGISTER_NUMBER; > > is checked for in drivers/pci/access.c I can submit a separate patch to clean that up. -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/6840b689/attachment.txt From arnd at arndb.de Tue Feb 1 08:56:44 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 31 Jan 2005 22:56:44 +0100 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FEA4AA.1080407@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> Message-ID: <200501312256.44692.arnd@arndb.de> On Maandag 31 Januar 2005 22:35, Brian King wrote: > Matthew Wilcox wrote: > > Basically, ppc64's config ops are broken and need to check the offset > > being read. ?Here's i386: > > > > static int pci_conf1_write (int seg, int bus, int devfn, int reg, int len, u32 v > > alue) > > { > > ? ? ? ? unsigned long flags; > > > > ? ? ? ? if ((bus > 255) || (devfn > 255) || (reg > 255)) > > ? ? ? ? ? ? ? ? return -EINVAL; > > Here is a pure ppc64 implementation that does this. Actually, it doesn't: > +static int config_access_valid(struct device_node *dn, int where) > +{ > +???????struct device_node *hose_dn = dn->phb->arch_data; > + > +???????if (where < 256 || hose_dn->pci_ext_config_space) > +???????????????return 1; This needs a check for (where < 4096) in case of PCIe or PCI-X. > @@ -62,6 +72,8 @@ static int rtas_read_config(struct devic > ????????????????return PCIBIOS_DEVICE_NOT_FOUND; > ????????if (where & (size - 1)) > ????????????????return PCIBIOS_BAD_REGISTER_NUMBER; > +???????if (!config_access_valid(dn, where)) > +???????????????return PCIBIOS_BAD_REGISTER_NUMBER; > ? > ????????addr = (dn->busno << 16) | (dn->devfn << 8) | where; addr is still wrong, see my previous mail. > @@ -110,6 +122,8 @@ static int rtas_write_config(struct devi > ????????????????return PCIBIOS_DEVICE_NOT_FOUND; > ????????if (where & (size - 1)) > ????????????????return PCIBIOS_BAD_REGISTER_NUMBER; > +???????if (!config_access_valid(dn, where)) > +???????????????return PCIBIOS_BAD_REGISTER_NUMBER; > ? > ????????addr = (dn->busno << 16) | (dn->devfn << 8) | where; same here > @@ -285,6 +309,7 @@ static int __devinit setup_phb(struct de > ????????phb->arch_data = dev; > ????????phb->ops = &rtas_pci_ops; > ????????phb->buid = get_phb_buid(dev); > +???????get_phb_config_space_type(dev); > ? > ????????return 0; > ?} Isn't the config space size a property of the PCI device instead of the host bridge? For a PCI device behind a PCIe host bridge, this could still lead to an incorrect config space accesses. Arnd <>< PS: I got a permanent fatal error from , does that list actually exist? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/3bf6d47e/attachment.pgp From greg at kroah.com Tue Feb 1 09:13:46 2005 From: greg at kroah.com (Greg KH) Date: Mon, 31 Jan 2005 14:13:46 -0800 Subject: pci: Arch hook to determine config space size In-Reply-To: <200501312256.44692.arnd@arndb.de> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> Message-ID: <20050131221346.GA25180@kroah.com> On Mon, Jan 31, 2005 at 10:56:44PM +0100, Arnd Bergmann wrote: > PS: I got a permanent fatal error from , does > that list actually exist? No, that is not the email address for the linux-pci mailing list. I don't know who put that in this thread, but next time, someone might want to actually look the address up before blindly guessing... thanks, greg k-h From brking at us.ibm.com Tue Feb 1 09:43:30 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 31 Jan 2005 16:43:30 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <200501312256.44692.arnd@arndb.de> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> Message-ID: <41FEB492.2020002@us.ibm.com> Arnd Bergmann wrote: > On Maandag 31 Januar 2005 22:35, Brian King wrote: > >>Matthew Wilcox wrote: >> >>>Basically, ppc64's config ops are broken and need to check the offset >>>being read. Here's i386: >>> >>>static int pci_conf1_write (int seg, int bus, int devfn, int reg, int len, u32 v >>>alue) >>>{ >>> unsigned long flags; >>> >>> if ((bus > 255) || (devfn > 255) || (reg > 255)) >>> return -EINVAL; >> >>Here is a pure ppc64 implementation that does this. > > > Actually, it doesn't: > > >>+static int config_access_valid(struct device_node *dn, int where) >>+{ >>+ struct device_node *hose_dn = dn->phb->arch_data; >>+ >>+ if (where < 256 || hose_dn->pci_ext_config_space) >>+ return 1; > > > This needs a check for (where < 4096) in case of PCIe or PCI-X. Done. >>@@ -62,6 +72,8 @@ static int rtas_read_config(struct devic >> return PCIBIOS_DEVICE_NOT_FOUND; >> if (where & (size - 1)) >> return PCIBIOS_BAD_REGISTER_NUMBER; >>+ if (!config_access_valid(dn, where)) >>+ return PCIBIOS_BAD_REGISTER_NUMBER; >> >> addr = (dn->busno << 16) | (dn->devfn << 8) | where; > > > addr is still wrong, see my previous mail. Fixed. >>@@ -285,6 +309,7 @@ static int __devinit setup_phb(struct de >> phb->arch_data = dev; >> phb->ops = &rtas_pci_ops; >> phb->buid = get_phb_buid(dev); >>+ get_phb_config_space_type(dev); >> >> return 0; >> } > > > Isn't the config space size a property of the PCI device instead of the > host bridge? For a PCI device behind a PCIe host bridge, this could > still lead to an incorrect config space accesses. It is a property of both. Accessing config space beyond the first 256 bytes will only work if both the PCI device and the host bridge support it. The problem I ran into was generic pci code issuing a config read to offset 256 after checking that the device supports it when the host bridge did not support it. > PS: I got a permanent fatal error from , does > that list actually exist? Sorry about that... Should be fixed on this thread now. I checked the archives and saw a thread related to adding another L: line to the MAINTAINERS file for the linux-pci list. Greg - was some flavor of that patch going in? -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/0ff2e3ca/attachment.txt From benh at kernel.crashing.org Tue Feb 1 10:15:33 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 10:15:33 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline In-Reply-To: <20050131192713.GA16268@mars.ravnborg.org> References: <1107151447.5712.81.camel@gaston> <20050131192713.GA16268@mars.ravnborg.org> Message-ID: <1107213333.5905.21.camel@gaston> On Mon, 2005-01-31 at 20:27 +0100, Sam Ravnborg wrote: > > Index: linux-work/arch/ppc64/kernel/vdso32/Makefile > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-01-31 16:25:56.000000000 +1100 > > @@ -0,0 +1,50 @@ > > +# Choose compiler > > +# > > +# XXX FIXME: We probably want to enforce using a biarch compiler by default > > +# and thus use (CC) with -m64, while letting the user pass a > > +# CROSS32_COMPILE prefix if wanted. Same goes for the zImage > > +# wrappers > > +# > > + > > +CROSS32_COMPILE ?= > > + > > +CROSS32CC := $(CROSS32_COMPILE)gcc > > +CROSS32AS := $(CROSS32_COMPILE)as > This needs to go into arch/ppc64/Makefile Yes, we need to consolidate that with the CROSS32_COMPILE stuff using by the boot wrapper (arch/ppc64/boot). I haven't yet completely decided what to do there, I'll probably assume a biarch compiler by default instead of using the local gcc for 32 bits unless CROSS32_COMPILE is specified. > > + > > +# List of files in the vdso, has to be asm only for now > > + > > +src-vdso32 = sigtramp.S gettimeofday.S datapage.S cacheflush.S > > It is normal kbuild practice to list .o files. > So it would be: > > obj-vdso32 := sigtramp.o gettimeofday.o datapage.o cacheflush.o > targets := $(obj-vdso32) > obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) > > One line saved compared to below (not counting the src-vdso32 assignment > that is unused). > Also notice that ':=' uses all over. No need to use late evaluation when > no dynamic references are used ($ $@ etc.). > > > +# Build rules > > + > > +obj-vdso32 := $(addsuffix .o, $(basename $(src-vdso32))) > > +targets := $(obj-vdso32) vdso32.so > > +obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) > > +src-vdso32 := $(addprefix $(src)/, $(src-vdso32)) > > > Same comments to the vdso64/Makefile Hrm... I remember back then flip/flop'ing between using .S and .o in the file list and I had a reason to stick to .S but I can't remember why now :) It may be something I fixed in the meantime tho, I'll have a look . I'm not sure about the "late evaluation" thing, I'm no make expert (just learning as I write those makefiles), I'll have to dig in the doc here. Ben. From arndb at onlinehome.de Tue Feb 1 10:22:02 2005 From: arndb at onlinehome.de (arndb at onlinehome.de) Date: Tue, 1 Feb 2005 00:22:02 +0100 Subject: pci: Arch hook to determine config space size Message-ID: <26879984$110721275641feb9d4b0ac20.24786725@config18.schlund.de> Brian King schrieb am 31.01.2005, 23:43:30: > > Isn't the config space size a property of the PCI device instead of the > > host bridge? For a PCI device behind a PCIe host bridge, this could > > still lead to an incorrect config space accesses. > > It is a property of both. Accessing config space beyond the first 256 > bytes will only work if both the PCI device and the host bridge support > it. The problem I ran into was generic pci code issuing a config read to > offset 256 after checking that the device supports it when the host > bridge did not support it. If I interpret the spec correctly, the firmware should always store the value we need in the property for every device node, which means that you should look at the host bridge config-space-type attribute only when you want to look at the bridge itself. If the device claims to support a PCIe config space and the bridge doesn't, that sounds to me like a firmware bug. Arnd <>< From benh at kernel.crashing.org Tue Feb 1 11:38:02 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 11:38:02 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline In-Reply-To: <20050131192713.GA16268@mars.ravnborg.org> References: <1107151447.5712.81.camel@gaston> <20050131192713.GA16268@mars.ravnborg.org> Message-ID: <1107218282.5906.33.camel@gaston> > Also notice that ':=' uses all over. No need to use late evaluation when > no dynamic references are used ($ $@ etc.). Hrm... Rusty tells me that you got it backward ;) Anyway, I'll stick to := for now, it's not really an issue. Ben. From benh at kernel.crashing.org Tue Feb 1 12:49:44 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 12:49:44 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 Message-ID: <1107222584.5906.43.camel@gaston> Here's an update of the patch addressing Sam's comments. I moved the definition of the 32 bits tools to the main Makefile, updated the boot wrapper code to use that as well, and made the makefile use your target compiler with -m32 when it is detected to be biarch instead of your local gcc (when CROSS32_COMPILE isn't specified). --- This is a rather large patch. See notes below for possible backward compatiblity issues. (Note: It depends on "ppc64: Move systemcfg out of head.S" beeing applied) This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every process space, similar to the x86 vsyscall page. However, the implementation is very different (and doesn't use the gate area mecanism). Actually, it contains two implementations, a 32 bits and a 64 bits one. These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a process load section isn't already there). In the future, we can randomize that address, or even imagine having a special phdr entry letting apps that wnat finer control over their address space to put it elsewhere (or not at all). The implementation adds a hook to binfmt_elf to let the architecture add a real VMA to the process space instead of using the gate area mecanism. This mecanism wasn't very suitable for ppc, we couldn't just "shove" PTE entries mapping kernel addresses into userland without expensive changes to our hash table management. Instead, I made the vDSO be a normal VMA which, additionally, means it supports copy-on-write semantics if made writable via ptrace/mprotect, thus allowing breakpoints in the vDSO code. The current implementation of the vDSOs contain the signal trampolines with appropriate DWARF informations, which enable us to use non-executable stacks (patches to come later) along with a few more functions that we hope glibc will soon make good use of (this is the "hard" part now :) Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps can't be expected to link against them directly, the vDSO's are both seen as if they were linked at 0 and the symbols just contain offsets to the various functions. This is done on purpose to avoid a relocation step (ppc64 functions normally have descriptors with abs addresses in them). When glibc uses those functions, it's expected to use it's own trampolines that know how to reach them. In some cases, the vDSO contains several versions of a given function (for various CPUs), the kernel will "patch" the symbol table at boot to make it point to the appropriate one transparently. What is currently implemented is: - int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); This is a fully userland implementation of gettimeofday, with no barriers and no locks, and providing 100% equivalent results to the syscall version - void __kernel_sync_dicache(unsigned long start, unsigned long end) This function sync's the data and instruction caches (for making data executable), it is expected that userland loaders use this instead of doing it themselves, as the kernel will provide optimized versions for the current CPU. Currently, the vDSO procides a full one for all CPUs prior to POWER5 and a nop one for POWER5 which implements hardware snooping at the L1 level. In the future, an intermediate implementation may be done for the POWER4 and 970 which don't need the "dcbst" loop (the L1D cache is write-through on those). - void *__kernel_get_syscall_map(unsigned int *syscall_count) ; Returns a pointer to a map of implemented syscalls on the currently running kernel. The map is agnostic to the size of "long", unlike kernel bitops, it stores bits from top to bottom so that memory actually contains a linear bitmap check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of * 32 bits int at N >> 5. Note about backward compatibility issues: A bug in the ppc64 libgcc unwinder makes it unable to unwind stacks properly accross signals if the signal trampoline isn't on the stack. This has been fixed in CVS for gcc 4.0 and will be soon on the stable branch, but the problem exist will all currently used versions. That means that until glibc gets the patch to enable it's use of the vDSO symbols for the DWARF unwinder (rather trivial patch that will be pushed to glibc CVS soon hopefully), unwinding from a signal handler will not work for 64 bits applications. I consider this as a non-issue though as a patch is about to be produced, which can easily get pushed to "live" distros like debian, gentoo, fedora, etc... soon enough (it breaks compatilbity with kernels below 2.4.20 unfortunately as our signal stack layout changed, crap crap crap), as there are few 64 bits applications out there (expect gentoo), as it's only really an issue with C++ code relying on throwing exceptions out of signal handlers (extremely rare it seems), and as "release" distros like SLES or RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix by the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :) So far, I yet have to see an app failing because of that... Finally, many many many thanks to Alan Modra for writing the DWARF information of the signal handlers and debugging the libgcc issues ! Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/Makefile =================================================================== --- linux-work.orig/arch/ppc64/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/Makefile 2005-02-01 12:23:40.000000000 +1100 @@ -15,17 +15,38 @@ KERNELLOAD := 0xc000000000000000 +# Set default 32 bits cross compilers for vdso and boot wrapper +CROSS32_COMPILE ?= + +CROSS32CC := $(CROSS32_COMPILE)gcc +CROSS32AS := $(CROSS32_COMPILE)as +CROSS32LD := $(CROSS32_COMPILE)ld +CROSS32OBJCOPY := $(CROSS32_COMPILE)objcopy + +# If we have a biarch compiler, use it for 32 bits cross compile if +# CROSS32_COMPILE wasn't explicitely defined, and add proper explicit +# target type to target compilers + HAS_BIARCH := $(call cc-option-yn, -m64) ifeq ($(HAS_BIARCH),y) +ifeq ($(CROSS32_COMPILE),) +CROSS32CC := $(CC) -m32 +CROSS32AS := $(AS) -a32 +CROSS32LD := $(LD) -m elf32ppc +CROSS32OBJCOPY := $(OBJCOPY) +endif AS := $(AS) -a64 LD := $(LD) -m elf64ppc CC := $(CC) -m64 endif +export CROSS32CC CROSS32AS CROSS32LD CROSS32OBJCOPY + new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' > /dev/null; then echo y; else echo n; fi) ifeq ($(new_nm),y) NM := $(NM) --synthetic + endif CHECKFLAGS += -m64 -D__powerpc__ @@ -53,6 +74,8 @@ libs-y += arch/ppc64/lib/ core-y += arch/ppc64/kernel/ +core-y += arch/ppc64/kernel/vdso32/ +core-y += arch/ppc64/kernel/vdso64/ core-y += arch/ppc64/mm/ core-$(CONFIG_XMON) += arch/ppc64/xmon/ drivers-$(CONFIG_OPROFILE) += arch/ppc64/oprofile/ Index: linux-work/arch/ppc64/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/asm-offsets.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/asm-offsets.c 2005-01-31 16:25:56.000000000 +1100 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -35,6 +36,8 @@ #include #include #include +#include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -167,5 +170,24 @@ DEFINE(CPU_SPEC_FEATURES, offsetof(struct cpu_spec, cpu_features)); DEFINE(CPU_SPEC_SETUP, offsetof(struct cpu_spec, cpu_setup)); + /* systemcfg offsets for use by vdso */ + DEFINE(CFG_TB_ORIG_STAMP, offsetof(struct systemcfg, tb_orig_stamp)); + DEFINE(CFG_TB_TICKS_PER_SEC, offsetof(struct systemcfg, tb_ticks_per_sec)); + DEFINE(CFG_TB_TO_XS, offsetof(struct systemcfg, tb_to_xs)); + DEFINE(CFG_STAMP_XSEC, offsetof(struct systemcfg, stamp_xsec)); + DEFINE(CFG_TB_UPDATE_COUNT, offsetof(struct systemcfg, tb_update_count)); + DEFINE(CFG_TZ_MINUTEWEST, offsetof(struct systemcfg, tz_minuteswest)); + DEFINE(CFG_TZ_DSTTIME, offsetof(struct systemcfg, tz_dsttime)); + DEFINE(CFG_SYSCALL_MAP32, offsetof(struct systemcfg, syscall_map_32)); + DEFINE(CFG_SYSCALL_MAP64, offsetof(struct systemcfg, syscall_map_64)); + + /* timeval/timezone offsets for use by vdso */ + DEFINE(TVAL64_TV_SEC, offsetof(struct timeval, tv_sec)); + DEFINE(TVAL64_TV_USEC, offsetof(struct timeval, tv_usec)); + DEFINE(TVAL32_TV_SEC, offsetof(struct compat_timeval, tv_sec)); + DEFINE(TVAL32_TV_USEC, offsetof(struct compat_timeval, tv_usec)); + DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); + DEFINE(TZONE_TZ_DSTTIME, offsetof(struct timezone, tz_dsttime)); + return 0; } Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-01-31 16:25:56.000000000 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o vdso.o obj-$(CONFIG_PPC_OF) += of_device.o Index: linux-work/arch/ppc64/kernel/signal32.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal32.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal32.c 2005-01-31 16:25:56.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -656,18 +657,24 @@ /* Save user registers on the stack */ frame = &rt_sf->uc.uc_mcontext; - if (save_user_regs(regs, frame, __NR_rt_sigreturn)) - goto badframe; - if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; + + if (vdso32_rt_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, frame, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_rt_sigtramp; + } else { + if (save_user_regs(regs, frame, __NR_rt_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->tramp; + } regs->gpr[1] = (unsigned long) newsp; regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) &rt_sf->info; regs->gpr[5] = (unsigned long) &rt_sf->uc; regs->gpr[6] = (unsigned long) rt_sf; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->tramp; regs->trap = 0; regs->result = 0; @@ -825,8 +832,15 @@ || __put_user(sig, &sc->signal)) goto badframe; - if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) - goto badframe; + if (vdso32_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, &frame->mctx, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_sigtramp; + } else { + if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->mctx.tramp; + } if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; @@ -834,7 +848,6 @@ regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) sc; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->mctx.tramp; regs->trap = 0; regs->result = 0; Index: linux-work/arch/ppc64/kernel/setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/setup.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/setup.c 2005-01-31 16:25:56.000000000 +1100 @@ -990,6 +990,34 @@ } /* + * Called from setup_arch to initialize the bitmap of available + * syscalls in the systemcfg page + */ +void __init setup_syscall_map(void) +{ + unsigned int i, count64 = 0, count32 = 0; + extern unsigned long *sys_call_table; + extern unsigned long *sys_call_table32; + extern unsigned long sys_ni_syscall; + + + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table[i] == sys_ni_syscall) + continue; + count64++; + systemcfg->syscall_map_64[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table32[i] == sys_ni_syscall) + continue; + count32++; + systemcfg->syscall_map_32[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + printk(KERN_INFO "Syscall map setup, %d 32 bits and %d 64 bits syscalls\n", + count32, count64); +} + +/* * Called into from start_kernel, after lock_kernel has been called. * Initializes bootmem, which is unsed to manage page allocation until * mem_init is called. @@ -1027,6 +1055,9 @@ /* set up the bootmem stuff with available memory */ do_init_bootmem(); + /* initialize the syscall map in systemcfg */ + setup_syscall_map(); + ppc_md.setup_arch(); /* Select the correct idle loop for the platform. */ Index: linux-work/arch/ppc64/kernel/signal.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal.c 2005-01-31 16:25:56.000000000 +1100 @@ -34,6 +34,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -426,10 +427,14 @@ goto badframe; /* Set up to return from userspace. */ - err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); - if (err) - goto badframe; - + if (vdso64_rt_sigtramp && current->thread.vdso_base) { + regs->link = current->thread.vdso_base + vdso64_rt_sigtramp; + } else { + err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); + if (err) + goto badframe; + regs->link = (unsigned long) &frame->tramp[0]; + } funct_desc_ptr = (func_descr_t __user *) ka->sa.sa_handler; /* Allocate a dummy caller frame for the signal handler. */ @@ -438,7 +443,6 @@ /* Set up "regs" so we "return" to the signal handler. */ err |= get_user(regs->nip, &funct_desc_ptr->entry); - regs->link = (unsigned long) &frame->tramp[0]; regs->gpr[1] = newsp; err |= get_user(regs->gpr[2], &funct_desc_ptr->toc); regs->gpr[3] = signr; Index: linux-work/arch/ppc64/kernel/smp.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/smp.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/smp.c 2005-01-31 16:25:56.000000000 +1100 @@ -383,7 +383,7 @@ * For now we leave it which means the time can be some * number of msecs off until someone does a settimeofday() */ - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->tb_orig_stamp = tb_last_stamp; #endif Index: linux-work/arch/ppc64/kernel/time.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/time.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/time.c 2005-01-31 16:25:56.000000000 +1100 @@ -86,8 +86,6 @@ unsigned long tb_ticks_per_jiffy; unsigned long tb_ticks_per_usec = 100; /* sane default */ unsigned long tb_ticks_per_sec; -unsigned long next_xtime_sync_tb; -unsigned long xtime_sync_interval; unsigned long tb_to_xs; unsigned tb_to_us; unsigned long processor_freq; @@ -158,8 +156,8 @@ * The conversion to microseconds at the end is done * without a divide (and in fact, without a multiply) */ - tb_ticks = tb_val - do_gtod.tb_orig_stamp; temp_varp = do_gtod.varp; + tb_ticks = tb_val - temp_varp->tb_orig_stamp; temp_tb_to_xs = temp_varp->tb_to_xs; temp_stamp_xsec = temp_varp->stamp_xsec; tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); @@ -185,17 +183,55 @@ { struct timeval my_tv; - if (cur_tb > next_xtime_sync_tb) { - next_xtime_sync_tb = cur_tb + xtime_sync_interval; - __do_gettimeofday(&my_tv, cur_tb); - - if (xtime.tv_sec <= my_tv.tv_sec) { - xtime.tv_sec = my_tv.tv_sec; - xtime.tv_nsec = my_tv.tv_usec * 1000; - } + __do_gettimeofday(&my_tv, cur_tb); + + if (xtime.tv_sec <= my_tv.tv_sec) { + xtime.tv_sec = my_tv.tv_sec; + xtime.tv_nsec = my_tv.tv_usec * 1000; } } +/* + * When the timebase - tb_orig_stamp gets too big, we do a manipulation + * between tb_orig_stamp and stamp_xsec. The goal here is to keep the + * difference tb - tb_orig_stamp small enough to always fit inside a + * 32 bits number. This is a requirement of our fast 32 bits userland + * implementation in the vdso. If we "miss" a call to this function + * (interrupt latency, CPU locked in a spinlock, ...) and we end up + * with a too big difference, then the vdso will fallback to calling + * the syscall + */ +static __inline__ void timer_recalc_offset(unsigned long cur_tb) +{ + struct gettimeofday_vars * temp_varp; + unsigned temp_idx; + unsigned long offset, new_stamp_xsec, new_tb_orig_stamp; + + if (((cur_tb - do_gtod.varp->tb_orig_stamp) & 0x80000000u) == 0) + return; + + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + + new_tb_orig_stamp = cur_tb; + offset = new_tb_orig_stamp - do_gtod.varp->tb_orig_stamp; + new_stamp_xsec = do_gtod.varp->stamp_xsec + mulhdu(offset, do_gtod.varp->tb_to_xs); + + temp_varp->tb_to_xs = do_gtod.varp->tb_to_xs; + temp_varp->tb_orig_stamp = new_tb_orig_stamp; + temp_varp->stamp_xsec = new_stamp_xsec; + mb(); + do_gtod.varp = temp_varp; + do_gtod.var_idx = temp_idx; + + ++(systemcfg->tb_update_count); + wmb(); + systemcfg->tb_orig_stamp = new_tb_orig_stamp; + systemcfg->stamp_xsec = new_stamp_xsec; + wmb(); + ++(systemcfg->tb_update_count); +} + #ifdef CONFIG_SMP unsigned long profile_pc(struct pt_regs *regs) { @@ -311,6 +347,7 @@ if (cpu == boot_cpuid) { write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; + timer_recalc_offset(lpaca->next_jiffy_update_tb); do_timer(regs); timer_sync_xtime(lpaca->next_jiffy_update_tb); timer_check_rtc(); @@ -398,7 +435,9 @@ time_maxerror = NTP_PHASE_LIMIT; time_esterror = NTP_PHASE_LIMIT; - delta_xsec = mulhdu( (tb_last_stamp-do_gtod.tb_orig_stamp), do_gtod.varp->tb_to_xs ); + delta_xsec = mulhdu( (tb_last_stamp-do_gtod.varp->tb_orig_stamp), + do_gtod.varp->tb_to_xs ); + new_xsec = (new_nsec * XSEC_PER_SEC) / NSEC_PER_SEC; new_xsec += new_sec * XSEC_PER_SEC; if ( new_xsec > delta_xsec ) { @@ -411,7 +450,7 @@ * before 1970 ... eg. we booted ten days ago, and we are setting * the time to Jan 5, 1970 */ do_gtod.varp->stamp_xsec = new_xsec; - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->stamp_xsec = new_xsec; systemcfg->tb_orig_stamp = tb_last_stamp; } @@ -464,9 +503,9 @@ xtime.tv_sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); tb_last_stamp = get_tb(); - do_gtod.tb_orig_stamp = tb_last_stamp; do_gtod.varp = &do_gtod.vars[0]; do_gtod.var_idx = 0; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; do_gtod.varp->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; do_gtod.tb_ticks_per_sec = tb_ticks_per_sec; do_gtod.varp->tb_to_xs = tb_to_xs; @@ -477,9 +516,6 @@ systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; systemcfg->tb_to_xs = tb_to_xs; - xtime_sync_interval = tb_ticks_per_sec - (tb_ticks_per_sec/8); - next_xtime_sync_tb = tb_last_stamp + xtime_sync_interval; - time_freq = 0; xtime.tv_nsec = 0; @@ -584,12 +620,12 @@ stamp_xsec which is the time (in 1/2^20 second units) corresponding to tb_orig_stamp. This new value of stamp_xsec compensates for the change in frequency (implied by the new tb_to_xs) which guarantees that the current time remains the same */ - tb_ticks = get_tb() - do_gtod.tb_orig_stamp; + write_seqlock_irqsave( &xtime_lock, flags ); + tb_ticks = get_tb() - do_gtod.varp->tb_orig_stamp; div128_by_32( 1024*1024, 0, new_tb_ticks_per_sec, &divres ); new_tb_to_xs = divres.result_low; new_xsec = mulhdu( tb_ticks, new_tb_to_xs ); - write_seqlock_irqsave( &xtime_lock, flags ); old_xsec = mulhdu( tb_ticks, do_gtod.varp->tb_to_xs ); new_stamp_xsec = do_gtod.varp->stamp_xsec + old_xsec - new_xsec; @@ -597,16 +633,12 @@ values in do_gettimeofday. We alternate the copies and as long as a reasonable time elapses between changes, there will never be inconsistent values. ntpd has a minimum of one minute between updates */ - if (do_gtod.var_idx == 0) { - temp_varp = &do_gtod.vars[1]; - temp_idx = 1; - } - else { - temp_varp = &do_gtod.vars[0]; - temp_idx = 0; - } + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + temp_varp->tb_to_xs = new_tb_to_xs; temp_varp->stamp_xsec = new_stamp_xsec; + temp_varp->tb_orig_stamp = do_gtod.varp->tb_orig_stamp; mb(); do_gtod.varp = temp_varp; do_gtod.var_idx = temp_idx; Index: linux-work/arch/ppc64/kernel/vdso.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso.c 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,614 @@ +/* + * linux/arch/ppc64/kernel/vdso.c + * + * Copyright (C) 2004 Benjamin Herrenschmidt, IBM Corp. + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + + +/* + * The vDSOs themselves are here + */ +extern char vdso64_start, vdso64_end; +extern char vdso32_start, vdso32_end; + +static void *vdso64_kbase = &vdso64_start; +static void *vdso32_kbase = &vdso32_start; + +unsigned int vdso64_pages; +unsigned int vdso32_pages; + +/* Signal trampolines user addresses */ + +unsigned long vdso64_rt_sigtramp; +unsigned long vdso32_sigtramp; +unsigned long vdso32_rt_sigtramp; + +/* Format of the patch table */ +struct vdso_patch_def +{ + u32 pvr_mask, pvr_value; + const char *gen_name; + const char *fix_name; +}; + +/* Table of functions to patch based on the CPU type/revision + * + * TODO: Improve by adding whole lists for each entry + */ +static struct vdso_patch_def vdso_patches[] = { + { + 0xffff0000, 0x003a0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, + { + 0xffff0000, 0x003b0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, +}; + +/* + * Some infos carried around for each of them during parsing at + * boot time. + */ +struct lib32_elfinfo +{ + Elf32_Ehdr *hdr; /* ptr to ELF */ + Elf32_Sym *dynsym; /* ptr to .dynsym section */ + unsigned long dynsymsize; /* size of .dynsym section */ + char *dynstr; /* ptr to .dynstr section */ + unsigned long text; /* offset of .text section in .so */ +}; + +struct lib64_elfinfo +{ + Elf64_Ehdr *hdr; + Elf64_Sym *dynsym; + unsigned long dynsymsize; + char *dynstr; + unsigned long text; +}; + + +#ifdef __DEBUG +static void dump_one_vdso_page(struct page *pg, struct page *upg) +{ + printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT), + page_count(pg), + pg->flags); + if (upg/* && pg != upg*/) { + printk(" upg: %p (c:%d,f:%08lx)", __va(page_to_pfn(upg) << PAGE_SHIFT), + page_count(upg), + upg->flags); + } + printk("\n"); +} + +static void dump_vdso_pages(struct vm_area_struct * vma) +{ + int i; + + if (!vma || test_thread_flag(TIF_32BIT)) { + printk("vDSO32 @ %016lx:\n", (unsigned long)vdso32_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } + if (!vma || !test_thread_flag(TIF_32BIT)) { + printk("vDSO64 @ %016lx:\n", (unsigned long)vdso64_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } +} +#endif /* DEBUG */ + +/* + * Keep a dummy vma_close for now, it will prevent VMA merging. + */ +static void vdso_vma_close(struct vm_area_struct * vma) +{ +} + +/* + * Our nopage() function, maps in the actual vDSO kernel pages, they will + * be mapped read-only by do_no_page(), and eventually COW'ed, either + * right away for an initial write access, or by do_wp_page(). + */ +static struct page * vdso_vma_nopage(struct vm_area_struct * vma, + unsigned long address, int *type) +{ + unsigned long offset = address - vma->vm_start; + struct page *pg; + void *vbase = test_thread_flag(TIF_32BIT) ? vdso32_kbase : vdso64_kbase; + + DBG("vdso_vma_nopage(current: %s, address: %016lx, off: %lx)\n", + current->comm, address, offset); + + if (address < vma->vm_start || address > vma->vm_end) + return NOPAGE_SIGBUS; + + /* + * Last page is systemcfg, special handling here, no get_page() a + * this is a reserved page + */ + if ((vma->vm_end - address) <= PAGE_SIZE) + return virt_to_page(systemcfg); + + pg = virt_to_page(vbase + offset); + get_page(pg); + DBG(" ->page count: %d\n", page_count(pg)); + + return pg; +} + +static struct vm_operations_struct vdso_vmops = { + .close = vdso_vma_close, + .nopage = vdso_vma_nopage, +}; + +/* + * This is called from binfmt_elf, we create the special vma for the + * vDSO and insert it into the mm struct tree + */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + unsigned long vdso_pages; + unsigned long vdso_base; + + if (test_thread_flag(TIF_32BIT)) { + vdso_pages = vdso32_pages; + vdso_base = VDSO32_MBASE; + } else { + vdso_pages = vdso64_pages; + vdso_base = VDSO64_MBASE; + } + + /* vDSO has a problem and was disabled, just don't "enable" it for the + * process + */ + if (vdso_pages == 0) { + current->thread.vdso_base = 0; + return 0; + } + vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); + if (vma == NULL) + return -ENOMEM; + if (security_vm_enough_memory(vdso_pages)) { + kmem_cache_free(vm_area_cachep, vma); + return -ENOMEM; + } + memset(vma, 0, sizeof(*vma)); + + /* + * pick a base address for the vDSO in process space. We have a default + * base of 1Mb on which we had a random offset up to 1Mb. + * XXX: Add possibility for a program header to specify that location + */ + current->thread.vdso_base = vdso_base; + /* + ((unsigned long)vma & 0x000ff000); */ + + vma->vm_mm = mm; + vma->vm_start = current->thread.vdso_base; + + /* + * the VMA size is one page more than the vDSO since systemcfg + * is mapped in the last one + */ + vma->vm_end = vma->vm_start + ((vdso_pages + 1) << PAGE_SHIFT); + + /* + * our vma flags don't have VM_WRITE so by default, the process isn't allowed + * to write those pages. + * gdb can break that with ptrace interface, and thus trigger COW on those + * pages but it's then your responsibility to never do that on the "data" page + * of the vDSO or you'll stop getting kernel updates and your nice userland + * gettimeofday will be totally dead. It's fine to use that for setting + * breakpoints in the vDSO code pages though + */ + vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + vma->vm_flags |= mm->def_flags; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + vma->vm_ops = &vdso_vmops; + + down_write(&mm->mmap_sem); + insert_vm_struct(mm, vma); + mm->total_vm += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + up_write(&mm->mmap_sem); + + return 0; +} + +static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf32_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + *size = 0; + return NULL; +} + +static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf64_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + if (size) + *size = 0; + return NULL; +} + +static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +static Elf64_Sym * __init find_symbol64(struct lib64_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf64_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function32(struct lib32_elfinfo *lib, const char *symname) +{ + Elf32_Sym *sym = find_symbol32(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO32: function %s not found !\n", symname); + return 0; + } + return sym->st_value - VDSO32_LBASE; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function64(struct lib64_elfinfo *lib, const char *symname) +{ + Elf64_Sym *sym = find_symbol64(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO64: function %s not found !\n", symname); + return 0; + } +#ifdef VDS64_HAS_DESCRIPTORS + return *((u64 *)(vdso64_kbase + sym->st_value - VDSO64_LBASE)) - VDSO64_LBASE; +#else + return sym->st_value - VDSO64_LBASE; +#endif +} + + +static __init int vdso_do_find_sections(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + void *sect; + + /* + * Locate symbol tables & text section + */ + + v32->dynsym = find_section32(v32->hdr, ".dynsym", &v32->dynsymsize); + v32->dynstr = find_section32(v32->hdr, ".dynstr", NULL); + if (v32->dynsym == NULL || v32->dynstr == NULL) { + printk(KERN_ERR "vDSO32: a required symbol section was not found\n"); + return -1; + } + sect = find_section32(v32->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO32: the .text section was not found\n"); + return -1; + } + v32->text = sect - vdso32_kbase; + + v64->dynsym = find_section64(v64->hdr, ".dynsym", &v64->dynsymsize); + v64->dynstr = find_section64(v64->hdr, ".dynstr", NULL); + if (v64->dynsym == NULL || v64->dynstr == NULL) { + printk(KERN_ERR "vDSO64: a required symbol section was not found\n"); + return -1; + } + sect = find_section64(v64->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO64: the .text section was not found\n"); + return -1; + } + v64->text = sect - vdso64_kbase; + + return 0; +} + +static __init void vdso_setup_trampolines(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + /* + * Find signal trampolines + */ + + vdso64_rt_sigtramp = find_function64(v64, "__kernel_sigtramp_rt64"); + vdso32_sigtramp = find_function32(v32, "__kernel_sigtramp32"); + vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32"); +} + +static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + Elf32_Sym *sym32; + Elf64_Sym *sym64; + + sym32 = find_symbol32(v32, "__kernel_datapage_offset"); + if (sym32 == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) = + (vdso32_pages << PAGE_SHIFT) - (sym32->st_value - VDSO32_LBASE); + + sym64 = find_symbol64(v64, "__kernel_datapage_offset"); + if (sym64 == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) = + (vdso64_pages << PAGE_SHIFT) - (sym64->st_value - VDSO64_LBASE); + + return 0; +} + +static int vdso_do_func_patch32(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf32_Sym *sym32_gen, *sym32_fix; + + sym32_gen = find_symbol32(v32, orig); + if (sym32_gen == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig); + return -1; + } + sym32_fix = find_symbol32(v32, fix); + if (sym32_fix == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix); + return -1; + } + sym32_gen->st_value = sym32_fix->st_value; + sym32_gen->st_size = sym32_fix->st_size; + sym32_gen->st_info = sym32_fix->st_info; + sym32_gen->st_other = sym32_fix->st_other; + sym32_gen->st_shndx = sym32_fix->st_shndx; + + return 0; +} + +static int vdso_do_func_patch64(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf64_Sym *sym64_gen, *sym64_fix; + + sym64_gen = find_symbol64(v64, orig); + if (sym64_gen == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", orig); + return -1; + } + sym64_fix = find_symbol64(v64, fix); + if (sym64_fix == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", fix); + return -1; + } + sym64_gen->st_value = sym64_fix->st_value; + sym64_gen->st_size = sym64_fix->st_size; + sym64_gen->st_info = sym64_fix->st_info; + sym64_gen->st_other = sym64_fix->st_other; + sym64_gen->st_shndx = sym64_fix->st_shndx; + + return 0; +} + +static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + u32 pvr; + int i; + + pvr = mfspr(SPRN_PVR); + for (i = 0; i < ARRAY_SIZE(vdso_patches); i++) { + struct vdso_patch_def *patch = &vdso_patches[i]; + int match = (pvr & patch->pvr_mask) == patch->pvr_value; + + DBG("patch %d (mask: %x, pvr: %x) : %s\n", + i, patch->pvr_mask, patch->pvr_value, match ? "match" : "skip"); + + if (!match) + continue; + + DBG("replacing %s with %s...\n", patch->gen_name, patch->fix_name); + + /* + * Patch the 32 bits and 64 bits symbols. Note that we do not patch + * the "." symbol on 64 bits. It would be easy to do, but doesn't + * seem to be necessary, patching the OPD symbol is enough. + */ + vdso_do_func_patch32(v32, v64, patch->gen_name, patch->fix_name); + vdso_do_func_patch64(v32, v64, patch->gen_name, patch->fix_name); + } + + return 0; +} + + +static __init int vdso_setup(void) +{ + struct lib32_elfinfo v32; + struct lib64_elfinfo v64; + + v32.hdr = vdso32_kbase; + v64.hdr = vdso64_kbase; + + if (vdso_do_find_sections(&v32, &v64)) + return -1; + + if (vdso_fixup_datapage(&v32, &v64)) + return -1; + + if (vdso_fixup_alt_funcs(&v32, &v64)) + return -1; + + vdso_setup_trampolines(&v32, &v64); + + return 0; +} + +void __init vdso_init(void) +{ + int i; + + vdso64_pages = (&vdso64_end - &vdso64_start) >> PAGE_SHIFT; + vdso32_pages = (&vdso32_end - &vdso32_start) >> PAGE_SHIFT; + + DBG("vdso64_kbase: %p, 0x%x pages, vdso32_kbase: %p, 0x%x pages\n", + vdso64_kbase, vdso64_pages, vdso32_kbase, vdso32_pages); + + /* + * Initialize the vDSO images in memory, that is do necessary + * fixups of vDSO symbols, locate trampolines, etc... + */ + if (vdso_setup()) { + printk(KERN_ERR "vDSO setup failure, not enabled !\n"); + /* XXX should free pages here ? */ + vdso64_pages = vdso32_pages = 0; + return; + } + + /* Make sure pages are in the correct state */ + for (i = 0; i < vdso64_pages; i++) { + struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } + for (i = 0; i < vdso32_pages; i++) { + struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } +} + +int in_gate_area_no_task(unsigned long addr) +{ + return 0; +} + +int in_gate_area(struct task_struct *task, unsigned long addr) +{ + return 0; +} + +struct vm_area_struct *get_gate_vma(struct task_struct *tsk) +{ + return NULL; +} + Index: linux-work/include/asm-ppc64/processor.h =================================================================== --- linux-work.orig/include/asm-ppc64/processor.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/processor.h 2005-01-31 16:25:56.000000000 +1100 @@ -544,8 +544,8 @@ /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ -#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(STACK_TOP_USER32 / 4)) -#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(STACK_TOP_USER64 / 4)) +#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4)) +#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(TASK_SIZE_USER64 / 4)) #define TASK_UNMAPPED_BASE ((test_thread_flag(TIF_32BIT)||(ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 ) @@ -562,7 +562,8 @@ double fpr[32]; /* Complete floating point set */ unsigned long fpscr; /* Floating point status (plus pad) */ unsigned long fpexc_mode; /* Floating-point exception mode */ - unsigned long pad[3]; /* was saved_msr, saved_softe */ + unsigned long pad[2]; /* was saved_msr, saved_softe */ + unsigned long vdso_base; /* base of the vDSO library */ #ifdef CONFIG_ALTIVEC /* Complete AltiVec register set */ vector128 vr[32] __attribute((aligned(16))); Index: linux-work/include/asm-ppc64/systemcfg.h =================================================================== --- linux-work.orig/include/asm-ppc64/systemcfg.h 2005-01-31 15:56:55.000000000 +1100 +++ linux-work/include/asm-ppc64/systemcfg.h 2005-01-31 16:25:56.000000000 +1100 @@ -20,10 +20,14 @@ * Minor version changes are a hint. */ #define SYSTEMCFG_MAJOR 1 -#define SYSTEMCFG_MINOR 0 +#define SYSTEMCFG_MINOR 1 #ifndef __ASSEMBLY__ +#include + +#define SYSCALL_MAP_SIZE ((__NR_syscalls + 31) / 32) + struct systemcfg { __u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */ struct { /* Systemcfg version numbers */ @@ -47,6 +51,8 @@ __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ __u32 icache_size; /* L1 i-cache size 0x68 */ __u32 icache_line_size; /* L1 i-cache line size 0x6C */ + __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */ + __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */ }; #ifdef __KERNEL__ Index: linux-work/include/asm-ppc64/a.out.h =================================================================== --- linux-work.orig/include/asm-ppc64/a.out.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/a.out.h 2005-01-31 16:25:56.000000000 +1100 @@ -30,14 +30,11 @@ #ifdef __KERNEL__ -#define STACK_TOP_USER64 (TASK_SIZE_USER64) +#define STACK_TOP_USER64 TASK_SIZE_USER64 +#define STACK_TOP_USER32 TASK_SIZE_USER32 -/* Give 32-bit user space a full 4G address space to live in. */ -#define STACK_TOP_USER32 (TASK_SIZE_USER32) - -#define STACK_TOP ((test_thread_flag(TIF_32BIT) || \ - (ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ - STACK_TOP_USER32 : STACK_TOP_USER64) +#define STACK_TOP (test_thread_flag(TIF_32BIT) ? \ + STACK_TOP_USER32 : STACK_TOP_USER64) #endif /* __KERNEL__ */ Index: linux-work/include/asm-ppc64/elf.h =================================================================== --- linux-work.orig/include/asm-ppc64/elf.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/elf.h 2005-01-31 16:25:56.000000000 +1100 @@ -238,10 +238,20 @@ /* A special ignored type value for PPC, for glibc compatibility. */ #define AT_IGNOREPPC 22 +/* The vDSO location. We have to use the same value as x86 for glibc's + * sake :-) + */ +#define AT_SYSINFO_EHDR 33 + extern int dcache_bsize; extern int icache_bsize; extern int ucache_bsize; +/* We do have an arch_setup_additional_pages for vDSO matters */ +#define ARCH_HAS_SETUP_ADDITIONAL_PAGES +struct linux_binprm; +extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); + /* * The requirements here are: * - keep the final alignment of sp (sp & 0xf) @@ -260,6 +270,8 @@ NEW_AUX_ENT(AT_DCACHEBSIZE, dcache_bsize); \ NEW_AUX_ENT(AT_ICACHEBSIZE, icache_bsize); \ NEW_AUX_ENT(AT_UCACHEBSIZE, ucache_bsize); \ + /* vDSO base */ \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, current->thread.vdso_base); \ } while (0) /* PowerPC64 relocations defined by the ABIs */ Index: linux-work/include/asm-ppc64/time.h =================================================================== --- linux-work.orig/include/asm-ppc64/time.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/time.h 2005-01-31 16:25:56.000000000 +1100 @@ -43,10 +43,10 @@ struct gettimeofday_vars { unsigned long tb_to_xs; unsigned long stamp_xsec; + unsigned long tb_orig_stamp; }; struct gettimeofday_struct { - unsigned long tb_orig_stamp; unsigned long tb_ticks_per_sec; struct gettimeofday_vars vars[2]; struct gettimeofday_vars * volatile varp; Index: linux-work/fs/binfmt_elf.c =================================================================== --- linux-work.orig/fs/binfmt_elf.c 2005-01-31 14:18:24.000000000 +1100 +++ linux-work/fs/binfmt_elf.c 2005-01-31 16:25:56.000000000 +1100 @@ -772,6 +772,14 @@ goto out_free_dentry; } +#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES + retval = arch_setup_additional_pages(bprm, executable_stack); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + goto out_free_dentry; + } +#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ + current->mm->start_stack = bprm->p; /* Now we do a little grungy work by mmaping the ELF image into Index: linux-work/include/asm-ppc64/page.h =================================================================== --- linux-work.orig/include/asm-ppc64/page.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/page.h 2005-01-31 16:25:56.000000000 +1100 @@ -185,6 +185,9 @@ extern u64 ppc64_pft_size; /* Log 2 of page table size */ +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* __ASSEMBLY__ */ #ifdef MODULE Index: linux-work/include/asm-ppc64/vdso.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/include/asm-ppc64/vdso.h 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,83 @@ +#ifndef __PPC64_VDSO_H__ +#define __PPC64_VDSO_H__ + +#ifdef __KERNEL__ + +/* Default link addresses for the vDSOs */ +#define VDSO32_LBASE 0 +#define VDSO64_LBASE 0 + +/* Default map addresses */ +#define VDSO32_MBASE 0x100000 +#define VDSO64_MBASE 0x100000 + +#define VDSO_VERSION_STRING LINUX_2.6.11 + +/* Define if 64 bits VDSO has procedure descriptors */ +#undef VDS64_HAS_DESCRIPTORS + +#ifndef __ASSEMBLY__ + +extern unsigned int vdso64_pages; +extern unsigned int vdso32_pages; + +/* Offsets relative to thread->vdso_base */ +extern unsigned long vdso64_rt_sigtramp; +extern unsigned long vdso32_sigtramp; +extern unsigned long vdso32_rt_sigtramp; + +extern void vdso_init(void); + +#else /* __ASSEMBLY__ */ + +#ifdef __VDSO64__ +#ifdef VDS64_HAS_DESCRIPTORS +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .section ".opd","a"; \ + .align 3; \ + name: \ + .quad .name,.TOC. at tocbase,0; \ + .previous; \ + .globl .name; \ + .type .name, at function; \ + .name: \ + +#define V_FUNCTION_END(name) \ + .size .name,.-.name; + +#define V_LOCAL_FUNC(name) (.name) + +#else /* VDS64_HAS_DESCRIPTORS */ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* VDS64_HAS_DESCRIPTORS */ +#endif /* __VDSO64__ */ + +#ifdef __VDSO32__ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .type name, at function; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* __VDSO32__ */ + +#endif /* __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* __PPC64_VDSO_H__ */ Index: linux-work/arch/ppc64/mm/init.c =================================================================== --- linux-work.orig/arch/ppc64/mm/init.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/mm/init.c 2005-01-31 16:25:56.000000000 +1100 @@ -62,6 +62,7 @@ #include #include #include +#include int mem_init_done; unsigned long ioremap_bot = IMALLOC_BASE; @@ -743,6 +744,8 @@ #ifdef CONFIG_PPC_ISERIES iommu_vio_init(); #endif + /* Initialize the vDSO */ + vdso_init(); } /* Index: linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,139 @@ +/* + * Userland implementation of gettimeofday() for 32 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r10,r3 /* r10 saves tv */ + mr r11,r4 /* r11 saves tz */ + bl __get_datapage at local /* get data page */ + mr r9, r3 /* datapage ptr in r9 */ + bl __do_get_xsec at local /* get xsec from tb & kernel */ + bne- 2f /* out of line -> do syscall */ + + /* seconds are xsec >> 20 */ + rlwinm r5,r4,12,20,31 + rlwimi r5,r3,12,0,19 + stw r5,TVAL32_TV_SEC(r10) + + /* get remaining xsec and convert to usec. we scale + * up remaining xsec by 12 bits and get the top 32 bits + * of the multiplication + */ + rlwinm r5,r4,12,0,19 + lis r6,1000000 at h + ori r6,r6,1000000 at l + mulhwu r5,r5,r6 + stw r5,TVAL32_TV_USEC(r10) + + cmpli cr0,r11,0 /* check if tz is NULL */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r9) + stw r4,TZONE_TZ_MINWEST(r11) + stw r5,TZONE_TZ_DSTTIME(r11) + +1: mtlr r12 + blr + +2: mr r3,r10 + mr r4,r11 + li r0,__NR_gettimeofday + sc + b 1b + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r3 & r4 and expects the datapage ptr (non clobbered) + * in r9. clobbers r0,r4,r5,r6,r7,r8 +*/ +__do_get_xsec: + .cfi_startproc + /* Check for update count & load values. We use the low + * order 32 bits of the update count + */ +1: lwz r8,(CFG_TB_UPDATE_COUNT+4)(r9) + andi. r0,r8,1 /* pending update ? loop */ + bne- 1b + xor r0,r8,r8 /* create dependency */ + add r9,r9,r0 + + /* Load orig stamp (offset to TB) */ + lwz r5,CFG_TB_ORIG_STAMP(r9) + lwz r6,(CFG_TB_ORIG_STAMP+4)(r9) + + /* Get a stable TB value */ +2: mftbu r3 + mftbl r4 + mftbu r0 + cmpl cr0,r3,r0 + bne- 2b + + /* Substract tb orig stamp. If the high part is non-zero, we jump to the + * slow path which call the syscall. If it's ok, then we have our 32 bits + * tb_ticks value in r7 + */ + subfc r7,r6,r4 + subfe. r0,r5,r3 + bne- 3f + + /* Load scale factor & do multiplication */ + lwz r5,CFG_TB_TO_XS(r9) /* load values */ + lwz r6,(CFG_TB_TO_XS+4)(r9) + mulhwu r4,r7,r5 + mulhwu r6,r7,r6 + mullw r6,r7,r5 + addc r6,r6,r0 + + /* At this point, we have the scaled xsec value in r4 + XER:CA + * we load & add the stamp since epoch + */ + lwz r5,CFG_STAMP_XSEC(r9) + lwz r6,(CFG_STAMP_XSEC+4)(r9) + adde r4,r4,r6 + addze r3,r5 + + /* We now have our result in r3,r4. We create a fake dependency + * on that result and re-check the counter + */ + xor r0,r4,r4 + add r9,r9,r0 + lwz r0,(CFG_TB_UPDATE_COUNT+4)(r9) + cmpl cr0,r8,r0 /* check if updated */ + bne- 1b + + /* Warning ! The caller expects CR:EQ to be set to indicate a + * successful calculation (so it won't fallback to the syscall + * method). We have overriden that CR bit in the counter check, + * but fortunately, the loop exit condition _is_ CR:EQ set, so + * we can exit safely here. If you change this code, be careful + * of that side effect. + */ +3: blr + .cfi_endproc Index: linux-work/arch/ppc64/kernel/vdso32/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,300 @@ +/* + * Signal trampolines for 32 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by adding a nop before + the real start. */ + nop +V_FUNCTION_BEGIN(__kernel_sigtramp32) +.Lsig_start = . - 4 + li r0,__NR_sigreturn + sc +.Lsig_end: +V_FUNCTION_END(__kernel_sigtramp32) + +.Lsigrt_start: + nop +V_FUNCTION_BEGIN(__kernel_sigtramp_rt32) + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt32) + + .section .eh_frame,"a", at progbits + +/* Register r1 can be found at offset 4 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. The VMX reg struct is at offset VREGS of + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 64+28 + +/* Size of regs. */ +#define RSIZE 4 + +/* This is the offset of the VMX regs. */ +#define VREGS 48*RSIZE+34*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 32*16) +#else +#define EH_FRAME_VMX +#endif + +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -4 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x1b /* DW_EH_PE_pcrel | DW_EH_PE_sdata4. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 4 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .long .Lsig_start - . /* PC start, length */ + .long .Lsig_end - .Lsig_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde0_end: + +/* We have a different stack layout for rt_sigreturn. */ +#undef PTREGS +#define PTREGS 64+16+128+20+28 + + .long .Lfde1_end - .Lfde1_start +.Lfde1_start: + .long .Lfde1_start - .Lcie /* CIE pointer. */ + .long .Lsigrt_start - . /* PC start, length */ + .long .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde1_end: Index: linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,12 @@ +#include + + .section ".data" + + .globl vdso32_start, vdso32_end + .balign 4096 +vdso32_start: + .incbin "arch/ppc64/kernel/vdso32/vdso32.so" + .balign 4096 +vdso32_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,110 @@ +/* + * This is the infamous ld script for the 64 bits vdso + * library + */ +#include + +OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") +OUTPUT_ARCH(powerpc:common64) +ENTRY(_start) + +SECTIONS +{ + . = VDSO64_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + *(.sfpr .glink) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + + .opd ALIGN(8) : { KEEP (*(.opd)) } + .got ALIGN(8) : { *(.got .toc) } + .rela.dyn ALIGN(8) : { *(.rela.dyn) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + PROVIDE (end = .); + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sectio/ns. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.branch_lt) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp_rt64; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,12 @@ +#include + + .section ".data" + + .globl vdso64_start, vdso64_end + .balign 4096 +vdso64_start: + .incbin "arch/ppc64/kernel/vdso64/vdso64.so" + .balign 4096 +vdso64_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso32/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/datapage.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl __get_datapage at local + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP32 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso32/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-02-01 12:04:11.000000000 +1100 @@ -0,0 +1,36 @@ + +# List of files in the vdso, has to be asm only for now + +obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o + +# Build rules + +targets := $(obj-vdso32) vdso32.so +obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) + + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1 +EXTRA_AFLAGS := -D__VDSO32__ -s + +obj-y += vdso32_wrapper.o +extra-y += vdso32.lds +CPPFLAGS_vdso32.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso32.so: $(src)/vdso32.lds $(obj-vdso32) + $(call if_changed,vdso32ld) + +# assembly rules for the .S files +$(obj-vdso32): %.o: %.S + $(call if_changed_dep,vdso32as) + +# actual build commands +quiet_cmd_vdso32ld = VDSO32L $@ + cmd_vdso32ld = $(CROSS32CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso32as = VDSO32A $@ + cmd_vdso32as = $(CROSS32CC) $(a_flags) -c -o $@ $< + Index: linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,91 @@ +/* + * Userland implementation of gettimeofday() for 64 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r11,r3 /* r11 holds tv */ + mr r10,r4 /* r10 holds tz */ + bl V_LOCAL_FUNC(__get_datapage) /* get data page */ + bl V_LOCAL_FUNC(__do_get_xsec) /* get xsec from tb & kernel */ + lis r7,15 /* r7 = 1000000 = USEC_PER_SEC */ + ori r7,r7,16960 + rldicl r5,r4,44,20 /* r5 = sec = xsec / XSEC_PER_SEC */ + rldicr r6,r5,20,43 /* r6 = sec * XSEC_PER_SEC */ + std r5,TVAL64_TV_SEC(r11) /* store sec in tv */ + subf r0,r6,r4 /* r0 = xsec = (xsec - r6) */ + mulld r0,r0,r7 /* usec = (xsec * USEC_PER_SEC) / XSEC_PER_SEC */ + rldicl r0,r0,44,20 + cmpldi cr0,r10,0 /* check if tz is NULL */ + std r0,TVAL64_TV_USEC(r11) /* store usec in tv */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r3) + stw r4,TZONE_TZ_MINWEST(r10) + stw r5,TZONE_TZ_DSTTIME(r10) +1: mtlr r12 + li r3,0 /* always success */ + blr + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r4 and expects the datapage ptr (non clobbered) + * in r3. clobbers r0,r4,r5,r6,r7,r8 +*/ +V_FUNCTION_BEGIN(__do_get_xsec) + .cfi_startproc + /* check for update count & load values */ +1: ld r7,CFG_TB_UPDATE_COUNT(r3) + andi. r0,r4,1 /* pending update ? loop */ + bne- 1b + xor r0,r4,r4 /* create dependency */ + add r3,r3,r0 + + /* Get TB & offset it */ + mftb r8 + ld r9,CFG_TB_ORIG_STAMP(r3) + subf r8,r9,r8 + + /* Scale result */ + ld r5,CFG_TB_TO_XS(r3) + mulhdu r8,r8,r5 + + /* Add stamp since epoch */ + ld r6,CFG_STAMP_XSEC(r3) + add r4,r6,r8 + + xor r0,r4,r4 + add r3,r3,r0 + ld r0,CFG_TB_UPDATE_COUNT(r3) + cmpld cr0,r0,r7 /* check if updated */ + bne- 1b + blr + .cfi_endproc +V_FUNCTION_END(__do_get_xsec) Index: linux-work/arch/ppc64/kernel/vdso64/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/datapage.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl V_LOCAL_FUNC(__get_datapage) + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP64 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso64/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/sigtramp.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,294 @@ +/* + * Signal trampoline for 64 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by padding before the + real start. */ + nop + .balign 8 +V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) +.Lsigrt_start = . - 4 + addi r1, r1, __SIGNAL_FRAMESIZE + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt64) +/* The ".balign 8" above and the following zeros mimic the old stack + trampoline layout. The last magic value is the ucontext pointer, + chosen in such a way that older libgcc unwind code returns a zero + for a sigcontext pointer. */ + .long 0,0,0 + .quad 0,-21*8 + +/* Register r1 can be found at offset 8 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. A pointer to the VMX reg struct is at VREGS in + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 128+168+56 + +/* Size of regs. */ +#define RSIZE 8 + +/* This is the offset of the VMX reg pointer. */ +#define VREGS 48*RSIZE+33*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 33*16) +#else +#define EH_FRAME_VMX +#endif + + .section .eh_frame,"a", at progbits +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -8 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x14 /* DW_EH_PE_pcrel | DW_EH_PE_udata8. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 8 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .quad .Lsigrt_start - . /* PC start, length */ + .quad .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX +# Do we really need to describe the frame at this point? ie. will +# we ever have some call chain that returns somewhere past the addi? +# I don't think so, since gcc doesn't support async signals. +# .byte 0x41 /* DW_CFA_advance_loc 1*4 */ +#undef PTREGS +#define PTREGS 168+56 +# EH_FRAME_GEN +# EH_FRAME_FP +# EH_FRAME_VMX + .balign 8 +.Lfde0_end: Index: linux-work/arch/ppc64/kernel/vdso64/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/Makefile 2005-02-01 11:51:25.000000000 +1100 @@ -0,0 +1,35 @@ +# List of files in the vdso, has to be asm only for now + +obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o + +# Build rules + +targets := $(obj-vdso64) vdso64.so +obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64)) + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso64.so.1 +EXTRA_AFLAGS := -D__VDSO64__ -s + +obj-y += vdso64_wrapper.o +extra-y += vdso64.lds +CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso64.so: $(src)/vdso64.lds $(obj-vdso64) + $(call if_changed,vdso64ld) + +# assembly rules for the .S files +$(obj-vdso64): %.o: %.S + $(call if_changed_dep,vdso64as) + +# actual build commands +quiet_cmd_vdso64ld = VDSO64L $@ + cmd_vdso64ld = $(CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso64as = VDSO64A $@ + cmd_vdso64as = $(CC) $(a_flags) -c -o $@ $< + + Index: linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,111 @@ + +/* + * This is the infamous ld script for the 32 bits vdso + * library + */ +#include + +/* Default link addresses for the vDSOs */ +OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") +OUTPUT_ARCH(powerpc:common) +ENTRY(_start) + +SECTIONS +{ + . = VDSO32_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + .fixup : { *(.fixup) } + + .got ALIGN(4) : { *(.got.plt) *(.got) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + __end = .; + PROVIDE (end = .); + + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.* .sdata*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp32; + __kernel_sigtramp_rt32; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso32/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/cacheflush.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,65 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) + Index: linux-work/arch/ppc64/kernel/vdso64/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/cacheflush.S 2005-01-31 16:25:56.000000000 +1100 @@ -0,0 +1,64 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-01-31 16:19:44.000000000 +1100 +++ linux-work/arch/ppc64/kernel/head.S 2005-01-31 16:25:56.000000000 +1100 @@ -54,7 +54,6 @@ * 0x0100 - 0x2fff : pSeries Interrupt prologs * 0x3000 - 0x3fff : Interrupt support * 0x4000 - 0x4fff : NACA - * 0x5000 - 0x5fff : SystemCfg * 0x6000 : iSeries and common interrupt prologs * 0x9000 - 0x9fff : Initial segment table */ Index: linux-work/arch/ppc64/boot/Makefile =================================================================== --- linux-work.orig/arch/ppc64/boot/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/boot/Makefile 2005-02-01 11:50:33.000000000 +1100 @@ -20,17 +20,11 @@ # CROSS32_COMPILE is setup as a prefix just like CROSS_COMPILE # in the toplevel makefile. -CROSS32_COMPILE ?= -#CROSS32_COMPILE = /usr/local/ppc/bin/powerpc-linux- -BOOTCC := $(CROSS32_COMPILE)gcc HOSTCC := gcc BOOTCFLAGS := $(HOSTCFLAGS) $(LINUXINCLUDE) -fno-builtin -BOOTAS := $(CROSS32_COMPILE)as BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -BOOTLD := $(CROSS32_COMPILE)ld BOOTLFLAGS := -Ttext 0x00400000 -e _start -T $(srctree)/$(src)/zImage.lds -BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy OBJCOPYFLAGS := contents,alloc,load,readonly,data src-boot := crt0.S string.S prom.c main.c zlib.c imagesize.c div64.S @@ -38,10 +32,10 @@ obj-boot := $(addsuffix .o, $(basename $(src-boot))) quiet_cmd_bootcc = BOOTCC $@ - cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< + cmd_bootcc = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< quiet_cmd_bootas = BOOTAS $@ - cmd_bootas = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< + cmd_bootas = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< $(patsubst %.c,%.o, $(filter %.c, $(src-boot))): %.o: %.c $(call if_changed_dep,bootcc) @@ -77,15 +71,15 @@ $(obj)/vmlinux.initrd: vmlinux.strip $(obj)/addRamDisk $(obj)/ramdisk.image.gz FORCE $(call if_changed,ramdisk) -addsection = $(BOOTOBJCOPY) $(1) \ +addsection = $(CROSS32OBJCOPY) $(1) \ --add-section=.kernel:$(strip $(patsubst $(obj)/kernel-%.o,%, $(1)))=$(patsubst %.o,%.gz, $(1)) \ --set-section-flags=.kernel:$(strip $(patsubst $(obj)/kernel-%.o,%, $(1)))=$(OBJCOPYFLAGS) quiet_cmd_addnote = ADDNOTE $@ - cmd_addnote = $(BOOTLD) $(BOOTLFLAGS) -o $@ $(obj-boot) && $(obj)/addnote $@ + cmd_addnote = $(CROSS32LD) $(BOOTLFLAGS) -o $@ $(obj-boot) && $(obj)/addnote $@ quiet_cmd_piggy = PIGGY $@ - cmd_piggy = $(obj)/piggyback $(@:.o=) < $< | $(BOOTAS) -o $@ + cmd_piggy = $(obj)/piggyback $(@:.o=) < $< | $(CROSS32AS) -o $@ $(call gz-sec, $(required)): $(obj)/kernel-%.gz: % FORCE $(call if_changed,gzip) From benh at kernel.crashing.org Tue Feb 1 14:15:27 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 14:15:27 +1100 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FEB492.2020002@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> Message-ID: <1107227727.5963.46.camel@gaston> On Mon, 2005-01-31 at 16:43 -0600, Brian King wrote: > diff -puN include/asm-ppc64/prom.h~ppc64_pcix_mode2_cfg include/asm-ppc64/prom.h > --- linux-2.6.11-rc2-bk9/include/asm-ppc64/prom.h~ppc64_pcix_mode2_cfg 2005-01-31 14:32:01.000000000 -0600 > +++ linux-2.6.11-rc2-bk9-bjking1/include/asm-ppc64/prom.h 2005-01-31 14:32:01.000000000 -0600 > @@ -137,6 +137,7 @@ struct device_node { > int devfn; /* for pci devices */ > int eeh_mode; /* See eeh.h for possible EEH_MODEs */ > int eeh_config_addr; > + int pci_ext_config_space; /* for phb's or bridges */ > struct pci_controller *phb; /* for pci devices */ > struct iommu_table *iommu_table; /* for phb's or bridges */ Grrr... more crap added to the device-node, I don't like that ... This is a PHB only field, can't it be in struct pci_controller instead ? Ben. From brking at us.ibm.com Tue Feb 1 15:52:29 2005 From: brking at us.ibm.com (Brian King) Date: Mon, 31 Jan 2005 22:52:29 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <1107227727.5963.46.camel@gaston> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> Message-ID: <41FF0B0D.8020003@us.ibm.com> Benjamin Herrenschmidt wrote: > On Mon, 2005-01-31 at 16:43 -0600, Brian King wrote: > > >>diff -puN include/asm-ppc64/prom.h~ppc64_pcix_mode2_cfg include/asm-ppc64/prom.h >>--- linux-2.6.11-rc2-bk9/include/asm-ppc64/prom.h~ppc64_pcix_mode2_cfg 2005-01-31 14:32:01.000000000 -0600 >>+++ linux-2.6.11-rc2-bk9-bjking1/include/asm-ppc64/prom.h 2005-01-31 14:32:01.000000000 -0600 >>@@ -137,6 +137,7 @@ struct device_node { >> int devfn; /* for pci devices */ >> int eeh_mode; /* See eeh.h for possible EEH_MODEs */ >> int eeh_config_addr; >>+ int pci_ext_config_space; /* for phb's or bridges */ >> struct pci_controller *phb; /* for pci devices */ >> struct iommu_table *iommu_table; /* for phb's or bridges */ > > > Grrr... more crap added to the device-node, I don't like that ... > > This is a PHB only field, can't it be in struct pci_controller instead ? Assuming I am reading the spec correctly, this is only a property of the PHB, so I could move it into the pci_controller struct instead. -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050131/738fdf76/attachment.txt From sfr at canb.auug.org.au Tue Feb 1 15:59:01 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 1 Feb 2005 15:59:01 +1100 Subject: [PATCH] ppc64 iseries: can't remove viocd module when no cdroms Message-ID: <20050201155901.62d7c14d.sfr@canb.auug.org.au> Hi Andrew, This patch fixes a bug where attempting to remove the viocd module when no virtual cdroms where actually present would cause an oops. The driver was not completing its initialisation in this case. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/drivers/cdrom/viocd.c linus-bk.viocd.1/drivers/cdrom/viocd.c --- linus-bk/drivers/cdrom/viocd.c 2004-11-16 16:05:11.000000000 +1100 +++ linus-bk.viocd.1/drivers/cdrom/viocd.c 2005-02-01 15:52:03.000000000 +1100 @@ -765,8 +765,6 @@ vio_setHandler(viomajorsubtype_cdio, vio_handle_cd_event); get_viocd_info(); - if (viocd_numdev == 0) - goto out_undo_vio; spin_lock_init(&viocd_reqlock); @@ -786,7 +784,6 @@ dma_free_coherent(iSeries_vio_dev, sizeof(*viocd_unitinfo) * VIOCD_MAX_CD, viocd_unitinfo, unitinfo_dmaaddr); -out_undo_vio: vio_clearHandler(viomajorsubtype_cdio); viopath_close(viopath_hostLp, viomajorsubtype_cdio, MAX_CD_REQ + 2); out_unregister: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050201/2b951bc5/attachment.pgp From benh at kernel.crashing.org Tue Feb 1 15:57:44 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 15:57:44 +1100 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FF0B0D.8020003@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> Message-ID: <1107233864.5963.65.camel@gaston> On Mon, 2005-01-31 at 22:52 -0600, Brian King wrote: > Assuming I am reading the spec correctly, this is only a property of the > PHB, so I could move it into the pci_controller struct instead. Note that Arnd seems to imply the opposite ... BTW. I'm thinking about moving all those PCI/VIO related fields out of struct device_node to their own structure and keep only a pointer to that structure in device_node. That way, we avoid the bloat for every single non-pci node in the system, and we can have different structures for different bus types (along with proper iommu function pointers and that sort-of-thing). So if you think you really need a per-device info here, feel free to add it to device_node for now, and I'll move it to the new structure along with the rest of the stuff once I find time to do this patch. Ben. From sam at ravnborg.org Tue Feb 1 16:41:49 2005 From: sam at ravnborg.org (Sam Ravnborg) Date: Tue, 1 Feb 2005 06:41:49 +0100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline In-Reply-To: <1107218282.5906.33.camel@gaston> References: <1107151447.5712.81.camel@gaston> <20050131192713.GA16268@mars.ravnborg.org> <1107218282.5906.33.camel@gaston> Message-ID: <20050201054149.GA8136@mars.ravnborg.org> On Tue, Feb 01, 2005 at 11:38:02AM +1100, Benjamin Herrenschmidt wrote: > > > Also notice that ':=' uses all over. No need to use late evaluation when > > no dynamic references are used ($ $@ etc.). > > Hrm... Rusty tells me that you got it backward ;) Anyway, I'll stick > to := for now, it's not really an issue. := Right hand side is evaluated when encountered. Often what you want. So for example CC := cc here CC is assigned the value cc when seen. = Right hand side is evaluated only when left hand side is used. Also very usefull. Example just mocked up: cmd_vdso32_cc = $(CC) -T $^ -o $@ Doing late evaluation will cause correct replacement of $^ and $@ when used. When cmd_vdso_32 is defined make does not know the desired values for $^ and $@ - this is only known when cmd_vdso_32 is actually used. Hope this clarifies it. Sam From sam at ravnborg.org Tue Feb 1 16:45:47 2005 From: sam at ravnborg.org (Sam Ravnborg) Date: Tue, 1 Feb 2005 06:45:47 +0100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 In-Reply-To: <1107222584.5906.43.camel@gaston> References: <1107222584.5906.43.camel@gaston> Message-ID: <20050201054547.GB8136@mars.ravnborg.org> On Tue, Feb 01, 2005 at 12:49:44PM +1100, Benjamin Herrenschmidt wrote: core-y += arch/ppc64/kernel/ > +core-y += arch/ppc64/kernel/vdso32/ > +core-y += arch/ppc64/kernel/vdso64/ Please include your previous change to reflect this in arch/ppc64/kernel/Makefile It is much more obvious to look up this in the Makefile like we do for the rest of the kernel. Sam From benh at kernel.crashing.org Tue Feb 1 16:55:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Feb 2005 16:55:01 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline In-Reply-To: <20050201054149.GA8136@mars.ravnborg.org> References: <1107151447.5712.81.camel@gaston> <20050131192713.GA16268@mars.ravnborg.org> <1107218282.5906.33.camel@gaston> <20050201054149.GA8136@mars.ravnborg.org> Message-ID: <1107237301.5963.67.camel@gaston> > Right hand side is evaluated only when left hand side is used. > Also very usefull. Example just mocked up: > cmd_vdso32_cc = $(CC) -T $^ -o $@ > > Doing late evaluation will cause correct replacement of $^ and $@ when > used. When cmd_vdso_32 is defined make does not know the desired values > for $^ and $@ - this is only known when cmd_vdso_32 is actually used. > > Hope this clarifies it. Definitely, thanks. Ben. From grundler at parisc-linux.org Tue Feb 1 18:46:57 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Tue, 1 Feb 2005 00:46:57 -0700 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FE8994.4040802@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> <41FE8994.4040802@us.ibm.com> Message-ID: <20050201074657.GA548@colo.lackof.org> On Mon, Jan 31, 2005 at 01:40:04PM -0600, Brian King wrote: > CC'ing the linux-pci mailing list... thanks... > > This patch adds an arch hook so > > that individual archs can indicate if the underlying system supports > > expanded config space accesses or not. > >@@ -653,6 +653,8 @@ static int pci_cfg_space_size(struct pci > > goto fail; > > } > > > >+ if (!pcibios_exp_cfg_space(dev)) > >+ goto fail; > > if (pci_read_config_dword(dev, 256, &status) != PCIBIOS_SUCCESSFUL) > > goto fail; pci_read_config_dword lands in arch specific code. See drivers/pci/access.c:PCI_OP_READ() macro. I'm missing what pcibios_exp_cfg_space() does that can't be handled by the bus_ops supplied by pci_scan_bus(). I would expect the pci_read_config_dword to fail for being out of bounds. Is that wrong? Or is bus_ops not feasible in this case because pcibios needs access to pci_dev? If it's feasible, maybe the right place to add this hook is to pci_read_config_dword which is also handed the pci_dev. And add another function pointer to bus_ops (which could be NULL) to check chipset support for Expanded Config space before calling pci_bus_read_config_dword. Thats cleaner than adding a hook before each use of pci_read_config_dword. hth, grant From matthew at wil.cx Tue Feb 1 23:32:49 2005 From: matthew at wil.cx (Matthew Wilcox) Date: Tue, 1 Feb 2005 12:32:49 +0000 Subject: pci: Arch hook to determine config space size In-Reply-To: <41FF0B0D.8020003@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> Message-ID: <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> On Mon, Jan 31, 2005 at 10:52:29PM -0600, Brian King wrote: > @@ -62,8 +72,11 @@ static int rtas_read_config(struct devic > return PCIBIOS_DEVICE_NOT_FOUND; > if (where & (size - 1)) > return PCIBIOS_BAD_REGISTER_NUMBER; You should probably delete this redundant test at the same time ... > + if (!config_access_valid(dn, where)) > + return PCIBIOS_BAD_REGISTER_NUMBER; > > - addr = (dn->busno << 16) | (dn->devfn << 8) | where; > + addr = ((where & 0xf00) << 20) | (dn->busno << 16) | > + (dn->devfn << 8) | (where & 0xff); > buid = dn->phb->buid; > if (buid) { > ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain From brking at us.ibm.com Wed Feb 2 02:23:36 2005 From: brking at us.ibm.com (Brian King) Date: Tue, 01 Feb 2005 09:23:36 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <20050201074657.GA548@colo.lackof.org> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050128185234.GB21760@infradead.org> <20050129040647.GA6261@kroah.com> <41FE82B6.9060407@us.ibm.com> <41FE8994.4040802@us.ibm.com> <20050201074657.GA548@colo.lackof.org> Message-ID: <41FF9EF8.2000101@us.ibm.com> Grant Grundler wrote: > On Mon, Jan 31, 2005 at 01:40:04PM -0600, Brian King wrote: > >>CC'ing the linux-pci mailing list... > > > thanks... > > >>>This patch adds an arch hook so >>>that individual archs can indicate if the underlying system supports >>>expanded config space accesses or not. > > >>>@@ -653,6 +653,8 @@ static int pci_cfg_space_size(struct pci >>> goto fail; >>> } >>> >>>+ if (!pcibios_exp_cfg_space(dev)) >>>+ goto fail; >>> if (pci_read_config_dword(dev, 256, &status) != PCIBIOS_SUCCESSFUL) >>> goto fail; > > > pci_read_config_dword lands in arch specific code. > See drivers/pci/access.c:PCI_OP_READ() macro. > > I'm missing what pcibios_exp_cfg_space() does that can't be handled by > the bus_ops supplied by pci_scan_bus(). > > I would expect the pci_read_config_dword to fail for being out of bounds. > Is that wrong? > Or is bus_ops not feasible in this case because pcibios needs access > to pci_dev? The current patch for this has become essentially that. It is now a PPC64 specific patch that adds bounds checking in the PPC64 PCI config access functions. -Brian -- Brian King eServer Storage I/O IBM Linux Technology Center From zwane at arm.linux.org.uk Wed Feb 2 05:25:14 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Tue, 1 Feb 2005 11:25:14 -0700 (MST) Subject: [PATCH] PPC64: Generic hotplug cpu support Message-ID: Patch provides a generic hotplug cpu implementation, with the only current user being pmac. This doesn't replace real hotplug code as is currently used by LPAR systems. Ben i can add the additional pmac specific code to put the processor into a sleeping state seperately. Thanks to Nathan for testing. arch/ppc64/Kconfig | 2 arch/ppc64/kernel/idle.c | 4 + arch/ppc64/kernel/irq.c | 29 +++++++++++++ arch/ppc64/kernel/pSeries_setup.c | 5 +- arch/ppc64/kernel/pmac_setup.c | 3 + arch/ppc64/kernel/pmac_smp.c | 5 ++ arch/ppc64/kernel/setup.c | 3 - arch/ppc64/kernel/smp.c | 80 ++++++++++++++++++++++++++++++++++++++ arch/ppc64/kernel/sysfs.c | 6 -- include/asm-ppc64/machdep.h | 1 include/asm-ppc64/smp.h | 9 +++- 11 files changed, 136 insertions(+), 11 deletions(-) Signed-off-by: Zwane Mwaikambo Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/Kconfig =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/Kconfig,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 Kconfig --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/Kconfig 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/Kconfig 1 Feb 2005 05:01:10 -0000 @@ -313,7 +313,7 @@ source "drivers/pci/Kconfig" config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" - depends on SMP && EXPERIMENTAL && PPC_PSERIES + depends on SMP && EXPERIMENTAL && (PPC_PSERIES || PPC_PMAC) select HOTPLUG ---help--- Say Y here to be able to turn CPUs off and on. Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/idle.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/idle.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 idle.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/idle.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/idle.c 1 Feb 2005 06:32:09 -0000 @@ -293,6 +293,10 @@ static int native_idle(void) power4_idle(); if (need_resched()) schedule(); + + if (cpu_is_offline(smp_processor_id()) && + system_state == SYSTEM_RUNNING) + cpu_die(); } return 0; } Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/irq.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/irq.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 irq.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/irq.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/irq.c 1 Feb 2005 05:18:21 -0000 @@ -115,6 +115,35 @@ skip: return 0; } +#ifdef CONFIG_HOTPLUG_CPU +void fixup_irqs(cpumask_t map) +{ + unsigned int irq; + static int warned; + + for_each_irq(irq) { + cpumask_t mask; + + if (irq_desc[irq].status & IRQ_PER_CPU) + continue; + + cpus_and(mask, irq_affinity[irq], map); + if (any_online_cpu(mask) == NR_CPUS) { + printk("Breaking affinity for irq %i\n", irq); + mask = map; + } + if (irq_desc[irq].handler->set_affinity) + irq_desc[irq].handler->set_affinity(irq, mask); + else if (irq_desc[irq].action && !(warned++)) + printk("Cannot set affinity for irq %i\n", irq); + } + + local_irq_enable(); + mdelay(1); + local_irq_disable(); +} +#endif + extern int noirqdebug; /* Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pSeries_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/pSeries_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pSeries_setup.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pSeries_setup.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pSeries_setup.c 1 Feb 2005 05:01:10 -0000 @@ -320,8 +320,9 @@ static void __init pSeries_discover_pic } } -static void pSeries_cpu_die(void) +static void pSeries_mach_cpu_die(void) { + idle_task_exit(); local_irq_disable(); /* Some hardware requires clearing the CPPR, while other hardware does not * it is safe either way @@ -599,7 +600,7 @@ struct machdep_calls __initdata pSeries_ .power_off = rtas_power_off, .halt = rtas_halt, .panic = rtas_os_term, - .cpu_die = pSeries_cpu_die, + .cpu_die = pSeries_mach_cpu_die, .get_boot_time = pSeries_get_boot_time, .get_rtc_time = pSeries_get_rtc_time, .set_rtc_time = pSeries_set_rtc_time, Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/pmac_setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_setup.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_setup.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_setup.c 1 Feb 2005 06:49:25 -0000 @@ -439,6 +439,9 @@ static int __init pmac_probe(int platfor } struct machdep_calls __initdata pmac_md = { +#ifdef CONFIG_HOTPLUG_CPU + .cpu_die = generic_mach_cpu_die, +#endif .probe = pmac_probe, .setup_arch = pmac_setup_arch, .init_early = pmac_init_early, Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/pmac_smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 pmac_smp.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_smp.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/pmac_smp.c 1 Feb 2005 06:50:02 -0000 @@ -308,4 +308,9 @@ struct smp_ops_t core99_smp_ops __pmacda void __init pmac_setup_smp(void) { smp_ops = &core99_smp_ops; +#ifdef CONFIG_HOTPLUG_CPU + smp_ops->cpu_enable = generic_cpu_enable; + smp_ops->cpu_disable = generic_cpu_disable; + smp_ops->cpu_die = generic_cpu_die; +#endif } Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/setup.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/setup.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 setup.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/setup.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/setup.c 1 Feb 2005 06:25:29 -0000 @@ -1345,9 +1345,6 @@ early_param("xmon", early_xmon); void cpu_die(void) { - idle_task_exit(); if (ppc_md.cpu_die) ppc_md.cpu_die(); - local_irq_disable(); - for (;;); } Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/smp.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/smp.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/smp.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/smp.c 1 Feb 2005 06:36:42 -0000 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -406,10 +407,89 @@ void __devinit smp_prepare_boot_cpu(void current_set[boot_cpuid] = current->thread_info; } +#ifdef CONFIG_HOTPLUG_CPU +/* State of each CPU during hotplug phases */ +DEFINE_PER_CPU(int, cpu_state) = { 0 }; + +int generic_cpu_disable(void) +{ + unsigned int cpu = smp_processor_id(); + + if (cpu == boot_cpuid) + return -EBUSY; + + systemcfg->processorCount--; + cpu_clear(cpu, cpu_online_map); + fixup_irqs(cpu_online_map); + return 0; +} + +int generic_cpu_enable(unsigned int cpu) +{ + /* Do the normal bootup if we haven't + * already bootstrapped. */ + if (system_state != SYSTEM_RUNNING) + return -ENOSYS; + + /* get the target out of it's holding state */ + per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; + wmb(); + + while (!cpu_online(cpu)) + cpu_relax(); + + fixup_irqs(cpu_online_map); + /* counter the irq disable in fixup_irqs */ + local_irq_enable(); + return 0; +} + +void generic_cpu_die(unsigned int cpu) +{ + int i; + + for (i = 0; i < 100; i++) { + rmb(); + if (per_cpu(cpu_state, cpu) == CPU_DEAD) + return; + msleep(100); + } + printk(KERN_ERR "CPU%d didn't die...\n", cpu); +} + +void generic_mach_cpu_die(void) +{ + unsigned int cpu; + + local_irq_disable(); + cpu = smp_processor_id(); + printk(KERN_DEBUG "CPU%d offline\n", cpu); + __get_cpu_var(cpu_state) = CPU_DEAD; + wmb(); + while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) + cpu_relax(); + + flush_tlb_pending(); + cpu_set(cpu, cpu_online_map); + local_irq_enable(); +} +#endif + +static int __devinit cpu_enable(unsigned int cpu) +{ + if (smp_ops->cpu_enable) + return smp_ops->cpu_enable(cpu); + + return -ENOSYS; +} + int __devinit __cpu_up(unsigned int cpu) { int c; + if (!cpu_enable(cpu)) + return 0; + /* At boot, don't bother with non-present cpus -JSCHOPP */ if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) return -ENOENT; Index: linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/sysfs.c =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/arch/ppc64/kernel/sysfs.c,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 sysfs.c --- linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/sysfs.c 29 Jan 2005 21:29:21 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/arch/ppc64/kernel/sysfs.c 1 Feb 2005 05:01:10 -0000 @@ -18,7 +18,7 @@ #include #include #include - +#include static DEFINE_PER_CPU(struct cpu, cpu_devices); @@ -413,9 +413,7 @@ static int __init topology_init(void) * CPU. For instance, the boot cpu might never be valid * for hotplugging. */ -#ifdef CONFIG_HOTPLUG_CPU - if (systemcfg->platform != PLATFORM_PSERIES_LPAR) -#endif + if (!ppc_md.cpu_die) c->no_control = 1; if (cpu_online(cpu) || (c->no_control == 0)) { Index: linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/machdep.h =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/include/asm-ppc64/machdep.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 machdep.h --- linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/machdep.h 29 Jan 2005 21:29:28 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/machdep.h 1 Feb 2005 05:56:24 -0000 @@ -30,6 +30,7 @@ struct smp_ops_t { void (*setup_cpu)(int nr); void (*take_timebase)(void); void (*give_timebase)(void); + int (*cpu_enable)(unsigned int nr); int (*cpu_disable)(void); void (*cpu_die)(unsigned int nr); }; Index: linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/smp.h =================================================================== RCS file: /home/cvsroot/linux-2.6.11-rc2-mm2/include/asm-ppc64/smp.h,v retrieving revision 1.1.1.1 diff -u -p -B -r1.1.1.1 smp.h --- linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/smp.h 29 Jan 2005 21:29:28 -0000 1.1.1.1 +++ linux-2.6.11-rc2-mm2-ppc64/include/asm-ppc64/smp.h 1 Feb 2005 06:25:08 -0000 @@ -29,7 +29,7 @@ extern int boot_cpuid; extern int boot_cpuid_phys; -extern void cpu_die(void) __attribute__((noreturn)); +extern void cpu_die(void); #ifdef CONFIG_SMP @@ -37,6 +37,13 @@ extern void smp_send_debugger_break(int struct pt_regs; extern void smp_message_recv(int, struct pt_regs *); +#ifdef CONFIG_HOTPLUG_CPU +extern void fixup_irqs(cpumask_t map); +int generic_cpu_disable(void); +int generic_cpu_enable(unsigned int cpu); +void generic_cpu_die(unsigned int cpu); +void generic_mach_cpu_die(void); +#endif #define __smp_processor_id() (get_paca()->paca_index) #define hard_smp_processor_id() (get_paca()->hw_cpu_id) From olh at suse.de Wed Feb 2 06:27:35 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 1 Feb 2005 20:27:35 +0100 Subject: [PATCH] e1000, errata 2{3, 4} - possible EEH or memory corruption when DMA crosses a 64k boundary Message-ID: <20050201192735.GB7433@suse.de> We have this patch in SLES9 SP1. I asked google about 'fix for errata 23, cant cross 64kB boundary', and it shhows such a patch is also part of RH 2.6.9. It still applies to current Linus tree. Can you check wether this is still required for the current driver? References: SUSE48368 LTC12567 Need to check 64k boundary on DMA address as well. We also need to have 64k boundary checking on the DMA address that comes back from pci_map_single(). This address is what will be passed to the adapter on ppc64 for it to DMA into. It's the address that the adapter sees which will trip erratum 23. The so patched driver passed a quick netperf run and a weekend long stress test. diff -puN drivers/net/e1000-new/e1000_main.c~64k-align-check-dma-suse drivers/net/e1000-new/e1000_main.c --- linux-2.6.5-7.127/drivers/net/e1000-new/e1000_main.c~64k-align-check-dma-suse Wed Dec 8 16:55:46 2004 +++ linux-2.6.5-7.127-moilanen/drivers/net/e1000-new/e1000_main.c Thu Dec 9 15:46:04 2004 @@ -2579,6 +2579,29 @@ e1000_alloc_rx_buffers(struct e1000_adap adapter->rx_buffer_len, PCI_DMA_FROMDEVICE); + if(adapter->hw.mac_type == e1000_82545 || + adapter->hw.mac_type == e1000_82546) { + /* fix for errata 23, cant cross 64kB boundary */ + begin = (unsigned long)buffer_info->dma; + end = (unsigned long)(adapter->rx_buffer_len) - 1; + + if(!e1000_check_64k_alignment(adapter, begin, end)) { + + DPRINTK(RX_ERR,ERR,"dma align check failed: " + "begin: 0x%lx, end: 0x%lx\n", begin, end); + + dev_kfree_skb(skb); + buffer_info->skb = NULL; + + pci_unmap_single(pdev, + buffer_info->dma, + adapter->rx_buffer_len, + PCI_DMA_FROMDEVICE); + + break; /* while !buffer_info->skb */ + } + } + rx_desc = E1000_RX_DESC(*rx_ring, i); rx_desc->buffer_addr = cpu_to_le64(buffer_info->dma); From jimix at watson.ibm.com Wed Feb 2 06:23:39 2005 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Tue, 1 Feb 2005 14:23:39 -0500 Subject: [PATCH] drivers/char/hvcs.c and devfs Message-ID: <16895.55099.774986.376938@kitch0.watson.ibm.com> The hvcs driver does not register a devfs_name resulting in devfs creating /dev/* entries. The following one line patch remedies the problem. Signed-off-by: Jimi Xenidis --- orig/drivers/char/hvcs.c +++ mod/drivers/char/hvcs.c @@ -1363,6 +1363,7 @@ hvcs_tty_driver->driver_name = hvcs_driver_name; hvcs_tty_driver->name = hvcs_device_node; + hvcs_tty_driver->devfs_name = hvcs_device_node; /* * We'll let the system assign us a major number, indicated by leaving From jimix at watson.ibm.com Wed Feb 2 06:34:31 2005 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Tue, 1 Feb 2005 14:34:31 -0500 Subject: HUPin the ttys in drivers/char/hvcs.c Message-ID: <16895.55751.310549.454083@kitch0.watson.ibm.com> In an LPAR environment there is the hvc (client side VTERM) and the hvcs (server side VTERM). If the /dev/hvcs is paired/registered with a VTERM what is removed (as in the case of LPAR death) the H_GET_TERM_CHAR hcall will eventually return H_Closed. IMHO, when this event occurs the /dev/hvcs should get HUPed and ultimately an H_FREE_VTERM should occurs on the channel. Otherwise the administrator would have to clean up after it. Thoughts? -JX From jdmason at us.ibm.com Wed Feb 2 06:33:59 2005 From: jdmason at us.ibm.com (Jon Mason) Date: Tue, 1 Feb 2005 13:33:59 -0600 Subject: [PATCH] e1000, errata 2{3, 4} - possible EEH or memory corruption when DMA crosses a 64k boundary In-Reply-To: <20050201192735.GB7433@suse.de> References: <20050201192735.GB7433@suse.de> Message-ID: <200502011333.59262.jdmason@us.ibm.com> On Tuesday 01 February 2005 01:27 pm, Olaf Hering wrote: > > We have this patch in SLES9 SP1. > I asked google about 'fix for errata 23, cant cross 64kB boundary', and > it shhows such a patch is also part of RH 2.6.9. > It still applies to current Linus tree. > Can you check wether this is still required for the current driver? This patch is still lacking from the latest e1000 driver. Intel has the patch in their queue, so it will be needed until they release their next version of the e1000 driver. From ganesh.venkatesan at intel.com Wed Feb 2 06:36:31 2005 From: ganesh.venkatesan at intel.com (Venkatesan, Ganesh) Date: Tue, 1 Feb 2005 11:36:31 -0800 Subject: [PATCH] e1000, errata 2{3, 4} - possible EEH or memory corruption when DMA crosses a 64k boundary Message-ID: <468F3FDA28AA87429AD807992E22D07E041A065B@orsmsx408> The patch attached to your mail does not seem to be complete. Did the mail application truncate? In any case, this fix is required in e1000. It is *not* in the latest driver that is released but *is* in the driver that is lined up for release in a couple of weeks. Thanks, Ganesh. >-----Original Message----- >From: Olaf Hering [mailto:olh at suse.de] >Sent: Tuesday, February 01, 2005 11:28 AM >To: Venkatesan, Ganesh; netdev at oss.sgi.com >Cc: linuxppc64-dev at ozlabs.org >Subject: [PATCH] e1000, errata 2{3,4} - possible EEH or memory corruption >when DMA crosses a 64k boundary > > >We have this patch in SLES9 SP1. >I asked google about 'fix for errata 23, cant cross 64kB boundary', and >it shhows such a patch is also part of RH 2.6.9. >It still applies to current Linus tree. >Can you check wether this is still required for the current driver? > > >References: SUSE48368 LTC12567 > >Need to check 64k boundary on DMA address as well. > >We also need to have 64k boundary checking on the DMA address >that comes back from pci_map_single(). This address is what will >be passed to the adapter on ppc64 for it to DMA into. It's the >address that the adapter sees which will trip erratum 23. > >The so patched driver passed a quick netperf run and a weekend >long stress test. > >diff -puN drivers/net/e1000-new/e1000_main.c~64k-align-check-dma-suse >drivers/net/e1000-new/e1000_main.c >--- linux-2.6.5-7.127/drivers/net/e1000-new/e1000_main.c~64k-align-check- >dma-suse Wed Dec 8 16:55:46 2004 >+++ linux-2.6.5-7.127-moilanen/drivers/net/e1000-new/e1000_main.c Thu Dec >9 15:46:04 2004 >@@ -2579,6 +2579,29 @@ e1000_alloc_rx_buffers(struct e1000_adap > adapter->rx_buffer_len, > PCI_DMA_FROMDEVICE); > >+ if(adapter->hw.mac_type == e1000_82545 || >+ adapter->hw.mac_type == e1000_82546) { >+ /* fix for errata 23, cant cross 64kB boundary */ >+ begin = (unsigned long)buffer_info->dma; >+ end = (unsigned long)(adapter->rx_buffer_len) - 1; >+ >+ if(!e1000_check_64k_alignment(adapter, begin, end)) { >+ >+ DPRINTK(RX_ERR,ERR,"dma align check failed: " >+ "begin: 0x%lx, end: 0x%lx\n", begin, end); >+ >+ dev_kfree_skb(skb); >+ buffer_info->skb = NULL; >+ >+ pci_unmap_single(pdev, >+ buffer_info->dma, >+ adapter->rx_buffer_len, >+ PCI_DMA_FROMDEVICE); >+ >+ break; /* while !buffer_info->skb */ >+ } >+ } >+ > rx_desc = E1000_RX_DESC(*rx_ring, i); > rx_desc->buffer_addr = cpu_to_le64(buffer_info->dma); > From olh at suse.de Wed Feb 2 06:40:10 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 1 Feb 2005 20:40:10 +0100 Subject: [PATCH] e1000, errata 2{3, 4} - possible EEH or memory corruption when DMA crosses a 64k boundary In-Reply-To: <468F3FDA28AA87429AD807992E22D07E041A065B@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E041A065B@orsmsx408> Message-ID: <20050201194010.GA7892@suse.de> On Tue, Feb 01, Venkatesan, Ganesh wrote: > In any case, this fix is required in e1000. It is *not* in the latest > driver that is released but *is* in the driver that is lined up for > release in a couple of weeks. Thats good enough, just that things dont get lost. There are probably a few separate patches for each problem found. We are still in the process of sorting out our +2k patch mess. From brking at us.ibm.com Wed Feb 2 07:16:47 2005 From: brking at us.ibm.com (Brian King) Date: Tue, 01 Feb 2005 14:16:47 -0600 Subject: pci: Arch hook to determine config space size In-Reply-To: <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> Message-ID: <41FFE3AF.706@us.ibm.com> Matthew Wilcox wrote: > On Mon, Jan 31, 2005 at 10:52:29PM -0600, Brian King wrote: > >>@@ -62,8 +72,11 @@ static int rtas_read_config(struct devic >> return PCIBIOS_DEVICE_NOT_FOUND; >> if (where & (size - 1)) >> return PCIBIOS_BAD_REGISTER_NUMBER; > > > You should probably delete this redundant test at the same time ... Done. The new patch below also adds some address checking to iSeries config accessor functions. Additionally, this patch should address Arnd's concern, as it now looks for the "ibm,pci-config-space-type" property on the device itself rather than on the bridge. -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050201/f3a6a775/attachment.txt From sonny at burdell.org Wed Feb 2 06:52:07 2005 From: sonny at burdell.org (Sonny Rao) Date: Tue, 1 Feb 2005 14:52:07 -0500 Subject: Very Large "zero" slab expected? Message-ID: <20050201195207.GA5723@kevlar.burdell.org> Hi, I'm running a fairly heavy database workload using DB/2 on a 4-way 720 ontop of ext3 trying to replicate a separate VM performance issue. During and after the run, I see a massive about of slab memory being used by the "zero" cache, which appears to a be a ppc64 specific cache for pmd's and pgd's. Basically, about 1GB out of my total of 8GB on the system is being consumed by this. Is this expected ? It seems excessive considering my mapped memory is only about 2GB during the run. The DB/2 buffer pool was set to 4GB and none of it used hugh pages. Here's the relavent snippet from slabinfo: slabinfo - version: 2.0 # name : tunables : slabdata zero 256424 256431 4096 1 1 : tunables 24 12 8 : slabdata 256424 256431 0 Any ideas as to why there are so many of these ? Sonny From kravetz at us.ibm.com Wed Feb 2 09:44:33 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Tue, 1 Feb 2005 14:44:33 -0800 Subject: ppc64 memory hotplug config options Message-ID: <20050201224433.GA5689@w-mikek2.ibm.com> I'm working on memory hotplug for ppc64 and wanted to get some opinions/information on config option dependencies. In the current hotplug memory development patch set, there are four interrelated config options. FLATMEM - Indicates a 'flat' contiguous memory model. SPARSEMEM - Indicates a 'sparse' memory model that may contain holes. HOTPLUG_MEMORY - Allow hotplug memory operations. NUMA - Add support for NUMA architecture. Some dependencies are obvious. If you want HOTPLUG_MEMORY, you must have SPARSEMEM (as removing memory will leave holes). However, I was curious about the relationship between NUMA and SPARSEMEM/FLATMEM on ppc64. In the current mainline kernels, NUMA depends on DISCONTIGMEM. However, I believe this is mainly due to a bunch of code for node specific memory access being part of DISCONTIGMEM. It seems that DISCONTIGMEM and NUMA are more intertwined than they need to be. Is it possible to have a 'flat' (no holes) memory layout on a ppc64 box with NUMA characteristics? I would think that the firmware/hypervisor could present memory to the OS that appears to have no holes even though different portions of memory have different access characteristics. Thanks, -- Mike From sonny at burdell.org Wed Feb 2 10:02:39 2005 From: sonny at burdell.org (Sonny Rao) Date: Tue, 1 Feb 2005 18:02:39 -0500 Subject: Very Large "zero" slab expected? In-Reply-To: <20050201195207.GA5723@kevlar.burdell.org> References: <20050201195207.GA5723@kevlar.burdell.org> Message-ID: <20050201230239.GA8589@kevlar.burdell.org> On Tue, Feb 01, 2005 at 02:52:07PM -0500, Sonny Rao wrote: > Hi, I'm running a fairly heavy database workload using DB/2 on a 4-way > 720 ontop of ext3 trying to replicate a separate VM performance issue. > > During and after the run, I see a massive about of slab memory being > used by the "zero" cache, which appears to a be a ppc64 specific > cache for pmd's and pgd's. > > Basically, about 1GB out of my total of 8GB on the system is being > consumed by this. Is this expected ? It seems excessive > considering my mapped memory is only about 2GB during the run. > > The DB/2 buffer pool was set to 4GB and none of it used hugh pages. > > Here's the relavent snippet from slabinfo: > slabinfo - version: 2.0 > # name : tunables : slabdata > zero 256424 256431 4096 1 1 : tunables 24 12 8 : slabdata 256424 256431 0 > > Any ideas as to why there are so many of these ? More info : Here's /proc/meminfo output from a similar run, with 4GB of RAM instead of 8GB: portrait:~ # cat /proc/meminfo MemTotal: 4099072 kB MemFree: 15008 kB Buffers: 38580 kB Cached: 2941320 kB SwapCached: 118492 kB Active: 2672776 kB Inactive: 645284 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 4099072 kB LowFree: 15008 kB SwapTotal: 1050616 kB SwapFree: 341156 kB Dirty: 372 kB Writeback: 4 kB Mapped: 1827844 kB Slab: 715384 kB Committed_AS: 5342648 kB PageTables: 610864 kB VmallocTotal: 2147483647 kB VmallocUsed: 25340 kB VmallocChunk: 2147458223 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 16384 kB And the top few lines of slabtop: Active / Total Objects (% used) : 728019 / 747302 (97.4%) Active / Total Slabs (% used) : 177511 / 177591 (100.0%) Active / Total Caches (% used) : 79 / 120 (65.8%) Active / Total Size (% used) : 705236.61K / 709101.23K (99.5%) Minimum / Average / Maximum Object : 0.02K / 0.95K / 128.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 336360 335400 99% 0.09K 8409 40 33636K buffer_head 157860 156817 99% 0.12K 5262 30 21048K size-128 154036 153988 99% 4.00K 154036 1 616144K zero 22572 22120 97% 0.16K 1026 22 4104K vm_area_struct 18571 18074 97% 0.52K 2653 7 10612K radix_tree_node So, you can see at this point we have around 600MB of these "zero" pages being used. There are about 150 db2 processes running using lots of shared memory: portrait:~ # ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x763ee574 2392064 db2inst1 767 39987136 161 0x763ee561 2424833 db2inst1 701 39370752 159 0x00000000 2457602 db2fenc1 701 164052992 161 0x6803bbdb 2490371 db2inst1 761 50331648 1 0x00000000 2785285 db2inst1 701 1073741824 145 0x00000000 2818054 db2inst1 701 1073741824 145 0x00000000 2850823 db2inst1 701 1073741824 145 0x00000000 2883592 db2inst1 701 117735424 145 Thanks, Sonny From paulus at samba.org Wed Feb 2 10:39:02 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 2 Feb 2005 10:39:02 +1100 Subject: Very Large "zero" slab expected? In-Reply-To: <20050201195207.GA5723@kevlar.burdell.org> References: <20050201195207.GA5723@kevlar.burdell.org> Message-ID: <16896.4886.136678.130456@cargo.ozlabs.ibm.com> Sonny Rao writes: > Hi, I'm running a fairly heavy database workload using DB/2 on a 4-way > 720 ontop of ext3 trying to replicate a separate VM performance issue. > > During and after the run, I see a massive about of slab memory being > used by the "zero" cache, which appears to a be a ppc64 specific > cache for pmd's and pgd's. > > Basically, about 1GB out of my total of 8GB on the system is being > consumed by this. Is this expected ? It seems excessive > considering my mapped memory is only about 2GB during the run. > > The DB/2 buffer pool was set to 4GB and none of it used hugh pages. How many DB/2 processes had the buffer pool mapped? It was probably 100 or more, I would guess. The zero slab cache is used for pte pages as well as pmds and pgds, and each process will have a separate page table tree, which will occupy about 1/500th or so of the amount of mapped memory, per process. So if you had ~125 processes the page table trees would occupy about 1GB for a 4GB buffer pool. The solution is to implement shared page tables. Some people here in the LTC are looking at possibly doing that, but getting it accepted into mainline could be tricky. Paul. From sonny at burdell.org Wed Feb 2 10:27:24 2005 From: sonny at burdell.org (Sonny Rao) Date: Tue, 1 Feb 2005 18:27:24 -0500 Subject: Very Large "zero" slab expected? In-Reply-To: <16896.4886.136678.130456@cargo.ozlabs.ibm.com> References: <20050201195207.GA5723@kevlar.burdell.org> <16896.4886.136678.130456@cargo.ozlabs.ibm.com> Message-ID: <20050201232724.GA8934@kevlar.burdell.org> On Wed, Feb 02, 2005 at 10:39:02AM +1100, Paul Mackerras wrote: > Sonny Rao writes: > > > Hi, I'm running a fairly heavy database workload using DB/2 on a 4-way > > 720 ontop of ext3 trying to replicate a separate VM performance issue. > > > > During and after the run, I see a massive about of slab memory being > > used by the "zero" cache, which appears to a be a ppc64 specific > > cache for pmd's and pgd's. > > > > Basically, about 1GB out of my total of 8GB on the system is being > > consumed by this. Is this expected ? It seems excessive > > considering my mapped memory is only about 2GB during the run. > > > > The DB/2 buffer pool was set to 4GB and none of it used hugh pages. > > How many DB/2 processes had the buffer pool mapped? It was probably > 100 or more, I would guess. The zero slab cache is used for pte pages > as well as pmds and pgds, and each process will have a separate page > table tree, which will occupy about 1/500th or so of the amount of > mapped memory, per process. So if you had ~125 processes the page > table trees would occupy about 1GB for a 4GB buffer pool. Yes, around 150 processes. Now that I actually look at it, this number corresponds to what is shown under "PageTables" in meminfo. > The solution is to implement shared page tables. Some people here in > the LTC are looking at possibly doing that, but getting it accepted > into mainline could be tricky. Ok great just wanted to make sure this is expected, thanks. BTW, those people you mentioned work across the hall from my office, and are causing quite a ruckus :) Sonny From sonny at burdell.org Wed Feb 2 10:30:23 2005 From: sonny at burdell.org (Sonny Rao) Date: Tue, 1 Feb 2005 18:30:23 -0500 Subject: Very Large "zero" slab expected? In-Reply-To: <16896.4886.136678.130456@cargo.ozlabs.ibm.com> References: <20050201195207.GA5723@kevlar.burdell.org> <16896.4886.136678.130456@cargo.ozlabs.ibm.com> Message-ID: <20050201233023.GA9063@kevlar.burdell.org> On Wed, Feb 02, 2005 at 10:39:02AM +1100, Paul Mackerras wrote: > Sonny Rao writes: > > > Hi, I'm running a fairly heavy database workload using DB/2 on a 4-way > > 720 ontop of ext3 trying to replicate a separate VM performance issue. > > > > During and after the run, I see a massive about of slab memory being > > used by the "zero" cache, which appears to a be a ppc64 specific > > cache for pmd's and pgd's. > > > > Basically, about 1GB out of my total of 8GB on the system is being > > consumed by this. Is this expected ? It seems excessive > > considering my mapped memory is only about 2GB during the run. > > > > The DB/2 buffer pool was set to 4GB and none of it used hugh pages. > > How many DB/2 processes had the buffer pool mapped? It was probably > 100 or more, I would guess. The zero slab cache is used for pte pages > as well as pmds and pgds, and each process will have a separate page > table tree, which will occupy about 1/500th or so of the amount of > mapped memory, per process. So if you had ~125 processes the page > table trees would occupy about 1GB for a 4GB buffer pool. > > The solution is to implement shared page tables. Some people here in > the LTC are looking at possibly doing that, but getting it accepted > into mainline could be tricky. By the way, the name "zero" slab is not very descriptive, some of the PPC64 developers here in Austin didn't even know what it was off hand. You might consider renaming it to avoid confusion. Thanks, Sonny From linas at austin.ibm.com Wed Feb 2 12:15:11 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 1 Feb 2005 19:15:11 -0600 Subject: [PATCH] PPC64: draft version of EEH code. Message-ID: <20050202011511.GB9140@austin.ibm.com> Hi Ben, Paulus, Attached is my current draft version of the EEH patches. Its lightly tested. It fixes a few bugs compared to previous versions. -- I've chopped out the device_node tracking stuff, because device nodes no longer seem to disappear out from under me. -- #define EEH_MAX_FAILS bumped large; the e1000 driver makes thoursands of attempts before it gives up. -- The fail count is tracked in a distinct variable now. -- I still don't know why the PCI config space doesn't get set up, so this patch sets it manually. As before, the patch is in two parts; one for the ppc64 side, and one for the hotplug side. --linas -------------- next part -------------- ===== arch/ppc64/kernel/eeh.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/eeh.c 2005-01-06 13:05:42 -06:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-02-01 14:16:42 -06:00 @@ -17,21 +17,19 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include #include -#include #include #include #include #include #include -#include +#include #include #include #include #include -#include #include "pci.h" #undef DEBUG @@ -88,8 +86,7 @@ static struct notifier_block *eeh_notifi * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -378,6 +385,17 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; + } } #ifdef DEBUG @@ -389,6 +407,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct device_node *dn, int severity) +{ + unsigned long flags; + int rc; + + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -424,7 +468,7 @@ static int read_slot_reset_state(struct outputs = 3; } - return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, + return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid)); } @@ -484,11 +528,9 @@ static void eeh_event_handler(void *dumm "%s %s\n", event->reset_state, pci_name(event->dev), pci_pretty_name(event->dev)); - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - __get_cpu_var(slot_resets)++; + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); pci_dev_put(event->dev); kfree(event); @@ -496,8 +538,8 @@ static void eeh_event_handler(void *dumm } /** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -512,6 +554,17 @@ static inline unsigned long eeh_token_to return pa | (token & (PAGE_SIZE-1)); } +static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } + return NULL; +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -532,7 +585,7 @@ int eeh_dn_check_failure(struct device_n int ret; int rets[3]; unsigned long flags; - int rc, reset_state; + int reset_state; struct eeh_event *event; __get_cpu_var(total_mmio_ffs)++; @@ -540,16 +593,20 @@ int eeh_dn_check_failure(struct device_n if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -558,8 +615,11 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_check_count ++; + if (dn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Driver ignored %d bad reads, panicing\n", + dn->eeh_check_count); + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ @@ -581,34 +641,25 @@ int eeh_dn_check_failure(struct device_n return 0; } - /* prevent repeated reports of this failure */ + /* Prevent repeated reports of this failure */ dn->eeh_mode |= EEH_MODE_ISOLATED; reset_state = rets[0]; + /* Log the error with the rtas logger */ + if (dn->eeh_freeze_count < EEH_MAX_ALLOWED_FREEZES) { + eeh_slot_error_detail (dn, 1 /* Temporary Error */); + } else { + eeh_slot_error_detail (dn, 2 /* Permanent Error */); + } - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { - eeh_panic(dev, reset_state); + printk (KERN_ERR "EEH: out of memory, event not handled\n"); return 1; } + if (!dev) + dev = eeh_get_pci_dev (dn); event->dev = dev; event->dn = dn; event->reset_state = reset_state; @@ -634,7 +685,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -642,6 +692,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -651,8 +702,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -663,6 +716,123 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int token = rtas_token ("ibm,set-slot-reset"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 1); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot\n"); + return; + } + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. + */ + msleep (200); + + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 0); +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge\n"); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct device_node *dn) +{ + if (! dn->eeh_is_bridge) + __restore_bars (dn); + + if (dn->child) + eeh_restore_bars (dn->child); + + if (dn->sibling) + eeh_restore_bars (dn->sibling); +} + +EXPORT_SYMBOL(eeh_restore_bars); + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -742,7 +912,7 @@ static void *early_enable_eeh(struct dev dn->full_name); } - return NULL; + return NULL; } /* @@ -829,7 +999,9 @@ void eeh_add_device_early(struct device_ return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -848,6 +1020,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -857,6 +1032,14 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; } EXPORT_SYMBOL(eeh_add_device_late); @@ -886,12 +1069,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -899,13 +1087,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; ===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited ===== --- 1.59/arch/ppc64/kernel/pSeries_pci.c 2004-11-15 21:29:10 -06:00 +++ edited/arch/ppc64/kernel/pSeries_pci.c 2005-01-20 17:25:37 -06:00 @@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-25 18:17:38 -05:00 +++ edited/include/asm-ppc64/eeh.h 2005-02-01 13:24:13 -06:00 @@ -22,8 +22,8 @@ #include #include -#include #include +#include struct pci_dev; struct device_node; @@ -33,6 +33,10 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + #ifdef CONFIG_PPC_PSERIES extern void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -57,6 +61,34 @@ void eeh_add_device_early(struct device_ void eeh_add_device_late(struct pci_dev *); /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); + +/** * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * @@ -91,6 +123,10 @@ struct eeh_event { /** Register to find out about EEH events. */ int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); + +/** Restore device configuration info across device resets. + */ +void eeh_restore_bars(struct device_node *); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. ===== include/asm-ppc64/prom.h 1.24 vs edited ===== --- 1.24/include/asm-ppc64/prom.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/prom.h 2005-01-31 18:01:01 -06:00 @@ -164,8 +164,12 @@ struct device_node { int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_check_count; /* number of times device driver ignored error */ + int eeh_freeze_count; /* number of times this device froze up. */ + int eeh_is_bridge; /* device is pci-to-pci bridge */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ struct property *properties; struct device_node *parent; ===== include/asm-ppc64/rtas.h 1.25 vs edited ===== --- 1.25/include/asm-ppc64/rtas.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/rtas.h 2005-01-20 17:25:37 -06:00 @@ -241,4 +241,6 @@ extern void rtas_stop_self(void); /* RMO buffer reserved for user-space RTAS use */ extern unsigned long rtas_rmo_buf; +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ -------------- next part -------------- ===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited ===== --- 1.11/drivers/pci/hotplug/rpaphp.h 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp.h 2005-01-20 17:25:37 -06:00 @@ -125,7 +125,8 @@ extern int rpaphp_enable_pci_slot(struct extern int register_pci_slot(struct slot *slot); extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); -extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); ===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited ===== --- 1.18/drivers/pci/hotplug/rpaphp_core.c 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp_core.c 2005-01-20 17:25:37 -06:00 @@ -443,12 +443,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } ===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited ===== --- 1.17/drivers/pci/hotplug/rpaphp_pci.c 2004-11-18 02:36:18 -06:00 +++ edited/drivers/pci/hotplug/rpaphp_pci.c 2005-02-01 18:57:35 -06:00 @@ -22,8 +22,12 @@ * Send feedback to * */ +#include +#include #include +#include #include +#include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -562,36 +576,181 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_search_bus_for_dev - return 1 if device is under this bus, else 0 + * @bus: the bus to search for this device. + * @dev: the pci device we are looking for. + */ +static int rpaphp_search_bus_for_dev (struct pci_bus *bus, struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *ln; + + if (!bus) return 0; + + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return 1; + if (pdev->subordinate) { + int rc; + rc = rpaphp_search_bus_for_dev (pdev->subordinate, dev); + if (rc) + return 1; + } + } + return 0; +} + +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) +{ + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; - struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB's don't have bridges. */ + if (slot->bridge == NULL) continue; - } + + /* The PCI device could be the slot itself. */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; - } - } + if (rpaphp_search_bus_for_dev (bus, dev)) + return slot; + } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ +int eeh_reset_device (struct pci_dev *dev, int reconfig) +{ + int freeze_count=0; + struct slot *frozen_slot; + + if (!dev) + return 1; + + frozen_slot = rpaphp_find_slot(dev); + if (!frozen_slot) + { + printk (KERN_ERR "EEH: Cannot find PCI slot for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + return 1; + } + + if (frozen_slot->dn->child) + freeze_count = frozen_slot->dn->child->eeh_freeze_count; + + if (reconfig) rpaphp_unconfig_pci_adapter (frozen_slot); + + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (dev), + pci_pretty_name (dev), + freeze_count); + return 1; + } + printk (KERN_WARNING + "EEH: This device has failed %d times since last reboot: %s:%s\n", + freeze_count, + pci_name (dev), + pci_pretty_name (dev)); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (frozen_slot->dn->child); + + rtas_configure_bridge(frozen_slot->dn); + eeh_restore_bars(frozen_slot->dn->child); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + if (reconfig) { + ssleep (5); + rpaphp_enable_pci_slot (frozen_slot); + } + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + if (frozen_slot->dn->child) + frozen_slot->dn->child->eeh_freeze_count = freeze_count; + + return 0; +} + +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + struct eeh_event *event = ev; + + if (!event->dev) + { + if (event->dn) + printk ("EEH: Cannot find the PCI device for dn %s\n", + event->dn->full_name); + else + printk ("EEH: EEH error caught, but no PCI device specified!\n"); + return 1; + } + +if(!strncmp (pci_pretty_name (event->dev), "Mylex Corporation Gemstone", 25)) return 0; + return eeh_reset_device (event->dev, 1); +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + eeh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + eeh_unregister_notifier (&eeh_block); +} + From benh at kernel.crashing.org Wed Feb 2 15:17:48 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 02 Feb 2005 15:17:48 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <20050202011511.GB9140@austin.ibm.com> References: <20050202011511.GB9140@austin.ibm.com> Message-ID: <1107317868.1665.68.camel@gaston> Ok, built your stuff, applied, setup a partition on an SF2 with an ethernet e100 card in a hotplug slot, got the errinj tools (0.8.5), enabled error injection in the FSP, rebooted the partition, and then did: linux:~ # /opt/ibmras/errinjct eeh -f 0 -p U7311.D20.10488BA-P1-C02-T1 Injecting an ioa-bus-error... cpu 0x0: Vector: 700 (Program Check) at [c0000000332c38b0] pc: 00000000077d9374 lr: 000000000000dafc sp: c0000000332c3b30 msr: 81002 current = 0xc000000034b5b030 paca = 0xc000000000573000 pid = 11206, comm = errinjct cpu 0x0: Vector: 700 (Program Check) at [c0000000332c2e00] pc: 00000000077d9374 lr: 000000000000dafc sp: c0000000332c3080 msr: 81002 current = 0xc000000034b5b030 paca = 0xc000000000573000 pid = 11206, comm = errinjct cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop xmon: WARNING: bad recursive fault on cpu 0x0 Looks like it died in RTAS or something weird like that ... Ben. From benh at kernel.crashing.org Wed Feb 2 17:11:14 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 02 Feb 2005 17:11:14 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <1107317868.1665.68.camel@gaston> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> Message-ID: <1107324674.5625.79.camel@gaston> On Wed, 2005-02-02 at 15:17 +1100, Benjamin Herrenschmidt wrote: > Ok, built your stuff, applied, setup a partition on an SF2 with an > ethernet e100 card in a hotplug slot, got the errinj tools (0.8.5), > enabled error injection in the FSP, rebooted the partition, and then > did: > > .../... Ok, it looks like rtas gets destroyed in memory, apparently by the the injection tool itself who is mmap'ing crap. Unfortunately, the .tar.gz that comes with it on the RAS page doesn't contains what it advertises (instead, it contains the source for the rtas error daemon), so I can't debug the tool. Linas, can you fix the source package (and send me the source for errinjct) ? Ben. From arnd at arndb.de Wed Feb 2 21:05:36 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 2 Feb 2005 11:05:36 +0100 Subject: pci: Arch hook to determine config space size In-Reply-To: <1107233864.5963.65.camel@gaston> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <41FF0B0D.8020003@us.ibm.com> <1107233864.5963.65.camel@gaston> Message-ID: <200502021105.42249.arnd@arndb.de> On Dinsdag 01 Februar 2005 05:57, Benjamin Herrenschmidt wrote: > BTW. I'm thinking about moving all those PCI/VIO related fields out of > struct device_node to their own structure and keep only a pointer to > that structure in device_node. That way, we avoid the bloat for every > single non-pci node in the system, and we can have different structures > for different bus types (along with proper iommu function pointers and > that sort-of-thing). How about something along the lines of this patch? Instead of adding a pointer to the pci data from the device node, it embeds the node inside a new struct pci_device_node. The patch is not complete and therefore not expected to work as is, but maybe you want to reuse it. The interesting part that is missing is creating and destroying pci_device_nodes in prom.c, maybe you have an idea how to do that. I'm also not sure about eeh. Are the eeh functions known to be called only for device_nodes of PCI devices? If not, eeh_mode and eeh_config_addr might have to stay inside of device_node. Arnd <>< Signed-off-by: Arnd Bergmann Index: linux-2.6-64/arch/ppc64/kernel/pci.h =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pci.h 2005-01-24 23:46:37.000000000 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pci.h 2005-02-02 00:11:01.485740824 +0100 @@ -29,8 +29,8 @@ /* PCI device_node operations */ struct device_node; -typedef void *(*traverse_func)(struct device_node *me, void *data); -void *traverse_pci_devices(struct device_node *start, traverse_func pre, +typedef void *(*traverse_func)(struct pci_device_node *me, void *data); +void *traverse_pci_devices(struct pci_device_node *start, traverse_func pre, void *data); void pci_devs_phb_init(void); Index: linux-2.6-64/arch/ppc64/kernel/pSeries_iommu.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pSeries_iommu.c 2005-02-01 22:53:00.673332472 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pSeries_iommu.c 2005-02-02 00:11:01.486740672 +0100 @@ -48,7 +48,7 @@ #define DBG(fmt...) -extern int is_python(struct device_node *); +extern int is_python(struct pci_device_node *); static void tce_build_pSeries(struct iommu_table *tbl, long index, long npages, unsigned long uaddr, @@ -289,7 +289,7 @@ * Currently we hard code these sizes (more or less). */ static void iommu_table_setparms_lpar(struct pci_controller *phb, - struct device_node *dn, + struct pci_device_node *dn, struct iommu_table *tbl, unsigned int *dma_window) { @@ -308,7 +308,7 @@ static void iommu_bus_setup_pSeries(struct pci_bus *bus) { - struct device_node *dn, *pdn; + struct pci_device_node *dn, *pdn; DBG("iommu_bus_setup_pSeries, bus %p, bus->self %p\n", bus, bus->self); @@ -331,7 +331,7 @@ DBG("Python root bus %s\n", bus->name); - iohole = (unsigned int *)get_property(dn, "io-hole", 0); + iohole = (unsigned int *)get_property(&dn->node, "io-hole", 0); if (iohole) { /* On first bus we need to leave room for the @@ -349,7 +349,7 @@ tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - iommu_table_setparms(dn->phb, dn, tbl); + iommu_table_setparms(dn->phb, &dn->node, tbl); dn->iommu_table = iommu_init_table(tbl); } else { /* 256 MB window by default */ @@ -368,7 +368,7 @@ tbl = kmalloc(sizeof(struct iommu_table), GFP_KERNEL); - iommu_table_setparms(dn->phb, dn, tbl); + iommu_table_setparms(dn->phb, &dn->node, tbl); dn->iommu_table = iommu_init_table(tbl); } else { @@ -382,17 +382,19 @@ static void iommu_bus_setup_pSeriesLP(struct pci_bus *bus) { struct iommu_table *tbl; - struct device_node *dn, *pdn; + struct pci_device_node *dn, *pdn; + struct device_node *n; unsigned int *dma_window = NULL; dn = pci_bus_to_OF_node(bus); /* Find nearest ibm,dma-window, walking up the device tree */ - for (pdn = dn; pdn != NULL; pdn = pdn->parent) { - dma_window = (unsigned int *)get_property(pdn, "ibm,dma-window", NULL); + for (n = &dn->node; n; n = n->parent) { + dma_window = (unsigned int *)get_property(n, "ibm,dma-window", NULL); if (dma_window != NULL) break; } + pdn = to_pci_node(n); if (dma_window == NULL) { DBG("iommu_bus_setup_pSeriesLP: bus %s seems to have no ibm,dma-window property\n", dn->full_name); @@ -420,7 +422,7 @@ static void iommu_dev_setup_pSeries(struct pci_dev *dev) { - struct device_node *dn, *mydn; + struct pci_device_node *dn, *mydn; DBG("iommu_dev_setup_pSeries, dev %p (%s)\n", dev, dev->pretty_name); /* Now copy the iommu_table ptr from the bus device down to the @@ -430,7 +432,7 @@ mydn = dn = pci_device_to_OF_node(dev); while (dn && dn->iommu_table == NULL) - dn = dn->parent; + dn = to_pci_node(dn->node.parent); if (dn) { mydn->iommu_table = dn->iommu_table; Index: linux-2.6-64/include/asm-ppc64/pci-bridge.h =================================================================== --- linux-2.6-64.orig/include/asm-ppc64/pci-bridge.h 2005-01-24 23:47:11.000000000 +0100 +++ linux-2.6-64/include/asm-ppc64/pci-bridge.h 2005-02-02 00:11:01.486740672 +0100 @@ -53,22 +53,23 @@ /* Get a device_node from a pci_dev. This code must be fast except in the case * where the sysdata is incorrect and needs to be fixed up (hopefully just once) */ -static inline struct device_node *pci_device_to_OF_node(struct pci_dev *dev) +static inline struct pci_device_node *pci_device_to_OF_node(struct pci_dev *dev) { struct device_node *dn = dev->sysdata; + struct pci_device_node *pdn = to_pci_node(dn); - if (dn->devfn == dev->devfn && dn->busno == dev->bus->number) - return dn; /* fast path. sysdata is good */ + if (pdn->devfn == dev->devfn && pdn->busno == dev->bus->number) + return pdn; /* fast path. sysdata is good */ else - return fetch_dev_dn(dev); + return to_pci_node(fetch_dev_dn(dev)); } -static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus) +static inline struct pci_device_node *pci_bus_to_OF_node(struct pci_bus *bus) { if (bus->self) return pci_device_to_OF_node(bus->self); else - return bus->sysdata; /* Must be root bus (PHB) */ + return to_pci_node(bus->sysdata); /* Must be root bus (PHB) */ } extern void pci_process_bridge_OF_ranges(struct pci_controller *hose, @@ -83,7 +84,7 @@ struct device_node *busdn = bus->sysdata; BUG_ON(busdn == NULL); - return busdn->phb; + return to_pci_node(busdn)->phb; } #endif Index: linux-2.6-64/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6-64.orig/include/asm-ppc64/eeh.h 2005-02-01 22:32:12.361104776 +0100 +++ linux-2.6-64/include/asm-ppc64/eeh.h 2005-02-02 00:11:01.487740520 +0100 @@ -27,7 +27,7 @@ struct pci_dev; struct device_node; -struct device_node; +struct pci_device_node; struct notifier_block; #ifdef CONFIG_EEH @@ -40,7 +40,7 @@ void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev); +int eeh_dn_check_failure(struct pci_device_node *dn, struct pci_dev *dev); void __init pci_addr_cache_build(void); /** @@ -52,7 +52,7 @@ * device (including config space i/o). Call eeh_add_device_late * to finish the eeh setup for this device. */ -void eeh_add_device_early(struct device_node *); +void eeh_add_device_early(struct pci_device_node *); void eeh_add_device_late(struct pci_dev *); /** Index: linux-2.6-64/arch/ppc64/kernel/sys_ppc32.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/sys_ppc32.c 2005-02-01 22:53:00.673332472 +0100 +++ linux-2.6-64/arch/ppc64/kernel/sys_ppc32.c 2005-02-02 00:11:01.487740520 +0100 @@ -744,7 +744,7 @@ struct pci_controller* hose; struct list_head *ln; struct pci_bus *bus = NULL; - struct device_node *hose_node; + struct pci_device_node *hose_node; /* Argh ! Please forgive me for that hack, but that's the * simplest way to get existing XFree to not lockup on some @@ -771,7 +771,7 @@ if (bus == NULL || bus->sysdata == NULL) return -ENODEV; - hose_node = (struct device_node *)bus->sysdata; + hose_node = to_pci_node(bus->sysdata); hose = hose_node->phb; switch (which) { Index: linux-2.6-64/arch/ppc64/kernel/iommu.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/iommu.c 2005-01-24 23:46:37.000000000 +0100 +++ linux-2.6-64/arch/ppc64/kernel/iommu.c 2005-02-02 00:11:01.488740368 +0100 @@ -432,7 +432,7 @@ return tbl; } -void iommu_free_table(struct device_node *dn) +void iommu_free_table(struct pci_device_node *dn) { struct iommu_table *tbl = dn->iommu_table; unsigned long bitmap_sz, i; @@ -440,7 +440,7 @@ if (!tbl || !tbl->it_map) { printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, - dn->full_name); + dn->node.full_name); return; } @@ -449,7 +449,7 @@ for (i = 0; i < (tbl->it_size/64); i++) { if (tbl->it_map[i] != 0) { printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", - __FUNCTION__, dn->full_name); + __FUNCTION__, dn->node.full_name); break; } } Index: linux-2.6-64/include/asm-ppc64/iommu.h =================================================================== --- linux-2.6-64.orig/include/asm-ppc64/iommu.h 2005-01-24 23:47:11.000000000 +0100 +++ linux-2.6-64/include/asm-ppc64/iommu.h 2005-02-02 00:11:01.488740368 +0100 @@ -101,6 +101,7 @@ #endif /* CONFIG_PPC_ISERIES */ struct scatterlist; +struct pci_device_node; #ifdef CONFIG_PPC_MULTIPLATFORM @@ -109,14 +110,14 @@ extern void iommu_setup_u3(void); /* Frees table for an individual device node */ -extern void iommu_free_table(struct device_node *dn); +extern void iommu_free_table(struct pci_device_node *dn); #endif /* CONFIG_PPC_MULTIPLATFORM */ #ifdef CONFIG_PPC_PSERIES /* Creates table for an individual device node */ -extern void iommu_devnode_init_pSeries(struct device_node *dn); +extern void iommu_devnode_init_pSeries(struct pci_device_node *dn); #endif /* CONFIG_PPC_PSERIES */ Index: linux-2.6-64/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pci.c 2005-01-24 23:46:36.000000000 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pci.c 2005-02-02 00:11:01.489740216 +0100 @@ -447,13 +447,13 @@ static ssize_t pci_show_devspec(struct device *dev, char *buf) { struct pci_dev *pdev; - struct device_node *np; + struct pci_device_node *np; pdev = to_pci_dev (dev); np = pci_device_to_OF_node(pdev); - if (np == NULL || np->full_name == NULL) + if (np == NULL || np->node.full_name == NULL) return 0; - return sprintf(buf, "%s", np->full_name); + return sprintf(buf, "%s", np->node.full_name); } static DEVICE_ATTR(devspec, S_IRUGO, pci_show_devspec, NULL); #endif /* CONFIG_PPC_MULTIPLATFORM */ @@ -734,7 +734,8 @@ */ int pcibios_scan_all_fns(struct pci_bus *bus, int devfn) { - struct device_node *busdn, *dn; + struct pci_device_node *busdn; + struct device_node *dn; if (bus->self) busdn = pci_device_to_OF_node(bus->self); @@ -749,8 +750,8 @@ * device tree. If they are then we need to scan all the * functions of this slot. */ - for (dn = busdn->child; dn; dn = dn->sibling) - if ((dn->devfn >> 3) == (devfn >> 3)) + for (dn = busdn->node.child; dn; dn = dn->sibling) + if ((to_pci_node(dn)->devfn >> 3) == (devfn >> 3)) return 1; return 0; @@ -851,7 +852,7 @@ int pci_read_irq_line(struct pci_dev *pci_dev) { u8 intpin; - struct device_node *node; + struct pci_device_node *node; pci_read_config_byte(pci_dev, PCI_INTERRUPT_PIN, &intpin); if (intpin == 0) @@ -861,10 +862,10 @@ if (node == NULL) return -1; - if (node->n_intrs == 0) + if (node->node.n_intrs == 0) return -1; - pci_dev->irq = node->intrs[0].line; + pci_dev->irq = node->node.intrs[0].line; pci_write_config_byte(pci_dev, PCI_INTERRUPT_LINE, pci_dev->irq); Index: linux-2.6-64/include/asm-ppc64/prom.h =================================================================== --- linux-2.6-64.orig/include/asm-ppc64/prom.h 2005-02-01 22:32:12.365104168 +0100 +++ linux-2.6-64/include/asm-ppc64/prom.h 2005-02-02 00:11:01.490740064 +0100 @@ -131,15 +131,6 @@ struct interrupt_info *intrs; char *full_name; - /* PCI stuff probably doesn't belong here */ - int busno; /* for pci devices */ - int bussubno; /* for pci devices */ - int devfn; /* for pci devices */ - int eeh_mode; /* See eeh.h for possible EEH_MODEs */ - int eeh_config_addr; - struct pci_controller *phb; /* for pci devices */ - struct iommu_table *iommu_table; /* for phb's or bridges */ - struct property *properties; struct device_node *parent; struct device_node *child; @@ -153,6 +144,22 @@ unsigned long _flags; }; +struct pci_device_node { + int busno; + int bussubno; + int devfn; + int eeh_mode; /* See eeh.h for possible EEH_MODEs */ + int eeh_config_addr; + struct pci_controller *phb; + struct iommu_table *iommu_table; /* for phb's or bridges */ + struct device_node node; +}; + +static inline struct pci_device_node *to_pci_node(struct device_node *n) +{ + return n ? container_of(n, struct pci_device_node, node) : NULL; +} + extern struct device_node *of_chosen; /* flag descriptions */ Index: linux-2.6-64/arch/ppc64/kernel/pci_iommu.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pci_iommu.c 2005-01-24 23:46:37.000000000 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pci_iommu.c 2005-02-02 00:11:01.490740064 +0100 @@ -48,7 +48,7 @@ * pci_device_to_OF_node since ->sysdata will have been initialised * in the iommu init code for all devices. */ -#define PCI_GET_DN(dev) ((struct device_node *)((dev)->sysdata)) +#define PCI_GET_DN(dev) (to_pci_node(((dev)->sysdata))) static inline struct iommu_table *devnode_table(struct pci_dev *dev) { Index: linux-2.6-64/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pci_dn.c 2005-01-24 23:46:37.000000000 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pci_dn.c 2005-02-02 00:11:01.490740064 +0100 @@ -34,13 +34,13 @@ * Traverse_func that inits the PCI fields of the device node. * NOTE: this *must* be done before read/write config to the device. */ -static void * __devinit update_dn_pci_info(struct device_node *dn, void *data) +static void * __devinit update_dn_pci_info(struct pci_device_node *dn, void *data) { struct pci_controller *phb = data; u32 *regs; dn->phb = phb; - regs = (u32 *)get_property(dn, "reg", NULL); + regs = (u32 *)get_property(&dn->node, "reg", NULL); if (regs) { /* First register entry is addr (00BBSS00) */ dn->busno = (regs[0] >> 16) & 0xff; @@ -67,21 +67,21 @@ * one of these nodes we also assume its siblings are non-pci for * performance. */ -void *traverse_pci_devices(struct device_node *start, traverse_func pre, +void *traverse_pci_devices(struct pci_device_node *start, traverse_func pre, void *data) { struct device_node *dn, *nextdn; void *ret; /* We started with a phb, iterate all childs */ - for (dn = start->child; dn; dn = nextdn) { + for (dn = start->node.child; dn; dn = nextdn) { u32 *classp, class; nextdn = NULL; classp = (u32 *)get_property(dn, "class-code", NULL); class = classp ? *classp : 0; - if (pre && ((ret = pre(dn, data)) != NULL)) + if (pre && ((ret = pre(to_pci_node(dn), data)) != NULL)) return ret; /* If we are a PCI bridge, go down */ @@ -96,7 +96,7 @@ /* Walk up to next valid sibling. */ do { dn = dn->parent; - if (dn == start) + if (dn == &start->node) return NULL; } while (dn->sibling == NULL); nextdn = dn->sibling; @@ -107,7 +107,7 @@ void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb) { - struct device_node * dn = (struct device_node *) phb->arch_data; + struct pci_device_node *dn = to_pci_node(phb->arch_data); /* PHB nodes themselves must not match */ dn->devfn = dn->busno = -1; @@ -121,7 +121,7 @@ * Traversal func that looks for a value. * If found, the device_node is returned (thus terminating the traversal). */ -static void *is_devfn_node(struct device_node *dn, void *data) +static void *is_devfn_node(struct pci_device_node *dn, void *data) { int busno = ((unsigned long)data >> 8) & 0xff; int devfn = ((unsigned long)data) & 0xff; @@ -144,13 +144,13 @@ */ struct device_node *fetch_dev_dn(struct pci_dev *dev) { - struct device_node *orig_dn = dev->sysdata; + struct pci_device_node *orig_dn = to_pci_node(dev->sysdata); struct pci_controller *phb = orig_dn->phb; /* assume same phb as orig_dn */ - struct device_node *phb_dn; + struct pci_device_node *phb_dn; struct device_node *dn; unsigned long searchval = (dev->bus->number << 8) | dev->devfn; - phb_dn = phb->arch_data; + phb_dn = to_pci_node(phb->arch_data); dn = traverse_pci_devices(phb_dn, is_devfn_node, (void *)searchval); if (dn) dev->sysdata = dn; Index: linux-2.6-64/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/eeh.c 2005-02-02 00:10:53.126011696 +0100 +++ linux-2.6-64/arch/ppc64/kernel/eeh.c 2005-02-02 00:11:06.039048616 +0100 @@ -254,7 +254,7 @@ static void __pci_addr_cache_insert_device(struct pci_dev *dev) { - struct device_node *dn; + struct pci_device_node *dn; int i; int inserted = 0; @@ -413,7 +413,7 @@ * @dn: device node to read * @rets: array to return results in */ -static int read_slot_reset_state(struct device_node *dn, int rets[]) +static int read_slot_reset_state(struct pci_device_node *dn, int rets[]) { int token, outputs; @@ -528,7 +528,7 @@ * * It is safe to call this routine in an interrupt context. */ -int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) +int eeh_dn_check_failure(struct pci_device_node *dn, struct pci_dev *dev) { int ret; int rets[3]; @@ -603,7 +603,7 @@ spin_unlock_irqrestore(&slot_errbuf_lock, flags); printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); + rets[0], dn->node.name, dn->node.full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { eeh_panic(dev, reset_state); @@ -611,7 +611,7 @@ } event->dev = dev; - event->dn = dn; + event->dn = &dn->node; event->reset_state = reset_state; /* We may or may not be called in an interrupt context */ @@ -647,7 +647,7 @@ { unsigned long addr; struct pci_dev *dev; - struct device_node *dn; + struct pci_device_node *dn; /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); @@ -670,8 +670,9 @@ }; /* Enable eeh for the given device node. */ -static void *early_enable_eeh(struct device_node *dn, void *data) +static void *early_enable_eeh(struct pci_device_node *pdn, void *data) { + struct device_node *dn = &pdn->node; struct eeh_early_enable_info *info = data; int ret; char *status = get_property(dn, "status", NULL); @@ -681,7 +682,7 @@ u32 *regs; int enable; - dn->eeh_mode = 0; + pdn->eeh_mode = 0; if (status && strcmp(status, "ok") != 0) return NULL; /* ignore devices with bad status */ @@ -692,7 +693,7 @@ /* There is nothing to check on PCI to ISA bridges */ if (dn->type && !strcmp(dn->type, "isa")) { - dn->eeh_mode |= EEH_MODE_NOCHECK; + pdn->eeh_mode |= EEH_MODE_NOCHECK; return NULL; } @@ -709,7 +710,7 @@ enable = 0; if (!enable) - dn->eeh_mode |= EEH_MODE_NOCHECK; + pdn->eeh_mode |= EEH_MODE_NOCHECK; /* Ok... see if this device supports EEH. Some do, some don't, * and the only way to find out is to check each and every one. */ @@ -722,8 +723,8 @@ EEH_ENABLE); if (ret == 0) { eeh_subsystem_enabled = 1; - dn->eeh_mode |= EEH_MODE_SUPPORTED; - dn->eeh_config_addr = regs[0]; + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = regs[0]; #ifdef DEBUG printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name); #endif @@ -731,10 +732,12 @@ /* This device doesn't support EEH, but it may have an * EEH parent, in which case we mark it as supported. */ - if (dn->parent && (dn->parent->eeh_mode & EEH_MODE_SUPPORTED)) { + if (dn->parent && (to_pci_node(dn->parent)->eeh_mode + & EEH_MODE_SUPPORTED)) { /* Parent supports EEH. */ - dn->eeh_mode |= EEH_MODE_SUPPORTED; - dn->eeh_config_addr = dn->parent->eeh_config_addr; + pdn->eeh_mode |= EEH_MODE_SUPPORTED; + pdn->eeh_config_addr = + to_pci_node(dn->parent)->eeh_config_addr; return NULL; } } @@ -798,7 +801,7 @@ info.buid_lo = BUID_LO(buid); info.buid_hi = BUID_HI(buid); - traverse_pci_devices(phb, early_enable_eeh, &info); + traverse_pci_devices(to_pci_node(phb), early_enable_eeh, &info); } if (eeh_subsystem_enabled) @@ -819,7 +822,7 @@ * on the CEC architecture, type of the device, on earlier boot * command-line arguments & etc. */ -void eeh_add_device_early(struct device_node *dn) +void eeh_add_device_early(struct pci_device_node *dn) { struct pci_controller *phb; struct eeh_early_enable_info info; Index: linux-2.6-64/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/prom.c 2005-02-02 00:10:53.129011240 +0100 +++ linux-2.6-64/arch/ppc64/kernel/prom.c 2005-02-02 00:11:06.041048312 +0100 @@ -1802,8 +1802,11 @@ */ static void of_cleanup_node(struct device_node *np) { - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); + if (get_property(np, "ibm,dma-window", NULL)) { + struct pci_device_node *p = to_pci_node(np); + if (p->iommu_table) + iommu_free_table(p); + } } /* Index: linux-2.6-64/arch/ppc64/kernel/pSeries_pci.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/pSeries_pci.c 2005-02-02 00:10:53.127011544 +0100 +++ linux-2.6-64/arch/ppc64/kernel/pSeries_pci.c 2005-02-02 00:11:06.040048464 +0100 @@ -52,14 +52,12 @@ extern struct mpic *pSeries_mpic; -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +static int rtas_read_config(struct pci_device_node *dn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; int ret; - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -87,7 +85,8 @@ unsigned int devfn, int where, int size, u32 *val) { - struct device_node *busdn, *dn; + struct pci_device_node *busdn; + struct device_node *dn; if (bus->self) busdn = pci_device_to_OF_node(bus->self); @@ -95,13 +94,15 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->devfn == devfn) - return rtas_read_config(dn, where, size, val); + for (dn = busdn->node.child; dn; dn = dn->sibling) { + struct pci_device_node *pdn = to_pci_node(dn); + if (pdn->devfn == devfn) + return rtas_read_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +static int rtas_write_config(struct pci_device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; @@ -129,7 +130,8 @@ unsigned int devfn, int where, int size, u32 val) { - struct device_node *busdn, *dn; + struct pci_device_node *busdn; + struct device_node *dn; if (bus->self) busdn = pci_device_to_OF_node(bus->self); @@ -137,9 +139,11 @@ busdn = bus->sysdata; /* must be a phb */ /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->devfn == devfn) - return rtas_write_config(dn, where, size, val); + for (dn = busdn->node.child; dn; dn = dn->sibling) { + struct pci_device_node *pdn = to_pci_node(dn); + if (pdn->devfn == devfn) + return rtas_write_config(pdn, where, size, val); + } return PCIBIOS_DEVICE_NOT_FOUND; } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050202/a095da2d/attachment.pgp From sleddog at us.ibm.com Thu Feb 3 03:50:34 2005 From: sleddog at us.ibm.com (Dave C Boutcher) Date: Wed, 2 Feb 2005 10:50:34 -0600 Subject: HUPin the ttys in drivers/char/hvcs.c In-Reply-To: <16895.55751.310549.454083@kitch0.watson.ibm.com> References: <16895.55751.310549.454083@kitch0.watson.ibm.com> Message-ID: <20050202165034.GA17490@cs.umn.edu> On Tue, Feb 01, 2005 at 02:34:31PM -0500, Jimi Xenidis wrote: > > In an LPAR environment there is the hvc (client side VTERM) and the > hvcs (server side VTERM). If the /dev/hvcs is paired/registered > with a VTERM what is removed (as in the case of LPAR death) the > H_GET_TERM_CHAR hcall will eventually return H_Closed. > > IMHO, when this event occurs the /dev/hvcs should get HUPed and > ultimately an H_FREE_VTERM should occurs on the channel. > Otherwise the administrator would have to clean up after it. I think a typical case is where the administrator leaves the console active for long periods of time (so as to have a record of things like panics.) I think a reboot of the client partition also causes H_Closed (though I could be wrong about that, and Ryan is off having a baby so I can't ask him.) I wouldn't want a client reboot to cause a HUP. -- Dave Boutcher From nfont at austin.ibm.com Thu Feb 3 04:30:05 2005 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Wed, 02 Feb 2005 11:30:05 -0600 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <1107324674.5625.79.camel@gaston> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> <1107324674.5625.79.camel@gaston> Message-ID: <1107365406.32699.7.camel@dyn95394167.austin.ibm.com> On Wed, 2005-02-02 at 17:11 +1100, Benjamin Herrenschmidt wrote: > Unfortunately, the > .tar.gz that comes with it on the RAS page doesn't contains what it > advertises (instead, it contains the source for the rtas error daemon), > so I can't debug the tool. > > Linas, can you fix the source package (and send me the source for > errinjct) ? > > Ben. > Oops. The source tarball on the website should contain the correct source code now. -Nathan F. From linas at austin.ibm.com Thu Feb 3 06:08:57 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 2 Feb 2005 13:08:57 -0600 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <1107317868.1665.68.camel@gaston> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> Message-ID: <20050202190857.GC9140@austin.ibm.com> On Wed, Feb 02, 2005 at 03:17:48PM +1100, Benjamin Herrenschmidt was heard to remark: > > cpu 0x0: Vector: 700 (Program Check) at [c0000000332c38b0] > pc: 00000000077d9374 > lr: 000000000000dafc > sp: c0000000332c3b30 > msr: 81002 > current = 0xc000000034b5b030 > paca = 0xc000000000573000 > pid = 11206, comm = errinjct MSR shows it died in 32-bit real mode: i.e. in the firmware. I've seen this sporadically, I've assumed it was the off-kilter firmware on the box I have access to. But since you're seeing it... John Rose, the maintainer of librtas (the user space lib that does the memory mapping) tells me that even if corrupt values are passed to RTAS, the firmware should not crash. --linas From cfriesen at nortel.com Thu Feb 3 06:01:33 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Wed, 02 Feb 2005 13:01:33 -0600 Subject: buggy mlock() behaviour in 2.6.9 on ppc64? In-Reply-To: <20050202181737.GA29505@austin.ibm.com> References: <420117CA.8090507@nortelnetworks.com> <20050202181737.GA29505@austin.ibm.com> Message-ID: <4201238D.4030700@nortel.com> Olof Johansson wrote: > On Wed, Feb 02, 2005 at 12:11:22PM -0600, Chris Friesen wrote: >>I've got a simple test app that tries to mmap() and mlock() an amount of >>memory specified on the commandline. >>With the ppc64 kernel, however, the system hangs, and the fans speed >>starts increasing. It seems like it doesn't realize that there is >>simply no way that it will ever be able to succeed. >> >>Is this expected behaviour for ppc64? > linuxppc64-dev at ozlabs.org is the ppc64 mailing list, you might want to > Cc: there instead for the ppc64 questions. Ah, right. I'm copying the list if anyone else wants to take a look. > Anyhow, can you send me the testcase? Sounds like you're running on a > G5, it's easier to debug on a pSeries where we have debugger console. > Unless you have a stealth card in the machine? Yes, I am on a G5 and I don't have a stealth card. Here's the testcase. I've got 2gig of physical memory. "chewtest 1 1000000" gives an error that it can't allocate memory "chewtest 1 500000" gets killed by the oom killer "chewtest 1000 1000" just sits there Chris #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { if (argc != 3) { printf("usage: %s \n", *argv); return -1; } int mappings = atoi(argv[1]); int ppm = atoi(argv[2]); int mapsize = sysconf(_SC_PAGESIZE)*ppm; int i; int rc; void *buf; for (i=0;i References: <16895.55751.310549.454083@kitch0.watson.ibm.com> Message-ID: <1107373851.16433.8.camel@localhost.localdomain> On Tue, 2005-02-01 at 14:34 -0500, Jimi Xenidis wrote: > In an LPAR environment there is the hvc (client side VTERM) and the > hvcs (server side VTERM). If the /dev/hvcs is paired/registered > with a VTERM what is removed (as in the case of LPAR death) the > H_GET_TERM_CHAR hcall will eventually return H_Closed. > > IMHO, when this event occurs the /dev/hvcs should get HUPed and > ultimately an H_FREE_VTERM should occurs on the channel. > Otherwise the administrator would have to clean up after it. > > Thoughts? > -JX Hey Jimi, Great catch, you're absolutely right. At the moment, hvcs uses /arch/ppc64/kernel/hvconsole.c::hvc_get_chars() to gather character data from the hypervisor. If the partner vty adapter is removed, as you say, an H_Closed will be returned to hvc_get_chars(). This function doesn't do anything with errors atm. I think the best thing to do would be to update the arch code to return an errno when errors are encountered. Then hvcs can kick off a hangup if a partner adapter is removed. Thanks for catching this Jimi. It may take me a bit to get a fix. I have to make sure the hvcs scheduling algorithm can properly handle this scenario. -- Ryan Arnold IBM Linux Technology Center From paulus at samba.org Thu Feb 3 08:10:45 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 3 Feb 2005 08:10:45 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <20050202190857.GC9140@austin.ibm.com> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> <20050202190857.GC9140@austin.ibm.com> Message-ID: <16897.16853.103205.119522@cargo.ozlabs.ibm.com> Linas Vepstas writes: > On Wed, Feb 02, 2005 at 03:17:48PM +1100, Benjamin Herrenschmidt was heard to remark: > > > > cpu 0x0: Vector: 700 (Program Check) at [c0000000332c38b0] > > pc: 00000000077d9374 > > lr: 000000000000dafc > > sp: c0000000332c3b30 > > msr: 81002 > > current = 0xc000000034b5b030 > > paca = 0xc000000000573000 > > pid = 11206, comm = errinjct > > MSR shows it died in 32-bit real mode: i.e. in the firmware. > > I've seen this sporadically, I've assumed it was the off-kilter firmware > on the box I have access to. But since you're seeing it... > > John Rose, the maintainer of librtas (the user space lib that does > the memory mapping) tells me that even if corrupt values are passed to > RTAS, the firmware should not crash. When Ben and I dug into it a little using xmon, it turned out that 77d9374 (the pc value) was the RTAS entry point. We dumped out memory at that address and it was all zeroes. Hence the 700 exception. Ben then straced the errinjct program, and it was reading the proc file to get the rmo buffer base and getting back 77c9000. Then it did an mmap of /dev/mem at offset 77d9000 for 4096 bytes, which happens to be the first page of the RTAS private data area. So it looks very much like a bug in errinjct is causing it to overwrite the first page of RTAS's data area. Hence the desire to see the source for errinjct. Paul. From johnrose at us.ibm.com Thu Feb 3 08:56:46 2005 From: johnrose at us.ibm.com (John H Rose) Date: Wed, 2 Feb 2005 15:56:46 -0600 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <16897.16853.103205.119522@cargo.ozlabs.ibm.com> Message-ID: > When Ben and I dug into it a little using xmon, it turned out that > 77d9374 (the pc value) was the RTAS entry point. We dumped out memory > at that address and it was all zeroes. Hence the 700 exception. Ben > then straced the errinjct program, and it was reading the proc file to > get the rmo buffer base and getting back 77c9000. Then it did an mmap > of /dev/mem at offset 77d9000 for 4096 bytes, which happens to be the > first page of the RTAS private data area. So it looks very much like > a bug in errinjct is causing it to overwrite the first page of RTAS's > data area. Hence the desire to see the source for errinjct. It would appear that librtas isn't enforcing the RMO buffer size communicated by /proc/ppc64/rmobuf. This does sound like (partly) a librtas bug, and I'll have it fixed shortly. Librtas hands out low memory space beginning at the base of the reserved region. This could only occur in the case of a system that had already reserved the 19 pages of low memory without releasing any of them. This seems close to impossible, without some application acting erroneously. Thanks- John From benh at kernel.crashing.org Thu Feb 3 11:23:35 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Feb 2005 11:23:35 +1100 Subject: pci: Arch hook to determine config space size In-Reply-To: <200502021105.42249.arnd@arndb.de> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <41FF0B0D.8020003@us.ibm.com> <1107233864.5963.65.camel@gaston> <200502021105.42249.arnd@arndb.de> Message-ID: <1107390215.30709.88.camel@gaston> On Wed, 2005-02-02 at 11:05 +0100, Arnd Bergmann wrote: > How about something along the lines of this patch? Instead of adding a > pointer to the pci data from the device node, it embeds the node inside > a new struct pci_device_node. The patch is not complete and therefore > not expected to work as is, but maybe you want to reuse it. > > The interesting part that is missing is creating and destroying > pci_device_nodes in prom.c, maybe you have an idea how to do that. > > I'm also not sure about eeh. Are the eeh functions known to be called > only for device_nodes of PCI devices? If not, eeh_mode and > eeh_config_addr might have to stay inside of device_node. I'd rather not go that way for now. There are at least PCI and VIO devices concerned by this, and maybe more (depending on how I deal with macio devices for example). We also want, ultimately, to have the DMA routines be function pointers in this auxilliary structure. Ben. From benh at kernel.crashing.org Thu Feb 3 12:17:19 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Feb 2005 12:17:19 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: References: Message-ID: <1107393440.1665.89.camel@gaston> On Wed, 2005-02-02 at 15:56 -0600, John H Rose wrote: > It would appear that librtas isn't enforcing the RMO buffer size > communicated by /proc/ppc64/rmobuf. This does sound like (partly) a > librtas bug, and I'll have it fixed shortly. > > Librtas hands out low memory space beginning at the base of the reserved > region. This could only occur in the case of a system that had already > reserved the 19 pages of low memory without releasing any of them. This > seems close to impossible, without some application acting erroneously. I have a 100% reprocase here, booting SLES9 and launching errinjct once. Ben. From benh at kernel.crashing.org Thu Feb 3 12:46:12 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Feb 2005 12:46:12 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <1107365406.32699.7.camel@dyn95394167.austin.ibm.com> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> <1107324674.5625.79.camel@gaston> <1107365406.32699.7.camel@dyn95394167.austin.ibm.com> Message-ID: <1107395173.5625.91.camel@gaston> On Wed, 2005-02-02 at 11:30 -0600, Nathan Fontenot wrote: > On Wed, 2005-02-02 at 17:11 +1100, Benjamin Herrenschmidt wrote: > > Unfortunately, the > > .tar.gz that comes with it on the RAS page doesn't contains what it > > advertises (instead, it contains the source for the rtas error daemon), > > so I can't debug the tool. > > > > Linas, can you fix the source package (and send me the source for > > errinjct) ? > > > > Ben. > > > > Oops. The source tarball on the website should contain the correct > source code now. Nope, the tarball still contains rtas_errd instead of errinjct :) Also, is the source for librtas available somewhere ? Ben. From benh at kernel.crashing.org Thu Feb 3 12:50:07 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Feb 2005 12:50:07 +1100 Subject: [PATCH] PPC64: draft version of EEH code. In-Reply-To: <1107395173.5625.91.camel@gaston> References: <20050202011511.GB9140@austin.ibm.com> <1107317868.1665.68.camel@gaston> <1107324674.5625.79.camel@gaston> <1107365406.32699.7.camel@dyn95394167.austin.ibm.com> <1107395173.5625.91.camel@gaston> Message-ID: <1107395407.1665.93.camel@gaston> > Nope, the tarball still contains rtas_errd instead of errinjct :) Ooops, forget it, mozilla playing tricks with me. > Also, is the source for librtas available somewhere ? That one still applies :) Ben. From linas at austin.ibm.com Fri Feb 4 11:55:28 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 3 Feb 2005 18:55:28 -0600 Subject: [PATCH]: Another Revision for EEH Message-ID: <20050204005528.GH9140@austin.ibm.com> Hi Ben, Paul, Not to shower you with endless patches, but another bug report came in today. Appearently, we were not handling the "slot reset state == 5 (permanent failure)" case at all. FWIW, the patches below now handle the case, but I had to restructure the code a fair amount to do this cleanly. Which is fine, because I like the restructured version better anyway. --linas -------------- next part -------------- ===== arch/ppc64/kernel/eeh.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/eeh.c 2005-01-06 13:05:42 -06:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-02-03 18:30:33 -06:00 @@ -17,21 +17,19 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include #include -#include #include #include #include #include #include -#include +#include #include #include #include #include -#include #include "pci.h" #undef DEBUG @@ -88,8 +86,7 @@ static struct notifier_block *eeh_notifi * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -378,6 +385,17 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; + } } #ifdef DEBUG @@ -389,6 +407,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct device_node *dn, int severity) +{ + unsigned long flags; + int rc; + + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -421,10 +465,11 @@ static int read_slot_reset_state(struct outputs = 4; } else { token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ outputs = 3; } - return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, + return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid)); } @@ -480,15 +525,15 @@ static void eeh_event_handler(void *dumm if (event == NULL) break; - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s %s\n", event->reset_state, - pci_name(event->dev), pci_pretty_name(event->dev)); - - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); + if (event->reset_state != 5) { + printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " + "%s %s\n", event->reset_state, + pci_name(event->dev), pci_pretty_name(event->dev)); + } __get_cpu_var(slot_resets)++; + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); pci_dev_put(event->dev); kfree(event); @@ -496,8 +541,8 @@ static void eeh_event_handler(void *dumm } /** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -512,6 +557,17 @@ static inline unsigned long eeh_token_to return pa | (token & (PAGE_SIZE-1)); } +static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } + return NULL; +} + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -532,7 +588,6 @@ int eeh_dn_check_failure(struct device_n int ret; int rets[3]; unsigned long flags; - int rc, reset_state; struct eeh_event *event; __get_cpu_var(total_mmio_ffs)++; @@ -540,16 +595,20 @@ int eeh_dn_check_failure(struct device_n if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -558,8 +617,11 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_check_count ++; + if (dn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Driver ignored %d bad reads, panicing\n", + dn->eeh_check_count); + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ @@ -576,42 +638,27 @@ int eeh_dn_check_failure(struct device_n * In any case they must share a common PHB. */ ret = read_slot_reset_state(dn, rets); - if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { + if (!(ret == 0 && ((rets[1] == 1 && (rets[0] == 2 || rets[0] >= 4)) + || (rets[0] == 5)))) { __get_cpu_var(false_positives)++; return 0; } - /* prevent repeated reports of this failure */ + /* Prevent repeated reports of this failure */ dn->eeh_mode |= EEH_MODE_ISOLATED; - reset_state = rets[0]; - - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { - eeh_panic(dev, reset_state); + printk (KERN_ERR "EEH: out of memory, event not handled\n"); return 1; } + if (!dev) + dev = eeh_get_pci_dev (dn); event->dev = dev; event->dn = dn; - event->reset_state = reset_state; + event->reset_state = rets[0]; + event->time_unavail = rets[2]; /* We may or may not be called in an interrupt context */ spin_lock_irqsave(&eeh_eventlist_lock, flags); @@ -621,7 +668,7 @@ int eeh_dn_check_failure(struct device_n /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - dump_stack(); + if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); return 0; @@ -634,7 +681,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -642,6 +688,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -651,8 +698,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -663,6 +712,123 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int token = rtas_token ("ibm,set-slot-reset"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 1); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot\n"); + return; + } + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. + */ + msleep (200); + + rc = rtas_call(token,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + 0); +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge\n"); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct device_node *dn) +{ + if (! dn->eeh_is_bridge) + __restore_bars (dn); + + if (dn->child) + eeh_restore_bars (dn->child); + + if (dn->sibling) + eeh_restore_bars (dn->sibling); +} + +EXPORT_SYMBOL(eeh_restore_bars); + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -742,7 +908,7 @@ static void *early_enable_eeh(struct dev dn->full_name); } - return NULL; + return NULL; } /* @@ -829,7 +995,9 @@ void eeh_add_device_early(struct device_ return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -848,6 +1016,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -857,6 +1028,14 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; } EXPORT_SYMBOL(eeh_add_device_late); @@ -886,12 +1065,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -899,13 +1083,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; ===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited ===== --- 1.59/arch/ppc64/kernel/pSeries_pci.c 2004-11-15 21:29:10 -06:00 +++ edited/arch/ppc64/kernel/pSeries_pci.c 2005-01-20 17:25:37 -06:00 @@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-25 18:17:38 -05:00 +++ edited/include/asm-ppc64/eeh.h 2005-02-03 15:05:52 -06:00 @@ -22,8 +22,8 @@ #include #include -#include #include +#include struct pci_dev; struct device_node; @@ -33,6 +33,10 @@ struct device_node; #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 + #ifdef CONFIG_PPC_PSERIES extern void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val); @@ -57,6 +61,34 @@ void eeh_add_device_early(struct device_ void eeh_add_device_late(struct pci_dev *); /** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); + +/** * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * @@ -86,11 +118,16 @@ struct eeh_event { struct pci_dev *dev; struct device_node *dn; int reset_state; + int time_unavail; }; /** Register to find out about EEH events. */ int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); + +/** Restore device configuration info across device resets. + */ +void eeh_restore_bars(struct device_node *); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. ===== include/asm-ppc64/prom.h 1.24 vs edited ===== --- 1.24/include/asm-ppc64/prom.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/prom.h 2005-01-31 18:01:01 -06:00 @@ -164,8 +164,12 @@ struct device_node { int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_check_count; /* number of times device driver ignored error */ + int eeh_freeze_count; /* number of times this device froze up. */ + int eeh_is_bridge; /* device is pci-to-pci bridge */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ struct property *properties; struct device_node *parent; ===== include/asm-ppc64/rtas.h 1.25 vs edited ===== --- 1.25/include/asm-ppc64/rtas.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/rtas.h 2005-01-20 17:25:37 -06:00 @@ -241,4 +241,6 @@ extern void rtas_stop_self(void); /* RMO buffer reserved for user-space RTAS use */ extern unsigned long rtas_rmo_buf; +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ -------------- next part -------------- ===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited ===== --- 1.11/drivers/pci/hotplug/rpaphp.h 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp.h 2005-01-20 17:25:37 -06:00 @@ -125,7 +125,8 @@ extern int rpaphp_enable_pci_slot(struct extern int register_pci_slot(struct slot *slot); extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); -extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); ===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited ===== --- 1.18/drivers/pci/hotplug/rpaphp_core.c 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp_core.c 2005-01-20 17:25:37 -06:00 @@ -443,12 +443,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } ===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited ===== --- 1.17/drivers/pci/hotplug/rpaphp_pci.c 2004-11-18 02:36:18 -06:00 +++ edited/drivers/pci/hotplug/rpaphp_pci.c 2005-02-03 18:40:27 -06:00 @@ -22,8 +22,12 @@ * Send feedback to * */ +#include +#include #include +#include #include +#include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -562,36 +576,229 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_search_bus_for_dev - return 1 if device is under this bus, else 0 + * @bus: the bus to search for this device. + * @dev: the pci device we are looking for. + */ +static int rpaphp_search_bus_for_dev (struct pci_bus *bus, struct pci_dev *dev) +{ + struct list_head *ln; + + if (!bus) return 0; + + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return 1; + if (pdev->subordinate) { + int rc; + rc = rpaphp_search_bus_for_dev (pdev->subordinate, dev); + if (rc) + return 1; + } + } + return 0; +} + +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; - struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB's don't have bridges. */ + if (slot->bridge == NULL) continue; - } + + /* The PCI device could be the slot itself. */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; - } - } + if (rpaphp_search_bus_for_dev (bus, dev)) + return slot; + } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ +int eeh_reset_device (struct pci_dev *dev, int reconfig) +{ + int freeze_count=0; + struct slot *frozen_slot; + + if (!dev) + return 1; + + frozen_slot = rpaphp_find_slot(dev); + if (!frozen_slot) + { + printk (KERN_ERR "EEH: Cannot find PCI slot for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + return 1; + } + + if (reconfig) rpaphp_unconfig_pci_adapter (frozen_slot); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (frozen_slot->dn->child); + + rtas_configure_bridge(frozen_slot->dn); + eeh_restore_bars(frozen_slot->dn->child); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + if (reconfig) { + ssleep (5); + rpaphp_enable_pci_slot (frozen_slot); + } + return 0; +} + +/* The longest amount of time to wait for a pci device + * to come back on line, in seconds. + */ +#define MAX_WAIT_FOR_RECOVERY 15 + +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + int freeze_count=0; + struct slot *frozen_slot; + struct eeh_event *event = ev; + struct pci_dev *dev = event->dev; + int perm_failure = 0; + int rc; + + if (!dev) + { + if (event->dn) + printk ("EEH: Cannot find the PCI device for dn %s\n", + event->dn->full_name); + else + printk ("EEH: EEH error caught, but no PCI device specified!\n"); + return 1; + } + + frozen_slot = rpaphp_find_slot(dev); + if (!frozen_slot) + { + printk (KERN_ERR "EEH: Cannot find PCI slot for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + return 1; + } + + /* We get "permanent failure" messages on empty slots. + * These are false alarms. Empty slots have no child dn. */ + if ((event->reset_state == 5) && (frozen_slot->dn->child == NULL)) + return 0; + + if (frozen_slot->dn->child) + freeze_count = frozen_slot->dn->child->eeh_freeze_count; + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) + perm_failure = 1; + + /* If the reset state is a '5' and the time to reset is 0 (infinity) + * or is more then 15 seconds, then mark this as a permanent failure. + */ + if ((event->reset_state == 5) && + ((event->time_unavail <= 0) || + (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) + perm_failure = 1; + + /* Log the error with the rtas logger. */ + if (perm_failure) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (dev), + pci_pretty_name (dev), + freeze_count); + + eeh_slot_error_detail (frozen_slot->dn->child, 2 /* Permanent Error */); + + /* Unconfigure the thing and go home. */ + rpaphp_unconfig_pci_adapter (frozen_slot); + return 1; + } else { + eeh_slot_error_detail (frozen_slot->dn->child, 1 /* Temporary Error */); + } + + printk (KERN_WARNING + "EEH: This device has failed %d times since last reboot: %s:%s\n", + freeze_count, + pci_name (dev), + pci_pretty_name (dev)); + + /* OK, firmware told us to wait. So wait */ + if (event->reset_state == 5) + msleep (event->time_unavail); + +if(strncmp (pci_pretty_name (event->dev), "Mylex Corporation Gemstone", 25)) { + rc = eeh_reset_device (event->dev, 1); +} + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + if (frozen_slot->dn->child) + frozen_slot->dn->child->eeh_freeze_count = freeze_count; + + return rc; +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + eeh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + eeh_unregister_notifier (&eeh_block); +} + From akpm at osdl.org Fri Feb 4 16:32:05 2005 From: akpm at osdl.org (Andrew Morton) Date: Thu, 3 Feb 2005 21:32:05 -0800 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 In-Reply-To: <1107222584.5906.43.camel@gaston> References: <1107222584.5906.43.camel@gaston> Message-ID: <20050203213205.58b5907d.akpm@osdl.org> Benjamin Herrenschmidt wrote: > > This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every > process space, similar to the x86 vsyscall page. erp. Do I need a toolchain upgrade? arch/ppc64/kernel/vdso32/gettimeofday.S: Assembler messages: arch/ppc64/kernel/vdso32/gettimeofday.S:27: Error: unknown pseudo-op: `.cfi_startproc' arch/ppc64/kernel/vdso32/gettimeofday.S:29: Error: unknown pseudo-op: `.cfi_register' arch/ppc64/kernel/vdso32/gettimeofday.S:68: Error: unknown pseudo-op: `.cfi_endproc' arch/ppc64/kernel/vdso32/gettimeofday.S:77: Error: unknown pseudo-op: `.cfi_startproc' arch/ppc64/kernel/vdso32/gettimeofday.S:139: Error: unknown pseudo-op: `.cfi_endproc' arch/ppc64/kernel/vdso64/gettimeofday.S: Assembler messages: arch/ppc64/kernel/vdso64/gettimeofday.S:27: Error: unknown pseudo-op: `.cfi_startproc' arch/ppc64/kernel/vdso64/gettimeofday.S:29: Error: unknown pseudo-op: `.cfi_register' arch/ppc64/kernel/vdso64/gettimeofday.S:53: Error: unknown pseudo-op: `.cfi_endproc' From amodra at bigpond.net.au Fri Feb 4 16:43:04 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Fri, 4 Feb 2005 16:13:04 +1030 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 In-Reply-To: <20050203213205.58b5907d.akpm@osdl.org> References: <1107222584.5906.43.camel@gaston> <20050203213205.58b5907d.akpm@osdl.org> Message-ID: <20050204054304.GL24757@bubble.modra.org> On Thu, Feb 03, 2005 at 09:32:05PM -0800, Andrew Morton wrote: > erp. Do I need a toolchain upgrade? Definitely. If your assembler doesn't understand .cfi_startproc it's so old it's mouldy. > arch/ppc64/kernel/vdso32/gettimeofday.S: Assembler messages: > arch/ppc64/kernel/vdso32/gettimeofday.S:27: Error: unknown pseudo-op: `.cfi_startproc' -- Alan Modra IBM OzLabs - Linux Technology Centre From ahuja at austin.ibm.com Fri Feb 4 17:06:12 2005 From: ahuja at austin.ibm.com (Manish Ahuja) Date: Fri, 04 Feb 2005 00:06:12 -0600 Subject: Collect real process and processor utilization values when virtualization is enabled. In-Reply-To: <16868.36168.772082.315933@cargo.ozlabs.ibm.com> References: <41E4787D.90309@austin.ibm.com> <16868.36168.772082.315933@cargo.ozlabs.ibm.com> Message-ID: <420310D4.4000400@austin.ibm.com> Cleaned up version .... Renamed variables... etc ----------------------- There is a requirement to collect real usage values of each partition in LPAR environment on pseries as well as iseries. This patch enables that feature. The current purr (processor Utilization register ) values of each of the processors is stored in a per_cpu data array. this is then summed and used to calculate various numbers for managing lpars. The patch also calculates how much real cpu time each process uses and stores this value in a ppc64 specific struct. The value is needed by CKRM to do further calculations. Signed-off-by: Manish Ahuja -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch4.txt Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050204/d0313236/attachment.txt From nathanl at austin.ibm.com Fri Feb 4 17:50:33 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 04 Feb 2005 00:50:33 -0600 Subject: [PATCH] make cpu hotplug play well with maxcpus and smt-enabled parameters Message-ID: <1107499833.7882.2.camel@biclops> This patch allows you to boot a pSeries system with maxcpus=x or smt-enabled=off (or both) and bring up the offline cpus later from userspace, assuming the kernel was built with CONFIG_HOTPLUG_CPU=y. - Record cpus which were started from OF in a cpu map and use that instead of system_state to decide how to start a cpu in smp_startup_cpu. - Change the smp bootup logic slightly so that the path for bringing up secondary threads is exactly the same as hotplugging a cpu later from userspace. - Add a new function to smp_ops - cpu_bootable. This is implemented only by pSeries to filter out secondary threads during boot with smt-enabled=off. Another way this could be done is to change the kick_cpu member to return int and we can check for this case in smp_pSeries_kick_cpu. - Remove the games we play with cpu_present_map and the hard_smp_processor_id to handle smt-enabled=off, since they're now unnecessary. - Remove find_physical_cpu_to_start; assigning threads to logical slots should be done at bootup and at DLPAR time, not during a cpu online operation. A couple of caveats: - You need up-to-date firmware on Power5 for the maxcpus option to work on systems with more than one cpu device node. Otherwise interrupts get misrouted, typically resulting in hangs or "unable to find root filesystem" problems. - This breaks cpu DLPAR in the sense that we need code such as what I posted last week to handle the addition of new cpu device nodes and update the paca and cpu_present_map. Tested on Power5 with and without CONFIG_HOTPLUG_CPU and with various combinations of the maxcpus= and smt-enabled= parameters. arch/ppc64/kernel/pSeries_smp.c | 131 +++++++++++--------------------- arch/ppc64/kernel/setup.c | 12 -- arch/ppc64/kernel/smp.c | 13 --- include/asm-ppc64/machdep.h | 1 4 files changed, 52 insertions(+), 105 deletions(-) Signed-off-by: Nathan Lynch Index: linux-2.6.11-rc3/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_smp.c 2005-02-04 00:40:22.097318813 -0600 +++ linux-2.6.11-rc3/arch/ppc64/kernel/pSeries_smp.c 2005-02-04 00:40:30.743338605 -0600 @@ -53,8 +53,16 @@ #define DBG(fmt...) #endif +/* + * The primary thread of each non-boot processor is recorded here before + * smp init. + */ +static cpumask_t of_spin_map; + extern void pSeries_secondary_smp_init(unsigned long); +#ifdef CONFIG_HOTPLUG_CPU + /* Get state of physical CPU. * Return codes: * 0 - The processor is in the RTAS stopped state @@ -81,9 +89,6 @@ static int query_cpu_stopped(unsigned in return cpu_status; } - -#ifdef CONFIG_HOTPLUG_CPU - int pSeries_cpu_disable(void) { systemcfg->processorCount--; @@ -121,61 +126,14 @@ void pSeries_cpu_die(unsigned int cpu) */ paca[cpu].cpu_start = 0; } - -/* Search all cpu device nodes for an offline logical cpu. If a - * device node has a "ibm,my-drc-index" property (meaning this is an - * LPAR), paranoid-check whether we own the cpu. For each "thread" - * of a cpu, if it is offline and has the same hw index as before, - * grab that in preference. - */ -static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) -{ - struct device_node *np = NULL; - unsigned int best = -1U; - - while ((np = of_find_node_by_type(np, "cpu"))) { - int nr_threads, len; - u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); - u32 *tid = (u32 *) - get_property(np, "ibm,ppc-interrupt-server#s", &len); - - if (!tid) - tid = (u32 *)get_property(np, "reg", &len); - - if (!tid) - continue; - - /* If there is a drc-index, make sure that we own - * the cpu. - */ - if (index) { - int state; - int rc = rtas_get_sensor(9003, *index, &state); - if (rc != 0 || state != 1) - continue; - } - - nr_threads = len / sizeof(u32); - - while (nr_threads--) { - if (0 == query_cpu_stopped(tid[nr_threads])) { - best = tid[nr_threads]; - if (best == old_hwindex) - goto out; - } - } - } -out: - of_node_put(np); - return best; -} +#endif /* CONFIG_HOTPLUG_CPU */ /** * smp_startup_cpu() - start the given cpu * - * At boot time, there is nothing to do. At run-time, call RTAS with - * the appropriate start location, if the cpu is in the RTAS stopped - * state. + * At boot time, there is nothing to do for primary threads which were + * started from Open Firmware. For anything else, call RTAS with the + * appropriate start location. * * Returns: * 0 - failure @@ -188,23 +146,15 @@ static inline int __devinit smp_startup_ pSeries_secondary_smp_init)); unsigned int pcpu; - /* At boot time the cpus are already spinning in hold - * loops, so nothing to do. */ - if (system_state < SYSTEM_RUNNING) + if (cpu_isset(lcpu, of_spin_map)) + /* Already started by OF and sitting in spin loop */ return 1; - pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); - if (pcpu == -1U) { - printk(KERN_INFO "No more cpus available, failing\n"); - return 0; - } + pcpu = get_hard_smp_processor_id(lcpu); /* Fixup atomic count: it exited inside IRQ handler. */ paca[lcpu].__current->thread_info->preempt_count = 0; - /* At boot this is done in prom.c. */ - paca[lcpu].hw_cpu_id = pcpu; - status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, pcpu, start_here, lcpu); if (status != 0) { @@ -213,12 +163,6 @@ static inline int __devinit smp_startup_ } return 1; } -#else /* ... CONFIG_HOTPLUG_CPU */ -static inline int __devinit smp_startup_cpu(unsigned int lcpu) -{ - return 1; -} -#endif /* CONFIG_HOTPLUG_CPU */ static inline void smp_xics_do_message(int cpu, int msg) { @@ -258,6 +202,8 @@ static void __devinit smp_xics_setup_cpu if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); + cpu_clear(cpu, of_spin_map); + /* * Put the calling processor into the GIQ. This is really only * necessary from a secondary thread as the OF start-cpu interface @@ -307,6 +253,20 @@ static void __devinit smp_pSeries_kick_c paca[nr].cpu_start = 1; } +static int smp_pSeries_cpu_bootable(unsigned int nr) +{ + /* Special case - we inhibit secondary thread startup + * during boot if the user requests it. Odd-numbered + * cpus are assumed to be secondary threads. + */ + if (system_state < SYSTEM_RUNNING && + cur_cpu_spec->cpu_features & CPU_FTR_SMT && + !smt_enabled_at_boot && nr % 2 != 0) + return 0; + + return 1; +} + static struct smp_ops_t pSeries_mpic_smp_ops = { .message_pass = smp_mpic_message_pass, .probe = smp_mpic_probe, @@ -319,12 +279,13 @@ static struct smp_ops_t pSeries_xics_smp .probe = smp_xics_probe, .kick_cpu = smp_pSeries_kick_cpu, .setup_cpu = smp_xics_setup_cpu, + .cpu_bootable = smp_pSeries_cpu_bootable, }; /* This is called very early */ void __init smp_init_pSeries(void) { - int ret, i; + int i; DBG(" -> smp_init_pSeries()\n"); @@ -338,20 +299,20 @@ void __init smp_init_pSeries(void) smp_ops->cpu_die = pSeries_cpu_die; #endif - /* Start secondary threads on SMT systems; primary threads - * are already in the running state. - */ - for_each_present_cpu(i) { - if (query_cpu_stopped(get_hard_smp_processor_id(i)) == 0) { - printk("%16.16x : starting thread\n", i); - DBG("%16.16x : starting thread\n", i); - rtas_call(rtas_token("start-cpu"), 3, 1, &ret, - get_hard_smp_processor_id(i), - __pa((u32)*((unsigned long *) - pSeries_secondary_smp_init)), - i); + /* Mark threads which are still spinning in hold loops. */ + if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + for_each_present_cpu(i) { + if (i % 2 == 0) + /* + * Even-numbered logical cpus correspond to + * primary threads. + */ + cpu_set(i, of_spin_map); } - } + else + of_spin_map = cpu_present_map; + + cpu_clear(boot_cpuid, of_spin_map); /* Non-lpar has additional take/give timebase */ if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) { Index: linux-2.6.11-rc3/include/asm-ppc64/machdep.h =================================================================== --- linux-2.6.11-rc3.orig/include/asm-ppc64/machdep.h 2005-02-04 00:40:22.097318813 -0600 +++ linux-2.6.11-rc3/include/asm-ppc64/machdep.h 2005-02-04 00:40:30.744334576 -0600 @@ -32,6 +32,7 @@ struct smp_ops_t { void (*give_timebase)(void); int (*cpu_disable)(void); void (*cpu_die)(unsigned int nr); + int (*cpu_bootable)(unsigned int nr); }; #endif Index: linux-2.6.11-rc3/arch/ppc64/kernel/smp.c =================================================================== --- linux-2.6.11-rc3.orig/arch/ppc64/kernel/smp.c 2005-02-04 00:40:22.097318813 -0600 +++ linux-2.6.11-rc3/arch/ppc64/kernel/smp.c 2005-02-04 00:40:30.744334576 -0600 @@ -410,9 +410,8 @@ int __devinit __cpu_up(unsigned int cpu) { int c; - /* At boot, don't bother with non-present cpus -JSCHOPP */ - if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) - return -ENOENT; + if (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)) + return -EINVAL; paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; @@ -526,14 +525,6 @@ void __init smp_cpus_done(unsigned int m smp_ops->setup_cpu(boot_cpuid); set_cpus_allowed(current, old_mask); - - /* - * We know at boot the maximum number of cpus we can add to - * a partition and set cpu_possible_map accordingly. cpu_present_map - * needs to match for the hotplug code to allow us to hot add - * any offline cpus. - */ - cpu_present_map = cpu_possible_map; } #ifdef CONFIG_HOTPLUG_CPU Index: linux-2.6.11-rc3/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-rc3.orig/arch/ppc64/kernel/setup.c 2005-02-04 00:40:22.097318813 -0600 +++ linux-2.6.11-rc3/arch/ppc64/kernel/setup.c 2005-02-04 00:40:30.745330546 -0600 @@ -268,15 +268,9 @@ static void __init setup_cpu_maps(void) nthreads = len / sizeof(u32); for (j = 0; j < nthreads && cpu < NR_CPUS; j++) { - /* - * Only spin up secondary threads if SMT is enabled. - * We must leave space in the logical map for the - * threads. - */ - if (j == 0 || smt_enabled_at_boot) { - cpu_set(cpu, cpu_present_map); - set_hard_smp_processor_id(cpu, intserv[j]); - } + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, intserv[j]); + if (intserv[j] == boot_cpuid_phys) swap_cpuid = cpu; cpu_set(cpu, cpu_possible_map); From olof at austin.ibm.com Fri Feb 4 18:22:54 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 4 Feb 2005 01:22:54 -0600 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro Message-ID: <20050204072254.GA17565@austin.ibm.com> Hi, It's getting pretty old to have see and type cur_cpu_spec->cpu_features & CPU_FTR_, when a shorter and less TLA-ridden macro is more readable. This also takes care of the differences between PPC and PPC64 cpu features for the common code; most places in PPC could be replaced with the macro as well. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc/kernel/ppc_htab.c | 8 +++--- linux-2.5-olof/arch/ppc/kernel/setup.c | 4 +-- linux-2.5-olof/arch/ppc/kernel/temp.c | 2 - linux-2.5-olof/arch/ppc/mm/mmu_decl.h | 2 - linux-2.5-olof/arch/ppc/mm/ppc_mmu.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_setup.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_smp.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/sandpoint.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/align.c | 2 - linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c | 2 - linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c | 2 - linux-2.5-olof/arch/ppc64/kernel/process.c | 4 +-- linux-2.5-olof/arch/ppc64/kernel/setup.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/smp.c | 2 - linux-2.5-olof/arch/ppc64/kernel/sysfs.c | 22 +++++++++---------- linux-2.5-olof/arch/ppc64/mm/hash_native.c | 14 ++++++------ linux-2.5-olof/arch/ppc64/mm/hash_utils.c | 2 - linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c | 2 - linux-2.5-olof/arch/ppc64/mm/init.c | 10 ++++---- linux-2.5-olof/arch/ppc64/mm/slb.c | 4 +-- linux-2.5-olof/arch/ppc64/mm/stab.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c | 2 - linux-2.5-olof/arch/ppc64/xmon/xmon.c | 8 +++--- linux-2.5-olof/drivers/macintosh/via-pmu.c | 2 - linux-2.5-olof/drivers/md/raid6altivec.uc | 2 - linux-2.5-olof/include/asm-ppc/cputable.h | 2 + linux-2.5-olof/include/asm-ppc64/cacheflush.h | 2 - linux-2.5-olof/include/asm-ppc64/cputable.h | 2 + linux-2.5-olof/include/asm-ppc64/mmu_context.h | 4 +-- linux-2.5-olof/include/asm-ppc64/page.h | 2 - 32 files changed, 70 insertions(+), 66 deletions(-) diff -puN include/asm-ppc64/cputable.h~cpu-has-feature include/asm-ppc64/cputable.h --- linux-2.5/include/asm-ppc64/cputable.h~cpu-has-feature 2005-02-04 00:33:25.000000000 -0600 +++ linux-2.5-olof/include/asm-ppc64/cputable.h 2005-02-04 00:33:26.000000000 -0600 @@ -66,6 +66,8 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec; +#define CPU_HAS_FEATURE(x) (cur_cpu_spec->cpu_features & CPU_FTR_##x) + /* firmware feature bitmask values */ #define FIRMWARE_MAX_FEATURES 63 diff -puN arch/ppc64/kernel/align.c~cpu-has-feature arch/ppc64/kernel/align.c --- linux-2.5/arch/ppc64/kernel/align.c~cpu-has-feature 2005-02-04 00:33:25.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/align.c 2005-02-04 00:33:26.000000000 -0600 @@ -238,7 +238,7 @@ fix_alignment(struct pt_regs *regs) dsisr = regs->dsisr; - if (cur_cpu_spec->cpu_features & CPU_FTR_NODSISRALIGN) { + if (CPU_HAS_FEATURE(NODSISRALIGN)) { unsigned int real_instr; if (__get_user(real_instr, (unsigned int __user *)regs->nip)) return 0; diff -puN arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature arch/ppc64/kernel/iSeries_setup.c --- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature 2005-02-04 00:33:25.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c 2005-02-04 00:33:26.000000000 -0600 @@ -267,7 +267,7 @@ unsigned long iSeries_process_mainstore_ unsigned long i; unsigned long mem_blocks = 0; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (CPU_HAS_FEATURE(SLB)) mem_blocks = iSeries_process_Regatta_mainstore_vpd(mb_array, max_entries); else diff -puN arch/ppc64/kernel/idle.c~cpu-has-feature arch/ppc64/kernel/idle.c diff -puN arch/ppc64/kernel/process.c~cpu-has-feature arch/ppc64/kernel/process.c --- linux-2.5/arch/ppc64/kernel/process.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/process.c 2005-02-04 00:33:26.000000000 -0600 @@ -388,12 +388,12 @@ copy_thread(int nr, unsigned long clone_ kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; p->thread.ksp = sp; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (CPU_HAS_FEATURE(SLB)) { unsigned long sp_vsid = get_kernel_vsid(sp); sp_vsid <<= SLB_VSID_SHIFT; sp_vsid |= SLB_VSID_KERNEL; - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (CPU_HAS_FEATURE(16M_PAGE)) sp_vsid |= SLB_VSID_L; p->thread.ksp_vsid = sp_vsid; diff -puN arch/ppc64/kernel/smp.c~cpu-has-feature arch/ppc64/kernel/smp.c --- linux-2.5/arch/ppc64/kernel/smp.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/smp.c 2005-02-04 00:33:26.000000000 -0600 @@ -416,7 +416,7 @@ int __devinit __cpu_up(unsigned int cpu) paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_SLB)) { + if (!CPU_HAS_FEATURE(SLB)) { void *tmp; /* maximum of 48 CPUs on machines with a segment table */ diff -puN arch/ppc64/kernel/sysfs.c~cpu-has-feature arch/ppc64/kernel/sysfs.c --- linux-2.5/arch/ppc64/kernel/sysfs.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/sysfs.c 2005-02-04 00:33:26.000000000 -0600 @@ -63,7 +63,7 @@ static int __init smt_setup(void) unsigned int *val; unsigned int cpu; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!CPU_HAS_FEATURE(SMT)) return 1; options = find_path_device("/options"); @@ -86,7 +86,7 @@ static int __init setup_smt_snooze_delay unsigned int cpu; int snooze; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!CPU_HAS_FEATURE(SMT)) return 1; smt_snooze_cmdline = 1; @@ -167,7 +167,7 @@ void ppc64_enable_pmcs(void) * On SMT machines we have to set the run latch in the ctrl register * in order to make PMC6 spin. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) { + if (CPU_HAS_FEATURE(SMT)) { ctrl = mfspr(CTRLF); ctrl |= RUNLATCH; mtspr(CTRLT, ctrl); @@ -266,7 +266,7 @@ static void register_cpu_online(unsigned struct sys_device *s = &c->sysdev; #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) sysdev_create_file(s, &attr_smt_snooze_delay); #endif @@ -275,7 +275,7 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_mmcr0); sysdev_create_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (CPU_HAS_FEATURE(MMCRA)) sysdev_create_file(s, &attr_mmcra); sysdev_create_file(s, &attr_pmc1); @@ -285,12 +285,12 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_pmc5); sysdev_create_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (CPU_HAS_FEATURE(PMC8)) { sysdev_create_file(s, &attr_pmc7); sysdev_create_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) sysdev_create_file(s, &attr_purr); } @@ -303,7 +303,7 @@ static void unregister_cpu_online(unsign BUG_ON(c->no_control); #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) sysdev_remove_file(s, &attr_smt_snooze_delay); #endif @@ -312,7 +312,7 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_mmcr0); sysdev_remove_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (CPU_HAS_FEATURE(MMCRA)) sysdev_remove_file(s, &attr_mmcra); sysdev_remove_file(s, &attr_pmc1); @@ -322,12 +322,12 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_pmc5); sysdev_remove_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (CPU_HAS_FEATURE(PMC8)) { sysdev_remove_file(s, &attr_pmc7); sysdev_remove_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) sysdev_remove_file(s, &attr_purr); } #endif /* CONFIG_HOTPLUG_CPU */ diff -puN arch/ppc64/mm/hash_native.c~cpu-has-feature arch/ppc64/mm/hash_native.c --- linux-2.5/arch/ppc64/mm/hash_native.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_native.c 2005-02-04 00:33:26.000000000 -0600 @@ -217,10 +217,10 @@ static long native_hpte_updatepp(unsigne } /* Ensure it is out of the tlb too */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (CPU_HAS_FEATURE(TLBIEL) && !large && local) { tlbiel(va); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !CPU_HAS_FEATURE(LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); @@ -245,7 +245,7 @@ static void native_hpte_updateboltedpp(u unsigned long vsid, va, vpn, flags = 0; long slot; HPTE *hptep; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !CPU_HAS_FEATURE(LOCKLESS_TLBIE); vsid = get_kernel_vsid(ea); va = (vsid << 28) | (ea & 0x0fffffff); @@ -273,7 +273,7 @@ static void native_hpte_invalidate(unsig Hpte_dword0 dw0; unsigned long avpn = va >> 23; unsigned long flags; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !CPU_HAS_FEATURE(LOCKLESS_TLBIE); if (large) avpn &= ~0x1UL; @@ -292,7 +292,7 @@ static void native_hpte_invalidate(unsig } /* Invalidate the tlb */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (CPU_HAS_FEATURE(TLBIEL) && !large && local) { tlbiel(va); } else { if (lock_tlbie) @@ -360,7 +360,7 @@ static void native_flush_hash_range(unsi j++; } - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (CPU_HAS_FEATURE(TLBIEL) && !large && local) { asm volatile("ptesync":::"memory"); for (i = 0; i < j; i++) @@ -368,7 +368,7 @@ static void native_flush_hash_range(unsi asm volatile("ptesync":::"memory"); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !CPU_HAS_FEATURE(LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); diff -puN arch/ppc64/mm/hash_utils.c~cpu-has-feature arch/ppc64/mm/hash_utils.c --- linux-2.5/arch/ppc64/mm/hash_utils.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_utils.c 2005-02-04 00:33:26.000000000 -0600 @@ -190,7 +190,7 @@ void __init htab_initialize(void) * _NOT_ map it to avoid cache paradoxes as it's remapped non * cacheable later on */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (CPU_HAS_FEATURE(16M_PAGE)) use_largepages = 1; /* create bolted the linear mapping in the hash table */ diff -puN arch/ppc64/mm/hugetlbpage.c~cpu-has-feature arch/ppc64/mm/hugetlbpage.c --- linux-2.5/arch/ppc64/mm/hugetlbpage.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c 2005-02-04 00:33:26.000000000 -0600 @@ -705,7 +705,7 @@ unsigned long hugetlb_get_unmapped_area( if (len & ~HPAGE_MASK) return -EINVAL; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)) + if (!CPU_HAS_FEATURE(16M_PAGE)) return -EINVAL; if (test_thread_flag(TIF_32BIT)) { diff -puN arch/ppc64/mm/init.c~cpu-has-feature arch/ppc64/mm/init.c --- linux-2.5/arch/ppc64/mm/init.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/init.c 2005-02-04 00:33:26.000000000 -0600 @@ -752,7 +752,7 @@ void __init mem_init(void) */ void flush_dcache_page(struct page *page) { - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (CPU_HAS_FEATURE(COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ if (test_bit(PG_arch_1, &page->flags)) @@ -763,7 +763,7 @@ void clear_user_page(void *page, unsigne { clear_page(page); - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (CPU_HAS_FEATURE(COHERENT_ICACHE)) return; /* * We shouldnt have to do this, but some versions of glibc @@ -796,7 +796,7 @@ void copy_user_page(void *vto, void *vfr return; #endif - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (CPU_HAS_FEATURE(COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ @@ -832,8 +832,8 @@ void update_mmu_cache(struct vm_area_str unsigned long flags; /* handle i-cache coherency */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) && - !(cur_cpu_spec->cpu_features & CPU_FTR_NOEXECUTE)) { + if (!CPU_HAS_FEATURE(COHERENT_ICACHE) && + !CPU_HAS_FEATURE(NOEXECUTE)) { unsigned long pfn = pte_pfn(pte); if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); diff -puN arch/ppc64/mm/slb.c~cpu-has-feature arch/ppc64/mm/slb.c --- linux-2.5/arch/ppc64/mm/slb.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/slb.c 2005-02-04 00:33:26.000000000 -0600 @@ -51,7 +51,7 @@ static void slb_flush_and_rebolt(void) WARN_ON(!irqs_disabled()); - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (CPU_HAS_FEATURE(16M_PAGE)) ksp_flags |= SLB_VSID_L; ksp_esid_data = mk_esid_data(get_paca()->kstack, 2); @@ -139,7 +139,7 @@ void slb_initialize(void) unsigned long flags = SLB_VSID_KERNEL; /* Invalidate the entire SLB (even slot 0) & all the ERATS */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (CPU_HAS_FEATURE(16M_PAGE)) flags |= SLB_VSID_L; asm volatile("isync":::"memory"); diff -puN arch/ppc64/mm/stab.c~cpu-has-feature arch/ppc64/mm/stab.c --- linux-2.5/arch/ppc64/mm/stab.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/mm/stab.c 2005-02-04 00:33:26.000000000 -0600 @@ -227,7 +227,7 @@ void stab_initialize(unsigned long stab) { unsigned long vsid = get_kernel_vsid(KERNELBASE); - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (CPU_HAS_FEATURE(SLB)) { slb_initialize(); } else { asm volatile("isync; slbia; isync":::"memory"); diff -puN arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature arch/ppc64/oprofile/op_model_power4.c --- linux-2.5/arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c 2005-02-04 00:33:26.000000000 -0600 @@ -54,7 +54,7 @@ static void power4_reg_setup(struct op_c * * It has been verified to work on POWER5 so we enable it there. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA_SIHV) + if (CPU_HAS_FEATURE(MMCRA_SIHV)) mmcra_has_sihv = 1; /* diff -puN arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature arch/ppc64/oprofile/op_model_rs64.c --- linux-2.5/arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c 2005-02-04 00:33:26.000000000 -0600 @@ -114,7 +114,7 @@ static void rs64_cpu_setup(void *unused) /* reset MMCR1, MMCRA */ mtspr(SPRN_MMCR1, 0); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (CPU_HAS_FEATURE(MMCRA)) mtspr(SPRN_MMCRA, 0); mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; diff -puN arch/ppc64/xmon/xmon.c~cpu-has-feature arch/ppc64/xmon/xmon.c --- linux-2.5/arch/ppc64/xmon/xmon.c~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/xmon/xmon.c 2005-02-04 00:33:26.000000000 -0600 @@ -723,7 +723,7 @@ static void insert_cpu_bpts(void) { if (dabr.enabled) set_controlled_dabr(dabr.address | (dabr.enabled & 7)); - if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (iabr && CPU_HAS_FEATURE(IABR)) set_iabr(iabr->address | (iabr->enabled & (BP_IABR|BP_IABR_TE))); } @@ -751,7 +751,7 @@ static void remove_bpts(void) static void remove_cpu_bpts(void) { set_controlled_dabr(0); - if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (CPU_HAS_FEATURE(IABR)) set_iabr(0); } @@ -1098,7 +1098,7 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { + if (!CPU_HAS_FEATURE(IABR)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); break; @@ -2496,7 +2496,7 @@ void xmon_init(void) void dump_segments(void) { - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (CPU_HAS_FEATURE(SLB)) dump_slb(); else dump_stab(); diff -puN include/asm-ppc64/cacheflush.h~cpu-has-feature include/asm-ppc64/cacheflush.h --- linux-2.5/include/asm-ppc64/cacheflush.h~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/include/asm-ppc64/cacheflush.h 2005-02-04 00:33:26.000000000 -0600 @@ -40,7 +40,7 @@ extern void __flush_dcache_icache(void * static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)) + if (!CPU_HAS_FEATURE(COHERENT_ICACHE)) __flush_icache_range(start, stop); } diff -puN include/asm-ppc64/mmu_context.h~cpu-has-feature include/asm-ppc64/mmu_context.h --- linux-2.5/include/asm-ppc64/mmu_context.h~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/include/asm-ppc64/mmu_context.h 2005-02-04 00:33:26.000000000 -0600 @@ -59,11 +59,11 @@ static inline void switch_mm(struct mm_s return; #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (CPU_HAS_FEATURE(ALTIVEC)) asm volatile ("dssall"); #endif /* CONFIG_ALTIVEC */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (CPU_HAS_FEATURE(SLB)) switch_slb(tsk, next); else switch_stab(tsk, next); diff -puN include/asm-ppc64/page.h~cpu-has-feature include/asm-ppc64/page.h --- linux-2.5/include/asm-ppc64/page.h~cpu-has-feature 2005-02-04 00:33:26.000000000 -0600 +++ linux-2.5-olof/include/asm-ppc64/page.h 2005-02-04 00:33:26.000000000 -0600 @@ -67,7 +67,7 @@ #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #define in_hugepage_area(context, addr) \ - ((cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) && \ + (CPU_HAS_FEATURE(16M_PAGE) && \ ( (((addr) >= TASK_HPAGE_BASE) && ((addr) < TASK_HPAGE_END)) || \ ( ((addr) < 0x100000000L) && \ ((1 << GET_ESID(addr)) & (context).htlb_segs) ) ) ) diff -puN arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature arch/ppc64/kernel/pSeries_lpar.c --- linux-2.5/arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature 2005-02-04 00:34:36.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c 2005-02-04 00:34:52.000000000 -0600 @@ -505,7 +505,7 @@ void pSeries_lpar_flush_hash_range(unsig int i; unsigned long flags = 0; struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !CPU_HAS_FEATURE(LOCKLESS_TLBIE); if (lock_tlbie) spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); diff -puN arch/ppc64/kernel/setup.c~cpu-has-feature arch/ppc64/kernel/setup.c --- linux-2.5/arch/ppc64/kernel/setup.c~cpu-has-feature 2005-02-04 00:35:01.000000000 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/setup.c 2005-02-04 00:35:41.000000000 -0600 @@ -315,7 +315,7 @@ static void __init setup_cpu_maps(void) maxcpus = ireg[num_addr_cell + num_size_cell]; /* Double maxcpus for processors which have SMT capability */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) maxcpus *= 2; if (maxcpus > NR_CPUS) { @@ -339,7 +339,7 @@ static void __init setup_cpu_maps(void) */ for_each_cpu(cpu) { cpu_set(cpu, cpu_sibling_map[cpu]); - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (CPU_HAS_FEATURE(SMT)) cpu_set(cpu ^ 0x1, cpu_sibling_map[cpu]); } @@ -767,7 +767,7 @@ static int show_cpuinfo(struct seq_file seq_printf(m, "unknown (%08x)", pvr); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (CPU_HAS_FEATURE(ALTIVEC)) seq_printf(m, ", altivec supported"); #endif /* CONFIG_ALTIVEC */ diff -puN drivers/macintosh/via-pmu.c~cpu-has-feature drivers/macintosh/via-pmu.c --- linux-2.5/drivers/macintosh/via-pmu.c~cpu-has-feature 2005-02-04 00:35:56.000000000 -0600 +++ linux-2.5-olof/drivers/macintosh/via-pmu.c 2005-02-04 00:36:28.000000000 -0600 @@ -2389,7 +2389,7 @@ pmac_suspend_devices(void) enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (CPU_HAS_FEATURE(ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN include/asm-ppc/cputable.h~cpu-has-feature include/asm-ppc/cputable.h --- linux-2.5/include/asm-ppc/cputable.h~cpu-has-feature 2005-02-04 00:37:02.000000000 -0600 +++ linux-2.5-olof/include/asm-ppc/cputable.h 2005-02-04 00:40:29.000000000 -0600 @@ -61,6 +61,8 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec[]; +#define CPU_HAS_FEATURE(x) (cur_cpu_spec[0]->cpu_features & CPU_FTR_##x) + #endif /* __ASSEMBLY__ */ /* CPU kernel features */ diff -puN arch/ppc/mm/ppc_mmu.c~cpu-has-feature arch/ppc/mm/ppc_mmu.c --- linux-2.5/arch/ppc/mm/ppc_mmu.c~cpu-has-feature 2005-02-04 00:51:34.000000000 -0600 +++ linux-2.5-olof/arch/ppc/mm/ppc_mmu.c 2005-02-04 00:52:27.000000000 -0600 @@ -138,7 +138,7 @@ void __init setbat(int index, unsigned l union ubat *bat = BATS[index]; if (((flags & _PAGE_NO_CACHE) == 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_NEED_COHERENT)) + CPU_HAS_FEATURE(NEED_COHERENT)) flags |= _PAGE_COHERENT; bl = (size >> 17) - 1; @@ -191,7 +191,7 @@ void __init MMU_init_hw(void) extern unsigned int hash_page[]; extern unsigned int flush_hash_patch_A[], flush_hash_patch_B[]; - if ((cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE) == 0) { + if (!CPU_HAS_FEATURE(HPTE_TABLE)) { /* * Put a blr (procedure return) instruction at the * start of hash_page, since we can still get DSI diff -puN arch/ppc/mm/mmu_decl.h~cpu-has-feature arch/ppc/mm/mmu_decl.h --- linux-2.5/arch/ppc/mm/mmu_decl.h~cpu-has-feature 2005-02-04 00:52:45.000000000 -0600 +++ linux-2.5-olof/arch/ppc/mm/mmu_decl.h 2005-02-04 00:53:03.000000000 -0600 @@ -75,7 +75,7 @@ static inline void flush_HPTE(unsigned c unsigned long pdval) { if ((Hash != 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE)) + CPU_HAS_FEATURE(HPTE_TABLE)) flush_hash_pages(0, va, pdval, 1); else _tlbie(va); diff -puN arch/ppc/kernel/setup.c~cpu-has-feature arch/ppc/kernel/setup.c --- linux-2.5/arch/ppc/kernel/setup.c~cpu-has-feature 2005-02-04 00:54:27.000000000 -0600 +++ linux-2.5-olof/arch/ppc/kernel/setup.c 2005-02-04 00:55:17.000000000 -0600 @@ -619,7 +619,7 @@ machine_init(unsigned long r3, unsigned /* Checks "l2cr=xxxx" command-line option */ int __init ppc_setup_l2cr(char *str) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (CPU_HAS_FEATURE(L2CR)) { unsigned long val = simple_strtoul(str, NULL, 0); printk(KERN_INFO "l2cr set to %lx\n", val); _set_L2CR(0); /* force invalidate by disable cache */ @@ -720,7 +720,7 @@ void __init setup_arch(char **cmdline_p) * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPLIT_ID_CACHE) { + if (CPU_HAS_FEATURE(SPLIT_ID_CACHE)) { dcache_bsize = cur_cpu_spec[0]->dcache_bsize; icache_bsize = cur_cpu_spec[0]->icache_bsize; ucache_bsize = 0; diff -puN arch/ppc/kernel/temp.c~cpu-has-feature arch/ppc/kernel/temp.c --- linux-2.5/arch/ppc/kernel/temp.c~cpu-has-feature 2005-02-04 00:55:40.000000000 -0600 +++ linux-2.5-olof/arch/ppc/kernel/temp.c 2005-02-04 00:56:17.000000000 -0600 @@ -223,7 +223,7 @@ int __init TAU_init(void) /* We assume in SMP that if one CPU has TAU support, they * all have it --BenH */ - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_TAU)) { + if (!CPU_HAS_FEATURE(TAU)) { printk("Thermal assist unit not available\n"); tau_initialized = 0; return 1; diff -puN arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature arch/ppc/platforms/pmac_cpufreq.c --- linux-2.5/arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature 2005-02-04 00:56:40.000000000 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c 2005-02-04 00:57:21.000000000 -0600 @@ -230,7 +230,7 @@ static int __pmac pmu_set_cpu_speed(int enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (CPU_HAS_FEATURE(ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN arch/ppc/platforms/pmac_setup.c~cpu-has-feature arch/ppc/platforms/pmac_setup.c --- linux-2.5/arch/ppc/platforms/pmac_setup.c~cpu-has-feature 2005-02-04 00:56:44.000000000 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_setup.c 2005-02-04 00:57:33.000000000 -0600 @@ -274,7 +274,7 @@ pmac_setup_arch(void) pmac_find_bridges(); /* Checks "l2cr-value" property in the registry */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (CPU_HAS_FEATURE(L2CR)) { struct device_node *np = find_devices("cpus"); if (np == 0) np = find_type_devices("cpu"); diff -puN arch/ppc/platforms/pmac_smp.c~cpu-has-feature arch/ppc/platforms/pmac_smp.c --- linux-2.5/arch/ppc/platforms/pmac_smp.c~cpu-has-feature 2005-02-04 00:56:46.000000000 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_smp.c 2005-02-04 00:57:55.000000000 -0600 @@ -119,7 +119,7 @@ static volatile int sec_tb_reset = 0; static void __init core99_init_caches(int cpu) { - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!CPU_HAS_FEATURE(L2CR)) return; if (cpu == 0) { @@ -132,7 +132,7 @@ static void __init core99_init_caches(in printk("CPU%d: L2CR set to %lx\n", cpu, core99_l2_cache); } - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR)) + if (!CPU_HAS_FEATURE(L3CR)) return; if (cpu == 0){ diff -puN arch/ppc/platforms/sandpoint.c~cpu-has-feature arch/ppc/platforms/sandpoint.c --- linux-2.5/arch/ppc/platforms/sandpoint.c~cpu-has-feature 2005-02-04 00:56:53.000000000 -0600 +++ linux-2.5-olof/arch/ppc/platforms/sandpoint.c 2005-02-04 00:58:28.000000000 -0600 @@ -319,10 +319,10 @@ sandpoint_setup_arch(void) * We will do this now with good known values. Future versions * of DINK32 are supposed to get this correct. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPEC7450) + if (CPU_HAS_FEATURE(SPEC7450)) /* 745x is different. We only want to pass along enable. */ _set_L2CR(L2CR_L2E); - else if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) + else if (CPU_HAS_FEATURE(L2CR)) /* All modules have 1MB of L2. We also assume that an * L2 divisor of 3 will work. */ @@ -330,7 +330,7 @@ sandpoint_setup_arch(void) | L2CR_L2RAM_PIPE | L2CR_L2OH_1_0 | L2CR_L2DF); #if 0 /* Untested right now. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR) { + if (CPU_HAS_FEATURE(L3CR)) { /* Magic value. */ _set_L3CR(0x8f032000); } diff -puN arch/ppc/kernel/ppc_htab.c~cpu-has-feature arch/ppc/kernel/ppc_htab.c --- linux-2.5/arch/ppc/kernel/ppc_htab.c~cpu-has-feature 2005-02-04 00:59:10.000000000 -0600 +++ linux-2.5-olof/arch/ppc/kernel/ppc_htab.c 2005-02-04 01:00:12.000000000 -0600 @@ -108,7 +108,7 @@ static int ppc_htab_show(struct seq_file PTE *ptr; #endif /* CONFIG_PPC_STD_MMU */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (CPU_HAS_FEATURE(604_PERF_MON)) { mmcr0 = mfspr(SPRN_MMCR0); pmc1 = mfspr(SPRN_PMC1); pmc2 = mfspr(SPRN_PMC2); @@ -209,7 +209,7 @@ static ssize_t ppc_htab_write(struct fil if ( !strncmp( buffer, "reset", 5) ) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (CPU_HAS_FEATURE(604_PERF_MON)) { /* reset PMC1 and PMC2 */ mtspr(SPRN_PMC1, 0); mtspr(SPRN_PMC2, 0); @@ -221,7 +221,7 @@ static ssize_t ppc_htab_write(struct fil } /* Everything below here requires the performance monitor feature. */ - if ( !cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON ) + if (!CPU_HAS_FEATURE(604_PERF_MON)) return count; /* turn off performance monitoring */ @@ -339,7 +339,7 @@ int proc_dol2crvec(ctl_table *table, int "0.5", "1.0", "(reserved2)", "(reserved3)" }; - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!CPU_HAS_FEATURE(L2CR)) return -EFAULT; if ( /*!table->maxlen ||*/ (*ppos && !write)) { diff -puN drivers/md/raid6altivec.uc~cpu-has-feature drivers/md/raid6altivec.uc --- linux-2.5/drivers/md/raid6altivec.uc~cpu-has-feature 2005-02-04 01:08:58.808596448 -0600 +++ linux-2.5-olof/drivers/md/raid6altivec.uc 2005-02-04 01:09:35.001094352 -0600 @@ -108,7 +108,7 @@ int raid6_have_altivec(void); int raid6_have_altivec(void) { /* This assumes either all CPUs have Altivec or none does */ - return cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC; + return CPU_HAS_FEATURE(ALTIVEC): } #endif _ From penberg at gmail.com Fri Feb 4 19:17:48 2005 From: penberg at gmail.com (Pekka Enberg) Date: Fri, 4 Feb 2005 10:17:48 +0200 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204072254.GA17565@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> Message-ID: <84144f0205020400172d89eddf@mail.gmail.com> Hi, On Fri, 4 Feb 2005 01:22:54 -0600, Olof Johansson wrote: > +#define CPU_HAS_FEATURE(x) (cur_cpu_spec->cpu_features & CPU_FTR_##x) > + Please drop the CPU_FTR_##x macro magic as it makes grepping more complicated. If the enum names are too long, just do s/CPU_FTR_/CPU_/g or something similar. Also, could you please make this a static inline function? Pekka From arnd at arndb.de Fri Feb 4 23:36:55 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 4 Feb 2005 13:36:55 +0100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204072254.GA17565@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> Message-ID: <200502041336.59892.arnd@arndb.de> On Freedag 04 Februar 2005 08:22, Olof Johansson wrote: > It's getting pretty old to have see and type cur_cpu_spec->cpu_features > & CPU_FTR_, when a shorter and less TLA-ridden macro is more > readable. > > This also takes care of the differences between PPC and PPC64 cpu > features for the common code; most places in PPC could be replaced with > the macro as well. I have a somewhat similar patch that does the same to the systemcfg->platform checks. I'm not sure if we should use the same inline function for both checks, but I do think that they should be used in a similar way, e.g. CPU_HAS_FEATURE(x) and PLATFORM_HAS_FEATURE(x). My implementation of the platform checks tries to be extra clever by turning runtime checks into compile time checks if possible. This reduces code size and in some cases execution speed. It can also be used to replace compile time checks, i.e. it allows us to write static inline unsigned int readl(const volatile void __iomem *addr) { if (platform_is(PLATFORM_PPC_ISERIES)) return iSeries_readl(addr); if (platform_possible(PLATFORM_PPC_PSERIES)) return eeh_readl(addr); return in_le32(); } which will always result in the shortest code for any combination of CONFIG_PPC_ISERIES, CONFIG_PPC_PSERIES and the other platforms. The required code for this is roughly enum { PPC64_PLATFORM_POSSIBLE = #ifdef CONFIG_PPC_ISERIES PLATFORM_ISERIES | #endif #ifdef CONFIG_PPC_PSERIES PLATFORM_PSERIES | #endif #ifdef CONFIG_PPC_PSERIES PLATFORM_PSERIES_LPAR | #endif #ifdef CONFIG_PPC_POWERMAC PLATFORM_POWERMAC | #endif #ifdef CONFIG_PPC_MAPLE PLATFORM_MAPLE | #endif 0, PPC64_PLATFORM_ONLY = #ifdef CONFIG_PPC_ISERIES PLATFORM_ISERIES & #endif #ifdef CONFIG_PPC_PSERIES PLATFORM_PSERIES & #endif #ifdef CONFIG_PPC_POWERMAC PLATFORM_POWERMAC & #endif #ifdef CONFIG_PPC_MAPLE PLATFORM_MAPLE & #endif -1ul, }; static inline platform_is(unsigned long platform) { return ((PPC64_PLATFORM_ONLY & platform) || (PPC64_PLATFORM_POSSIBLE & platform & systemcfg->platform)); } static inline platform_possible(unsigned long platform) { reutrn !!(PPC64_PLATFORM_POSSIBLE & platform); } The same stuff is obviously possible for cur_cpu_spec->cpu_features as well. Do you think that it will help there? Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050204/7e91e399/attachment.pgp From trini at kernel.crashing.org Sat Feb 5 01:45:32 2005 From: trini at kernel.crashing.org (Tom Rini) Date: Fri, 4 Feb 2005 07:45:32 -0700 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204072254.GA17565@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> Message-ID: <20050204144532.GN15359@smtp.west.cox.net> On Fri, Feb 04, 2005 at 01:22:54AM -0600, Olof Johansson wrote: > Hi, > > It's getting pretty old to have see and type cur_cpu_spec->cpu_features > & CPU_FTR_, when a shorter and less TLA-ridden macro is more > readable. > > This also takes care of the differences between PPC and PPC64 cpu > features for the common code; most places in PPC could be replaced with > the macro as well. It'd be nice if someone went and changed ppc32's cpu feature from an array and matched ppc64, while we're in here... -- Tom Rini http://gate.crashing.org/~trini/ From olof at austin.ibm.com Sat Feb 5 04:20:41 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 4 Feb 2005 11:20:41 -0600 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <84144f0205020400172d89eddf@mail.gmail.com> References: <20050204072254.GA17565@austin.ibm.com> <84144f0205020400172d89eddf@mail.gmail.com> Message-ID: <20050204172041.GA17586@austin.ibm.com> On Fri, Feb 04, 2005 at 10:17:48AM +0200, Pekka Enberg wrote: > Please drop the CPU_FTR_##x macro magic as it makes grepping more > complicated. If the enum names are too long, just do s/CPU_FTR_/CPU_/g > or something similar. Also, could you please make this a static inline > function? I considered that for a while, but decided against it because: * cpu-has-feature(cpu-feature-foo) v cpu-has-feature(foo): I picked the latter for readability. * Renaming CPU_FTR_ -> CPU_ makes it less obvious that it's actually a cpu feature it's describing (i.e. CPU_ALTIVEC vs CPU_FTR_ALTIVEC). * Renaming would clobber the namespace, CPU_* definitions are used in other places in the tree. * Can't make it an inline and still use the preprocessor concatenation. That being said, you do have a point about grepability. However, personally I'd be more likely to look for CPU_HAS_FEATURE than the feature itself when reading the code, and would find that easily. The other way around (finding all uses of a feature) is harder, but the concatenation macro is right below the bit definitions and easy to spot. -Olof From olof at austin.ibm.com Sat Feb 5 05:35:14 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 4 Feb 2005 12:35:14 -0600 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <200502041336.59892.arnd@arndb.de> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> Message-ID: <20050204183514.GB17586@austin.ibm.com> On Fri, Feb 04, 2005 at 01:36:55PM +0100, Arnd Bergmann wrote: > I have a somewhat similar patch that does the same to the > systemcfg->platform checks. I'm not sure if we should use the same inline > function for both checks, but I do think that they should be used in a > similar way, e.g. CPU_HAS_FEATURE(x) and PLATFORM_HAS_FEATURE(x). Yep. Firmware features are also on the list. I figured I'd do CPU features first though since they are the ones that started bugging me. > The same stuff is obviously possible for cur_cpu_spec->cpu_features as well. > Do you think that it will help there? Nice. It won't be quite as easy to do compile-time for cpu features. pSeries will need all cpus enabled since we have them all on various machines, etc. I guess Powermac/Maple could benefit from it. In the end it depends on how hairy the implementation would get vs performance improvement. -Olof From arnd at arndb.de Sat Feb 5 05:57:06 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 4 Feb 2005 19:57:06 +0100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204183514.GB17586@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> <20050204183514.GB17586@austin.ibm.com> Message-ID: <200502041957.06979.arnd@arndb.de> On Freedag 04 Februar 2005 19:35, Olof Johansson wrote: > pSeries will need all cpus enabled since we have them all on various > machines, etc. I guess Powermac/Maple could benefit from it. Even on pSeries, we already have CONFIG_POWER4_ONLY, which could be used to optimize away some of the checks at compile time. I think it makes sense to extend this a bit to look more like the CPU selection on i386 or s390 where can set the oldest CPU you want to support. This also fits nicely with the gcc -mcpu= options. > In the end it depends on how hairy the implementation would get vs > performance improvement. Fortunately, that optimization should be easy to do on top of your patch, so we don't have to decide now. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050204/bc4e5ab1/attachment.pgp From cfriesen at nortel.com Sat Feb 5 07:14:11 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Fri, 04 Feb 2005 14:14:11 -0600 Subject: question on symbol exports In-Reply-To: <20050204203050.GA5889@dmt.cnet> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> Message-ID: <4203D793.1040604@nortel.com> I've added the ppc64 list to the addressees, in case they are interested. Marcelo Tosatti wrote: > On Tue, Feb 01, 2005 at 04:50:16PM +0100, Arjan van de Ven wrote: >>afaik one doesn't need to do a tlb flush in code that clears the dirty >>bit, as long as you use the proper vm functions to do so. >>(if those need a tlb flush, those are supposed to do that for you >>afaik). > Yep, and "proper VM function" is include/asm-generic/pgtable.h::ptep_clear_flush_dirty(), > which on PPC flushes the TLB. It turns out that to call ptep_clear_flush_dirty() on ppc64 from a module I needed to export the following symbols: __flush_tlb_pending ppc64_tlb_batch hpte_update >>Also note that your code isn't dealing with 4 level pagetables.... And >>pagetable walking in drivers is basically almost always a mistake and a >>sign that something is wrong. > Or a sign that the core kernel lacks helper functions :) Absolutely. It'd be so nice if there was a simple va_to_ptep() helper function available. Chris From benh at kernel.crashing.org Sat Feb 5 10:49:49 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Feb 2005 10:49:49 +1100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <200502041336.59892.arnd@arndb.de> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> Message-ID: <1107560989.2189.119.camel@gaston> On Fri, 2005-02-04 at 13:36 +0100, Arnd Bergmann wrote: > On Freedag 04 Februar 2005 08:22, Olof Johansson wrote: > > It's getting pretty old to have see and type cur_cpu_spec->cpu_features > > & CPU_FTR_, when a shorter and less TLA-ridden macro is more > > readable. > > > > This also takes care of the differences between PPC and PPC64 cpu > > features for the common code; most places in PPC could be replaced with > > the macro as well. > > I have a somewhat similar patch that does the same to the > systemcfg->platform checks. I'm not sure if we should use the same inline > function for both checks, but I do think that they should be used in a > similar way, e.g. CPU_HAS_FEATURE(x) and PLATFORM_HAS_FEATURE(x). Note that I would prefer cpu_has_feature(), it doesn't strictly have to be a macro and has function semantics anyway. > My implementation of the platform checks tries to be extra clever by turning > runtime checks into compile time checks if possible. This reduces code size > and in some cases execution speed. It can also be used to replace compile > time checks, i.e. it allows us to write > > static inline unsigned int readl(const volatile void __iomem *addr) > { > if (platform_is(PLATFORM_PPC_ISERIES)) > return iSeries_readl(addr); > if (platform_possible(PLATFORM_PPC_PSERIES)) > return eeh_readl(addr); > return in_le32(); > } > > which will always result in the shortest code for any combination of > CONFIG_PPC_ISERIES, CONFIG_PPC_PSERIES and the other platforms. That's a good idea ! > The required code for this is roughly > > enum { > PPC64_PLATFORM_POSSIBLE = > #ifdef CONFIG_PPC_ISERIES > PLATFORM_ISERIES | > #endif > #ifdef CONFIG_PPC_PSERIES > PLATFORM_PSERIES | > #endif > #ifdef CONFIG_PPC_PSERIES > PLATFORM_PSERIES_LPAR | > #endif > #ifdef CONFIG_PPC_POWERMAC > PLATFORM_POWERMAC | > #endif > #ifdef CONFIG_PPC_MAPLE > PLATFORM_MAPLE | > #endif > 0, > PPC64_PLATFORM_ONLY = > #ifdef CONFIG_PPC_ISERIES > PLATFORM_ISERIES & > #endif > #ifdef CONFIG_PPC_PSERIES > PLATFORM_PSERIES & > #endif > #ifdef CONFIG_PPC_POWERMAC > PLATFORM_POWERMAC & > #endif > #ifdef CONFIG_PPC_MAPLE > PLATFORM_MAPLE & > #endif > -1ul, > }; > > static inline platform_is(unsigned long platform) > { > return ((PPC64_PLATFORM_ONLY & platform) > || (PPC64_PLATFORM_POSSIBLE & platform & systemcfg->platform)); > } > > static inline platform_possible(unsigned long platform) > { > reutrn !!(PPC64_PLATFORM_POSSIBLE & platform); > } > > The same stuff is obviously possible for cur_cpu_spec->cpu_features as well. > Do you think that it will help there? > > Arnd <>< > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -- Benjamin Herrenschmidt From benh at kernel.crashing.org Sat Feb 5 10:50:44 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Feb 2005 10:50:44 +1100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204183514.GB17586@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> <20050204183514.GB17586@austin.ibm.com> Message-ID: <1107561044.2189.120.camel@gaston> On Fri, 2005-02-04 at 12:35 -0600, Olof Johansson wrote: > On Fri, Feb 04, 2005 at 01:36:55PM +0100, Arnd Bergmann wrote: > > I have a somewhat similar patch that does the same to the > > systemcfg->platform checks. I'm not sure if we should use the same inline > > function for both checks, but I do think that they should be used in a > > similar way, e.g. CPU_HAS_FEATURE(x) and PLATFORM_HAS_FEATURE(x). > > Yep. Firmware features are also on the list. I figured I'd do CPU features > first though since they are the ones that started bugging me. > > > The same stuff is obviously possible for cur_cpu_spec->cpu_features as well. > > Do you think that it will help there? > > Nice. It won't be quite as easy to do compile-time for cpu features. > pSeries will need all cpus enabled since we have them all on various > machines, etc. I guess Powermac/Maple could benefit from it. In the > end it depends on how hairy the implementation would get vs performance > improvement. One other thing we did on ppc32 was to have separate ELF sections for pmac, chrp and prep specific code & get rid of them after boot... It may be worth bringing this back in... Ben. From arnd at arndb.de Sat Feb 5 11:22:21 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sat, 5 Feb 2005 01:22:21 +0100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <1107560989.2189.119.camel@gaston> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> <1107560989.2189.119.camel@gaston> Message-ID: <200502050122.27254.arnd@arndb.de> On S?nnavend 05 Februar 2005 00:49, Benjamin Herrenschmidt wrote: > On Fri, 2005-02-04 at 13:36 +0100, Arnd Bergmann wrote: > > I have a somewhat similar patch that does the same to the > > systemcfg->platform checks. I'm not sure if we should use the same inline > > function for both checks, but I do think that they should be used in a > > similar way, e.g. CPU_HAS_FEATURE(x) and PLATFORM_HAS_FEATURE(x). > > Note that I would prefer cpu_has_feature(), it doesn't strictly have to > be a macro and has function semantics anyway. > > [ ... ] > > which will always result in the shortest code for any combination of > > CONFIG_PPC_ISERIES, CONFIG_PPC_PSERIES and the other platforms. > > That's a good idea ! This is the patch to evaluate CPU_HAS_FEATURE() at compile time whenever possible. Testing showed that vmlinux shrinks around 4000 bytes with g5_defconfig. I also checked that pSeries code is completely unaltered semantically when support for all CPU types is enabled, although a few instructions are emitted in a different order by gcc. I have made cpu_has_feature() an inline function that expects the full name of a feature bit while the CPU_HAS_FEATURE() macro still behaves the same way as in Olofs original patch for now. I'm not sure if I got the Kconfig dependencies right, maybe you can check them. Signed-off-by: Arnd Bergmann --- Index: linux-2.6-64/include/asm-ppc64/cputable.h =================================================================== --- linux-2.6-64.orig/include/asm-ppc64/cputable.h 2005-02-05 01:24:58.975674192 +0100 +++ linux-2.6-64/include/asm-ppc64/cputable.h 2005-02-05 01:26:17.328762712 +0100 @@ -66,9 +66,6 @@ extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec; -#define CPU_HAS_FEATURE(x) (cur_cpu_spec->cpu_features & CPU_FTR_##x) - - /* firmware feature bitmask values */ #define FIRMWARE_MAX_FEATURES 63 @@ -154,6 +151,80 @@ #define CPU_FTR_PPCAS_ARCH_V2 (CPU_FTR_PPCAS_ARCH_V2_BASE | CPU_FTR_16M_PAGE) #endif +/* We only set the altivec features if the kernel was compiled with altivec + * support + */ +#ifdef CONFIG_ALTIVEC +#define CPU_FTR_ALTIVEC_COMP CPU_FTR_ALTIVEC +#define PPC_FEATURE_HAS_ALTIVEC_COMP PPC_FEATURE_HAS_ALTIVEC +#else +#define CPU_FTR_ALTIVEC_COMP 0 +#define PPC_FEATURE_HAS_ALTIVEC_COMP 0 +#endif + +enum { + CPU_FTR_POWER3 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | CPU_FTR_PMC8, + CPU_FTR_RS64 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_IABR | CPU_FTR_PMC8 | + CPU_FTR_MMCRA, + CPU_FTR_POWER4 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_PPC970 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | + CPU_FTR_ALTIVEC_COMP | CPU_FTR_CAN_NAP | + CPU_FTR_PMC8 | CPU_FTR_MMCRA, + CPU_FTR_POWER5 = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | + CPU_FTR_MMCRA | CPU_FTR_SMT | CPU_FTR_COHERENT_ICACHE | + CPU_FTR_LOCKLESS_TLBIE | CPU_FTR_MMCRA_SIHV, + CPU_FTR_COMPATIBLE = CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | + CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2, + CPU_FTR_POSSIBLE = +#ifdef CONFIG_CPU_POWER3 + CPU_FTR_POWER3 | +#endif +#ifdef CONFIG_CPU_RS64 + CPU_FTR_RS64 | +#endif +#ifdef CONFIG_CPU_POWER4 + CPU_FTR_POWER4 | +#endif +#ifdef CONFIG_CPU_PPC970 + CPU_FTR_PPC970 | +#endif +#ifdef CONFIG_CPU_POWER5 + CPU_FTR_POWER5 | +#endif + 0, + CPU_FTR_ALWAYS = +#ifdef CONFIG_CPU_POWER3 + CPU_FTR_POWER3 & +#endif +#ifdef CONFIG_CPU_RS64 + CPU_FTR_RS64 & +#endif +#ifdef CONFIG_CPU_POWER4 + CPU_FTR_POWER4 & +#endif +#ifdef CONFIG_CPU_PPC970 + CPU_FTR_PPC970 & +#endif +#ifdef CONFIG_CPU_POWER5 + CPU_FTR_POWER5 & +#endif + CPU_FTR_POSSIBLE, +}; + +static inline int cpu_has_feature(unsigned long feature) +{ + return (CPU_FTR_ALWAYS & feature) || + (CPU_FTR_POSSIBLE & feature & cur_cpu_spec->cpu_features); +} + +#define CPU_HAS_FEATURE(x) cpu_has_feature(CPU_FTR_##x) + #define COMMON_PPC64_FW (0) #endif Index: linux-2.6-64/arch/ppc64/Kconfig =================================================================== --- linux-2.6-64.orig/arch/ppc64/Kconfig 2005-02-05 01:24:31.098912104 +0100 +++ linux-2.6-64/arch/ppc64/Kconfig 2005-02-05 01:25:01.430301032 +0100 @@ -107,6 +107,31 @@ bool default y +config CPU_POWER3 + bool + default y + depends on (PPC_ISERIES || PPC_PSERIES) && !POWER4_ONLY + +config CPU_RS64 + bool + default y + depends on (PPC_ISERIES || PPC_PSERIES) && !POWER4_ONLY + +config CPU_POWER4 + bool + default y + depends on PPC_ISERIES || PPC_PSERIES + +config CPU_PPC970 + bool + default y + depends on PPC_PSERIES || PPC_PMAC || PPC_MAPLE + +config CPU_POWER5 + bool + default y + depends on PPC_PSERIES + # VMX is pSeries only for now until somebody writes the iSeries # exception vectors for it config ALTIVEC Index: linux-2.6-64/arch/ppc64/kernel/cputable.c =================================================================== --- linux-2.6-64.orig/arch/ppc64/kernel/cputable.c 2005-02-05 01:24:31.098912104 +0100 +++ linux-2.6-64/arch/ppc64/kernel/cputable.c 2005-02-05 01:25:01.431300880 +0100 @@ -33,137 +33,94 @@ extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec); -/* We only set the altivec features if the kernel was compiled with altivec - * support - */ -#ifdef CONFIG_ALTIVEC -#define CPU_FTR_ALTIVEC_COMP CPU_FTR_ALTIVEC -#define PPC_FEATURE_HAS_ALTIVEC_COMP PPC_FEATURE_HAS_ALTIVEC -#else -#define CPU_FTR_ALTIVEC_COMP 0 -#define PPC_FEATURE_HAS_ALTIVEC_COMP 0 -#endif - struct cpu_spec cpu_specs[] = { { /* Power3 */ 0xffff0000, 0x00400000, "POWER3 (630)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8, - COMMON_USER_PPC64, + CPU_FTR_POWER3, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* Power3+ */ 0xffff0000, 0x00410000, "POWER3 (630+)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8, - COMMON_USER_PPC64, + CPU_FTR_POWER3, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* Northstar */ 0xffff0000, 0x00330000, "RS64-II (northstar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_RS64, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* Pulsar */ 0xffff0000, 0x00340000, "RS64-III (pulsar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_RS64, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* I-star */ 0xffff0000, 0x00360000, "RS64-III (icestar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_RS64, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* S-star */ 0xffff0000, 0x00370000, "RS64-IV (sstar)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_IABR | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_RS64, COMMON_USER_PPC64, 128, 128, __setup_cpu_power3, COMMON_PPC64_FW }, { /* Power4 */ 0xffff0000, 0x00350000, "POWER4 (gp)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_POWER4, COMMON_USER_PPC64, 128, 128, __setup_cpu_power4, COMMON_PPC64_FW }, { /* Power4+ */ 0xffff0000, 0x00380000, "POWER4+ (gq)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64, + CPU_FTR_POWER4, COMMON_USER_PPC64, 128, 128, __setup_cpu_power4, COMMON_PPC64_FW }, { /* PPC970 */ 0xffff0000, 0x00390000, "PPC970", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, + CPU_FTR_PPC970, COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, 128, 128, __setup_cpu_ppc970, COMMON_PPC64_FW }, { /* PPC970FX */ 0xffff0000, 0x003c0000, "PPC970FX", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | - CPU_FTR_CAN_NAP | CPU_FTR_PMC8 | CPU_FTR_MMCRA, - COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, + CPU_FTR_PPC970, COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, 128, 128, __setup_cpu_ppc970, COMMON_PPC64_FW }, { /* Power5 */ 0xffff0000, 0x003a0000, "POWER5 (gr)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | - CPU_FTR_MMCRA_SIHV, - COMMON_USER_PPC64, + CPU_FTR_POWER5, COMMON_USER_PPC64, 128, 128, __setup_cpu_power4, COMMON_PPC64_FW }, { /* Power5 */ 0xffff0000, 0x003b0000, "POWER5 (gs)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_MMCRA | CPU_FTR_SMT | - CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | - CPU_FTR_MMCRA_SIHV, - COMMON_USER_PPC64, + CPU_FTR_POWER5, COMMON_USER_PPC64, 128, 128, __setup_cpu_power4, COMMON_PPC64_FW }, { /* default match */ 0x00000000, 0x00000000, "POWER4 (compatible)", - CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | - CPU_FTR_PPCAS_ARCH_V2, - COMMON_USER_PPC64, + CPU_FTR_COMPATIBLE, COMMON_USER_PPC64, 128, 128, __setup_cpu_power4, COMMON_PPC64_FW -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050205/a92c1465/attachment.pgp From benh at kernel.crashing.org Sat Feb 5 12:47:05 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Feb 2005 12:47:05 +1100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <200502050122.27254.arnd@arndb.de> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> <1107560989.2189.119.camel@gaston> <200502050122.27254.arnd@arndb.de> Message-ID: <1107568025.2189.136.camel@gaston> > This is the patch to evaluate CPU_HAS_FEATURE() at compile time whenever > possible. Testing showed that vmlinux shrinks around 4000 bytes with > g5_defconfig. I also checked that pSeries code is completely unaltered > semantically when support for all CPU types is enabled, although a few > instructions are emitted in a different order by gcc. > > I have made cpu_has_feature() an inline function that expects the full > name of a feature bit while the CPU_HAS_FEATURE() macro still behaves > the same way as in Olofs original patch for now. Note that this doesn't the asm part of it, where feature "sections" are nop'ed out... it may be interesting to get rid of the nops too here, oh well, that's too complicated for now. Ben. From anton at samba.org Sat Feb 5 12:34:26 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 5 Feb 2005 12:34:26 +1100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <200502050122.27254.arnd@arndb.de> References: <20050204072254.GA17565@austin.ibm.com> <200502041336.59892.arnd@arndb.de> <1107560989.2189.119.camel@gaston> <200502050122.27254.arnd@arndb.de> Message-ID: <20050205013426.GC11318@krispykreme.ozlabs.ibm.com> Hi, > This is the patch to evaluate CPU_HAS_FEATURE() at compile time whenever > possible. Testing showed that vmlinux shrinks around 4000 bytes with > g5_defconfig. I also checked that pSeries code is completely unaltered > semantically when support for all CPU types is enabled, although a few > instructions are emitted in a different order by gcc. > > I have made cpu_has_feature() an inline function that expects the full > name of a feature bit while the CPU_HAS_FEATURE() macro still behaves > the same way as in Olofs original patch for now. Interesting :) However we already get bug reports with the current CONFIG_POWER4_ONLY option. I worry about adding more options that users could get wrong unless there is a noticeable improvement in performance. Anton From penberg at cs.helsinki.fi Sat Feb 5 18:48:19 2005 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Sat, 05 Feb 2005 09:48:19 +0200 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204172041.GA17586@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <84144f0205020400172d89eddf@mail.gmail.com> <20050204172041.GA17586@austin.ibm.com> Message-ID: <1107589699.17616.4.camel@localhost> On Fri, 2005-02-04 at 11:20 -0600, Olof Johansson wrote: > * cpu-has-feature(cpu-feature-foo) v cpu-has-feature(foo): I picked the > latter for readability. > * Renaming CPU_FTR_ -> CPU_ makes it less obvious that > it's actually a cpu feature it's describing (i.e. CPU_ALTIVEC vs > CPU_FTR_ALTIVEC). > * Renaming would clobber the namespace, CPU_* definitions are used in > other places in the tree. > * Can't make it an inline and still use the preprocessor concatenation. Seriously, if readability is your argument, macro magic is not the answer. Ok, we can't clobber the CPU_ definitions, so pick another prefix. If you want readability, please consider using named enums: enum cpu_feature { CF_ALTIVEC = /* ... */ }; static inline int cpu_has_feature(enum cpu_feature cf) { } Pekka From benh at kernel.crashing.org Sat Feb 5 20:08:53 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Feb 2005 20:08:53 +1100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050204172041.GA17586@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <84144f0205020400172d89eddf@mail.gmail.com> <20050204172041.GA17586@austin.ibm.com> Message-ID: <1107594534.30270.3.camel@gaston> On Fri, 2005-02-04 at 11:20 -0600, Olof Johansson wrote: > On Fri, Feb 04, 2005 at 10:17:48AM +0200, Pekka Enberg wrote: > > Please drop the CPU_FTR_##x macro magic as it makes grepping more > > complicated. If the enum names are too long, just do s/CPU_FTR_/CPU_/g > > or something similar. Also, could you please make this a static inline > > function? I tend to agree with Pekka... > I considered that for a while, but decided against it because: > > * cpu-has-feature(cpu-feature-foo) v cpu-has-feature(foo): I picked the > latter for readability. I don't think it really matters compared to the usefullness of grep, and is still more readable than the old way... > * Renaming CPU_FTR_ -> CPU_ makes it less obvious that > it's actually a cpu feature it's describing (i.e. CPU_ALTIVEC vs > CPU_FTR_ALTIVEC). Agreed. > * Renaming would clobber the namespace, CPU_* definitions are used in > other places in the tree. > * Can't make it an inline and still use the preprocessor concatenation. I'd like to keep the constants as-is and have the stuff inline with no macro trick as Pekka suggest since I did use grep on those things quite often. > That being said, you do have a point about grepability. However, > personally I'd be more likely to look for CPU_HAS_FEATURE than the > feature itself when reading the code, and would find that easily. The > other way around (finding all uses of a feature) is harder, but the > concatenation macro is right below the bit definitions and easy to spot. No, when I grep, i'm looking for the feature itself... Ben. From benh at kernel.crashing.org Sat Feb 5 20:19:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 05 Feb 2005 20:19:08 +1100 Subject: question on symbol exports In-Reply-To: <4203D793.1040604@nortel.com> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> Message-ID: <1107595148.30302.5.camel@gaston> > It turns out that to call ptep_clear_flush_dirty() on ppc64 from a > module I needed to export the following symbols: > > __flush_tlb_pending > ppc64_tlb_batch > hpte_update Any reason why you need to call that from a module ? Is the module GPL'd ? Ben. From arnd at arndb.de Sat Feb 5 22:04:34 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sat, 5 Feb 2005 12:04:34 +0100 Subject: [PATCH] PPC/PPC64: Introduce CPU_HAS_FEATURE() macro In-Reply-To: <20050205013426.GC11318@krispykreme.ozlabs.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <200502050122.27254.arnd@arndb.de> <20050205013426.GC11318@krispykreme.ozlabs.ibm.com> Message-ID: <200502051204.38965.arnd@arndb.de> On S?nnavend 05 Februar 2005 02:34, Anton Blanchard wrote: > Interesting :) However we already get bug reports with the current > CONFIG_POWER4_ONLY option. I worry about adding more options that users > could get wrong unless there is a noticeable improvement in performance. > The patch that I posted doesn't add any new user selectable options, it only limits the supported CPUs to the ones that are available on the supported platforms. If you select powermac or maple, the only supported CPU will be PowerPC970, so the C compiler can optimize away all runtime checks for CPU features. I don't expect much noticeable performance advantage from the patch, but it allows to make some of the source code nicer. E.g. you can replace every instance of '#ifdef CONFIG_ALTIVEC' with 'if (CPU_FTR_POSSIBLE & CPU_FTR_ALTIVEC)' or an inline function wrapping that. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050205/d97757b1/attachment.pgp From olof at austin.ibm.com Sun Feb 6 05:46:47 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Sat, 5 Feb 2005 12:46:47 -0600 Subject: [PATCH] PPC/PPC64: Abstract cpu_feature checks. In-Reply-To: <20050204072254.GA17565@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> Message-ID: <20050205184647.GA17417@austin.ibm.com> Abstract most manual mask checks of cpu_features with cpu_has_feature() Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc/kernel/ppc_htab.c | 8 +++--- linux-2.5-olof/arch/ppc/kernel/setup.c | 4 +-- linux-2.5-olof/arch/ppc/kernel/temp.c | 2 - linux-2.5-olof/arch/ppc/mm/mmu_decl.h | 2 - linux-2.5-olof/arch/ppc/mm/ppc_mmu.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_setup.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_smp.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/sandpoint.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/align.c | 2 - linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c | 2 - linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c | 2 - linux-2.5-olof/arch/ppc64/kernel/process.c | 4 +-- linux-2.5-olof/arch/ppc64/kernel/setup.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/smp.c | 2 - linux-2.5-olof/arch/ppc64/kernel/sysfs.c | 22 +++++++++---------- linux-2.5-olof/arch/ppc64/mm/hash_native.c | 14 ++++++------ linux-2.5-olof/arch/ppc64/mm/hash_utils.c | 2 - linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c | 2 - linux-2.5-olof/arch/ppc64/mm/init.c | 10 ++++---- linux-2.5-olof/arch/ppc64/mm/slb.c | 4 +-- linux-2.5-olof/arch/ppc64/mm/stab.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c | 2 - linux-2.5-olof/arch/ppc64/xmon/xmon.c | 8 +++--- linux-2.5-olof/drivers/macintosh/via-pmu.c | 2 - linux-2.5-olof/drivers/md/raid6altivec.uc | 2 - linux-2.5-olof/include/asm-ppc/cputable.h | 5 ++++ linux-2.5-olof/include/asm-ppc64/cacheflush.h | 2 - linux-2.5-olof/include/asm-ppc64/cputable.h | 5 ++++ linux-2.5-olof/include/asm-ppc64/mmu_context.h | 4 +-- linux-2.5-olof/include/asm-ppc64/page.h | 2 - 32 files changed, 76 insertions(+), 66 deletions(-) diff -puN include/asm-ppc64/cputable.h~cpu-has-feature include/asm-ppc64/cputable.h --- linux-2.5/include/asm-ppc64/cputable.h~cpu-has-feature 2005-02-05 11:11:03.478617416 -0600 +++ linux-2.5-olof/include/asm-ppc64/cputable.h 2005-02-05 11:22:32.309899144 -0600 @@ -66,6 +66,11 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec; +static inline unsigned long cpu_has_feature(feature) +{ + return cur_cpu_spec->cpu_features & feature; +} + /* firmware feature bitmask values */ #define FIRMWARE_MAX_FEATURES 63 diff -puN arch/ppc64/kernel/align.c~cpu-has-feature arch/ppc64/kernel/align.c --- linux-2.5/arch/ppc64/kernel/align.c~cpu-has-feature 2005-02-05 11:11:03.521610880 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/align.c 2005-02-05 11:11:04.117520288 -0600 @@ -238,7 +238,7 @@ fix_alignment(struct pt_regs *regs) dsisr = regs->dsisr; - if (cur_cpu_spec->cpu_features & CPU_FTR_NODSISRALIGN) { + if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { unsigned int real_instr; if (__get_user(real_instr, (unsigned int __user *)regs->nip)) return 0; diff -puN arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature arch/ppc64/kernel/iSeries_setup.c --- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature 2005-02-05 11:11:03.525610272 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c 2005-02-05 11:11:04.118520136 -0600 @@ -267,7 +267,7 @@ unsigned long iSeries_process_mainstore_ unsigned long i; unsigned long mem_blocks = 0; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) mem_blocks = iSeries_process_Regatta_mainstore_vpd(mb_array, max_entries); else diff -puN arch/ppc64/kernel/idle.c~cpu-has-feature arch/ppc64/kernel/idle.c diff -puN arch/ppc64/kernel/process.c~cpu-has-feature arch/ppc64/kernel/process.c --- linux-2.5/arch/ppc64/kernel/process.c~cpu-has-feature 2005-02-05 11:11:03.600598872 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/process.c 2005-02-05 11:11:04.119519984 -0600 @@ -388,12 +388,12 @@ copy_thread(int nr, unsigned long clone_ kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; p->thread.ksp = sp; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (cpu_has_feature(CPU_FTR_SLB)) { unsigned long sp_vsid = get_kernel_vsid(sp); sp_vsid <<= SLB_VSID_SHIFT; sp_vsid |= SLB_VSID_KERNEL; - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) sp_vsid |= SLB_VSID_L; p->thread.ksp_vsid = sp_vsid; diff -puN arch/ppc64/kernel/smp.c~cpu-has-feature arch/ppc64/kernel/smp.c --- linux-2.5/arch/ppc64/kernel/smp.c~cpu-has-feature 2005-02-05 11:11:03.606597960 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/smp.c 2005-02-05 11:11:04.120519832 -0600 @@ -416,7 +416,7 @@ int __devinit __cpu_up(unsigned int cpu) paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_SLB)) { + if (!cpu_has_feature(CPU_FTR_SLB)) { void *tmp; /* maximum of 48 CPUs on machines with a segment table */ diff -puN arch/ppc64/kernel/sysfs.c~cpu-has-feature arch/ppc64/kernel/sysfs.c --- linux-2.5/arch/ppc64/kernel/sysfs.c~cpu-has-feature 2005-02-05 11:11:03.609597504 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/sysfs.c 2005-02-05 11:11:04.121519680 -0600 @@ -63,7 +63,7 @@ static int __init smt_setup(void) unsigned int *val; unsigned int cpu; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!cpu_has_feature(CPU_FTR_SMT)) return 1; options = find_path_device("/options"); @@ -86,7 +86,7 @@ static int __init setup_smt_snooze_delay unsigned int cpu; int snooze; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!cpu_has_feature(CPU_FTR_SMT)) return 1; smt_snooze_cmdline = 1; @@ -167,7 +167,7 @@ void ppc64_enable_pmcs(void) * On SMT machines we have to set the run latch in the ctrl register * in order to make PMC6 spin. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) { + if (cpu_has_feature(CPU_FTR_SMT)) { ctrl = mfspr(CTRLF); ctrl |= RUNLATCH; mtspr(CTRLT, ctrl); @@ -266,7 +266,7 @@ static void register_cpu_online(unsigned struct sys_device *s = &c->sysdev; #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_create_file(s, &attr_smt_snooze_delay); #endif @@ -275,7 +275,7 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_mmcr0); sysdev_create_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) sysdev_create_file(s, &attr_mmcra); sysdev_create_file(s, &attr_pmc1); @@ -285,12 +285,12 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_pmc5); sysdev_create_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (cpu_has_feature(CPU_FTR_PMC8)) { sysdev_create_file(s, &attr_pmc7); sysdev_create_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_create_file(s, &attr_purr); } @@ -303,7 +303,7 @@ static void unregister_cpu_online(unsign BUG_ON(c->no_control); #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_remove_file(s, &attr_smt_snooze_delay); #endif @@ -312,7 +312,7 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_mmcr0); sysdev_remove_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) sysdev_remove_file(s, &attr_mmcra); sysdev_remove_file(s, &attr_pmc1); @@ -322,12 +322,12 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_pmc5); sysdev_remove_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (cpu_has_feature(CPU_FTR_PMC8)) { sysdev_remove_file(s, &attr_pmc7); sysdev_remove_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_remove_file(s, &attr_purr); } #endif /* CONFIG_HOTPLUG_CPU */ diff -puN arch/ppc64/mm/hash_native.c~cpu-has-feature arch/ppc64/mm/hash_native.c --- linux-2.5/arch/ppc64/mm/hash_native.c~cpu-has-feature 2005-02-05 11:11:03.653590816 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_native.c 2005-02-05 11:11:04.122519528 -0600 @@ -217,10 +217,10 @@ static long native_hpte_updatepp(unsigne } /* Ensure it is out of the tlb too */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { tlbiel(va); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); @@ -245,7 +245,7 @@ static void native_hpte_updateboltedpp(u unsigned long vsid, va, vpn, flags = 0; long slot; HPTE *hptep; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); vsid = get_kernel_vsid(ea); va = (vsid << 28) | (ea & 0x0fffffff); @@ -273,7 +273,7 @@ static void native_hpte_invalidate(unsig Hpte_dword0 dw0; unsigned long avpn = va >> 23; unsigned long flags; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (large) avpn &= ~0x1UL; @@ -292,7 +292,7 @@ static void native_hpte_invalidate(unsig } /* Invalidate the tlb */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { tlbiel(va); } else { if (lock_tlbie) @@ -360,7 +360,7 @@ static void native_flush_hash_range(unsi j++; } - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { asm volatile("ptesync":::"memory"); for (i = 0; i < j; i++) @@ -368,7 +368,7 @@ static void native_flush_hash_range(unsi asm volatile("ptesync":::"memory"); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); diff -puN arch/ppc64/mm/hash_utils.c~cpu-has-feature arch/ppc64/mm/hash_utils.c --- linux-2.5/arch/ppc64/mm/hash_utils.c~cpu-has-feature 2005-02-05 11:11:03.656590360 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_utils.c 2005-02-05 11:11:04.123519376 -0600 @@ -190,7 +190,7 @@ void __init htab_initialize(void) * _NOT_ map it to avoid cache paradoxes as it's remapped non * cacheable later on */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) use_largepages = 1; /* create bolted the linear mapping in the hash table */ diff -puN arch/ppc64/mm/hugetlbpage.c~cpu-has-feature arch/ppc64/mm/hugetlbpage.c --- linux-2.5/arch/ppc64/mm/hugetlbpage.c~cpu-has-feature 2005-02-05 11:11:03.674587624 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c 2005-02-05 11:11:04.123519376 -0600 @@ -705,7 +705,7 @@ unsigned long hugetlb_get_unmapped_area( if (len & ~HPAGE_MASK) return -EINVAL; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)) + if (!cpu_has_feature(CPU_FTR_16M_PAGE)) return -EINVAL; if (test_thread_flag(TIF_32BIT)) { diff -puN arch/ppc64/mm/init.c~cpu-has-feature arch/ppc64/mm/init.c --- linux-2.5/arch/ppc64/mm/init.c~cpu-has-feature 2005-02-05 11:11:03.680586712 -0600 +++ linux-2.5-olof/arch/ppc64/mm/init.c 2005-02-05 11:11:04.124519224 -0600 @@ -752,7 +752,7 @@ void __init mem_init(void) */ void flush_dcache_page(struct page *page) { - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ if (test_bit(PG_arch_1, &page->flags)) @@ -763,7 +763,7 @@ void clear_user_page(void *page, unsigne { clear_page(page); - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* * We shouldnt have to do this, but some versions of glibc @@ -796,7 +796,7 @@ void copy_user_page(void *vto, void *vfr return; #endif - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ @@ -832,8 +832,8 @@ void update_mmu_cache(struct vm_area_str unsigned long flags; /* handle i-cache coherency */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) && - !(cur_cpu_spec->cpu_features & CPU_FTR_NOEXECUTE)) { + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE) && + !cpu_has_feature(CPU_FTR_NOEXECUTE)) { unsigned long pfn = pte_pfn(pte); if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); diff -puN arch/ppc64/mm/slb.c~cpu-has-feature arch/ppc64/mm/slb.c --- linux-2.5/arch/ppc64/mm/slb.c~cpu-has-feature 2005-02-05 11:11:03.683586256 -0600 +++ linux-2.5-olof/arch/ppc64/mm/slb.c 2005-02-05 11:11:04.125519072 -0600 @@ -51,7 +51,7 @@ static void slb_flush_and_rebolt(void) WARN_ON(!irqs_disabled()); - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) ksp_flags |= SLB_VSID_L; ksp_esid_data = mk_esid_data(get_paca()->kstack, 2); @@ -139,7 +139,7 @@ void slb_initialize(void) unsigned long flags = SLB_VSID_KERNEL; /* Invalidate the entire SLB (even slot 0) & all the ERATS */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) flags |= SLB_VSID_L; asm volatile("isync":::"memory"); diff -puN arch/ppc64/mm/stab.c~cpu-has-feature arch/ppc64/mm/stab.c --- linux-2.5/arch/ppc64/mm/stab.c~cpu-has-feature 2005-02-05 11:11:03.704583064 -0600 +++ linux-2.5-olof/arch/ppc64/mm/stab.c 2005-02-05 11:11:04.125519072 -0600 @@ -227,7 +227,7 @@ void stab_initialize(unsigned long stab) { unsigned long vsid = get_kernel_vsid(KERNELBASE); - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (cpu_has_feature(CPU_FTR_SLB)) { slb_initialize(); } else { asm volatile("isync; slbia; isync":::"memory"); diff -puN arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature arch/ppc64/oprofile/op_model_power4.c --- linux-2.5/arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature 2005-02-05 11:11:03.764573944 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c 2005-02-05 11:11:04.126518920 -0600 @@ -54,7 +54,7 @@ static void power4_reg_setup(struct op_c * * It has been verified to work on POWER5 so we enable it there. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA_SIHV) + if (cpu_has_feature(CPU_FTR_MMCRA_SIHV)) mmcra_has_sihv = 1; /* diff -puN arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature arch/ppc64/oprofile/op_model_rs64.c --- linux-2.5/arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature 2005-02-05 11:11:03.768573336 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c 2005-02-05 11:11:04.126518920 -0600 @@ -114,7 +114,7 @@ static void rs64_cpu_setup(void *unused) /* reset MMCR1, MMCRA */ mtspr(SPRN_MMCR1, 0); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) mtspr(SPRN_MMCRA, 0); mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; diff -puN arch/ppc64/xmon/xmon.c~cpu-has-feature arch/ppc64/xmon/xmon.c --- linux-2.5/arch/ppc64/xmon/xmon.c~cpu-has-feature 2005-02-05 11:11:03.814566344 -0600 +++ linux-2.5-olof/arch/ppc64/xmon/xmon.c 2005-02-05 11:11:04.128518616 -0600 @@ -723,7 +723,7 @@ static void insert_cpu_bpts(void) { if (dabr.enabled) set_controlled_dabr(dabr.address | (dabr.enabled & 7)); - if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (iabr && cpu_has_feature(CPU_FTR_IABR)) set_iabr(iabr->address | (iabr->enabled & (BP_IABR|BP_IABR_TE))); } @@ -751,7 +751,7 @@ static void remove_bpts(void) static void remove_cpu_bpts(void) { set_controlled_dabr(0); - if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (cpu_has_feature(CPU_FTR_IABR)) set_iabr(0); } @@ -1098,7 +1098,7 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { + if (!cpu_has_feature(CPU_FTR_IABR)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); break; @@ -2496,7 +2496,7 @@ void xmon_init(void) void dump_segments(void) { - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) dump_slb(); else dump_stab(); diff -puN include/asm-ppc64/cacheflush.h~cpu-has-feature include/asm-ppc64/cacheflush.h --- linux-2.5/include/asm-ppc64/cacheflush.h~cpu-has-feature 2005-02-05 11:11:03.836563000 -0600 +++ linux-2.5-olof/include/asm-ppc64/cacheflush.h 2005-02-05 11:11:04.129518464 -0600 @@ -40,7 +40,7 @@ extern void __flush_dcache_icache(void * static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)) + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) __flush_icache_range(start, stop); } diff -puN include/asm-ppc64/mmu_context.h~cpu-has-feature include/asm-ppc64/mmu_context.h --- linux-2.5/include/asm-ppc64/mmu_context.h~cpu-has-feature 2005-02-05 11:11:03.841562240 -0600 +++ linux-2.5-olof/include/asm-ppc64/mmu_context.h 2005-02-05 11:11:04.129518464 -0600 @@ -59,11 +59,11 @@ static inline void switch_mm(struct mm_s return; #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) asm volatile ("dssall"); #endif /* CONFIG_ALTIVEC */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) switch_slb(tsk, next); else switch_stab(tsk, next); diff -puN include/asm-ppc64/page.h~cpu-has-feature include/asm-ppc64/page.h --- linux-2.5/include/asm-ppc64/page.h~cpu-has-feature 2005-02-05 11:11:03.845561632 -0600 +++ linux-2.5-olof/include/asm-ppc64/page.h 2005-02-05 11:11:04.130518312 -0600 @@ -67,7 +67,7 @@ #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #define in_hugepage_area(context, addr) \ - ((cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) && \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ ( (((addr) >= TASK_HPAGE_BASE) && ((addr) < TASK_HPAGE_END)) || \ ( ((addr) < 0x100000000L) && \ ((1 << GET_ESID(addr)) & (context).htlb_segs) ) ) ) diff -puN arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature arch/ppc64/kernel/pSeries_lpar.c --- linux-2.5/arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature 2005-02-05 11:11:03.848561176 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c 2005-02-05 11:11:04.130518312 -0600 @@ -505,7 +505,7 @@ void pSeries_lpar_flush_hash_range(unsig int i; unsigned long flags = 0; struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); diff -puN arch/ppc64/kernel/setup.c~cpu-has-feature arch/ppc64/kernel/setup.c --- linux-2.5/arch/ppc64/kernel/setup.c~cpu-has-feature 2005-02-05 11:11:03.853560416 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/setup.c 2005-02-05 11:11:04.132518008 -0600 @@ -315,7 +315,7 @@ static void __init setup_cpu_maps(void) maxcpus = ireg[num_addr_cell + num_size_cell]; /* Double maxcpus for processors which have SMT capability */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) maxcpus *= 2; if (maxcpus > NR_CPUS) { @@ -339,7 +339,7 @@ static void __init setup_cpu_maps(void) */ for_each_cpu(cpu) { cpu_set(cpu, cpu_sibling_map[cpu]); - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) cpu_set(cpu ^ 0x1, cpu_sibling_map[cpu]); } @@ -767,7 +767,7 @@ static int show_cpuinfo(struct seq_file seq_printf(m, "unknown (%08x)", pvr); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) seq_printf(m, ", altivec supported"); #endif /* CONFIG_ALTIVEC */ diff -puN drivers/macintosh/via-pmu.c~cpu-has-feature drivers/macintosh/via-pmu.c --- linux-2.5/drivers/macintosh/via-pmu.c~cpu-has-feature 2005-02-05 11:11:03.895554032 -0600 +++ linux-2.5-olof/drivers/macintosh/via-pmu.c 2005-02-05 11:11:04.134517704 -0600 @@ -2389,7 +2389,7 @@ pmac_suspend_devices(void) enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN include/asm-ppc/cputable.h~cpu-has-feature include/asm-ppc/cputable.h --- linux-2.5/include/asm-ppc/cputable.h~cpu-has-feature 2005-02-05 11:11:03.919550384 -0600 +++ linux-2.5-olof/include/asm-ppc/cputable.h 2005-02-05 11:22:58.928852448 -0600 @@ -61,6 +61,11 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec[]; +static inline unsigned int cpu_has_feature(feature) +{ + return cur_cpu_spec[0]->cpu_features & feature; +} + #endif /* __ASSEMBLY__ */ /* CPU kernel features */ diff -puN arch/ppc/mm/ppc_mmu.c~cpu-has-feature arch/ppc/mm/ppc_mmu.c --- linux-2.5/arch/ppc/mm/ppc_mmu.c~cpu-has-feature 2005-02-05 11:11:03.976541720 -0600 +++ linux-2.5-olof/arch/ppc/mm/ppc_mmu.c 2005-02-05 11:11:04.136517400 -0600 @@ -138,7 +138,7 @@ void __init setbat(int index, unsigned l union ubat *bat = BATS[index]; if (((flags & _PAGE_NO_CACHE) == 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_NEED_COHERENT)) + cpu_has_feature(CPU_FTR_NEED_COHERENT)) flags |= _PAGE_COHERENT; bl = (size >> 17) - 1; @@ -191,7 +191,7 @@ void __init MMU_init_hw(void) extern unsigned int hash_page[]; extern unsigned int flush_hash_patch_A[], flush_hash_patch_B[]; - if ((cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE) == 0) { + if (!cpu_has_feature(CPU_FTR_HPTE_TABLE)) { /* * Put a blr (procedure return) instruction at the * start of hash_page, since we can still get DSI diff -puN arch/ppc/mm/mmu_decl.h~cpu-has-feature arch/ppc/mm/mmu_decl.h --- linux-2.5/arch/ppc/mm/mmu_decl.h~cpu-has-feature 2005-02-05 11:11:03.979541264 -0600 +++ linux-2.5-olof/arch/ppc/mm/mmu_decl.h 2005-02-05 11:11:04.136517400 -0600 @@ -75,7 +75,7 @@ static inline void flush_HPTE(unsigned c unsigned long pdval) { if ((Hash != 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE)) + cpu_has_feature(CPU_FTR_HPTE_TABLE)) flush_hash_pages(0, va, pdval, 1); else _tlbie(va); diff -puN arch/ppc/kernel/setup.c~cpu-has-feature arch/ppc/kernel/setup.c --- linux-2.5/arch/ppc/kernel/setup.c~cpu-has-feature 2005-02-05 11:11:04.018535336 -0600 +++ linux-2.5-olof/arch/ppc/kernel/setup.c 2005-02-05 11:11:04.137517248 -0600 @@ -619,7 +619,7 @@ machine_init(unsigned long r3, unsigned /* Checks "l2cr=xxxx" command-line option */ int __init ppc_setup_l2cr(char *str) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (cpu_has_feature(CPU_FTR_L2CR)) { unsigned long val = simple_strtoul(str, NULL, 0); printk(KERN_INFO "l2cr set to %lx\n", val); _set_L2CR(0); /* force invalidate by disable cache */ @@ -720,7 +720,7 @@ void __init setup_arch(char **cmdline_p) * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPLIT_ID_CACHE) { + if (cpu_has_feature(CPU_FTR_SPLIT_ID_CACHE)) { dcache_bsize = cur_cpu_spec[0]->dcache_bsize; icache_bsize = cur_cpu_spec[0]->icache_bsize; ucache_bsize = 0; diff -puN arch/ppc/kernel/temp.c~cpu-has-feature arch/ppc/kernel/temp.c --- linux-2.5/arch/ppc/kernel/temp.c~cpu-has-feature 2005-02-05 11:11:04.024534424 -0600 +++ linux-2.5-olof/arch/ppc/kernel/temp.c 2005-02-05 11:11:04.137517248 -0600 @@ -223,7 +223,7 @@ int __init TAU_init(void) /* We assume in SMP that if one CPU has TAU support, they * all have it --BenH */ - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_TAU)) { + if (!cpu_has_feature(CPU_FTR_TAU)) { printk("Thermal assist unit not available\n"); tau_initialized = 0; return 1; diff -puN arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature arch/ppc/platforms/pmac_cpufreq.c --- linux-2.5/arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature 2005-02-05 11:11:04.064528344 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c 2005-02-05 11:11:04.138517096 -0600 @@ -230,7 +230,7 @@ static int __pmac pmu_set_cpu_speed(int enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN arch/ppc/platforms/pmac_setup.c~cpu-has-feature arch/ppc/platforms/pmac_setup.c --- linux-2.5/arch/ppc/platforms/pmac_setup.c~cpu-has-feature 2005-02-05 11:11:04.068527736 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_setup.c 2005-02-05 11:11:04.139516944 -0600 @@ -274,7 +274,7 @@ pmac_setup_arch(void) pmac_find_bridges(); /* Checks "l2cr-value" property in the registry */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (cpu_has_feature(CPU_FTR_L2CR)) { struct device_node *np = find_devices("cpus"); if (np == 0) np = find_type_devices("cpu"); diff -puN arch/ppc/platforms/pmac_smp.c~cpu-has-feature arch/ppc/platforms/pmac_smp.c --- linux-2.5/arch/ppc/platforms/pmac_smp.c~cpu-has-feature 2005-02-05 11:11:04.071527280 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_smp.c 2005-02-05 11:11:04.139516944 -0600 @@ -119,7 +119,7 @@ static volatile int sec_tb_reset = 0; static void __init core99_init_caches(int cpu) { - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!cpu_has_feature(CPU_FTR_L2CR)) return; if (cpu == 0) { @@ -132,7 +132,7 @@ static void __init core99_init_caches(in printk("CPU%d: L2CR set to %lx\n", cpu, core99_l2_cache); } - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR)) + if (!cpu_has_feature(CPU_FTR_L3CR)) return; if (cpu == 0){ diff -puN arch/ppc/platforms/sandpoint.c~cpu-has-feature arch/ppc/platforms/sandpoint.c --- linux-2.5/arch/ppc/platforms/sandpoint.c~cpu-has-feature 2005-02-05 11:11:04.074526824 -0600 +++ linux-2.5-olof/arch/ppc/platforms/sandpoint.c 2005-02-05 11:11:04.140516792 -0600 @@ -319,10 +319,10 @@ sandpoint_setup_arch(void) * We will do this now with good known values. Future versions * of DINK32 are supposed to get this correct. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPEC7450) + if (cpu_has_feature(CPU_FTR_SPEC7450)) /* 745x is different. We only want to pass along enable. */ _set_L2CR(L2CR_L2E); - else if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) + else if (cpu_has_feature(CPU_FTR_L2CR)) /* All modules have 1MB of L2. We also assume that an * L2 divisor of 3 will work. */ @@ -330,7 +330,7 @@ sandpoint_setup_arch(void) | L2CR_L2RAM_PIPE | L2CR_L2OH_1_0 | L2CR_L2DF); #if 0 /* Untested right now. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR) { + if (cpu_has_feature(CPU_FTR_L3CR)) { /* Magic value. */ _set_L3CR(0x8f032000); } diff -puN arch/ppc/kernel/ppc_htab.c~cpu-has-feature arch/ppc/kernel/ppc_htab.c --- linux-2.5/arch/ppc/kernel/ppc_htab.c~cpu-has-feature 2005-02-05 11:11:04.077526368 -0600 +++ linux-2.5-olof/arch/ppc/kernel/ppc_htab.c 2005-02-05 11:11:04.141516640 -0600 @@ -108,7 +108,7 @@ static int ppc_htab_show(struct seq_file PTE *ptr; #endif /* CONFIG_PPC_STD_MMU */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (cpu_has_feature(CPU_FTR_604_PERF_MON)) { mmcr0 = mfspr(SPRN_MMCR0); pmc1 = mfspr(SPRN_PMC1); pmc2 = mfspr(SPRN_PMC2); @@ -209,7 +209,7 @@ static ssize_t ppc_htab_write(struct fil if ( !strncmp( buffer, "reset", 5) ) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (cpu_has_feature(CPU_FTR_604_PERF_MON)) { /* reset PMC1 and PMC2 */ mtspr(SPRN_PMC1, 0); mtspr(SPRN_PMC2, 0); @@ -221,7 +221,7 @@ static ssize_t ppc_htab_write(struct fil } /* Everything below here requires the performance monitor feature. */ - if ( !cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON ) + if (!cpu_has_feature(CPU_FTR_604_PERF_MON)) return count; /* turn off performance monitoring */ @@ -339,7 +339,7 @@ int proc_dol2crvec(ctl_table *table, int "0.5", "1.0", "(reserved2)", "(reserved3)" }; - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!cpu_has_feature(CPU_FTR_L2CR)) return -EFAULT; if ( /*!table->maxlen ||*/ (*ppos && !write)) { diff -puN drivers/md/raid6altivec.uc~cpu-has-feature drivers/md/raid6altivec.uc --- linux-2.5/drivers/md/raid6altivec.uc~cpu-has-feature 2005-02-05 11:11:04.081525760 -0600 +++ linux-2.5-olof/drivers/md/raid6altivec.uc 2005-02-05 11:11:05.007385008 -0600 @@ -108,7 +108,7 @@ int raid6_have_altivec(void); int raid6_have_altivec(void) { /* This assumes either all CPUs have Altivec or none does */ - return cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC; + return cpu_has_feature(CPU_FTR_ALTIVEC): } #endif _ From olof at austin.ibm.com Sun Feb 6 14:26:45 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Sat, 5 Feb 2005 21:26:45 -0600 Subject: [PATCH] PPC/PPC64: Abstract cpu_feature checks. In-Reply-To: <20050205184647.GA17417@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <20050205184647.GA17417@austin.ibm.com> Message-ID: <20050206032645.GA18845@austin.ibm.com> Doh, forgot to do a final refpatch after fixing build error. I blame it on lack of morning coffee. Here's a proper version: Abstract most manual mask checks of cpu_features with cpu_has_feature() Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc/kernel/ppc_htab.c | 8 +++--- linux-2.5-olof/arch/ppc/kernel/setup.c | 4 +-- linux-2.5-olof/arch/ppc/kernel/temp.c | 2 - linux-2.5-olof/arch/ppc/mm/mmu_decl.h | 2 - linux-2.5-olof/arch/ppc/mm/ppc_mmu.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_setup.c | 2 - linux-2.5-olof/arch/ppc/platforms/pmac_smp.c | 4 +-- linux-2.5-olof/arch/ppc/platforms/sandpoint.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/align.c | 2 - linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c | 2 - linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c | 2 - linux-2.5-olof/arch/ppc64/kernel/process.c | 4 +-- linux-2.5-olof/arch/ppc64/kernel/setup.c | 6 ++--- linux-2.5-olof/arch/ppc64/kernel/smp.c | 2 - linux-2.5-olof/arch/ppc64/kernel/sysfs.c | 22 +++++++++---------- linux-2.5-olof/arch/ppc64/mm/hash_native.c | 14 ++++++------ linux-2.5-olof/arch/ppc64/mm/hash_utils.c | 2 - linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c | 2 - linux-2.5-olof/arch/ppc64/mm/init.c | 10 ++++---- linux-2.5-olof/arch/ppc64/mm/slb.c | 4 +-- linux-2.5-olof/arch/ppc64/mm/stab.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c | 2 - linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c | 2 - linux-2.5-olof/arch/ppc64/xmon/xmon.c | 8 +++--- linux-2.5-olof/drivers/macintosh/via-pmu.c | 2 - linux-2.5-olof/drivers/md/raid6altivec.uc | 2 - linux-2.5-olof/include/asm-ppc/cputable.h | 5 ++++ linux-2.5-olof/include/asm-ppc64/cacheflush.h | 2 - linux-2.5-olof/include/asm-ppc64/cputable.h | 5 ++++ linux-2.5-olof/include/asm-ppc64/mmu_context.h | 4 +-- linux-2.5-olof/include/asm-ppc64/page.h | 2 - 32 files changed, 76 insertions(+), 66 deletions(-) diff -puN include/asm-ppc64/cputable.h~cpu-has-feature include/asm-ppc64/cputable.h --- linux-2.5/include/asm-ppc64/cputable.h~cpu-has-feature 2005-02-05 11:11:03.478617416 -0600 +++ linux-2.5-olof/include/asm-ppc64/cputable.h 2005-02-05 21:14:05.873057376 -0600 @@ -66,6 +66,11 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec; +static inline unsigned long cpu_has_feature(unsigned long feature) +{ + return cur_cpu_spec->cpu_features & feature; +} + /* firmware feature bitmask values */ #define FIRMWARE_MAX_FEATURES 63 diff -puN arch/ppc64/kernel/align.c~cpu-has-feature arch/ppc64/kernel/align.c --- linux-2.5/arch/ppc64/kernel/align.c~cpu-has-feature 2005-02-05 11:11:03.521610880 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/align.c 2005-02-05 11:11:04.117520288 -0600 @@ -238,7 +238,7 @@ fix_alignment(struct pt_regs *regs) dsisr = regs->dsisr; - if (cur_cpu_spec->cpu_features & CPU_FTR_NODSISRALIGN) { + if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) { unsigned int real_instr; if (__get_user(real_instr, (unsigned int __user *)regs->nip)) return 0; diff -puN arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature arch/ppc64/kernel/iSeries_setup.c --- linux-2.5/arch/ppc64/kernel/iSeries_setup.c~cpu-has-feature 2005-02-05 11:11:03.525610272 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_setup.c 2005-02-05 11:11:04.118520136 -0600 @@ -267,7 +267,7 @@ unsigned long iSeries_process_mainstore_ unsigned long i; unsigned long mem_blocks = 0; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) mem_blocks = iSeries_process_Regatta_mainstore_vpd(mb_array, max_entries); else diff -puN arch/ppc64/kernel/idle.c~cpu-has-feature arch/ppc64/kernel/idle.c diff -puN arch/ppc64/kernel/process.c~cpu-has-feature arch/ppc64/kernel/process.c --- linux-2.5/arch/ppc64/kernel/process.c~cpu-has-feature 2005-02-05 11:11:03.600598872 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/process.c 2005-02-05 11:11:04.119519984 -0600 @@ -388,12 +388,12 @@ copy_thread(int nr, unsigned long clone_ kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; p->thread.ksp = sp; - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (cpu_has_feature(CPU_FTR_SLB)) { unsigned long sp_vsid = get_kernel_vsid(sp); sp_vsid <<= SLB_VSID_SHIFT; sp_vsid |= SLB_VSID_KERNEL; - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) sp_vsid |= SLB_VSID_L; p->thread.ksp_vsid = sp_vsid; diff -puN arch/ppc64/kernel/smp.c~cpu-has-feature arch/ppc64/kernel/smp.c --- linux-2.5/arch/ppc64/kernel/smp.c~cpu-has-feature 2005-02-05 11:11:03.606597960 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/smp.c 2005-02-05 11:11:04.120519832 -0600 @@ -416,7 +416,7 @@ int __devinit __cpu_up(unsigned int cpu) paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_SLB)) { + if (!cpu_has_feature(CPU_FTR_SLB)) { void *tmp; /* maximum of 48 CPUs on machines with a segment table */ diff -puN arch/ppc64/kernel/sysfs.c~cpu-has-feature arch/ppc64/kernel/sysfs.c --- linux-2.5/arch/ppc64/kernel/sysfs.c~cpu-has-feature 2005-02-05 11:11:03.609597504 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/sysfs.c 2005-02-05 11:11:04.121519680 -0600 @@ -63,7 +63,7 @@ static int __init smt_setup(void) unsigned int *val; unsigned int cpu; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!cpu_has_feature(CPU_FTR_SMT)) return 1; options = find_path_device("/options"); @@ -86,7 +86,7 @@ static int __init setup_smt_snooze_delay unsigned int cpu; int snooze; - if (!cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (!cpu_has_feature(CPU_FTR_SMT)) return 1; smt_snooze_cmdline = 1; @@ -167,7 +167,7 @@ void ppc64_enable_pmcs(void) * On SMT machines we have to set the run latch in the ctrl register * in order to make PMC6 spin. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) { + if (cpu_has_feature(CPU_FTR_SMT)) { ctrl = mfspr(CTRLF); ctrl |= RUNLATCH; mtspr(CTRLT, ctrl); @@ -266,7 +266,7 @@ static void register_cpu_online(unsigned struct sys_device *s = &c->sysdev; #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_create_file(s, &attr_smt_snooze_delay); #endif @@ -275,7 +275,7 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_mmcr0); sysdev_create_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) sysdev_create_file(s, &attr_mmcra); sysdev_create_file(s, &attr_pmc1); @@ -285,12 +285,12 @@ static void register_cpu_online(unsigned sysdev_create_file(s, &attr_pmc5); sysdev_create_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (cpu_has_feature(CPU_FTR_PMC8)) { sysdev_create_file(s, &attr_pmc7); sysdev_create_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_create_file(s, &attr_purr); } @@ -303,7 +303,7 @@ static void unregister_cpu_online(unsign BUG_ON(c->no_control); #ifndef CONFIG_PPC_ISERIES - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_remove_file(s, &attr_smt_snooze_delay); #endif @@ -312,7 +312,7 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_mmcr0); sysdev_remove_file(s, &attr_mmcr1); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) sysdev_remove_file(s, &attr_mmcra); sysdev_remove_file(s, &attr_pmc1); @@ -322,12 +322,12 @@ static void unregister_cpu_online(unsign sysdev_remove_file(s, &attr_pmc5); sysdev_remove_file(s, &attr_pmc6); - if (cur_cpu_spec->cpu_features & CPU_FTR_PMC8) { + if (cpu_has_feature(CPU_FTR_PMC8)) { sysdev_remove_file(s, &attr_pmc7); sysdev_remove_file(s, &attr_pmc8); } - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) sysdev_remove_file(s, &attr_purr); } #endif /* CONFIG_HOTPLUG_CPU */ diff -puN arch/ppc64/mm/hash_native.c~cpu-has-feature arch/ppc64/mm/hash_native.c --- linux-2.5/arch/ppc64/mm/hash_native.c~cpu-has-feature 2005-02-05 11:11:03.653590816 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_native.c 2005-02-05 11:11:04.122519528 -0600 @@ -217,10 +217,10 @@ static long native_hpte_updatepp(unsigne } /* Ensure it is out of the tlb too */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { tlbiel(va); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); @@ -245,7 +245,7 @@ static void native_hpte_updateboltedpp(u unsigned long vsid, va, vpn, flags = 0; long slot; HPTE *hptep; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); vsid = get_kernel_vsid(ea); va = (vsid << 28) | (ea & 0x0fffffff); @@ -273,7 +273,7 @@ static void native_hpte_invalidate(unsig Hpte_dword0 dw0; unsigned long avpn = va >> 23; unsigned long flags; - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (large) avpn &= ~0x1UL; @@ -292,7 +292,7 @@ static void native_hpte_invalidate(unsig } /* Invalidate the tlb */ - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { tlbiel(va); } else { if (lock_tlbie) @@ -360,7 +360,7 @@ static void native_flush_hash_range(unsi j++; } - if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { + if (cpu_has_feature(CPU_FTR_TLBIEL) && !large && local) { asm volatile("ptesync":::"memory"); for (i = 0; i < j; i++) @@ -368,7 +368,7 @@ static void native_flush_hash_range(unsi asm volatile("ptesync":::"memory"); } else { - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock(&native_tlbie_lock); diff -puN arch/ppc64/mm/hash_utils.c~cpu-has-feature arch/ppc64/mm/hash_utils.c --- linux-2.5/arch/ppc64/mm/hash_utils.c~cpu-has-feature 2005-02-05 11:11:03.656590360 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hash_utils.c 2005-02-05 11:11:04.123519376 -0600 @@ -190,7 +190,7 @@ void __init htab_initialize(void) * _NOT_ map it to avoid cache paradoxes as it's remapped non * cacheable later on */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) use_largepages = 1; /* create bolted the linear mapping in the hash table */ diff -puN arch/ppc64/mm/hugetlbpage.c~cpu-has-feature arch/ppc64/mm/hugetlbpage.c --- linux-2.5/arch/ppc64/mm/hugetlbpage.c~cpu-has-feature 2005-02-05 11:11:03.674587624 -0600 +++ linux-2.5-olof/arch/ppc64/mm/hugetlbpage.c 2005-02-05 11:11:04.123519376 -0600 @@ -705,7 +705,7 @@ unsigned long hugetlb_get_unmapped_area( if (len & ~HPAGE_MASK) return -EINVAL; - if (!(cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE)) + if (!cpu_has_feature(CPU_FTR_16M_PAGE)) return -EINVAL; if (test_thread_flag(TIF_32BIT)) { diff -puN arch/ppc64/mm/init.c~cpu-has-feature arch/ppc64/mm/init.c --- linux-2.5/arch/ppc64/mm/init.c~cpu-has-feature 2005-02-05 11:11:03.680586712 -0600 +++ linux-2.5-olof/arch/ppc64/mm/init.c 2005-02-05 11:11:04.124519224 -0600 @@ -752,7 +752,7 @@ void __init mem_init(void) */ void flush_dcache_page(struct page *page) { - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ if (test_bit(PG_arch_1, &page->flags)) @@ -763,7 +763,7 @@ void clear_user_page(void *page, unsigne { clear_page(page); - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* * We shouldnt have to do this, but some versions of glibc @@ -796,7 +796,7 @@ void copy_user_page(void *vto, void *vfr return; #endif - if (cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) + if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) return; /* avoid an atomic op if possible */ @@ -832,8 +832,8 @@ void update_mmu_cache(struct vm_area_str unsigned long flags; /* handle i-cache coherency */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE) && - !(cur_cpu_spec->cpu_features & CPU_FTR_NOEXECUTE)) { + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE) && + !cpu_has_feature(CPU_FTR_NOEXECUTE)) { unsigned long pfn = pte_pfn(pte); if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); diff -puN arch/ppc64/mm/slb.c~cpu-has-feature arch/ppc64/mm/slb.c --- linux-2.5/arch/ppc64/mm/slb.c~cpu-has-feature 2005-02-05 11:11:03.683586256 -0600 +++ linux-2.5-olof/arch/ppc64/mm/slb.c 2005-02-05 11:11:04.125519072 -0600 @@ -51,7 +51,7 @@ static void slb_flush_and_rebolt(void) WARN_ON(!irqs_disabled()); - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) ksp_flags |= SLB_VSID_L; ksp_esid_data = mk_esid_data(get_paca()->kstack, 2); @@ -139,7 +139,7 @@ void slb_initialize(void) unsigned long flags = SLB_VSID_KERNEL; /* Invalidate the entire SLB (even slot 0) & all the ERATS */ - if (cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) + if (cpu_has_feature(CPU_FTR_16M_PAGE)) flags |= SLB_VSID_L; asm volatile("isync":::"memory"); diff -puN arch/ppc64/mm/stab.c~cpu-has-feature arch/ppc64/mm/stab.c --- linux-2.5/arch/ppc64/mm/stab.c~cpu-has-feature 2005-02-05 11:11:03.704583064 -0600 +++ linux-2.5-olof/arch/ppc64/mm/stab.c 2005-02-05 11:11:04.125519072 -0600 @@ -227,7 +227,7 @@ void stab_initialize(unsigned long stab) { unsigned long vsid = get_kernel_vsid(KERNELBASE); - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) { + if (cpu_has_feature(CPU_FTR_SLB)) { slb_initialize(); } else { asm volatile("isync; slbia; isync":::"memory"); diff -puN arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature arch/ppc64/oprofile/op_model_power4.c --- linux-2.5/arch/ppc64/oprofile/op_model_power4.c~cpu-has-feature 2005-02-05 11:11:03.764573944 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_power4.c 2005-02-05 11:11:04.126518920 -0600 @@ -54,7 +54,7 @@ static void power4_reg_setup(struct op_c * * It has been verified to work on POWER5 so we enable it there. */ - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA_SIHV) + if (cpu_has_feature(CPU_FTR_MMCRA_SIHV)) mmcra_has_sihv = 1; /* diff -puN arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature arch/ppc64/oprofile/op_model_rs64.c --- linux-2.5/arch/ppc64/oprofile/op_model_rs64.c~cpu-has-feature 2005-02-05 11:11:03.768573336 -0600 +++ linux-2.5-olof/arch/ppc64/oprofile/op_model_rs64.c 2005-02-05 11:11:04.126518920 -0600 @@ -114,7 +114,7 @@ static void rs64_cpu_setup(void *unused) /* reset MMCR1, MMCRA */ mtspr(SPRN_MMCR1, 0); - if (cur_cpu_spec->cpu_features & CPU_FTR_MMCRA) + if (cpu_has_feature(CPU_FTR_MMCRA)) mtspr(SPRN_MMCRA, 0); mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; diff -puN arch/ppc64/xmon/xmon.c~cpu-has-feature arch/ppc64/xmon/xmon.c --- linux-2.5/arch/ppc64/xmon/xmon.c~cpu-has-feature 2005-02-05 11:11:03.814566344 -0600 +++ linux-2.5-olof/arch/ppc64/xmon/xmon.c 2005-02-05 11:11:04.128518616 -0600 @@ -723,7 +723,7 @@ static void insert_cpu_bpts(void) { if (dabr.enabled) set_controlled_dabr(dabr.address | (dabr.enabled & 7)); - if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (iabr && cpu_has_feature(CPU_FTR_IABR)) set_iabr(iabr->address | (iabr->enabled & (BP_IABR|BP_IABR_TE))); } @@ -751,7 +751,7 @@ static void remove_bpts(void) static void remove_cpu_bpts(void) { set_controlled_dabr(0); - if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR)) + if (cpu_has_feature(CPU_FTR_IABR)) set_iabr(0); } @@ -1098,7 +1098,7 @@ bpt_cmds(void) break; case 'i': /* bi - hardware instr breakpoint */ - if (!(cur_cpu_spec->cpu_features & CPU_FTR_IABR)) { + if (!cpu_has_feature(CPU_FTR_IABR)) { printf("Hardware instruction breakpoint " "not supported on this cpu\n"); break; @@ -2496,7 +2496,7 @@ void xmon_init(void) void dump_segments(void) { - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) dump_slb(); else dump_stab(); diff -puN include/asm-ppc64/cacheflush.h~cpu-has-feature include/asm-ppc64/cacheflush.h --- linux-2.5/include/asm-ppc64/cacheflush.h~cpu-has-feature 2005-02-05 11:11:03.836563000 -0600 +++ linux-2.5-olof/include/asm-ppc64/cacheflush.h 2005-02-05 11:11:04.129518464 -0600 @@ -40,7 +40,7 @@ extern void __flush_dcache_icache(void * static inline void flush_icache_range(unsigned long start, unsigned long stop) { - if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE)) + if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) __flush_icache_range(start, stop); } diff -puN include/asm-ppc64/mmu_context.h~cpu-has-feature include/asm-ppc64/mmu_context.h --- linux-2.5/include/asm-ppc64/mmu_context.h~cpu-has-feature 2005-02-05 11:11:03.841562240 -0600 +++ linux-2.5-olof/include/asm-ppc64/mmu_context.h 2005-02-05 11:11:04.129518464 -0600 @@ -59,11 +59,11 @@ static inline void switch_mm(struct mm_s return; #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) asm volatile ("dssall"); #endif /* CONFIG_ALTIVEC */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SLB) + if (cpu_has_feature(CPU_FTR_SLB)) switch_slb(tsk, next); else switch_stab(tsk, next); diff -puN include/asm-ppc64/page.h~cpu-has-feature include/asm-ppc64/page.h --- linux-2.5/include/asm-ppc64/page.h~cpu-has-feature 2005-02-05 11:11:03.845561632 -0600 +++ linux-2.5-olof/include/asm-ppc64/page.h 2005-02-05 11:11:04.130518312 -0600 @@ -67,7 +67,7 @@ #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #define in_hugepage_area(context, addr) \ - ((cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) && \ + (cpu_has_feature(CPU_FTR_16M_PAGE) && \ ( (((addr) >= TASK_HPAGE_BASE) && ((addr) < TASK_HPAGE_END)) || \ ( ((addr) < 0x100000000L) && \ ((1 << GET_ESID(addr)) & (context).htlb_segs) ) ) ) diff -puN arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature arch/ppc64/kernel/pSeries_lpar.c --- linux-2.5/arch/ppc64/kernel/pSeries_lpar.c~cpu-has-feature 2005-02-05 11:11:03.848561176 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/pSeries_lpar.c 2005-02-05 11:11:04.130518312 -0600 @@ -505,7 +505,7 @@ void pSeries_lpar_flush_hash_range(unsig int i; unsigned long flags = 0; struct ppc64_tlb_batch *batch = &__get_cpu_var(ppc64_tlb_batch); - int lock_tlbie = !(cur_cpu_spec->cpu_features & CPU_FTR_LOCKLESS_TLBIE); + int lock_tlbie = !cpu_has_feature(CPU_FTR_LOCKLESS_TLBIE); if (lock_tlbie) spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags); diff -puN arch/ppc64/kernel/setup.c~cpu-has-feature arch/ppc64/kernel/setup.c --- linux-2.5/arch/ppc64/kernel/setup.c~cpu-has-feature 2005-02-05 11:11:03.853560416 -0600 +++ linux-2.5-olof/arch/ppc64/kernel/setup.c 2005-02-05 11:11:04.132518008 -0600 @@ -315,7 +315,7 @@ static void __init setup_cpu_maps(void) maxcpus = ireg[num_addr_cell + num_size_cell]; /* Double maxcpus for processors which have SMT capability */ - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) maxcpus *= 2; if (maxcpus > NR_CPUS) { @@ -339,7 +339,7 @@ static void __init setup_cpu_maps(void) */ for_each_cpu(cpu) { cpu_set(cpu, cpu_sibling_map[cpu]); - if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + if (cpu_has_feature(CPU_FTR_SMT)) cpu_set(cpu ^ 0x1, cpu_sibling_map[cpu]); } @@ -767,7 +767,7 @@ static int show_cpuinfo(struct seq_file seq_printf(m, "unknown (%08x)", pvr); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) seq_printf(m, ", altivec supported"); #endif /* CONFIG_ALTIVEC */ diff -puN drivers/macintosh/via-pmu.c~cpu-has-feature drivers/macintosh/via-pmu.c --- linux-2.5/drivers/macintosh/via-pmu.c~cpu-has-feature 2005-02-05 11:11:03.895554032 -0600 +++ linux-2.5-olof/drivers/macintosh/via-pmu.c 2005-02-05 11:11:04.134517704 -0600 @@ -2389,7 +2389,7 @@ pmac_suspend_devices(void) enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN include/asm-ppc/cputable.h~cpu-has-feature include/asm-ppc/cputable.h --- linux-2.5/include/asm-ppc/cputable.h~cpu-has-feature 2005-02-05 11:11:03.919550384 -0600 +++ linux-2.5-olof/include/asm-ppc/cputable.h 2005-02-05 21:14:25.443082280 -0600 @@ -61,6 +61,11 @@ struct cpu_spec { extern struct cpu_spec cpu_specs[]; extern struct cpu_spec *cur_cpu_spec[]; +static inline unsigned int cpu_has_feature(unsigned int feature) +{ + return cur_cpu_spec[0]->cpu_features & feature; +} + #endif /* __ASSEMBLY__ */ /* CPU kernel features */ diff -puN arch/ppc/mm/ppc_mmu.c~cpu-has-feature arch/ppc/mm/ppc_mmu.c --- linux-2.5/arch/ppc/mm/ppc_mmu.c~cpu-has-feature 2005-02-05 11:11:03.976541720 -0600 +++ linux-2.5-olof/arch/ppc/mm/ppc_mmu.c 2005-02-05 11:11:04.136517400 -0600 @@ -138,7 +138,7 @@ void __init setbat(int index, unsigned l union ubat *bat = BATS[index]; if (((flags & _PAGE_NO_CACHE) == 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_NEED_COHERENT)) + cpu_has_feature(CPU_FTR_NEED_COHERENT)) flags |= _PAGE_COHERENT; bl = (size >> 17) - 1; @@ -191,7 +191,7 @@ void __init MMU_init_hw(void) extern unsigned int hash_page[]; extern unsigned int flush_hash_patch_A[], flush_hash_patch_B[]; - if ((cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE) == 0) { + if (!cpu_has_feature(CPU_FTR_HPTE_TABLE)) { /* * Put a blr (procedure return) instruction at the * start of hash_page, since we can still get DSI diff -puN arch/ppc/mm/mmu_decl.h~cpu-has-feature arch/ppc/mm/mmu_decl.h --- linux-2.5/arch/ppc/mm/mmu_decl.h~cpu-has-feature 2005-02-05 11:11:03.979541264 -0600 +++ linux-2.5-olof/arch/ppc/mm/mmu_decl.h 2005-02-05 11:11:04.136517400 -0600 @@ -75,7 +75,7 @@ static inline void flush_HPTE(unsigned c unsigned long pdval) { if ((Hash != 0) && - (cur_cpu_spec[0]->cpu_features & CPU_FTR_HPTE_TABLE)) + cpu_has_feature(CPU_FTR_HPTE_TABLE)) flush_hash_pages(0, va, pdval, 1); else _tlbie(va); diff -puN arch/ppc/kernel/setup.c~cpu-has-feature arch/ppc/kernel/setup.c --- linux-2.5/arch/ppc/kernel/setup.c~cpu-has-feature 2005-02-05 11:11:04.018535336 -0600 +++ linux-2.5-olof/arch/ppc/kernel/setup.c 2005-02-05 11:11:04.137517248 -0600 @@ -619,7 +619,7 @@ machine_init(unsigned long r3, unsigned /* Checks "l2cr=xxxx" command-line option */ int __init ppc_setup_l2cr(char *str) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (cpu_has_feature(CPU_FTR_L2CR)) { unsigned long val = simple_strtoul(str, NULL, 0); printk(KERN_INFO "l2cr set to %lx\n", val); _set_L2CR(0); /* force invalidate by disable cache */ @@ -720,7 +720,7 @@ void __init setup_arch(char **cmdline_p) * Systems with OF can look in the properties on the cpu node(s) * for a possibly more accurate value. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPLIT_ID_CACHE) { + if (cpu_has_feature(CPU_FTR_SPLIT_ID_CACHE)) { dcache_bsize = cur_cpu_spec[0]->dcache_bsize; icache_bsize = cur_cpu_spec[0]->icache_bsize; ucache_bsize = 0; diff -puN arch/ppc/kernel/temp.c~cpu-has-feature arch/ppc/kernel/temp.c --- linux-2.5/arch/ppc/kernel/temp.c~cpu-has-feature 2005-02-05 11:11:04.024534424 -0600 +++ linux-2.5-olof/arch/ppc/kernel/temp.c 2005-02-05 11:11:04.137517248 -0600 @@ -223,7 +223,7 @@ int __init TAU_init(void) /* We assume in SMP that if one CPU has TAU support, they * all have it --BenH */ - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_TAU)) { + if (!cpu_has_feature(CPU_FTR_TAU)) { printk("Thermal assist unit not available\n"); tau_initialized = 0; return 1; diff -puN arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature arch/ppc/platforms/pmac_cpufreq.c --- linux-2.5/arch/ppc/platforms/pmac_cpufreq.c~cpu-has-feature 2005-02-05 11:11:04.064528344 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_cpufreq.c 2005-02-05 11:11:04.138517096 -0600 @@ -230,7 +230,7 @@ static int __pmac pmu_set_cpu_speed(int enable_kernel_fp(); #ifdef CONFIG_ALTIVEC - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC) + if (cpu_has_feature(CPU_FTR_ALTIVEC)) enable_kernel_altivec(); #endif /* CONFIG_ALTIVEC */ diff -puN arch/ppc/platforms/pmac_setup.c~cpu-has-feature arch/ppc/platforms/pmac_setup.c --- linux-2.5/arch/ppc/platforms/pmac_setup.c~cpu-has-feature 2005-02-05 11:11:04.068527736 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_setup.c 2005-02-05 11:11:04.139516944 -0600 @@ -274,7 +274,7 @@ pmac_setup_arch(void) pmac_find_bridges(); /* Checks "l2cr-value" property in the registry */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) { + if (cpu_has_feature(CPU_FTR_L2CR)) { struct device_node *np = find_devices("cpus"); if (np == 0) np = find_type_devices("cpu"); diff -puN arch/ppc/platforms/pmac_smp.c~cpu-has-feature arch/ppc/platforms/pmac_smp.c --- linux-2.5/arch/ppc/platforms/pmac_smp.c~cpu-has-feature 2005-02-05 11:11:04.071527280 -0600 +++ linux-2.5-olof/arch/ppc/platforms/pmac_smp.c 2005-02-05 11:11:04.139516944 -0600 @@ -119,7 +119,7 @@ static volatile int sec_tb_reset = 0; static void __init core99_init_caches(int cpu) { - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!cpu_has_feature(CPU_FTR_L2CR)) return; if (cpu == 0) { @@ -132,7 +132,7 @@ static void __init core99_init_caches(in printk("CPU%d: L2CR set to %lx\n", cpu, core99_l2_cache); } - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR)) + if (!cpu_has_feature(CPU_FTR_L3CR)) return; if (cpu == 0){ diff -puN arch/ppc/platforms/sandpoint.c~cpu-has-feature arch/ppc/platforms/sandpoint.c --- linux-2.5/arch/ppc/platforms/sandpoint.c~cpu-has-feature 2005-02-05 11:11:04.074526824 -0600 +++ linux-2.5-olof/arch/ppc/platforms/sandpoint.c 2005-02-05 11:11:04.140516792 -0600 @@ -319,10 +319,10 @@ sandpoint_setup_arch(void) * We will do this now with good known values. Future versions * of DINK32 are supposed to get this correct. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_SPEC7450) + if (cpu_has_feature(CPU_FTR_SPEC7450)) /* 745x is different. We only want to pass along enable. */ _set_L2CR(L2CR_L2E); - else if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR) + else if (cpu_has_feature(CPU_FTR_L2CR)) /* All modules have 1MB of L2. We also assume that an * L2 divisor of 3 will work. */ @@ -330,7 +330,7 @@ sandpoint_setup_arch(void) | L2CR_L2RAM_PIPE | L2CR_L2OH_1_0 | L2CR_L2DF); #if 0 /* Untested right now. */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_L3CR) { + if (cpu_has_feature(CPU_FTR_L3CR)) { /* Magic value. */ _set_L3CR(0x8f032000); } diff -puN arch/ppc/kernel/ppc_htab.c~cpu-has-feature arch/ppc/kernel/ppc_htab.c --- linux-2.5/arch/ppc/kernel/ppc_htab.c~cpu-has-feature 2005-02-05 11:11:04.077526368 -0600 +++ linux-2.5-olof/arch/ppc/kernel/ppc_htab.c 2005-02-05 11:11:04.141516640 -0600 @@ -108,7 +108,7 @@ static int ppc_htab_show(struct seq_file PTE *ptr; #endif /* CONFIG_PPC_STD_MMU */ - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (cpu_has_feature(CPU_FTR_604_PERF_MON)) { mmcr0 = mfspr(SPRN_MMCR0); pmc1 = mfspr(SPRN_PMC1); pmc2 = mfspr(SPRN_PMC2); @@ -209,7 +209,7 @@ static ssize_t ppc_htab_write(struct fil if ( !strncmp( buffer, "reset", 5) ) { - if (cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON) { + if (cpu_has_feature(CPU_FTR_604_PERF_MON)) { /* reset PMC1 and PMC2 */ mtspr(SPRN_PMC1, 0); mtspr(SPRN_PMC2, 0); @@ -221,7 +221,7 @@ static ssize_t ppc_htab_write(struct fil } /* Everything below here requires the performance monitor feature. */ - if ( !cur_cpu_spec[0]->cpu_features & CPU_FTR_604_PERF_MON ) + if (!cpu_has_feature(CPU_FTR_604_PERF_MON)) return count; /* turn off performance monitoring */ @@ -339,7 +339,7 @@ int proc_dol2crvec(ctl_table *table, int "0.5", "1.0", "(reserved2)", "(reserved3)" }; - if (!(cur_cpu_spec[0]->cpu_features & CPU_FTR_L2CR)) + if (!cpu_has_feature(CPU_FTR_L2CR)) return -EFAULT; if ( /*!table->maxlen ||*/ (*ppos && !write)) { diff -puN drivers/md/raid6altivec.uc~cpu-has-feature drivers/md/raid6altivec.uc --- linux-2.5/drivers/md/raid6altivec.uc~cpu-has-feature 2005-02-05 11:11:04.081525760 -0600 +++ linux-2.5-olof/drivers/md/raid6altivec.uc 2005-02-05 11:11:05.007385008 -0600 @@ -108,7 +108,7 @@ int raid6_have_altivec(void); int raid6_have_altivec(void) { /* This assumes either all CPUs have Altivec or none does */ - return cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC; + return cpu_has_feature(CPU_FTR_ALTIVEC): } #endif _ From miltonm at bga.com Sun Feb 6 19:10:55 2005 From: miltonm at bga.com (Milton Miller) Date: Sun, 6 Feb 2005 02:10:55 -0600 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 In-Reply-To: <1107151447.5712.81.camel@gaston> References: <1107151447.5712.81.camel@gaston> Message-ID: <5fcc65f08b15fa6d1d69f75d7c4dc989@bga.com> Benjamin Herrenschmidt wrote: > Index: linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S > 2005-01-31 16:25:56.000000000 +1100 > @@ -0,0 +1,12 @@ > +#include > + > + .section ".data" > + > + .globl vdso32_start, vdso32_end > + .balign 4096 > +vdso32_start: > + .incbin "arch/ppc64/kernel/vdso32/vdso32.so" > + .balign 4096 > +vdso32_end: > + > + .previous > Index: linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S > 2005-01-31 16:25:56.000000000 +1100 > @@ -0,0 +1,12 @@ > +#include > + > + .section ".data" > + > + .globl vdso64_start, vdso64_end > + .balign 4096 > +vdso64_start: > + .incbin "arch/ppc64/kernel/vdso64/vdso64.so" > + .balign 4096 > +vdso64_end: > + > + .previous How about putting these with the other page_aligned data, with .section ".data.page_aligned" Also, I don't see anything from being used, although including to get PAGE_SIZE would elimintate the magic 4096 number. milton From benh at kernel.crashing.org Sun Feb 6 21:30:06 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 06 Feb 2005 21:30:06 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #2 In-Reply-To: <5fcc65f08b15fa6d1d69f75d7c4dc989@bga.com> References: <1107151447.5712.81.camel@gaston> <5fcc65f08b15fa6d1d69f75d7c4dc989@bga.com> Message-ID: <1107685806.30303.50.camel@gaston> On Sun, 2005-02-06 at 02:10 -0600, Milton Miller wrote: > How about putting these with the other page_aligned data, with > .section ".data.page_aligned" > > Also, I don't see anything from being used, although > including to get PAGE_SIZE would elimintate the magic > 4096 number. Good point. The wrapper is indeed normally linked with the rest of the kernel. This was quick and dirty stuff a bit overlooked, I'll fix it, thanks for noticing. Ben. From arnd at arndb.de Sun Feb 6 22:57:34 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sun, 6 Feb 2005 12:57:34 +0100 Subject: [PATCH] PPC/PPC64: Abstract cpu_feature checks. In-Reply-To: <20050206032645.GA18845@austin.ibm.com> References: <20050204072254.GA17565@austin.ibm.com> <20050205184647.GA17417@austin.ibm.com> <20050206032645.GA18845@austin.ibm.com> Message-ID: <200502061257.40798.arnd@arndb.de> On S?nndag 06 Februar 2005 04:26, Olof Johansson wrote: > > Abstract most manual mask checks of cpu_features with cpu_has_feature() > ? Just to get back to the point of consistant naming: In case we do the other proposed changes as well, is everyone happy with the following function names? cpu_has_feature(CPU_FTR_X) cur_cpu_spec->cpu_features & CPU_FTR_X cpu_feature_possible(CPU_FTR_X) CPU_FTR_POSSIBLE_MASK & CPU_FTR_X fw_has_feature(FW_FEATURE_X) cur_cpu_spec->fw_features & FW_FTR_X platform_is(PLATFORM_X) systemcfg->platform == PLATFORM_X platform_possible(PLATFORM_X) PLATFORM_POSSIBLE_MASK & PLATFORM_X platform_compatible(PLATFORM_X) systemcfg->platform & PLATFORM_X It's not as consistant as I'd like it to be, but it's the best I could come up with. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050206/4abc4242/attachment.pgp From olh at suse.de Mon Feb 7 02:26:12 2005 From: olh at suse.de (Olaf Hering) Date: Sun, 6 Feb 2005 16:26:12 +0100 Subject: [PATCH] remove unneeded includes from pSeries_nvram.c Message-ID: <20050206152612.GA13354@suse.de> The pseries nvram driver started probably as a copy of nvram.c. These includes are not needed to build it. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_nvram.c ./arch/ppc64/kernel/pSeries_nvram.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_nvram.c 2005-02-03 02:56:33.000000000 +0100 +++ ./arch/ppc64/kernel/pSeries_nvram.c 2005-02-06 14:57:23.000000000 +0100 @@ -11,14 +11,9 @@ * This perhaps should live in drivers/char */ -#include #include #include -#include -#include -#include -#include #include #include #include From olh at suse.de Mon Feb 7 02:28:11 2005 From: olh at suse.de (Olaf Hering) Date: Sun, 6 Feb 2005 16:28:11 +0100 Subject: [PATCH] hide plpar_hcall_norets call in xmon Message-ID: <20050206152811.GB13354@suse.de> plpar_hcall_norets() is only availbe if pseries is selected in .config. The maple defconfig doesnt build right now because it has xmon enabled. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/xmon/xmon.c ./arch/ppc64/xmon/xmon.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/xmon/xmon.c 2005-02-03 02:56:35.000000000 +0100 +++ ./arch/ppc64/xmon/xmon.c 2005-02-06 15:44:50.643489636 +0100 @@ -629,9 +629,11 @@ int xmon_fault_handler(struct pt_regs *r static void set_controlled_dabr(unsigned long val) { if (systemcfg->platform == PLATFORM_PSERIES_LPAR) { +#ifdef CONFIG_PPC_PSERIES int rc = plpar_hcall_norets(H_SET_DABR, val); if (rc != H_Success) xmon_printf("Warning: setting DABR failed (%d)\n", rc); +#endif } else set_dabr(val); } From anton at samba.org Mon Feb 7 14:44:18 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 7 Feb 2005 14:44:18 +1100 Subject: Fix pseries hcall functions Message-ID: <20050207034418.GD5567@krispykreme.ozlabs.ibm.com> Hi, I had a look over our hcall functions and they are a bit of a mess. plpar_hcall_4out was corrupting r14 and they all did their own thing instead of following the ABI. How does this look? We no longer create a stack frame and save the required volatiles away in the parameter save area the caller set up. I also consolidated the duplicate HVSC definitions. Anton diff -puN arch/ppc64/kernel/pSeries_hvCall.S~fix_pseries_hcalls arch/ppc64/kernel/pSeries_hvCall.S --- foobar2/arch/ppc64/kernel/pSeries_hvCall.S~fix_pseries_hcalls 2005-01-30 13:51:35.800353154 +1100 +++ foobar2-anton/arch/ppc64/kernel/pSeries_hvCall.S 2005-01-30 14:23:02.535533536 +1100 @@ -1,7 +1,6 @@ /* * arch/ppc64/kernel/pSeries_hvCall.S * - * * This file contains the generic code to perform a call to the * pSeries LPAR hypervisor. * NOTE: this file will go away when we move to inline this work. @@ -11,133 +10,114 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ -#include -#include -#include +#include #include -#include -#include #include -/* - * hcall interface to pSeries LPAR - */ -#define HVSC .long 0x44000022 - -/* long plpar_hcall(unsigned long opcode, R3 - unsigned long arg1, R4 - unsigned long arg2, R5 - unsigned long arg3, R6 - unsigned long arg4, R7 - unsigned long *out1, R8 - unsigned long *out2, R9 - unsigned long *out3); R10 - */ +#define STK_PARM(i) (48 + ((i)-3)*8) .text + +/* long plpar_hcall(unsigned long opcode, R3 + unsigned long arg1, R4 + unsigned long arg2, R5 + unsigned long arg3, R6 + unsigned long arg4, R7 + unsigned long *out1, R8 + unsigned long *out2, R9 + unsigned long *out3); R10 + */ _GLOBAL(plpar_hcall) mfcr r0 - std r0,-8(r1) - stdu r1,-32(r1) - std r8,-8(r1) /* Save out ptrs. */ - std r9,-16(r1) - std r10,-24(r1) - - HVSC /* invoke the hypervisor */ + std r8,STK_PARM(r8)(r1) /* Save out ptrs */ + std r9,STK_PARM(r9)(r1) + std r10,STK_PARM(r10)(r1) + + stw r0,8(r1) + + HVSC /* invoke the hypervisor */ + + lwz r0,8(r1) + + ld r8,STK_PARM(r8)(r1) /* Fetch r4-r6 ret args */ + ld r9,STK_PARM(r9)(r1) + ld r10,STK_PARM(r10)(r1) + std r4,0(r8) + std r5,0(r9) + std r6,0(r10) - ld r10,-8(r1) /* Fetch r4-r7 ret args. */ - std r4,0(r10) - ld r10,-16(r1) - std r5,0(r10) - ld r10,-24(r1) - std r6,0(r10) - - ld r1,0(r1) - ld r0,-8(r1) mtcrf 0xff,r0 - blr /* return r3 = status */ + blr /* return r3 = status */ /* Simple interface with no output values (other than status) */ _GLOBAL(plpar_hcall_norets) mfcr r0 - std r0,-8(r1) - HVSC /* invoke the hypervisor */ - ld r0,-8(r1) - mtcrf 0xff,r0 - blr /* return r3 = status */ + stw r0,8(r1) + HVSC /* invoke the hypervisor */ -/* long plpar_hcall_8arg_2ret(unsigned long opcode, R3 - unsigned long arg1, R4 - unsigned long arg2, R5 - unsigned long arg3, R6 - unsigned long arg4, R7 - unsigned long arg5, R8 - unsigned long arg6, R9 - unsigned long arg7, R10 - unsigned long arg8, 112(R1) - unsigned long *out1); 120(R1) + lwz r0,8(r1) + mtcrf 0xff,r0 + blr /* return r3 = status */ - */ - .text +/* long plpar_hcall_8arg_2ret(unsigned long opcode, R3 + unsigned long arg1, R4 + unsigned long arg2, R5 + unsigned long arg3, R6 + unsigned long arg4, R7 + unsigned long arg5, R8 + unsigned long arg6, R9 + unsigned long arg7, R10 + unsigned long arg8, 112(R1) + unsigned long *out1); 120(R1) + */ _GLOBAL(plpar_hcall_8arg_2ret) mfcr r0 + ld r11,STK_PARM(r11)(r1) /* put arg8 in R11 */ + stw r0,8(r1) - ld r11, 112(r1) /* put arg8 and out1 in R11 and R12 */ - ld r12, 120(r1) - - std r0,-8(r1) - stdu r1,-32(r1) + HVSC /* invoke the hypervisor */ - std r12,-8(r1) /* Save out ptr */ - - HVSC /* invoke the hypervisor */ - - ld r10,-8(r1) /* Fetch r4 ret arg */ - std r4,0(r10) - - ld r1,0(r1) - ld r0,-8(r1) + lwz r0,8(r1) + ld r10,STK_PARM(r12)(r1) /* Fetch r4 ret arg */ + std r4,0(r10) mtcrf 0xff,r0 - blr /* return r3 = status */ + blr /* return r3 = status */ -/* long plpar_hcall_4out(unsigned long opcode, R3 - unsigned long arg1, R4 - unsigned long arg2, R5 - unsigned long arg3, R6 - unsigned long arg4, R7 - unsigned long *out1, (r4) R8 - unsigned long *out2, (r5) R9 - unsigned long *out3, (r6) R10 - unsigned long *out4); (r7) 112(R1). From Parameter save area. +/* long plpar_hcall_4out(unsigned long opcode, R3 + unsigned long arg1, R4 + unsigned long arg2, R5 + unsigned long arg3, R6 + unsigned long arg4, R7 + unsigned long *out1, R8 + unsigned long *out2, R9 + unsigned long *out3, R10 + unsigned long *out4); 112(R1) */ _GLOBAL(plpar_hcall_4out) mfcr r0 - std r0,-8(r1) - ld r14,112(r1) - stdu r1,-48(r1) - - std r8,32(r1) /* Save out ptrs. */ - std r9,24(r1) - std r10,16(r1) - std r14,8(r1) - - HVSC /* invoke the hypervisor */ + std r0,8(r1) - ld r14,32(r1) /* Fetch r4-r7 ret args. */ - std r4,0(r14) - ld r14,24(r1) - std r5,0(r14) - ld r14,16(r1) - std r6,0(r14) - ld r14,8(r1) - std r7,0(r14) + std r8,STK_PARM(r8)(r1) /* Save out ptrs */ + std r9,STK_PARM(r9)(r1) + std r10,STK_PARM(r10)(r1) + + HVSC /* invoke the hypervisor */ + + lwz r0,8(r1) + + ld r8,STK_PARM(r8)(r1) /* Fetch r4-r7 ret args */ + ld r9,STK_PARM(r9)(r1) + ld r10,STK_PARM(r10)(r1) + ld r11,STK_PARM(r11)(r1) + std r4,0(r8) + std r5,0(r9) + std r6,0(r10) + std r7,0(r11) - ld r1,0(r1) - ld r0,-8(r1) mtcrf 0xff,r0 - blr /* return r3 = status */ + blr /* return r3 = status */ diff -puN include/asm-ppc64/hvcall.h~fix_pseries_hcalls include/asm-ppc64/hvcall.h --- foobar2/include/asm-ppc64/hvcall.h~fix_pseries_hcalls 2005-01-30 13:54:53.163581752 +1100 +++ foobar2-anton/include/asm-ppc64/hvcall.h 2005-01-30 13:57:58.784771353 +1100 @@ -1,6 +1,8 @@ #ifndef _PPC64_HVCALL_H #define _PPC64_HVCALL_H +#define HVSC .long 0x44000022 + #define H_Success 0 #define H_Busy 1 /* Hardware busy -- retry later */ #define H_Constrained 4 /* Resource request constrained to max allowed */ @@ -41,7 +43,7 @@ /* Flags */ #define H_LARGE_PAGE (1UL<<(63-16)) -#define H_EXACT (1UL<<(63-24)) /* Use exact PTE or return H_PTEG_FULL */ +#define H_EXACT (1UL<<(63-24)) /* Use exact PTE or return H_PTEG_FULL */ #define H_R_XLATE (1UL<<(63-25)) /* include a valid logical page num in the pte if the valid bit is set */ #define H_READ_4 (1UL<<(63-26)) /* Return 4 PTEs */ #define H_AVPN (1UL<<(63-32)) /* An avpn is provided as a sanity test */ @@ -54,8 +56,6 @@ #define H_PP1 (1UL<<(63-62)) #define H_PP2 (1UL<<(63-63)) - - /* pSeries hypervisor opcodes */ #define H_REMOVE 0x04 #define H_ENTER 0x08 @@ -108,6 +108,8 @@ #define H_FREE_VTERM 0x158 #define H_POLL_PENDING 0x1D8 +#ifndef __ASSEMBLY__ + /* plpar_hcall() -- Generic call interface using above opcodes * * The actual call interface is a hypervisor call instruction with @@ -125,8 +127,6 @@ long plpar_hcall(unsigned long opcode, unsigned long *out2, unsigned long *out3); -#define HVSC ".long 0x44000022\n" - /* Same as plpar_hcall but for those opcodes that return no values * other than status. Slightly more efficient. */ @@ -147,9 +147,6 @@ long plpar_hcall_8arg_2ret(unsigned long unsigned long arg7, unsigned long arg8, unsigned long *out1); - - - /* plpar_hcall_4out() * @@ -166,4 +163,5 @@ long plpar_hcall_4out(unsigned long opco unsigned long *out3, unsigned long *out4); +#endif /* __ASSEMBLY__ */ #endif /* _PPC64_HVCALL_H */ diff -puN arch/ppc64/kernel/head.S~fix_pseries_hcalls arch/ppc64/kernel/head.S --- foobar2/arch/ppc64/kernel/head.S~fix_pseries_hcalls 2005-01-30 14:24:00.495238354 +1100 +++ foobar2-anton/arch/ppc64/kernel/head.S 2005-01-30 14:24:09.523947006 +1100 @@ -37,6 +37,7 @@ #include #include #include +#include #ifdef CONFIG_PPC_ISERIES #define DO_SOFT_DISABLE @@ -45,7 +46,6 @@ /* * hcall interface to pSeries LPAR */ -#define HVSC .long 0x44000022 #define H_SET_ASR 0x30 /* _ From sfr at canb.auug.org.au Mon Feb 7 18:02:18 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 7 Feb 2005 18:02:18 +1100 Subject: [PATCH] "invert" dma mapping routines Message-ID: <20050207180218.463809ed.sfr@canb.auug.org.au> Hi Anton, This patch "inverts" our dma mapping routines so that the pci_ and vio_ ... routines are implemented in terms of the dma_ ... routines (the vio_ routines basically disappear anyway as noone uses them directly any more). This is in anticipation of having something akin to dma_mapping_ops attached to the struct device. It also makes sompiling without PCI much easier. Compiled on iSeries, pSeries and pmac. Booted on iSeries. Diffstat looks like this: arch/ppc64/kernel/dma.c | 100 +++++++++++++-------------- arch/ppc64/kernel/iommu.c | 8 +- arch/ppc64/kernel/pci.c | 2 arch/ppc64/kernel/pci_direct_iommu.c | 34 +++++---- arch/ppc64/kernel/pci_iommu.c | 55 ++++++++------- arch/ppc64/kernel/vio.c | 55 +++++++++------ include/asm-ppc64/dma-mapping.h | 20 +++++ include/asm-ppc64/iommu.h | 6 - include/asm-ppc64/pci.h | 126 +---------------------------------- include/asm-ppc64/vio.h | 27 ------- 10 files changed, 166 insertions(+), 267 deletions(-) Please comment. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/dma.c linus-bk-dma.4/arch/ppc64/kernel/dma.c --- linus-bk/arch/ppc64/kernel/dma.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-bk-dma.4/arch/ppc64/kernel/dma.c 2005-02-07 17:47:41.000000000 +1100 @@ -13,14 +13,23 @@ #include #include -int dma_supported(struct device *dev, u64 mask) +static struct dma_mapping_ops *get_dma_ops(struct device *dev) { if (dev->bus == &pci_bus_type) - return pci_dma_supported(to_pci_dev(dev), mask); + return &pci_dma_ops; #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) - return vio_dma_supported(to_vio_dev(dev), mask); -#endif /* CONFIG_IBMVIO */ + return &vio_dma_ops; +#endif + return NULL; +} + +int dma_supported(struct device *dev, u64 mask) +{ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->dma_supported(dev, mask); BUG(); return 0; } @@ -32,7 +41,7 @@ return pci_set_dma_mask(to_pci_dev(dev), dma_mask); #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) - return vio_set_dma_mask(to_vio_dev(dev), dma_mask); + return -EIO; #endif /* CONFIG_IBMVIO */ BUG(); return 0; @@ -42,12 +51,10 @@ void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, int flag) { - if (dev->bus == &pci_bus_type) - return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_alloc_consistent(to_vio_dev(dev), size, dma_handle); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->alloc_coherent(dev, size, dma_handle, flag); BUG(); return NULL; } @@ -56,12 +63,10 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_handle) { - if (dev->bus == &pci_bus_type) - pci_free_consistent(to_pci_dev(dev), size, cpu_addr, dma_handle); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_free_consistent(to_vio_dev(dev), size, cpu_addr, dma_handle); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->free_coherent(dev, size, cpu_addr, dma_handle); else BUG(); } @@ -70,12 +75,10 @@ dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_single(to_pci_dev(dev), cpu_addr, size, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_single(to_vio_dev(dev), cpu_addr, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_single(dev, cpu_addr, size, direction); BUG(); return (dma_addr_t)0; } @@ -84,12 +87,10 @@ void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_single(to_pci_dev(dev), dma_addr, size, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_single(to_vio_dev(dev), dma_addr, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_single(dev, dma_addr, size, direction); else BUG(); } @@ -99,12 +100,11 @@ unsigned long offset, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_page(to_pci_dev(dev), page, offset, size, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_page(to_vio_dev(dev), page, offset, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_single(dev, + (page_address(page) + offset), size, direction); BUG(); return (dma_addr_t)0; } @@ -113,12 +113,10 @@ void dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_page(to_pci_dev(dev), dma_address, size, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_page(to_vio_dev(dev), dma_address, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_single(dev, dma_address, size, direction); else BUG(); } @@ -127,12 +125,10 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_sg(to_pci_dev(dev), sg, nents, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_sg(to_vio_dev(dev), sg, nents, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_sg(dev, sg, nents, direction); BUG(); return 0; } @@ -141,12 +137,10 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_sg(to_pci_dev(dev), sg, nhwentries, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_sg(to_vio_dev(dev), sg, nhwentries, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_sg(dev, sg, nhwentries, direction); else BUG(); } diff -ruN linus-bk/arch/ppc64/kernel/iommu.c linus-bk-dma.4/arch/ppc64/kernel/iommu.c --- linus-bk/arch/ppc64/kernel/iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.4/arch/ppc64/kernel/iommu.c 2005-02-07 15:00:06.000000000 +1100 @@ -513,8 +513,8 @@ * Returns the virtual address of the buffer and sets dma_handle * to the dma address (mapping) of the first page. */ -void *iommu_alloc_consistent(struct iommu_table *tbl, size_t size, - dma_addr_t *dma_handle) +void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size, + dma_addr_t *dma_handle, int flag) { void *ret = NULL; dma_addr_t mapping; @@ -538,7 +538,7 @@ return NULL; /* Alloc enough pages (and possibly more) */ - ret = (void *)__get_free_pages(GFP_ATOMIC, order); + ret = (void *)__get_free_pages(flag, order); if (!ret) return NULL; memset(ret, 0, size); @@ -553,7 +553,7 @@ return ret; } -void iommu_free_consistent(struct iommu_table *tbl, size_t size, +void iommu_free_coherent(struct iommu_table *tbl, size_t size, void *vaddr, dma_addr_t dma_handle) { unsigned int npages; diff -ruN linus-bk/arch/ppc64/kernel/pci.c linus-bk-dma.4/arch/ppc64/kernel/pci.c --- linus-bk/arch/ppc64/kernel/pci.c 2005-01-22 06:09:00.000000000 +1100 +++ linus-bk-dma.4/arch/ppc64/kernel/pci.c 2005-02-07 14:45:23.000000000 +1100 @@ -69,7 +69,7 @@ LIST_HEAD(hose_list); -struct pci_dma_ops pci_dma_ops; +struct dma_mapping_ops pci_dma_ops; EXPORT_SYMBOL(pci_dma_ops); int global_phb_number; /* Global phb counter */ diff -ruN linus-bk/arch/ppc64/kernel/pci_direct_iommu.c linus-bk-dma.4/arch/ppc64/kernel/pci_direct_iommu.c --- linus-bk/arch/ppc64/kernel/pci_direct_iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.4/arch/ppc64/kernel/pci_direct_iommu.c 2005-02-07 16:00:47.000000000 +1100 @@ -30,12 +30,12 @@ #include "pci.h" -static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +static void *pci_direct_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, int flag) { void *ret; - ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); + ret = (void *)__get_free_pages(flag, get_order(size)); if (ret != NULL) { memset(ret, 0, size); *dma_handle = virt_to_abs(ret); @@ -43,24 +43,24 @@ return ret; } -static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, +static void pci_direct_free_coherent(struct device *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) { free_pages((unsigned long)vaddr, get_order(size)); } -static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, +static dma_addr_t pci_direct_map_single(struct device *hwdev, void *ptr, size_t size, enum dma_data_direction direction) { return virt_to_abs(ptr); } -static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, +static void pci_direct_unmap_single(struct device *hwdev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction) { } -static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, +static int pci_direct_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { int i; @@ -73,17 +73,23 @@ return nents; } -static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, +static void pci_direct_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { } +static int pci_direct_dma_supported(struct device *dev, u64 mask) +{ + return mask < 0x100000000ull; +} + void __init pci_direct_iommu_init(void) { - pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; - pci_dma_ops.pci_map_single = pci_direct_map_single; - pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; - pci_dma_ops.pci_map_sg = pci_direct_map_sg; - pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; + pci_dma_ops.alloc_coherent = pci_direct_alloc_coherent; + pci_dma_ops.free_coherent = pci_direct_free_coherent; + pci_dma_ops.map_single = pci_direct_map_single; + pci_dma_ops.unmap_single = pci_direct_unmap_single; + pci_dma_ops.map_sg = pci_direct_map_sg; + pci_dma_ops.unmap_sg = pci_direct_unmap_sg; + pci_dma_ops.dma_supported = pci_direct_dma_supported; } diff -ruN linus-bk/arch/ppc64/kernel/pci_iommu.c linus-bk-dma.4/arch/ppc64/kernel/pci_iommu.c --- linus-bk/arch/ppc64/kernel/pci_iommu.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-bk-dma.4/arch/ppc64/kernel/pci_iommu.c 2005-02-07 15:10:05.000000000 +1100 @@ -50,19 +50,23 @@ */ #define PCI_GET_DN(dev) ((struct device_node *)((dev)->sysdata)) -static inline struct iommu_table *devnode_table(struct pci_dev *dev) +static inline struct iommu_table *devnode_table(struct device *dev) { - if (!dev) - dev = ppc64_isabridge_dev; - if (!dev) - return NULL; + struct pci_dev *pdev; + + if (!dev) { + pdev = ppc64_isabridge_dev; + if (!pdev) + return NULL; + } else + pdev = to_pci_dev(dev); #ifdef CONFIG_PPC_ISERIES - return ISERIES_DEVNODE(dev)->iommu_table; + return ISERIES_DEVNODE(pdev)->iommu_table; #endif /* CONFIG_PPC_ISERIES */ #ifdef CONFIG_PPC_MULTIPLATFORM - return PCI_GET_DN(dev)->iommu_table; + return PCI_GET_DN(pdev)->iommu_table; #endif /* CONFIG_PPC_MULTIPLATFORM */ } @@ -71,16 +75,17 @@ * Returns the virtual address of the buffer and sets dma_handle * to the dma address (mapping) of the first page. */ -static void *pci_iommu_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +static void *pci_iommu_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, int flag) { - return iommu_alloc_consistent(devnode_table(hwdev), size, dma_handle); + return iommu_alloc_coherent(devnode_table(hwdev), size, dma_handle, + flag); } -static void pci_iommu_free_consistent(struct pci_dev *hwdev, size_t size, +static void pci_iommu_free_coherent(struct device *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) { - iommu_free_consistent(devnode_table(hwdev), size, vaddr, dma_handle); + iommu_free_coherent(devnode_table(hwdev), size, vaddr, dma_handle); } /* Creates TCEs for a user provided buffer. The user buffer must be @@ -89,46 +94,46 @@ * need not be page aligned, the dma_addr_t returned will point to the same * byte within the page as vaddr. */ -static dma_addr_t pci_iommu_map_single(struct pci_dev *hwdev, void *vaddr, +static dma_addr_t pci_iommu_map_single(struct device *hwdev, void *vaddr, size_t size, enum dma_data_direction direction) { return iommu_map_single(devnode_table(hwdev), vaddr, size, direction); } -static void pci_iommu_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_handle, +static void pci_iommu_unmap_single(struct device *hwdev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction) { iommu_unmap_single(devnode_table(hwdev), dma_handle, size, direction); } -static int pci_iommu_map_sg(struct pci_dev *pdev, struct scatterlist *sglist, +static int pci_iommu_map_sg(struct device *pdev, struct scatterlist *sglist, int nelems, enum dma_data_direction direction) { - return iommu_map_sg(&pdev->dev, devnode_table(pdev), sglist, + return iommu_map_sg(pdev, devnode_table(pdev), sglist, nelems, direction); } -static void pci_iommu_unmap_sg(struct pci_dev *pdev, struct scatterlist *sglist, +static void pci_iommu_unmap_sg(struct device *pdev, struct scatterlist *sglist, int nelems, enum dma_data_direction direction) { iommu_unmap_sg(devnode_table(pdev), sglist, nelems, direction); } /* We support DMA to/from any memory page via the iommu */ -static int pci_iommu_dma_supported(struct pci_dev *pdev, u64 mask) +static int pci_iommu_dma_supported(struct device *dev, u64 mask) { return 1; } void pci_iommu_init(void) { - pci_dma_ops.pci_alloc_consistent = pci_iommu_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_iommu_free_consistent; - pci_dma_ops.pci_map_single = pci_iommu_map_single; - pci_dma_ops.pci_unmap_single = pci_iommu_unmap_single; - pci_dma_ops.pci_map_sg = pci_iommu_map_sg; - pci_dma_ops.pci_unmap_sg = pci_iommu_unmap_sg; - pci_dma_ops.pci_dma_supported = pci_iommu_dma_supported; + pci_dma_ops.alloc_coherent = pci_iommu_alloc_coherent; + pci_dma_ops.free_coherent = pci_iommu_free_coherent; + pci_dma_ops.map_single = pci_iommu_map_single; + pci_dma_ops.unmap_single = pci_iommu_unmap_single; + pci_dma_ops.map_sg = pci_iommu_map_sg; + pci_dma_ops.unmap_sg = pci_iommu_unmap_sg; + pci_dma_ops.dma_supported = pci_iommu_dma_supported; } diff -ruN linus-bk/arch/ppc64/kernel/vio.c linus-bk-dma.4/arch/ppc64/kernel/vio.c --- linus-bk/arch/ppc64/kernel/vio.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.4/arch/ppc64/kernel/vio.c 2005-02-07 15:45:00.000000000 +1100 @@ -557,48 +557,61 @@ EXPORT_SYMBOL(vio_disable_interrupts); #endif -dma_addr_t vio_map_single(struct vio_dev *dev, void *vaddr, +static dma_addr_t vio_map_single(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction) { - return iommu_map_single(dev->iommu_table, vaddr, size, direction); + return iommu_map_single(to_vio_dev(dev)->iommu_table, vaddr, size, + direction); } -EXPORT_SYMBOL(vio_map_single); -void vio_unmap_single(struct vio_dev *dev, dma_addr_t dma_handle, +static void vio_unmap_single(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction) { - iommu_unmap_single(dev->iommu_table, dma_handle, size, direction); + iommu_unmap_single(to_vio_dev(dev)->iommu_table, dma_handle, size, + direction); } -EXPORT_SYMBOL(vio_unmap_single); -int vio_map_sg(struct vio_dev *vdev, struct scatterlist *sglist, int nelems, - enum dma_data_direction direction) +static int vio_map_sg(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction direction) { - return iommu_map_sg(&vdev->dev, vdev->iommu_table, sglist, + return iommu_map_sg(dev, to_vio_dev(dev)->iommu_table, sglist, nelems, direction); } -EXPORT_SYMBOL(vio_map_sg); -void vio_unmap_sg(struct vio_dev *vdev, struct scatterlist *sglist, int nelems, - enum dma_data_direction direction) +static void vio_unmap_sg(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction direction) { - iommu_unmap_sg(vdev->iommu_table, sglist, nelems, direction); + iommu_unmap_sg(to_vio_dev(dev)->iommu_table, sglist, nelems, direction); } -EXPORT_SYMBOL(vio_unmap_sg); -void *vio_alloc_consistent(struct vio_dev *dev, size_t size, - dma_addr_t *dma_handle) +static void *vio_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, int flag) { - return iommu_alloc_consistent(dev->iommu_table, size, dma_handle); + return iommu_alloc_coherent(to_vio_dev(dev)->iommu_table, size, + dma_handle, flag); } -EXPORT_SYMBOL(vio_alloc_consistent); -void vio_free_consistent(struct vio_dev *dev, size_t size, +static void vio_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle) { - iommu_free_consistent(dev->iommu_table, size, vaddr, dma_handle); + iommu_free_coherent(to_vio_dev(dev)->iommu_table, size, vaddr, + dma_handle); } -EXPORT_SYMBOL(vio_free_consistent); + +static int vio_dma_supported(struct device *dev, u64 mask) +{ + return 1; +} + +struct dma_mapping_ops vio_dma_ops = { + .alloc_coherent = vio_alloc_coherent, + .free_coherent = vio_free_coherent, + .map_single = vio_map_single, + .unmap_single = vio_unmap_single, + .map_sg = vio_map_sg, + .unmap_sg = vio_unmap_sg, + .dma_supported = vio_dma_supported, +}; static int vio_bus_match(struct device *dev, struct device_driver *drv) { diff -ruN linus-bk/include/asm-ppc64/dma-mapping.h linus-bk-dma.4/include/asm-ppc64/dma-mapping.h --- linus-bk/include/asm-ppc64/dma-mapping.h 2004-09-14 21:06:08.000000000 +1000 +++ linus-bk-dma.4/include/asm-ppc64/dma-mapping.h 2005-02-07 14:38:01.000000000 +1100 @@ -113,4 +113,24 @@ /* nothing to do */ } +/* + * DMA operations are abstracted for G5 vs. i/pSeries, PCI vs. VIO + */ +struct dma_mapping_ops { + void * (*alloc_coherent)(struct device *dev, size_t size, + dma_addr_t *dma_handle, int flag); + void (*free_coherent)(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_handle); + dma_addr_t (*map_single)(struct device *dev, void *ptr, + size_t size, enum dma_data_direction direction); + void (*unmap_single)(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction); + int (*map_sg)(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction); + void (*unmap_sg)(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction); + int (*dma_supported)(struct device *dev, u64 mask); + int (*dac_dma_supported)(struct device *dev, u64 mask); +}; + #endif /* _ASM_DMA_MAPPING_H */ diff -ruN linus-bk/include/asm-ppc64/iommu.h linus-bk-dma.4/include/asm-ppc64/iommu.h --- linus-bk/include/asm-ppc64/iommu.h 2005-01-09 10:05:41.000000000 +1100 +++ linus-bk-dma.4/include/asm-ppc64/iommu.h 2005-02-07 15:02:01.000000000 +1100 @@ -145,9 +145,9 @@ extern void iommu_unmap_sg(struct iommu_table *tbl, struct scatterlist *sglist, int nelems, enum dma_data_direction direction); -extern void *iommu_alloc_consistent(struct iommu_table *tbl, size_t size, - dma_addr_t *dma_handle); -extern void iommu_free_consistent(struct iommu_table *tbl, size_t size, +extern void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size, + dma_addr_t *dma_handle, int flag); +extern void iommu_free_coherent(struct iommu_table *tbl, size_t size, void *vaddr, dma_addr_t dma_handle); extern dma_addr_t iommu_map_single(struct iommu_table *tbl, void *vaddr, size_t size, enum dma_data_direction direction); diff -ruN linus-bk/include/asm-ppc64/pci.h linus-bk-dma.4/include/asm-ppc64/pci.h --- linus-bk/include/asm-ppc64/pci.h 2004-10-28 06:58:36.000000000 +1000 +++ linus-bk-dma.4/include/asm-ppc64/pci.h 2005-02-07 15:03:50.000000000 +1100 @@ -13,11 +13,14 @@ #include #include #include + #include #include #include #include +#include + #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 @@ -63,131 +66,18 @@ extern unsigned int pcibios_assign_all_busses(void); -/* - * PCI DMA operations are abstracted for G5 vs. i/pSeries - */ -struct pci_dma_ops { - void * (*pci_alloc_consistent)(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle); - void (*pci_free_consistent)(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle); - - dma_addr_t (*pci_map_single)(struct pci_dev *hwdev, void *ptr, - size_t size, enum dma_data_direction direction); - void (*pci_unmap_single)(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction direction); - int (*pci_map_sg)(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction); - void (*pci_unmap_sg)(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction); - int (*pci_dma_supported)(struct pci_dev *hwdev, u64 mask); - int (*pci_dac_dma_supported)(struct pci_dev *hwdev, u64 mask); -}; - -extern struct pci_dma_ops pci_dma_ops; - -static inline void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - return pci_dma_ops.pci_alloc_consistent(hwdev, size, dma_handle); -} - -static inline void pci_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - pci_dma_ops.pci_free_consistent(hwdev, size, vaddr, dma_handle); -} - -static inline dma_addr_t pci_map_single(struct pci_dev *hwdev, void *ptr, - size_t size, int direction) -{ - return pci_dma_ops.pci_map_single(hwdev, ptr, size, - (enum dma_data_direction)direction); -} - -static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, int direction) -{ - pci_dma_ops.pci_unmap_single(hwdev, dma_addr, size, - (enum dma_data_direction)direction); -} - -static inline int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - return pci_dma_ops.pci_map_sg(hwdev, sg, nents, - (enum dma_data_direction)direction); -} - -static inline void pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - pci_dma_ops.pci_unmap_sg(hwdev, sg, nents, - (enum dma_data_direction)direction); -} - -static inline void pci_dma_sync_single_for_cpu(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_single_for_device(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_sg_for_cpu(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_sg_for_device(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -/* Return whether the given PCI device DMA address mask can - * be supported properly. For example, if your device can - * only drive the low 24-bits during PCI bus mastering, then - * you would pass 0x00ffffff as the mask to this function. - * We default to supporting only 32 bits DMA unless we have - * an explicit override of this function in pci_dma_ops for - * the platform - */ -static inline int pci_dma_supported(struct pci_dev *hwdev, u64 mask) -{ - if (pci_dma_ops.pci_dma_supported) - return pci_dma_ops.pci_dma_supported(hwdev, mask); - return (mask < 0x100000000ull); -} +extern struct dma_mapping_ops pci_dma_ops; /* For DAC DMA, we currently don't support it by default, but * we let the platform override this */ static inline int pci_dac_dma_supported(struct pci_dev *hwdev,u64 mask) { - if (pci_dma_ops.pci_dac_dma_supported) - return pci_dma_ops.pci_dac_dma_supported(hwdev, mask); + if (pci_dma_ops.dac_dma_supported) + return pci_dma_ops.dac_dma_supported(&hwdev->dev, mask); return 0; } -static inline int pci_dma_mapping_error(dma_addr_t dma_addr) -{ - return dma_mapping_error(dma_addr); -} - extern int pci_domain_nr(struct pci_bus *bus); /* Set the name of the bus as it appears in /proc/bus/pci */ @@ -201,10 +91,6 @@ /* Tell drivers/pci/proc.c that we have pci_mmap_page_range() */ #define HAVE_PCI_MMAP 1 -#define pci_map_page(dev, page, off, size, dir) \ - pci_map_single(dev, (page_address(page) + (off)), size, dir) -#define pci_unmap_page(dev,addr,sz,dir) pci_unmap_single(dev,addr,sz,dir) - /* pci_unmap_{single,page} is not a nop, thus... */ #define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME) \ dma_addr_t ADDR_NAME; diff -ruN linus-bk/include/asm-ppc64/vio.h linus-bk-dma.4/include/asm-ppc64/vio.h --- linus-bk/include/asm-ppc64/vio.h 2004-06-30 15:40:04.000000000 +1000 +++ linus-bk-dma.4/include/asm-ppc64/vio.h 2005-02-07 15:42:37.000000000 +1100 @@ -57,32 +57,7 @@ int vio_enable_interrupts(struct vio_dev *dev); int vio_disable_interrupts(struct vio_dev *dev); -dma_addr_t vio_map_single(struct vio_dev *dev, void *vaddr, - size_t size, enum dma_data_direction direction); -void vio_unmap_single(struct vio_dev *dev, dma_addr_t dma_handle, - size_t size, enum dma_data_direction direction); -int vio_map_sg(struct vio_dev *vdev, struct scatterlist *sglist, - int nelems, enum dma_data_direction direction); -void vio_unmap_sg(struct vio_dev *vdev, struct scatterlist *sglist, - int nelems, enum dma_data_direction direction); -void *vio_alloc_consistent(struct vio_dev *dev, size_t size, - dma_addr_t *dma_handle); -void vio_free_consistent(struct vio_dev *dev, size_t size, void *vaddr, - dma_addr_t dma_handle); - -static inline int vio_dma_supported(struct vio_dev *hwdev, u64 mask) -{ - return 1; -} - -#define vio_map_page(dev, page, off, size, dir) \ - vio_map_single(dev, (page_address(page) + (off)), size, dir) -#define vio_unmap_page(dev,addr,sz,dir) vio_unmap_single(dev,addr,sz,dir) - -static inline int vio_set_dma_mask(struct vio_dev *dev, u64 mask) -{ - return -EIO; -} +extern struct dma_mapping_ops vio_dma_ops; extern struct bus_type vio_bus_type; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050207/9250fc87/attachment.pgp From olh at suse.de Tue Feb 8 01:40:58 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:40:58 +0100 Subject: [PATCH] disable HMT for RS64 cpus Message-ID: <20050207144058.GA5516@suse.de> Hardware multithreading for RS64 cpus is currently broken. Anton sent me a patch a few weeks ago, but it did not work. So just hide the config option for the time being. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Kconfig ./arch/ppc64/Kconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/Kconfig 2005-02-03 02:55:53.000000000 +0100 +++ ./arch/ppc64/Kconfig 2005-02-07 15:36:25.079315558 +0100 @@ -194,7 +194,10 @@ config NR_CPUS config HMT bool "Hardware multithreading" - depends on SMP && PPC_PSERIES + depends on SMP && PPC_PSERIES && BROKEN + help + This option enables hardware multithreading on RS64 cpus. + pSeries systems p620 and p660 have such a cpu type. config DISCONTIGMEM bool "Discontiguous Memory Support" From olh at suse.de Tue Feb 8 01:42:28 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:42:28 +0100 Subject: [PATCH] update ppc64 defconfig Message-ID: <20050207144228.GB5516@suse.de> This turns the ppc64/defconfig into something useful, it boots on all systems. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/defconfig ./arch/ppc64/defconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/defconfig 2005-02-03 02:55:23.000000000 +0100 +++ ./arch/ppc64/defconfig 2005-02-07 15:11:06.466124000 +0100 @@ -1,9 +1,12 @@ # # Automatically generated make config: don't edit +# Linux kernel version: 2.6.11-rc3-bk3 +# Mon Feb 7 15:11:06 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y +CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y @@ -16,27 +19,36 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y -CONFIG_STANDALONE=y +CONFIG_LOCK_KERNEL=y # # General setup # +CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y +CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y +# CONFIG_AUDIT is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_HOTPLUG=y +CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y +# CONFIG_KALLSYMS_ALL is not set +# CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set +CONFIG_SHMEM=y +CONFIG_CC_ALIGN_FUNCTIONS=0 +CONFIG_CC_ALIGN_LABELS=0 +CONFIG_CC_ALIGN_LOOPS=0 +CONFIG_CC_ALIGN_JUMPS=0 +# CONFIG_TINY_SHMEM is not set # # Loadable module support @@ -45,28 +57,41 @@ CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y -# CONFIG_MODVERSIONS is not set -# CONFIG_KMOD is not set +CONFIG_MODVERSIONS=y +CONFIG_MODULE_SRCVERSION_ALL=y +CONFIG_KMOD=y CONFIG_STOP_MACHINE=y +CONFIG_SYSVIPC_COMPAT=y # # Platform support # # CONFIG_PPC_ISERIES is not set +CONFIG_PPC_MULTIPLATFORM=y CONFIG_PPC_PSERIES=y +CONFIG_PPC_PMAC=y +CONFIG_PPC_MAPLE=y CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y CONFIG_ALTIVEC=y -# CONFIG_PPC_PMAC is not set -# CONFIG_BOOTX_TEXT is not set +CONFIG_PPC_SPLPAR=y +CONFIG_IBMVIO=y +CONFIG_U3_DART=y +CONFIG_MPIC_BROKEN_U3=y +CONFIG_PPC_PMAC64=y +CONFIG_BOOTX_TEXT=y # CONFIG_POWER4_ONLY is not set -# CONFIG_IOMMU_VMERGE is not set +CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y -CONFIG_IRQ_ALL_CPUS=y CONFIG_NR_CPUS=32 # CONFIG_HMT is not set -# CONFIG_DISCONTIGMEM is not set +CONFIG_DISCONTIGMEM=y +# CONFIG_NUMA is not set +# CONFIG_SCHED_SMT is not set +# CONFIG_PREEMPT is not set +CONFIG_EEH=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_PPC_RTAS=y CONFIG_RTAS_FLASH=m CONFIG_SCANLOG=m @@ -78,14 +103,19 @@ CONFIG_LPARCFG=y CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_BINFMT_ELF=y -# CONFIG_BINFMT_MISC is not set -CONFIG_PCI_LEGACY_PROC=y -CONFIG_PCI_NAMES=y +CONFIG_BINFMT_MISC=m +# CONFIG_PCI_LEGACY_PROC is not set +# CONFIG_PCI_NAMES is not set +CONFIG_HOTPLUG_CPU=y # -# PCMCIA/CardBus support +# PCCARD (PCMCIA/CardBus) support +# +# CONFIG_PCCARD is not set + +# +# PC-card bridges # -# CONFIG_PCMCIA is not set # # PCI Hotplug Support @@ -93,7 +123,6 @@ CONFIG_PCI_NAMES=y CONFIG_HOTPLUG_PCI=m # CONFIG_HOTPLUG_PCI_FAKE is not set # CONFIG_HOTPLUG_PCI_CPCI is not set -# CONFIG_HOTPLUG_PCI_PCIE is not set # CONFIG_HOTPLUG_PCI_SHPC is not set CONFIG_HOTPLUG_PCI_RPA=m CONFIG_HOTPLUG_PCI_RPA_DLPAR=m @@ -107,7 +136,9 @@ CONFIG_PROC_DEVICETREE=y # # Generic Driver Options # -CONFIG_FW_LOADER=m +CONFIG_STANDALONE=y +CONFIG_PREVENT_FIRMWARE_BUILD=y +CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # @@ -118,7 +149,14 @@ CONFIG_FW_LOADER=m # # Parallel port support # -# CONFIG_PARPORT is not set +CONFIG_PARPORT=m +CONFIG_PARPORT_PC=m +CONFIG_PARPORT_PC_CML1=m +# CONFIG_PARPORT_SERIAL is not set +# CONFIG_PARPORT_PC_FIFO is not set +# CONFIG_PARPORT_PC_SUPERIO is not set +# CONFIG_PARPORT_OTHER is not set +# CONFIG_PARPORT_1284 is not set # # Plug and Play support @@ -128,17 +166,32 @@ CONFIG_FW_LOADER=m # Block devices # CONFIG_BLK_DEV_FD=y +# CONFIG_PARIDE is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set +# CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m -# CONFIG_BLK_DEV_CARMEL is not set +# CONFIG_BLK_DEV_SX8 is not set +# CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y -CONFIG_BLK_DEV_RAM_SIZE=4096 +CONFIG_BLK_DEV_RAM_COUNT=16 +CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y +CONFIG_INITRAMFS_SOURCE="" +# CONFIG_CDROM_PKTCDVD is not set + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +# CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support @@ -149,15 +202,14 @@ CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # +# CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set -# CONFIG_IDEDISK_STROKE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set -# CONFIG_IDE_TASKFILE_IO is not set # # IDE chipset support/bugfixes @@ -173,7 +225,6 @@ CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set -CONFIG_BLK_DEV_ADMA=y # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y @@ -194,6 +245,11 @@ CONFIG_BLK_DEV_AMD74XX=y # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set +CONFIG_BLK_DEV_IDE_PMAC=y +CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST=y +CONFIG_BLK_DEV_IDEDMA_PMAC=y +# CONFIG_BLK_DEV_IDE_PMAC_BLINK is not set +# CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y @@ -219,7 +275,6 @@ CONFIG_CHR_DEV_SG=y # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y -CONFIG_SCSI_REPORT_LUNS=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set @@ -228,33 +283,52 @@ CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y +CONFIG_SCSI_ISCSI_ATTRS=m # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set +# CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set -# CONFIG_SCSI_ADVANSYS is not set -# CONFIG_SCSI_MEGARAID is not set -# CONFIG_SCSI_SATA is not set +# CONFIG_MEGARAID_NEWGEN is not set +# CONFIG_MEGARAID_LEGACY is not set +CONFIG_SCSI_SATA=y +# CONFIG_SCSI_SATA_AHCI is not set +CONFIG_SCSI_SATA_SVW=y +# CONFIG_SCSI_ATA_PIIX is not set +# CONFIG_SCSI_SATA_NV is not set +# CONFIG_SCSI_SATA_PROMISE is not set +# CONFIG_SCSI_SATA_SX4 is not set +# CONFIG_SCSI_SATA_SIL is not set +# CONFIG_SCSI_SATA_SIS is not set +# CONFIG_SCSI_SATA_ULI is not set +# CONFIG_SCSI_SATA_VIA is not set +# CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set -# CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set +CONFIG_SCSI_IBMVSCSI=y +# CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set +# CONFIG_SCSI_PPA is not set +# CONFIG_SCSI_IMM is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set +CONFIG_SCSI_IPR=y +CONFIG_SCSI_IPR_TRACE=y +CONFIG_SCSI_IPR_DUMP=y # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set @@ -264,10 +338,9 @@ CONFIG_SCSI_QLA22XX=m CONFIG_SCSI_QLA2300=m CONFIG_SCSI_QLA2322=m CONFIG_SCSI_QLA6312=m -CONFIG_SCSI_QLA6322=m # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set -# CONFIG_SCSI_DEBUG is not set +CONFIG_SCSI_DEBUG=m # # Multi-device support (RAID and LVM) @@ -277,11 +350,16 @@ CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y +CONFIG_MD_RAID10=y CONFIG_MD_RAID5=y -CONFIG_MD_RAID6=y -# CONFIG_MD_MULTIPATH is not set +CONFIG_MD_RAID6=m +CONFIG_MD_MULTIPATH=m +CONFIG_MD_FAULTY=m CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=m +CONFIG_DM_SNAPSHOT=m +CONFIG_DM_MIRROR=m +CONFIG_DM_ZERO=m # # Fusion MPT device support @@ -291,15 +369,48 @@ CONFIG_DM_CRYPT=m # # IEEE 1394 (FireWire) support # -# CONFIG_IEEE1394 is not set +CONFIG_IEEE1394=y + +# +# Subsystem Options +# +# CONFIG_IEEE1394_VERBOSEDEBUG is not set +# CONFIG_IEEE1394_OUI_DB is not set +CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y +CONFIG_IEEE1394_CONFIG_ROM_IP1394=y + +# +# Device Drivers +# +# CONFIG_IEEE1394_PCILYNX is not set +CONFIG_IEEE1394_OHCI1394=y + +# +# Protocol Drivers +# +CONFIG_IEEE1394_VIDEO1394=m +CONFIG_IEEE1394_SBP2=m +# CONFIG_IEEE1394_SBP2_PHYS_DMA is not set +CONFIG_IEEE1394_ETH1394=m +CONFIG_IEEE1394_DV1394=m +CONFIG_IEEE1394_RAWIO=y +CONFIG_IEEE1394_CMP=m +CONFIG_IEEE1394_AMDTP=m # # I2O device support # +# CONFIG_I2O is not set # # Macintosh device drivers # +CONFIG_ADB=y +CONFIG_ADB_PMU=y +# CONFIG_PMAC_PBOOK is not set +# CONFIG_PMAC_BACKLIGHT is not set +# CONFIG_INPUT_ADBHID is not set +CONFIG_THERM_PM72=y # # Networking support @@ -322,19 +433,19 @@ CONFIG_NET_IPIP=y # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set -CONFIG_INET_ECN=y CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m +CONFIG_INET_TUNNEL=y +# CONFIG_IP_TCPDIAG is not set +# CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration # # CONFIG_IP_VS is not set # CONFIG_IPV6 is not set -# CONFIG_DECNET is not set -# CONFIG_BRIDGE is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set @@ -342,6 +453,9 @@ CONFIG_NETFILTER=y # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m +CONFIG_IP_NF_CT_ACCT=y +CONFIG_IP_NF_CONNTRACK_MARK=y +CONFIG_IP_NF_CT_PROTO_SCTP=m CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m CONFIG_IP_NF_TFTP=m @@ -366,8 +480,17 @@ CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_OWNER=m +CONFIG_IP_NF_MATCH_ADDRTYPE=m +CONFIG_IP_NF_MATCH_REALM=m +CONFIG_IP_NF_MATCH_SCTP=m +CONFIG_IP_NF_MATCH_COMMENT=m +CONFIG_IP_NF_MATCH_CONNMARK=m +CONFIG_IP_NF_MATCH_HASHLIMIT=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m +CONFIG_IP_NF_TARGET_LOG=m +CONFIG_IP_NF_TARGET_ULOG=m +CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m @@ -385,24 +508,24 @@ CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_CLASSIFY=m -CONFIG_IP_NF_TARGET_LOG=m -CONFIG_IP_NF_TARGET_ULOG=m -CONFIG_IP_NF_TARGET_TCPMSS=m +CONFIG_IP_NF_TARGET_CONNMARK=m +CONFIG_IP_NF_TARGET_CLUSTERIP=m +CONFIG_IP_NF_RAW=m +CONFIG_IP_NF_TARGET_NOTRACK=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m -CONFIG_IP_NF_COMPAT_IPCHAINS=m -CONFIG_IP_NF_COMPAT_IPFWADM=m CONFIG_XFRM=y CONFIG_XFRM_USER=m # # SCTP Configuration (EXPERIMENTAL) # -CONFIG_IPV6_SCTP__=y # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set +# CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set +# CONFIG_DECNET is not set CONFIG_LLC=y # CONFIG_LLC2 is not set # CONFIG_IPX is not set @@ -412,36 +535,42 @@ CONFIG_LLC=y # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set -# CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set +CONFIG_NET_CLS_ROUTE=y # # Network testing # # CONFIG_NET_PKTGEN is not set +CONFIG_NETPOLL=y +CONFIG_NETPOLL_RX=y +CONFIG_NETPOLL_TRAP=y +CONFIG_NET_POLL_CONTROLLER=y +# CONFIG_HAMRADIO is not set +# CONFIG_IRDA is not set +# CONFIG_BT is not set CONFIG_NETDEVICES=y +CONFIG_DUMMY=m +CONFIG_BONDING=m +# CONFIG_EQUALIZER is not set +CONFIG_TUN=m # # ARCnet devices # # CONFIG_ARCNET is not set -CONFIG_DUMMY=m -CONFIG_BONDING=m -# CONFIG_EQUALIZER is not set -CONFIG_TUN=m # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y -# CONFIG_OAKNET is not set # CONFIG_HAPPYMEAL is not set -# CONFIG_SUNGEM is not set +CONFIG_SUNGEM=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y # CONFIG_TYPHOON is not set @@ -451,6 +580,7 @@ CONFIG_VORTEX=y # # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set +CONFIG_IBMVETH=m CONFIG_NET_PCI=y CONFIG_PCNET32=y # CONFIG_AMD8111_ETH is not set @@ -483,8 +613,8 @@ CONFIG_E1000=y # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set -# CONFIG_SIS190 is not set # CONFIG_SK98LIN is not set +# CONFIG_VIA_VELOCITY is not set CONFIG_TIGON3=y # @@ -492,59 +622,40 @@ CONFIG_TIGON3=y # CONFIG_IXGB=m # CONFIG_IXGB_NAPI is not set -# CONFIG_FDDI is not set -# CONFIG_HIPPI is not set -CONFIG_IBMVETH=m -CONFIG_PPP=m -# CONFIG_PPP_MULTILINK is not set -# CONFIG_PPP_FILTER is not set -CONFIG_PPP_ASYNC=m -CONFIG_PPP_SYNC_TTY=m -CONFIG_PPP_DEFLATE=m -CONFIG_PPP_BSDCOMP=m -CONFIG_PPPOE=m -# CONFIG_SLIP is not set - -# -# Wireless LAN (non-hamradio) -# -# CONFIG_NET_RADIO is not set +# CONFIG_S2IO is not set # # Token Ring devices # CONFIG_TR=y CONFIG_IBMOL=y -# CONFIG_IBMLS is not set # CONFIG_3C359 is not set # CONFIG_TMS380TR is not set -# CONFIG_NET_FC is not set -# CONFIG_SHAPER is not set -CONFIG_NETCONSOLE=y - -# -# Wan interfaces -# -# CONFIG_WAN is not set - -# -# Amateur Radio support -# -# CONFIG_HAMRADIO is not set # -# IrDA (infrared) support +# Wireless LAN (non-hamradio) # -# CONFIG_IRDA is not set +# CONFIG_NET_RADIO is not set # -# Bluetooth support +# Wan interfaces # -# CONFIG_BT is not set -CONFIG_NETPOLL=y -CONFIG_NETPOLL_RX=y -CONFIG_NETPOLL_TRAP=y -CONFIG_NET_POLL_CONTROLLER=y +# CONFIG_WAN is not set +# CONFIG_FDDI is not set +# CONFIG_HIPPI is not set +# CONFIG_PLIP is not set +CONFIG_PPP=m +# CONFIG_PPP_MULTILINK is not set +# CONFIG_PPP_FILTER is not set +CONFIG_PPP_ASYNC=m +CONFIG_PPP_SYNC_TTY=m +CONFIG_PPP_DEFLATE=m +CONFIG_PPP_BSDCOMP=m +CONFIG_PPPOE=m +# CONFIG_SLIP is not set +# CONFIG_NET_FC is not set +# CONFIG_SHAPER is not set +CONFIG_NETCONSOLE=y # # ISDN subsystem @@ -565,12 +676,12 @@ CONFIG_INPUT=y # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y -CONFIG_INPUT_MOUSEDEV_PSAUX=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set -# CONFIG_INPUT_EVDEV is not set +CONFIG_INPUT_EVDEV=m # CONFIG_INPUT_EVBUG is not set # @@ -582,7 +693,10 @@ CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set +# CONFIG_SERIO_PARKBD is not set # CONFIG_SERIO_PCIPS2 is not set +CONFIG_SERIO_LIBPS2=y +# CONFIG_SERIO_RAW is not set # # Input Device Drivers @@ -590,15 +704,17 @@ CONFIG_SERIO_I8042=y CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set +# CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set +# CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y -CONFIG_INPUT_PCSPKR=y +CONFIG_INPUT_PCSPKR=m # CONFIG_INPUT_UINPUT is not set # @@ -623,16 +739,16 @@ CONFIG_SERIAL_8250_NR_UARTS=4 CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y # CONFIG_SERIAL_PMACZILOG is not set +CONFIG_SERIAL_ICOM=m CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 +CONFIG_PRINTER=m +# CONFIG_LP_CONSOLE is not set +# CONFIG_PPDEV is not set +# CONFIG_TIPAR is not set CONFIG_HVC_CONSOLE=y - -# -# Mice -# -# CONFIG_BUSMOUSE is not set -# CONFIG_QIC02_TAPE is not set +CONFIG_HVCS=m # # IPMI @@ -652,7 +768,6 @@ CONFIG_HVC_CONSOLE=y # # Ftape, the floppy tape device driver # -# CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 @@ -661,25 +776,30 @@ CONFIG_MAX_RAW_DEVS=256 # I2C support # CONFIG_I2C=y -# CONFIG_I2C_CHARDEV is not set +CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_ALGOPCF is not set +# CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set +# CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set -# CONFIG_I2C_AMD8111 is not set +CONFIG_I2C_AMD8111=y # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set +CONFIG_I2C_KEYWEST=y +# CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set +# CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set @@ -687,33 +807,61 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set +# CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set +# CONFIG_I2C_PCA_ISA is not set # -# I2C Hardware Sensors Chip support +# Hardware Sensors Chip support # # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set +# CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set +# CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set -# CONFIG_SENSORS_EEPROM is not set +# CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set +# CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set +# CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set +# CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_SMSC47B397 is not set +# CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83L785TS is not set +# CONFIG_SENSORS_W83627HF is not set + +# +# Other I2C Chip support +# +# CONFIG_SENSORS_EEPROM is not set +# CONFIG_SENSORS_PCF8574 is not set +# CONFIG_SENSORS_PCF8591 is not set +# CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set +# CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # +# Dallas's 1-wire bus +# +# CONFIG_W1 is not set + +# # Misc devices # @@ -731,20 +879,28 @@ CONFIG_I2C_ALGOBIT=y # Graphics support # CONFIG_FB=y +CONFIG_FB_MODE_HELPERS=y +CONFIG_FB_TILEBLITTING=y +# CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set CONFIG_FB_OF=y +# CONFIG_FB_CONTROL is not set +# CONFIG_FB_PLATINUM is not set +# CONFIG_FB_VALKYRIE is not set # CONFIG_FB_CT65550 is not set +# CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set -# CONFIG_FB_S3TRIO is not set # CONFIG_FB_VGA16 is not set -# CONFIG_FB_RIVA is not set +CONFIG_FB_RIVA=y +CONFIG_FB_RIVA_I2C=y +# CONFIG_FB_RIVA_DEBUG is not set CONFIG_FB_MATROX=y CONFIG_FB_MATROX_MILLENIUM=y CONFIG_FB_MATROX_MYSTIQUE=y -CONFIG_FB_MATROX_G450=y -CONFIG_FB_MATROX_G100=y -# CONFIG_FB_MATROX_I2C is not set +CONFIG_FB_MATROX_G=y +CONFIG_FB_MATROX_I2C=m +CONFIG_FB_MATROX_MAVEN=m CONFIG_FB_MATROX_MULTIHEAD=y # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y @@ -752,6 +908,7 @@ CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set +# CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set @@ -764,10 +921,8 @@ CONFIG_FB_RADEON_I2C=y # Console display driver support # # CONFIG_VGA_CONSOLE is not set -# CONFIG_MDA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y -CONFIG_PCI_CONSOLE=y # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y @@ -779,6 +934,11 @@ CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_BACKLIGHT_CLASS_DEVICE=y +CONFIG_BACKLIGHT_DEVICE=y +CONFIG_LCD_CLASS_DEVICE=y +CONFIG_LCD_DEVICE=y # # Sound @@ -797,13 +957,19 @@ CONFIG_USB=y CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set +# CONFIG_USB_OTG is not set +CONFIG_USB_ARCH_HAS_HCD=y +CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y +# CONFIG_USB_EHCI_SPLIT_ISO is not set +# CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_UHCI_HCD is not set +# CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers @@ -811,8 +977,13 @@ CONFIG_USB_OHCI_HCD=y # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set -CONFIG_USB_STORAGE=y + +# +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# +CONFIG_USB_STORAGE=m # CONFIG_USB_STORAGE_DEBUG is not set +CONFIG_USB_STORAGE_RW_DETECT=y # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set @@ -823,7 +994,7 @@ CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_JUMPSHOT is not set # -# USB Human Interface Devices (HID) +# USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y @@ -833,14 +1004,16 @@ CONFIG_USB_HIDDEV=y # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set +# CONFIG_USB_MTOUCH is not set +# CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set +# CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set -# CONFIG_USB_HPUSBSCSI is not set # # USB Multimedia devices @@ -852,17 +1025,18 @@ CONFIG_USB_HIDDEV=y # # -# USB Network adaptors +# USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set -# CONFIG_USB_PEGASUS is not set +CONFIG_USB_PEGASUS=y # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # # USB port drivers # +# CONFIG_USB_USS720 is not set # # USB Serial Converter support @@ -874,50 +1048,81 @@ CONFIG_USB_HIDDEV=y # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set -# CONFIG_USB_TIGL is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set +# CONFIG_USB_CYTHERM is not set +# CONFIG_USB_PHIDGETKIT is not set +# CONFIG_USB_PHIDGETSERVO is not set +# CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # +# USB ATM/DSL drivers +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set # +# MMC/SD Card support +# +# CONFIG_MMC is not set + +# +# InfiniBand support +# +CONFIG_INFINIBAND=m +CONFIG_INFINIBAND_MTHCA=m +# CONFIG_INFINIBAND_MTHCA_DEBUG is not set +CONFIG_INFINIBAND_IPOIB=m +# CONFIG_INFINIBAND_IPOIB_DEBUG is not set + +# # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y -# CONFIG_EXT2_FS_SECURITY is not set +CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y -# CONFIG_EXT3_FS_SECURITY is not set +CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set +CONFIG_REISERFS_FS_XATTR=y +CONFIG_REISERFS_FS_POSIX_ACL=y +CONFIG_REISERFS_FS_SECURITY=y CONFIG_JFS_FS=y CONFIG_JFS_POSIX_ACL=y +CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y + +# +# XFS support +# CONFIG_XFS_FS=m +CONFIG_XFS_EXPORT=y # CONFIG_XFS_RT is not set # CONFIG_XFS_QUOTA is not set -# CONFIG_XFS_SECURITY is not set +CONFIG_XFS_SECURITY=y CONFIG_XFS_POSIX_ACL=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set -CONFIG_AUTOFS_FS=m +CONFIG_DNOTIFY=y +CONFIG_AUTOFS_FS=y # CONFIG_AUTOFS4_FS is not set # @@ -927,6 +1132,7 @@ CONFIG_ISO9660_FS=y # CONFIG_JOLIET is not set # CONFIG_ZISOFS is not set CONFIG_UDF_FS=m +CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems @@ -934,6 +1140,8 @@ CONFIG_UDF_FS=m CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y +CONFIG_FAT_DEFAULT_CODEPAGE=437 +CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # @@ -941,10 +1149,13 @@ CONFIG_VFAT_FS=y # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y +CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y -# CONFIG_DEVPTS_FS_SECURITY is not set +CONFIG_DEVPTS_FS_SECURITY=y CONFIG_TMPFS=y +CONFIG_TMPFS_XATTR=y +CONFIG_TMPFS_SECURITY=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y @@ -954,8 +1165,8 @@ CONFIG_RAMFS=y # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set -# CONFIG_HFS_FS is not set -# CONFIG_HFSPLUS_FS is not set +CONFIG_HFS_FS=m +CONFIG_HFSPLUS_FS=m # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set @@ -973,18 +1184,23 @@ CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFS_V4=y # CONFIG_NFS_DIRECTIO is not set -CONFIG_NFSD=y +CONFIG_NFSD=m CONFIG_NFSD_V3=y CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y -CONFIG_EXPORTFS=y +CONFIG_EXPORTFS=m CONFIG_SUNRPC=y -CONFIG_SUNRPC_GSS=m -CONFIG_RPCSEC_GSS_KRB5=m +CONFIG_SUNRPC_GSS=y +CONFIG_RPCSEC_GSS_KRB5=y +CONFIG_RPCSEC_GSS_SPKM3=m # CONFIG_SMB_FS is not set CONFIG_CIFS=m +# CONFIG_CIFS_STATS is not set +CONFIG_CIFS_XATTR=y +CONFIG_CIFS_POSIX=y +# CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set @@ -992,51 +1208,66 @@ CONFIG_CIFS=m # # Partition Types # -# CONFIG_PARTITION_ADVANCED is not set +CONFIG_PARTITION_ADVANCED=y +# CONFIG_ACORN_PARTITION is not set +# CONFIG_OSF_PARTITION is not set +# CONFIG_AMIGA_PARTITION is not set +# CONFIG_ATARI_PARTITION is not set +CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y +# CONFIG_BSD_DISKLABEL is not set +# CONFIG_MINIX_SUBPARTITION is not set +# CONFIG_SOLARIS_X86_PARTITION is not set +# CONFIG_UNIXWARE_DISKLABEL is not set +# CONFIG_LDM_PARTITION is not set +# CONFIG_SGI_PARTITION is not set +# CONFIG_ULTRIX_PARTITION is not set +# CONFIG_SUN_PARTITION is not set +# CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" -# CONFIG_NLS_CODEPAGE_437 is not set -# CONFIG_NLS_CODEPAGE_737 is not set -# CONFIG_NLS_CODEPAGE_775 is not set -# CONFIG_NLS_CODEPAGE_850 is not set -# CONFIG_NLS_CODEPAGE_852 is not set -# CONFIG_NLS_CODEPAGE_855 is not set -# CONFIG_NLS_CODEPAGE_857 is not set -# CONFIG_NLS_CODEPAGE_860 is not set -# CONFIG_NLS_CODEPAGE_861 is not set -# CONFIG_NLS_CODEPAGE_862 is not set -# CONFIG_NLS_CODEPAGE_863 is not set -# CONFIG_NLS_CODEPAGE_864 is not set -# CONFIG_NLS_CODEPAGE_865 is not set -# CONFIG_NLS_CODEPAGE_866 is not set -# CONFIG_NLS_CODEPAGE_869 is not set -# CONFIG_NLS_CODEPAGE_936 is not set -# CONFIG_NLS_CODEPAGE_950 is not set -# CONFIG_NLS_CODEPAGE_932 is not set -# CONFIG_NLS_CODEPAGE_949 is not set -# CONFIG_NLS_CODEPAGE_874 is not set -# CONFIG_NLS_ISO8859_8 is not set -# CONFIG_NLS_CODEPAGE_1250 is not set -# CONFIG_NLS_CODEPAGE_1251 is not set -# CONFIG_NLS_ISO8859_1 is not set -# CONFIG_NLS_ISO8859_2 is not set -# CONFIG_NLS_ISO8859_3 is not set -# CONFIG_NLS_ISO8859_4 is not set -# CONFIG_NLS_ISO8859_5 is not set -# CONFIG_NLS_ISO8859_6 is not set -# CONFIG_NLS_ISO8859_7 is not set -# CONFIG_NLS_ISO8859_9 is not set -# CONFIG_NLS_ISO8859_13 is not set -# CONFIG_NLS_ISO8859_14 is not set -# CONFIG_NLS_ISO8859_15 is not set -# CONFIG_NLS_KOI8_R is not set -# CONFIG_NLS_KOI8_U is not set -# CONFIG_NLS_UTF8 is not set +CONFIG_NLS_CODEPAGE_437=m +CONFIG_NLS_CODEPAGE_737=m +CONFIG_NLS_CODEPAGE_775=m +CONFIG_NLS_CODEPAGE_850=m +CONFIG_NLS_CODEPAGE_852=m +CONFIG_NLS_CODEPAGE_855=m +CONFIG_NLS_CODEPAGE_857=m +CONFIG_NLS_CODEPAGE_860=m +CONFIG_NLS_CODEPAGE_861=m +CONFIG_NLS_CODEPAGE_862=m +CONFIG_NLS_CODEPAGE_863=m +CONFIG_NLS_CODEPAGE_864=m +CONFIG_NLS_CODEPAGE_865=m +CONFIG_NLS_CODEPAGE_866=m +CONFIG_NLS_CODEPAGE_869=m +CONFIG_NLS_CODEPAGE_936=m +CONFIG_NLS_CODEPAGE_950=m +CONFIG_NLS_CODEPAGE_932=m +CONFIG_NLS_CODEPAGE_949=m +CONFIG_NLS_CODEPAGE_874=m +CONFIG_NLS_ISO8859_8=m +CONFIG_NLS_CODEPAGE_1250=m +CONFIG_NLS_CODEPAGE_1251=m +CONFIG_NLS_ASCII=m +CONFIG_NLS_ISO8859_1=m +CONFIG_NLS_ISO8859_2=m +CONFIG_NLS_ISO8859_3=m +CONFIG_NLS_ISO8859_4=m +CONFIG_NLS_ISO8859_5=m +CONFIG_NLS_ISO8859_6=m +CONFIG_NLS_ISO8859_7=m +CONFIG_NLS_ISO8859_9=m +CONFIG_NLS_ISO8859_13=m +CONFIG_NLS_ISO8859_14=m +CONFIG_NLS_ISO8859_15=m +CONFIG_NLS_KOI8_R=m +CONFIG_NLS_KOI8_U=m +CONFIG_NLS_UTF8=m # # Profiling support @@ -1048,19 +1279,26 @@ CONFIG_OPROFILE=y # Kernel hacking # CONFIG_DEBUG_KERNEL=y +CONFIG_MAGIC_SYSRQ=y +# CONFIG_SCHEDSTATS is not set +# CONFIG_DEBUG_SLAB is not set +# CONFIG_DEBUG_SPINLOCK_SLEEP is not set +# CONFIG_DEBUG_KOBJECT is not set +# CONFIG_DEBUG_INFO is not set +CONFIG_DEBUG_FS=y CONFIG_DEBUG_STACKOVERFLOW=y +# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y -# CONFIG_DEBUG_SLAB is not set -CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUGGER=y CONFIG_XMON=y -CONFIG_XMON_DEFAULT=y +# CONFIG_XMON_DEFAULT is not set # CONFIG_PPCDBG is not set -# CONFIG_DEBUG_INFO is not set +CONFIG_IRQSTACKS=y # # Security options # +# CONFIG_KEYS is not set # CONFIG_SECURITY is not set # @@ -1070,24 +1308,36 @@ CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m -CONFIG_CRYPTO_MD5=m +CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m -CONFIG_CRYPTO_DES=m +CONFIG_CRYPTO_WP512=m +CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m +CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m +CONFIG_CRYPTO_KHAZAD=m +CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=m +CONFIG_CRYPTO_MICHAEL_MIC=m +CONFIG_CRYPTO_CRC32C=m CONFIG_CRYPTO_TEST=m # +# Hardware crypto devices +# + +# # Library routines # +CONFIG_CRC_CCITT=m CONFIG_CRC32=y +CONFIG_LIBCRC32C=m CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=m From cfriesen at nortel.com Tue Feb 8 01:44:48 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Mon, 07 Feb 2005 08:44:48 -0600 Subject: question on symbol exports In-Reply-To: <1107595148.30302.5.camel@gaston> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> <1107595148.30302.5.camel@gaston> Message-ID: <42077EE0.2060505@nortel.com> Benjamin Herrenschmidt wrote: >>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a >>module I needed to export the following symbols: >> >>__flush_tlb_pending >>ppc64_tlb_batch >>hpte_update > > > Any reason why you need to call that from a module ? Is the module > GPL'd ? I explained this at the beginning of the thread, but I'll do so again. The module will be released under the GPL. The basic idea is that we want to be able to track pages dirtied by a userspace process. The system has no swap, so we use the dirty bit for this. On demand we look up the page tables for an address range specified by the caller, store the addresses of any dirty pages, then mark them clean so that the next write causes them to get marked dirty again. It is this act of marking them clean that requires the additional exports. I've included the current code below. If there is any way to accomplish this without the additional exports, I'd love to hear about it. Chris Note: this code is run while holding &mm->mmap_sem and &mm->page_table_lock. for(addr=start&PAGE_MASK; addr<=end; addr+=PAGE_SIZE) { pte_t *ptep=0; ptep = va_to_ptep_map(mm, addr); if (!ptep) goto unmap_continue; if (!pte_dirty(*ptep)) goto unmap_continue; /* We have a user readable dirty page. Count it.*/ dirty_count++; if (dirty_count <= entries) { __put_user(addr, buf); buf++; ptep_clear_flush_dirty(find_vma(mm, addr), addr, ptep); /* Handle option to stop early. */ if ((dirty_count == entries) && (options & STOP_WHEN_BUF_FULL)) addr=end+1; } unmap_continue: if (ptep) pte_unmap(ptep); } From olh at suse.de Tue Feb 8 01:45:01 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:45:01 +0100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <20050207144228.GB5516@suse.de> References: <20050207144228.GB5516@suse.de> Message-ID: <20050207144501.GC5516@suse.de> This updates the G5 defconfig, disables some option for hardware that is not used on such toys. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/configs/g5_defconfig ./arch/ppc64/configs/g5_defconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/configs/g5_defconfig 2005-02-03 02:55:06.000000000 +0100 +++ ./arch/ppc64/configs/g5_defconfig 2005-02-07 14:58:27.000000000 +0100 @@ -1,11 +1,12 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.9-rc3 -# Thu Oct 7 15:18:38 2004 +# Linux kernel version: 2.6.11-rc3-bk3 +# Mon Feb 7 14:54:48 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y +CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y @@ -18,6 +19,7 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y +CONFIG_LOCK_KERNEL=y # # General setup @@ -25,25 +27,28 @@ CONFIG_CLEAN_COMPILE=y CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y -# CONFIG_POSIX_MQUEUE is not set +CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y -# CONFIG_AUDIT is not set +CONFIG_AUDIT=y +CONFIG_AUDITSYSCALL=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_HOTPLUG=y -# CONFIG_IKCONFIG is not set +CONFIG_KOBJECT_UEVENT=y +CONFIG_IKCONFIG=y +CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y +CONFIG_CC_ALIGN_FUNCTIONS=0 +CONFIG_CC_ALIGN_LABELS=0 +CONFIG_CC_ALIGN_LOOPS=0 +CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # @@ -51,10 +56,11 @@ CONFIG_SHMEM=y # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y -CONFIG_MODULE_FORCE_UNLOAD=y +# CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y -# CONFIG_MODVERSIONS is not set -# CONFIG_KMOD is not set +CONFIG_MODVERSIONS=y +CONFIG_MODULE_SRCVERSION_ALL=y +CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y @@ -65,6 +71,7 @@ CONFIG_SYSVIPC_COMPAT=y CONFIG_PPC_MULTIPLATFORM=y # CONFIG_PPC_PSERIES is not set CONFIG_PPC_PMAC=y +# CONFIG_PPC_MAPLE is not set CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y @@ -75,12 +82,10 @@ CONFIG_BOOTX_TEXT=y CONFIG_POWER4_ONLY=y CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y -CONFIG_IRQ_ALL_CPUS=y CONFIG_NR_CPUS=2 # CONFIG_SCHED_SMT is not set # CONFIG_PREEMPT is not set -# CONFIG_PPC_RTAS is not set -# CONFIG_LPARCFG is not set +CONFIG_GENERIC_HARDIRQS=y # # General setup @@ -91,12 +96,15 @@ CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y -# CONFIG_HOTPLUG_CPU is not set # -# PCMCIA/CardBus support +# PCCARD (PCMCIA/CardBus) support +# +# CONFIG_PCCARD is not set + +# +# PC-card bridges # -# CONFIG_PCMCIA is not set # # PCI Hotplug Support @@ -139,14 +147,29 @@ CONFIG_FW_LOADER=y # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set +# CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y -CONFIG_BLK_DEV_RAM_SIZE=8192 +CONFIG_BLK_DEV_RAM_COUNT=16 +CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y +CONFIG_INITRAMFS_SOURCE="" +CONFIG_CDROM_PKTCDVD=m +CONFIG_CDROM_PKTCDVD_BUFFERS=8 +# CONFIG_CDROM_PKTCDVD_WCACHE is not set + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +# CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support @@ -161,11 +184,10 @@ CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y -CONFIG_BLK_DEV_IDETAPE=y -CONFIG_BLK_DEV_IDEFLOPPY=y +# CONFIG_BLK_DEV_IDETAPE is not set +# CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set -# CONFIG_IDE_TASKFILE_IO is not set # # IDE chipset support/bugfixes @@ -205,7 +227,6 @@ CONFIG_BLK_DEV_IDE_PMAC=y CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST=y CONFIG_BLK_DEV_IDEDMA_PMAC=y # CONFIG_BLK_DEV_IDE_PMAC_BLINK is not set -CONFIG_BLK_DEV_IDEDMA_PMAC_AUTO=y # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set @@ -240,6 +261,7 @@ CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_SPI_ATTRS=y # CONFIG_SCSI_FC_ATTRS is not set +# CONFIG_SCSI_ISCSI_ATTRS is not set # # SCSI low-level drivers @@ -254,6 +276,7 @@ CONFIG_SCSI_SPI_ATTRS=y # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y +# CONFIG_SCSI_SATA_AHCI is not set CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_NV is not set @@ -261,6 +284,7 @@ CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set # CONFIG_SCSI_SATA_SIS is not set +# CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set # CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set @@ -270,12 +294,9 @@ CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set +# CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set -CONFIG_SCSI_SYM53C8XX_2=y -CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 -CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 -CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 -# CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set +# CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set @@ -286,11 +307,9 @@ CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set -# CONFIG_SCSI_QLA6322 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set -# CONFIG_SCSI_MAC53C94 is not set # # Multi-device support (RAID and LVM) @@ -300,15 +319,16 @@ CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y -# CONFIG_MD_RAID10 is not set +CONFIG_MD_RAID10=m CONFIG_MD_RAID5=y -# CONFIG_MD_RAID6 is not set -# CONFIG_MD_MULTIPATH is not set +CONFIG_MD_RAID6=m +CONFIG_MD_MULTIPATH=m +CONFIG_MD_FAULTY=m CONFIG_BLK_DEV_DM=y -# CONFIG_DM_CRYPT is not set -# CONFIG_DM_SNAPSHOT is not set -# CONFIG_DM_MIRROR is not set -# CONFIG_DM_ZERO is not set +CONFIG_DM_CRYPT=m +CONFIG_DM_SNAPSHOT=m +CONFIG_DM_MIRROR=m +CONFIG_DM_ZERO=m # # Fusion MPT device support @@ -386,6 +406,8 @@ CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_INET_TUNNEL=y +CONFIG_IP_TCPDIAG=m +# CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration @@ -398,61 +420,71 @@ CONFIG_NETFILTER=y # # IP: Netfilter Configuration # -CONFIG_IP_NF_CONNTRACK=y -# CONFIG_IP_NF_CT_ACCT is not set -# CONFIG_IP_NF_CT_PROTO_SCTP is not set -# CONFIG_IP_NF_FTP is not set -# CONFIG_IP_NF_IRC is not set -# CONFIG_IP_NF_TFTP is not set -# CONFIG_IP_NF_AMANDA is not set -CONFIG_IP_NF_QUEUE=y -CONFIG_IP_NF_IPTABLES=y -CONFIG_IP_NF_MATCH_LIMIT=y -CONFIG_IP_NF_MATCH_IPRANGE=y -CONFIG_IP_NF_MATCH_MAC=y -CONFIG_IP_NF_MATCH_PKTTYPE=y -CONFIG_IP_NF_MATCH_MARK=y -CONFIG_IP_NF_MATCH_MULTIPORT=y -CONFIG_IP_NF_MATCH_TOS=y -CONFIG_IP_NF_MATCH_RECENT=y -CONFIG_IP_NF_MATCH_ECN=y -CONFIG_IP_NF_MATCH_DSCP=y -CONFIG_IP_NF_MATCH_AH_ESP=y -CONFIG_IP_NF_MATCH_LENGTH=y -CONFIG_IP_NF_MATCH_TTL=y -CONFIG_IP_NF_MATCH_TCPMSS=y -CONFIG_IP_NF_MATCH_HELPER=y -CONFIG_IP_NF_MATCH_STATE=y -CONFIG_IP_NF_MATCH_CONNTRACK=y -CONFIG_IP_NF_MATCH_OWNER=y -# CONFIG_IP_NF_MATCH_ADDRTYPE is not set -# CONFIG_IP_NF_MATCH_REALM is not set -# CONFIG_IP_NF_MATCH_SCTP is not set -# CONFIG_IP_NF_MATCH_COMMENT is not set -CONFIG_IP_NF_FILTER=y -CONFIG_IP_NF_TARGET_REJECT=y -CONFIG_IP_NF_TARGET_LOG=y -CONFIG_IP_NF_TARGET_ULOG=y -CONFIG_IP_NF_TARGET_TCPMSS=y -CONFIG_IP_NF_NAT=y +CONFIG_IP_NF_CONNTRACK=m +CONFIG_IP_NF_CT_ACCT=y +CONFIG_IP_NF_CONNTRACK_MARK=y +CONFIG_IP_NF_CT_PROTO_SCTP=m +CONFIG_IP_NF_FTP=m +CONFIG_IP_NF_IRC=m +CONFIG_IP_NF_TFTP=m +CONFIG_IP_NF_AMANDA=m +CONFIG_IP_NF_QUEUE=m +CONFIG_IP_NF_IPTABLES=m +CONFIG_IP_NF_MATCH_LIMIT=m +CONFIG_IP_NF_MATCH_IPRANGE=m +CONFIG_IP_NF_MATCH_MAC=m +CONFIG_IP_NF_MATCH_PKTTYPE=m +CONFIG_IP_NF_MATCH_MARK=m +CONFIG_IP_NF_MATCH_MULTIPORT=m +CONFIG_IP_NF_MATCH_TOS=m +CONFIG_IP_NF_MATCH_RECENT=m +CONFIG_IP_NF_MATCH_ECN=m +CONFIG_IP_NF_MATCH_DSCP=m +CONFIG_IP_NF_MATCH_AH_ESP=m +CONFIG_IP_NF_MATCH_LENGTH=m +CONFIG_IP_NF_MATCH_TTL=m +CONFIG_IP_NF_MATCH_TCPMSS=m +CONFIG_IP_NF_MATCH_HELPER=m +CONFIG_IP_NF_MATCH_STATE=m +CONFIG_IP_NF_MATCH_CONNTRACK=m +CONFIG_IP_NF_MATCH_OWNER=m +CONFIG_IP_NF_MATCH_ADDRTYPE=m +CONFIG_IP_NF_MATCH_REALM=m +CONFIG_IP_NF_MATCH_SCTP=m +CONFIG_IP_NF_MATCH_COMMENT=m +CONFIG_IP_NF_MATCH_CONNMARK=m +CONFIG_IP_NF_MATCH_HASHLIMIT=m +CONFIG_IP_NF_FILTER=m +CONFIG_IP_NF_TARGET_REJECT=m +CONFIG_IP_NF_TARGET_LOG=m +CONFIG_IP_NF_TARGET_ULOG=m +CONFIG_IP_NF_TARGET_TCPMSS=m +CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y -CONFIG_IP_NF_TARGET_MASQUERADE=y -CONFIG_IP_NF_TARGET_REDIRECT=y -CONFIG_IP_NF_TARGET_NETMAP=y -CONFIG_IP_NF_TARGET_SAME=y -# CONFIG_IP_NF_NAT_SNMP_BASIC is not set -CONFIG_IP_NF_MANGLE=y -CONFIG_IP_NF_TARGET_TOS=y -CONFIG_IP_NF_TARGET_ECN=y -CONFIG_IP_NF_TARGET_DSCP=y -CONFIG_IP_NF_TARGET_MARK=y -CONFIG_IP_NF_TARGET_CLASSIFY=y -# CONFIG_IP_NF_RAW is not set -CONFIG_IP_NF_ARPTABLES=y -CONFIG_IP_NF_ARPFILTER=y -CONFIG_IP_NF_ARP_MANGLE=y +CONFIG_IP_NF_TARGET_MASQUERADE=m +CONFIG_IP_NF_TARGET_REDIRECT=m +CONFIG_IP_NF_TARGET_NETMAP=m +CONFIG_IP_NF_TARGET_SAME=m +CONFIG_IP_NF_NAT_SNMP_BASIC=m +CONFIG_IP_NF_NAT_IRC=m +CONFIG_IP_NF_NAT_FTP=m +CONFIG_IP_NF_NAT_TFTP=m +CONFIG_IP_NF_NAT_AMANDA=m +CONFIG_IP_NF_MANGLE=m +CONFIG_IP_NF_TARGET_TOS=m +CONFIG_IP_NF_TARGET_ECN=m +CONFIG_IP_NF_TARGET_DSCP=m +CONFIG_IP_NF_TARGET_MARK=m +CONFIG_IP_NF_TARGET_CLASSIFY=m +CONFIG_IP_NF_TARGET_CONNMARK=m +CONFIG_IP_NF_TARGET_CLUSTERIP=m +CONFIG_IP_NF_RAW=m +CONFIG_IP_NF_TARGET_NOTRACK=m +CONFIG_IP_NF_ARPTABLES=m +CONFIG_IP_NF_ARPFILTER=m +CONFIG_IP_NF_ARP_MANGLE=m CONFIG_XFRM=y -# CONFIG_XFRM_USER is not set +CONFIG_XFRM_USER=m # # SCTP Configuration (EXPERIMENTAL) @@ -471,13 +503,12 @@ CONFIG_LLC=y # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set -# CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set -# CONFIG_NET_CLS_ROUTE is not set +CONFIG_NET_CLS_ROUTE=y # # Network testing @@ -587,7 +618,7 @@ CONFIG_INPUT=y # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y -CONFIG_INPUT_MOUSEDEV_PSAUX=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_INPUT_JOYDEV=m @@ -663,7 +694,6 @@ CONFIG_LEGACY_PTY_COUNT=256 # # Ftape, the floppy tape device driver # -# CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 @@ -693,6 +723,7 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set CONFIG_I2C_KEYWEST=y +# CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set @@ -701,6 +732,7 @@ CONFIG_I2C_KEYWEST=y # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set +# CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set @@ -712,20 +744,25 @@ CONFIG_I2C_KEYWEST=y # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set @@ -768,6 +805,7 @@ CONFIG_I2C_KEYWEST=y # CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y +CONFIG_FB_TILEBLITTING=y # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set @@ -789,6 +827,7 @@ CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set +# CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set @@ -814,6 +853,11 @@ CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_BACKLIGHT_CLASS_DEVICE=m +CONFIG_BACKLIGHT_DEVICE=y +CONFIG_LCD_CLASS_DEVICE=m +CONFIG_LCD_DEVICE=y # # Sound @@ -833,6 +877,8 @@ CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set +CONFIG_USB_ARCH_HAS_HCD=y +CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers @@ -842,6 +888,7 @@ CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_UHCI_HCD is not set +# CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers @@ -849,9 +896,13 @@ CONFIG_USB_OHCI_HCD=y # CONFIG_USB_BLUETOOTH_TTY is not set CONFIG_USB_ACM=m CONFIG_USB_PRINTER=y + +# +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set -# CONFIG_USB_STORAGE_RW_DETECT is not set +CONFIG_USB_STORAGE_RW_DETECT=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y @@ -862,7 +913,7 @@ CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y # -# USB Human Interface Devices (HID) +# USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y @@ -885,7 +936,6 @@ CONFIG_USB_HIDDEV=y # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set -# CONFIG_USB_HPUSBSCSI is not set # # USB Multimedia devices @@ -897,12 +947,12 @@ CONFIG_USB_HIDDEV=y # # -# USB Network adaptors +# USB Network Adapters # -# CONFIG_USB_CATC is not set -# CONFIG_USB_KAWETH is not set -# CONFIG_USB_PEGASUS is not set -# CONFIG_USB_RTL8150 is not set +CONFIG_USB_CATC=m +CONFIG_USB_KAWETH=m +CONFIG_USB_PEGASUS=m +CONFIG_USB_RTL8150=m CONFIG_USB_USBNET=m # @@ -914,6 +964,7 @@ CONFIG_USB_BELKIN=y CONFIG_USB_GENESYS=y CONFIG_USB_NET1080=y CONFIG_USB_PL2301=y +CONFIG_USB_KC2190=y # # Intelligent USB Devices/Gadgets @@ -939,6 +990,7 @@ CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m +CONFIG_USB_SERIAL_CYPRESS_M8=m CONFIG_USB_SERIAL_EMPEG=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_VISOR=m @@ -946,6 +998,8 @@ CONFIG_USB_SERIAL_IPAQ=m CONFIG_USB_SERIAL_IR=m CONFIG_USB_SERIAL_EDGEPORT=m CONFIG_USB_SERIAL_EDGEPORT_TI=m +CONFIG_USB_SERIAL_GARMIN=m +CONFIG_USB_SERIAL_IPW=m CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m CONFIG_USB_SERIAL_KEYSPAN_MPR=y @@ -966,6 +1020,7 @@ CONFIG_USB_SERIAL_MCT_U232=m CONFIG_USB_SERIAL_PL2303=m CONFIG_USB_SERIAL_SAFE=m CONFIG_USB_SERIAL_SAFE_PADDED=y +CONFIG_USB_SERIAL_TI=m CONFIG_USB_SERIAL_CYBERJACK=m CONFIG_USB_SERIAL_XIRCOM=m CONFIG_USB_SERIAL_OMNINET=m @@ -976,42 +1031,72 @@ CONFIG_USB_EZUSB=y # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set -# CONFIG_USB_TIGL is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set +# CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set +# CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # +# USB ATM/DSL drivers +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set # +# MMC/SD Card support +# +# CONFIG_MMC is not set + +# +# InfiniBand support +# +# CONFIG_INFINIBAND is not set + +# # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y -# CONFIG_EXT2_FS_SECURITY is not set +CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y -# CONFIG_EXT3_FS_SECURITY is not set +CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y -# CONFIG_REISERFS_FS is not set +CONFIG_REISERFS_FS=y +# CONFIG_REISERFS_CHECK is not set +# CONFIG_REISERFS_PROC_INFO is not set +CONFIG_REISERFS_FS_XATTR=y +CONFIG_REISERFS_FS_POSIX_ACL=y +CONFIG_REISERFS_FS_SECURITY=y # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y -# CONFIG_XFS_FS is not set + +# +# XFS support +# +CONFIG_XFS_FS=m +CONFIG_XFS_EXPORT=y +# CONFIG_XFS_RT is not set +# CONFIG_XFS_QUOTA is not set +CONFIG_XFS_SECURITY=y +CONFIG_XFS_POSIX_ACL=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set +CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m # CONFIG_AUTOFS4_FS is not set @@ -1019,8 +1104,9 @@ CONFIG_AUTOFS_FS=m # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y -# CONFIG_JOLIET is not set -# CONFIG_ZISOFS is not set +CONFIG_JOLIET=y +CONFIG_ZISOFS=y +CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y @@ -1044,6 +1130,8 @@ CONFIG_SYSFS=y CONFIG_DEVPTS_FS_XATTR=y # CONFIG_DEVPTS_FS_SECURITY is not set CONFIG_TMPFS=y +CONFIG_TMPFS_XATTR=y +CONFIG_TMPFS_SECURITY=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y @@ -1053,8 +1141,8 @@ CONFIG_RAMFS=y # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set -# CONFIG_HFS_FS is not set -# CONFIG_HFSPLUS_FS is not set +CONFIG_HFS_FS=m +CONFIG_HFSPLUS_FS=m # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set @@ -1087,7 +1175,7 @@ CONFIG_RPCSEC_GSS_KRB5=y CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set # CONFIG_CIFS_XATTR is not set -# CONFIG_CIFS_POSIX is not set +# CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set @@ -1117,7 +1205,7 @@ CONFIG_MSDOS_PARTITION=y # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" -# CONFIG_NLS_CODEPAGE_437 is not set +CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set @@ -1138,10 +1226,10 @@ CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set -# CONFIG_NLS_CODEPAGE_1250 is not set -# CONFIG_NLS_CODEPAGE_1251 is not set -# CONFIG_NLS_ASCII is not set -# CONFIG_NLS_ISO8859_1 is not set +CONFIG_NLS_CODEPAGE_1250=y +CONFIG_NLS_CODEPAGE_1251=y +CONFIG_NLS_ASCII=y +CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set @@ -1151,10 +1239,10 @@ CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set -# CONFIG_NLS_ISO8859_15 is not set +CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set -# CONFIG_NLS_UTF8 is not set +CONFIG_NLS_UTF8=y # # Profiling support @@ -1167,19 +1255,23 @@ CONFIG_OPROFILE=y # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y +# CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set +# CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set +CONFIG_DEBUG_FS=y # CONFIG_DEBUG_STACKOVERFLOW is not set +# CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUGGER is not set # CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y -# CONFIG_SCHEDSTATS is not set # # Security options # +# CONFIG_KEYS is not set # CONFIG_SECURITY is not set # @@ -1193,7 +1285,7 @@ CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m -# CONFIG_CRYPTO_WP512 is not set +CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m @@ -1201,19 +1293,24 @@ CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m -# CONFIG_CRYPTO_TEA is not set +CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m -# CONFIG_CRYPTO_KHAZAD is not set +CONFIG_CRYPTO_KHAZAD=m +CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=m CONFIG_CRYPTO_MICHAEL_MIC=m -# CONFIG_CRYPTO_CRC32C is not set +CONFIG_CRYPTO_CRC32C=m CONFIG_CRYPTO_TEST=m # +# Hardware crypto devices +# + +# # Library routines # CONFIG_CRC_CCITT=m CONFIG_CRC32=y -# CONFIG_LIBCRC32C is not set +CONFIG_LIBCRC32C=m CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=m From olh at suse.de Tue Feb 8 01:48:33 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:48:33 +0100 Subject: [PATCH] update ppc64 iSeries_defconfig In-Reply-To: <20050207144501.GC5516@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> Message-ID: <20050207144833.GD5516@suse.de> update iSeries defconfig, compile tested. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/configs/iSeries_defconfig ./arch/ppc64/configs/iSeries_defconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/configs/iSeries_defconfig 2005-02-03 02:57:04.000000000 +0100 +++ ./arch/ppc64/configs/iSeries_defconfig 2005-02-07 14:15:15.000000000 +0100 @@ -1,9 +1,12 @@ # # Automatically generated make config: don't edit +# Linux kernel version: 2.6.11-rc3-bk3 +# Mon Feb 7 14:15:15 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y +CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y @@ -16,10 +19,12 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y +CONFIG_LOCK_KERNEL=y # # General setup # +CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y @@ -29,6 +34,7 @@ CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_HOTPLUG=y +CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set @@ -37,12 +43,12 @@ CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y +CONFIG_CC_ALIGN_FUNCTIONS=0 +CONFIG_CC_ALIGN_LABELS=0 +CONFIG_CC_ALIGN_LOOPS=0 +CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # @@ -52,8 +58,9 @@ CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y -# CONFIG_MODVERSIONS is not set -# CONFIG_KMOD is not set +CONFIG_MODVERSIONS=y +CONFIG_MODULE_SRCVERSION_ALL=y +CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y @@ -61,15 +68,17 @@ CONFIG_SYSVIPC_COMPAT=y # Platform support # CONFIG_PPC_ISERIES=y -# CONFIG_PPC_PSERIES is not set +# CONFIG_PPC_MULTIPLATFORM is not set CONFIG_PPC=y CONFIG_PPC64=y +CONFIG_IBMVIO=y # CONFIG_POWER4_ONLY is not set CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y CONFIG_NR_CPUS=32 # CONFIG_SCHED_SMT is not set # CONFIG_PREEMPT is not set +CONFIG_GENERIC_HARDIRQS=y CONFIG_MSCHUNKS=y CONFIG_LPARCFG=y @@ -82,12 +91,15 @@ CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y -# CONFIG_HOTPLUG_CPU is not set # -# PCMCIA/CardBus support +# PCCARD (PCMCIA/CardBus) support +# +# CONFIG_PCCARD is not set + +# +# PC-card bridges # -# CONFIG_PCMCIA is not set # # PCI Hotplug Support @@ -128,13 +140,26 @@ CONFIG_FW_LOADER=m # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set +# CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set CONFIG_BLK_DEV_RAM=y -CONFIG_BLK_DEV_RAM_SIZE=4096 +CONFIG_BLK_DEV_RAM_COUNT=16 +CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y +CONFIG_INITRAMFS_SOURCE="" +# CONFIG_CDROM_PKTCDVD is not set + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +# CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support @@ -169,6 +194,7 @@ CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y +# CONFIG_SCSI_ISCSI_ATTRS is not set # # SCSI low-level drivers @@ -191,6 +217,7 @@ CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set CONFIG_SCSI_IBMVSCSI=m +# CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set @@ -203,7 +230,6 @@ CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set -# CONFIG_SCSI_QLA6322 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set @@ -220,6 +246,7 @@ CONFIG_MD_RAID10=m CONFIG_MD_RAID5=y CONFIG_MD_RAID6=m CONFIG_MD_MULTIPATH=m +CONFIG_MD_FAULTY=m CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=m CONFIG_DM_SNAPSHOT=m @@ -271,6 +298,8 @@ CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_INET_TUNNEL=y +CONFIG_IP_TCPDIAG=m +# CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration @@ -284,6 +313,9 @@ CONFIG_NETFILTER=y # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m +CONFIG_IP_NF_CT_ACCT=y +CONFIG_IP_NF_CONNTRACK_MARK=y +CONFIG_IP_NF_CT_PROTO_SCTP=m CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m CONFIG_IP_NF_TFTP=m @@ -308,8 +340,17 @@ CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_OWNER=m +CONFIG_IP_NF_MATCH_ADDRTYPE=m +CONFIG_IP_NF_MATCH_REALM=m +CONFIG_IP_NF_MATCH_SCTP=m +CONFIG_IP_NF_MATCH_COMMENT=m +CONFIG_IP_NF_MATCH_CONNMARK=m +CONFIG_IP_NF_MATCH_HASHLIMIT=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m +CONFIG_IP_NF_TARGET_LOG=m +CONFIG_IP_NF_TARGET_ULOG=m +CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m @@ -327,21 +368,13 @@ CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_CLASSIFY=m -CONFIG_IP_NF_TARGET_LOG=m -CONFIG_IP_NF_TARGET_ULOG=m -CONFIG_IP_NF_TARGET_TCPMSS=m +CONFIG_IP_NF_TARGET_CONNMARK=m +CONFIG_IP_NF_TARGET_CLUSTERIP=m +CONFIG_IP_NF_RAW=m +CONFIG_IP_NF_TARGET_NOTRACK=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m -CONFIG_IP_NF_COMPAT_IPCHAINS=m -CONFIG_IP_NF_COMPAT_IPFWADM=m -CONFIG_IP_NF_TARGET_NOTRACK=m -CONFIG_IP_NF_RAW=m -CONFIG_IP_NF_MATCH_ADDRTYPE=m -CONFIG_IP_NF_MATCH_REALM=m -# CONFIG_IP_NF_CT_ACCT is not set -CONFIG_IP_NF_MATCH_SCTP=m -CONFIG_IP_NF_CT_PROTO_SCTP=m CONFIG_XFRM=y CONFIG_XFRM_USER=m @@ -362,7 +395,6 @@ CONFIG_LLC=y # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set -# CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing @@ -425,19 +457,21 @@ CONFIG_E100=y # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set -# CONFIG_VIA_VELOCITY is not set # # Ethernet (1000 Mbit) # -# CONFIG_ACENIC is not set +CONFIG_ACENIC=m +# CONFIG_ACENIC_OMIT_TIGON_I is not set # CONFIG_DL2K is not set -# CONFIG_E1000 is not set +CONFIG_E1000=m +# CONFIG_E1000_NAPI is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set +# CONFIG_VIA_VELOCITY is not set # CONFIG_TIGON3 is not set # @@ -498,7 +532,7 @@ CONFIG_INPUT=y # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y -CONFIG_INPUT_MOUSEDEV_PSAUX=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set @@ -511,11 +545,8 @@ CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y -CONFIG_SERIO=y +# CONFIG_SERIO is not set # CONFIG_SERIO_I8042 is not set -CONFIG_SERIO_SERPORT=y -# CONFIG_SERIO_CT82C710 is not set -# CONFIG_SERIO_PCIPS2 is not set # # Input Device Drivers @@ -563,7 +594,6 @@ CONFIG_LEGACY_PTY_COUNT=256 # # Ftape, the floppy tape device driver # -# CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 @@ -606,6 +636,12 @@ CONFIG_MAX_RAW_DEVS=256 # USB support # # CONFIG_USB is not set +CONFIG_USB_ARCH_HAS_HCD=y +CONFIG_USB_ARCH_HAS_OHCI=y + +# +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# # # USB Gadget Support @@ -613,16 +649,26 @@ CONFIG_MAX_RAW_DEVS=256 # CONFIG_USB_GADGET is not set # +# MMC/SD Card support +# +# CONFIG_MMC is not set + +# +# InfiniBand support +# +# CONFIG_INFINIBAND is not set + +# # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y -# CONFIG_EXT2_FS_SECURITY is not set +CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y -# CONFIG_EXT3_FS_SECURITY is not set +CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y @@ -631,20 +677,27 @@ CONFIG_REISERFS_FS=y # CONFIG_REISERFS_PROC_INFO is not set CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y -# CONFIG_REISERFS_FS_SECURITY is not set +CONFIG_REISERFS_FS_SECURITY=y CONFIG_JFS_FS=m CONFIG_JFS_POSIX_ACL=y +CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y + +# +# XFS support +# CONFIG_XFS_FS=m +CONFIG_XFS_EXPORT=y # CONFIG_XFS_RT is not set # CONFIG_XFS_QUOTA is not set -# CONFIG_XFS_SECURITY is not set +CONFIG_XFS_SECURITY=y CONFIG_XFS_POSIX_ACL=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set +CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m # CONFIG_AUTOFS4_FS is not set @@ -652,8 +705,9 @@ CONFIG_AUTOFS_FS=m # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y -# CONFIG_JOLIET is not set -# CONFIG_ZISOFS is not set +CONFIG_JOLIET=y +CONFIG_ZISOFS=y +CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y @@ -675,8 +729,10 @@ CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y -# CONFIG_DEVPTS_FS_SECURITY is not set +CONFIG_DEVPTS_FS_SECURITY=y CONFIG_TMPFS=y +CONFIG_TMPFS_XATTR=y +CONFIG_TMPFS_SECURITY=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y @@ -721,6 +777,7 @@ CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y +# CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set @@ -795,19 +852,23 @@ CONFIG_OPROFILE=y # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y +# CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set +# CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set +CONFIG_DEBUG_FS=y CONFIG_DEBUG_STACKOVERFLOW=y +# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y # CONFIG_DEBUGGER is not set # CONFIG_PPCDBG is not set -# CONFIG_IRQSTACKS is not set -# CONFIG_SCHEDSTATS is not set +CONFIG_IRQSTACKS=y # # Security options # +# CONFIG_KEYS is not set # CONFIG_SECURITY is not set # @@ -821,7 +882,7 @@ CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m -CONFIG_CRYPTO_WHIRLPOOL=m +CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m @@ -832,12 +893,17 @@ CONFIG_CRYPTO_CAST6=m CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m CONFIG_CRYPTO_KHAZAD=m +CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_CRC32C=m CONFIG_CRYPTO_TEST=m # +# Hardware crypto devices +# + +# # Library routines # CONFIG_CRC_CCITT=m From olh at suse.de Tue Feb 8 01:50:00 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:50:00 +0100 Subject: [PATCH] update ppc64 maple_defconfig In-Reply-To: <20050207144833.GD5516@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <20050207144833.GD5516@suse.de> Message-ID: <20050207145000.GA6591@suse.de> update Maple defconfig, compile tested. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/configs/maple_defconfig ./arch/ppc64/configs/maple_defconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/configs/maple_defconfig 2005-02-03 02:55:15.000000000 +0100 +++ ./arch/ppc64/configs/maple_defconfig 2005-02-07 14:43:12.000000000 +0100 @@ -1,11 +1,12 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.9 -# Wed Oct 20 15:39:14 2004 +# Linux kernel version: 2.6.11-rc3-bk3 +# Mon Feb 7 14:43:12 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y +CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y @@ -18,6 +19,7 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y +CONFIG_LOCK_KERNEL=y # # General setup @@ -25,12 +27,13 @@ CONFIG_CLEAN_COMPILE=y CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y -# CONFIG_POSIX_MQUEUE is not set +CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_LOG_BUF_SHIFT=17 # CONFIG_HOTPLUG is not set +CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set @@ -39,12 +42,12 @@ CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y +CONFIG_CC_ALIGN_FUNCTIONS=0 +CONFIG_CC_ALIGN_LABELS=0 +CONFIG_CC_ALIGN_LOOPS=0 +CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # @@ -52,9 +55,10 @@ CONFIG_SHMEM=y # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y -CONFIG_MODULE_FORCE_UNLOAD=y +# CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y -# CONFIG_MODVERSIONS is not set +CONFIG_MODVERSIONS=y +CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y @@ -77,10 +81,10 @@ CONFIG_BOOTX_TEXT=y CONFIG_POWER4_ONLY=y CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y -# CONFIG_IRQ_ALL_CPUS is not set CONFIG_NR_CPUS=2 # CONFIG_SCHED_SMT is not set # CONFIG_PREEMPT is not set +CONFIG_GENERIC_HARDIRQS=y # # General setup @@ -91,6 +95,20 @@ CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y + +# +# PCCARD (PCMCIA/CardBus) support +# +# CONFIG_PCCARD is not set + +# +# PC-card bridges +# + +# +# PCI Hotplug Support +# +# CONFIG_HOTPLUG_PCI is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set @@ -103,6 +121,7 @@ CONFIG_PROC_DEVICETREE=y # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y +# CONFIG_FW_LOADER is not set # CONFIG_DEBUG_DRIVER is not set # @@ -127,13 +146,26 @@ CONFIG_PREVENT_FIRMWARE_BUILD=y # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set +# CONFIG_BLK_DEV_COW_COMMON is not set # CONFIG_BLK_DEV_LOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y +CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=8192 # CONFIG_BLK_DEV_INITRD is not set +CONFIG_INITRAMFS_SOURCE="" +# CONFIG_CDROM_PKTCDVD is not set + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +# CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support @@ -151,7 +183,6 @@ CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set CONFIG_IDE_TASK_IOCTL=y -CONFIG_IDE_TASKFILE_IO=y # # IDE chipset support/bugfixes @@ -250,6 +281,8 @@ CONFIG_IP_PNP_DHCP=y # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_INET_TUNNEL is not set +CONFIG_IP_TCPDIAG=y +# CONFIG_IP_TCPDIAG_IPV6 is not set # CONFIG_IPV6 is not set # CONFIG_NETFILTER is not set @@ -269,7 +302,6 @@ CONFIG_IP_PNP_DHCP=y # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set -# CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing @@ -330,7 +362,6 @@ CONFIG_AMD8111_ETH=y # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set -# CONFIG_VIA_VELOCITY is not set # # Ethernet (1000 Mbit) @@ -344,6 +375,7 @@ CONFIG_E1000=y # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set +# CONFIG_VIA_VELOCITY is not set # CONFIG_TIGON3 is not set # @@ -461,7 +493,6 @@ CONFIG_LEGACY_PTY_COUNT=256 # # Ftape, the floppy tape device driver # -# CONFIG_AGP is not set # CONFIG_DRM is not set # CONFIG_RAW_DRIVER is not set @@ -489,6 +520,7 @@ CONFIG_I2C_AMD8111=y # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set +# CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set @@ -497,6 +529,7 @@ CONFIG_I2C_AMD8111=y # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set +# CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set @@ -508,20 +541,25 @@ CONFIG_I2C_AMD8111=y # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set @@ -588,6 +626,8 @@ CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set +CONFIG_USB_ARCH_HAS_HCD=y +CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers @@ -597,6 +637,7 @@ CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_UHCI_HCD=y +# CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers @@ -604,10 +645,14 @@ CONFIG_USB_UHCI_HCD=y # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set + +# +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# # CONFIG_USB_STORAGE is not set # -# USB Human Interface Devices (HID) +# USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y @@ -637,7 +682,7 @@ CONFIG_USB_HIDINPUT=y # # -# USB Network adaptors +# USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set @@ -657,6 +702,7 @@ CONFIG_USB_SERIAL=y CONFIG_USB_SERIAL_GENERIC=y # CONFIG_USB_SERIAL_BELKIN is not set # CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set +CONFIG_USB_SERIAL_CYPRESS_M8=m # CONFIG_USB_SERIAL_EMPEG is not set # CONFIG_USB_SERIAL_FTDI_SIO is not set # CONFIG_USB_SERIAL_VISOR is not set @@ -664,6 +710,8 @@ CONFIG_USB_SERIAL_GENERIC=y # CONFIG_USB_SERIAL_IR is not set # CONFIG_USB_SERIAL_EDGEPORT is not set # CONFIG_USB_SERIAL_EDGEPORT_TI is not set +CONFIG_USB_SERIAL_GARMIN=m +CONFIG_USB_SERIAL_IPW=m # CONFIG_USB_SERIAL_KEYSPAN_PDA is not set CONFIG_USB_SERIAL_KEYSPAN=y CONFIG_USB_SERIAL_KEYSPAN_MPR=y @@ -683,6 +731,7 @@ CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y # CONFIG_USB_SERIAL_MCT_U232 is not set # CONFIG_USB_SERIAL_PL2303 is not set # CONFIG_USB_SERIAL_SAFE is not set +CONFIG_USB_SERIAL_TI=m # CONFIG_USB_SERIAL_CYBERJACK is not set # CONFIG_USB_SERIAL_XIRCOM is not set # CONFIG_USB_SERIAL_OMNINET is not set @@ -693,22 +742,37 @@ CONFIG_USB_EZUSB=y # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set -# CONFIG_USB_TIGL is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set +# CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set +# CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # +# USB ATM/DSL drivers +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set # +# MMC/SD Card support +# +# CONFIG_MMC is not set + +# +# InfiniBand support +# +# CONFIG_INFINIBAND is not set + +# # File systems # CONFIG_EXT2_FS=y @@ -719,10 +783,15 @@ CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set + +# +# XFS support +# # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set +CONFIG_DNOTIFY=y # CONFIG_AUTOFS_FS is not set # CONFIG_AUTOFS4_FS is not set @@ -752,8 +821,10 @@ CONFIG_SYSFS=y CONFIG_DEVPTS_FS_XATTR=y # CONFIG_DEVPTS_FS_SECURITY is not set CONFIG_TMPFS=y -# CONFIG_HUGETLBFS is not set -# CONFIG_HUGETLB_PAGE is not set +CONFIG_TMPFS_XATTR=y +CONFIG_TMPFS_SECURITY=y +CONFIG_HUGETLBFS=y +CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # @@ -766,7 +837,7 @@ CONFIG_RAMFS=y # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set -# CONFIG_CRAMFS is not set +CONFIG_CRAMFS=y # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set @@ -784,7 +855,6 @@ CONFIG_NFS_V4=y CONFIG_ROOT_NFS=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y -# CONFIG_EXPORTFS is not set CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y CONFIG_RPCSEC_GSS_KRB5=y @@ -869,21 +939,25 @@ CONFIG_NLS_UTF8=y # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y +# CONFIG_SCHEDSTATS is not set CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SPINLOCK_SLEEP=y +# CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set +CONFIG_DEBUG_FS=y CONFIG_DEBUG_STACKOVERFLOW=y +# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUGGER=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y # CONFIG_PPCDBG is not set # CONFIG_IRQSTACKS is not set -# CONFIG_SCHEDSTATS is not set # # Security options # +# CONFIG_KEYS is not set # CONFIG_SECURITY is not set # @@ -908,14 +982,20 @@ CONFIG_CRYPTO_DES=y # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set +# CONFIG_CRYPTO_ANUBIS is not set # CONFIG_CRYPTO_DEFLATE is not set # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_TEST is not set # +# Hardware crypto devices +# + +# # Library routines # CONFIG_CRC_CCITT=y CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set +CONFIG_ZLIB_INFLATE=y From olh at suse.de Tue Feb 8 01:52:07 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 15:52:07 +0100 Subject: [PATCH] update ppc64 pSeries_defconfig In-Reply-To: <20050207145000.GA6591@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <20050207144833.GD5516@suse.de> <20050207145000.GA6591@suse.de> Message-ID: <20050207145207.GA6596@suse.de> update pSeries defconfig, compile tested. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/configs/pSeries_defconfig ./arch/ppc64/configs/pSeries_defconfig --- ../linux-2.6.11-rc3.orig/arch/ppc64/configs/pSeries_defconfig 2005-02-03 02:57:17.000000000 +0100 +++ ./arch/ppc64/configs/pSeries_defconfig 2005-02-07 14:08:06.000000000 +0100 @@ -1,11 +1,12 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.9-rc2 -# Thu Sep 23 16:45:05 2004 +# Linux kernel version: 2.6.11-rc3-bk3 +# Mon Feb 7 14:08:06 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y +CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y @@ -18,6 +19,7 @@ CONFIG_FORCE_MAX_ZONEORDER=13 # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y +CONFIG_LOCK_KERNEL=y # # General setup @@ -32,6 +34,7 @@ CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_HOTPLUG=y +CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set @@ -40,12 +43,12 @@ CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y -CONFIG_IOSCHED_NOOP=y -CONFIG_IOSCHED_AS=y -CONFIG_IOSCHED_DEADLINE=y -CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y +CONFIG_CC_ALIGN_FUNCTIONS=0 +CONFIG_CC_ALIGN_LABELS=0 +CONFIG_CC_ALIGN_LOOPS=0 +CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # @@ -55,8 +58,9 @@ CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y -# CONFIG_MODVERSIONS is not set -# CONFIG_KMOD is not set +CONFIG_MODVERSIONS=y +CONFIG_MODULE_SRCVERSION_ALL=y +CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y @@ -67,22 +71,26 @@ CONFIG_SYSVIPC_COMPAT=y CONFIG_PPC_MULTIPLATFORM=y CONFIG_PPC_PSERIES=y # CONFIG_PPC_PMAC is not set +# CONFIG_PPC_MAPLE is not set CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y CONFIG_ALTIVEC=y CONFIG_PPC_SPLPAR=y +CONFIG_IBMVIO=y +# CONFIG_U3_DART is not set # CONFIG_BOOTX_TEXT is not set # CONFIG_POWER4_ONLY is not set CONFIG_IOMMU_VMERGE=y CONFIG_SMP=y -CONFIG_IRQ_ALL_CPUS=y CONFIG_NR_CPUS=128 # CONFIG_HMT is not set CONFIG_DISCONTIGMEM=y CONFIG_NUMA=y CONFIG_SCHED_SMT=y # CONFIG_PREEMPT is not set +CONFIG_EEH=y +CONFIG_GENERIC_HARDIRQS=y CONFIG_PPC_RTAS=y CONFIG_RTAS_FLASH=m CONFIG_SCANLOG=m @@ -100,9 +108,13 @@ CONFIG_PCI_NAMES=y CONFIG_HOTPLUG_CPU=y # -# PCMCIA/CardBus support +# PCCARD (PCMCIA/CardBus) support +# +# CONFIG_PCCARD is not set + +# +# PC-card bridges # -# CONFIG_PCMCIA is not set # # PCI Hotplug Support @@ -110,7 +122,6 @@ CONFIG_HOTPLUG_CPU=y CONFIG_HOTPLUG_PCI=m # CONFIG_HOTPLUG_PCI_FAKE is not set # CONFIG_HOTPLUG_PCI_CPCI is not set -# CONFIG_HOTPLUG_PCI_PCIE is not set # CONFIG_HOTPLUG_PCI_SHPC is not set CONFIG_HOTPLUG_PCI_RPA=m CONFIG_HOTPLUG_PCI_RPA_DLPAR=m @@ -137,7 +148,14 @@ CONFIG_FW_LOADER=y # # Parallel port support # -# CONFIG_PARPORT is not set +CONFIG_PARPORT=m +CONFIG_PARPORT_PC=m +CONFIG_PARPORT_PC_CML1=m +# CONFIG_PARPORT_SERIAL is not set +# CONFIG_PARPORT_PC_FIFO is not set +# CONFIG_PARPORT_PC_SUPERIO is not set +# CONFIG_PARPORT_OTHER is not set +# CONFIG_PARPORT_1284 is not set # # Plug and Play support @@ -147,18 +165,32 @@ CONFIG_FW_LOADER=y # Block devices # CONFIG_BLK_DEV_FD=m +# CONFIG_PARIDE is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set +# CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y -CONFIG_BLK_DEV_RAM_SIZE=4096 +CONFIG_BLK_DEV_RAM_COUNT=16 +CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_BLK_DEV_INITRD=y +CONFIG_INITRAMFS_SOURCE="" +# CONFIG_CDROM_PKTCDVD is not set + +# +# IO Schedulers +# +CONFIG_IOSCHED_NOOP=y +CONFIG_IOSCHED_AS=y +CONFIG_IOSCHED_DEADLINE=y +CONFIG_IOSCHED_CFQ=y +# CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support @@ -177,7 +209,6 @@ CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set -# CONFIG_IDE_TASKFILE_IO is not set # # IDE chipset support/bugfixes @@ -247,6 +278,7 @@ CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y +CONFIG_SCSI_ISCSI_ATTRS=m # # SCSI low-level drivers @@ -269,15 +301,18 @@ CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set CONFIG_SCSI_IBMVSCSI=y +# CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set +# CONFIG_SCSI_PPA is not set +# CONFIG_SCSI_IMM is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set CONFIG_SCSI_IPR=y -# CONFIG_SCSI_IPR_TRACE is not set -# CONFIG_SCSI_IPR_DUMP is not set +CONFIG_SCSI_IPR_TRACE=y +CONFIG_SCSI_IPR_DUMP=y # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set @@ -287,7 +322,6 @@ CONFIG_SCSI_QLA22XX=m CONFIG_SCSI_QLA2300=m CONFIG_SCSI_QLA2322=m CONFIG_SCSI_QLA6312=m -CONFIG_SCSI_QLA6322=m # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set @@ -304,6 +338,7 @@ CONFIG_MD_RAID10=m CONFIG_MD_RAID5=y CONFIG_MD_RAID6=m CONFIG_MD_MULTIPATH=m +CONFIG_MD_FAULTY=m CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=m CONFIG_DM_SNAPSHOT=m @@ -355,6 +390,8 @@ CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_INET_TUNNEL=y +CONFIG_IP_TCPDIAG=m +# CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration @@ -368,7 +405,8 @@ CONFIG_NETFILTER=y # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m -# CONFIG_IP_NF_CT_ACCT is not set +CONFIG_IP_NF_CT_ACCT=y +CONFIG_IP_NF_CONNTRACK_MARK=y CONFIG_IP_NF_CT_PROTO_SCTP=m CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m @@ -397,6 +435,9 @@ CONFIG_IP_NF_MATCH_OWNER=m CONFIG_IP_NF_MATCH_ADDRTYPE=m CONFIG_IP_NF_MATCH_REALM=m CONFIG_IP_NF_MATCH_SCTP=m +CONFIG_IP_NF_MATCH_COMMENT=m +CONFIG_IP_NF_MATCH_CONNMARK=m +CONFIG_IP_NF_MATCH_HASHLIMIT=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_LOG=m @@ -419,13 +460,13 @@ CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_CLASSIFY=m +CONFIG_IP_NF_TARGET_CONNMARK=m +CONFIG_IP_NF_TARGET_CLUSTERIP=m CONFIG_IP_NF_RAW=m CONFIG_IP_NF_TARGET_NOTRACK=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m -CONFIG_IP_NF_COMPAT_IPCHAINS=m -CONFIG_IP_NF_COMPAT_IPFWADM=m CONFIG_XFRM=y CONFIG_XFRM_USER=m @@ -446,7 +487,6 @@ CONFIG_LLC=y # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set -# CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing @@ -512,7 +552,6 @@ CONFIG_E100=y # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set -# CONFIG_VIA_VELOCITY is not set # # Ethernet (1000 Mbit) @@ -527,6 +566,7 @@ CONFIG_E1000=y # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set +# CONFIG_VIA_VELOCITY is not set CONFIG_TIGON3=y # @@ -536,6 +576,7 @@ CONFIG_IXGB=m # CONFIG_IXGB_NAPI is not set CONFIG_S2IO=m # CONFIG_S2IO_NAPI is not set +# CONFIG_2BUFF_MODE is not set # # Token Ring devices @@ -556,6 +597,7 @@ CONFIG_IBMOL=y # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set +# CONFIG_PLIP is not set CONFIG_PPP=m # CONFIG_PPP_MULTILINK is not set # CONFIG_PPP_FILTER is not set @@ -588,7 +630,7 @@ CONFIG_INPUT=y # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y -CONFIG_INPUT_MOUSEDEV_PSAUX=y +# CONFIG_INPUT_MOUSEDEV_PSAUX is not set CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set @@ -605,7 +647,9 @@ CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set +# CONFIG_SERIO_PARKBD is not set # CONFIG_SERIO_PCIPS2 is not set +CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # @@ -624,7 +668,7 @@ CONFIG_MOUSE_PS2=y # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y -# CONFIG_INPUT_PCSPKR is not set +CONFIG_INPUT_PCSPKR=m # CONFIG_INPUT_UINPUT is not set # @@ -653,6 +697,9 @@ CONFIG_SERIAL_ICOM=m CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 +# CONFIG_PRINTER is not set +# CONFIG_PPDEV is not set +# CONFIG_TIPAR is not set CONFIG_HVC_CONSOLE=y CONFIG_HVCS=m @@ -674,7 +721,6 @@ CONFIG_HVCS=m # # Ftape, the floppy tape device driver # -# CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=1024 @@ -703,7 +749,9 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set +# CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set +# CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set @@ -711,6 +759,7 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set +# CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set @@ -722,20 +771,25 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set +# CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set +# CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set +# CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set +# CONFIG_SENSORS_PC87360 is not set +# CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set @@ -778,6 +832,7 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y +# CONFIG_FB_TILEBLITTING is not set # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set @@ -790,8 +845,7 @@ CONFIG_FB_OF=y CONFIG_FB_MATROX=y CONFIG_FB_MATROX_MILLENIUM=y CONFIG_FB_MATROX_MYSTIQUE=y -CONFIG_FB_MATROX_G450=y -CONFIG_FB_MATROX_G100=y +CONFIG_FB_MATROX_G=y # CONFIG_FB_MATROX_I2C is not set CONFIG_FB_MATROX_MULTIHEAD=y # CONFIG_FB_RADEON_OLD is not set @@ -800,6 +854,7 @@ CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set +# CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set @@ -825,6 +880,11 @@ CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y +CONFIG_BACKLIGHT_LCD_SUPPORT=y +CONFIG_BACKLIGHT_CLASS_DEVICE=m +CONFIG_BACKLIGHT_DEVICE=y +CONFIG_LCD_CLASS_DEVICE=m +CONFIG_LCD_DEVICE=y # # Sound @@ -844,6 +904,8 @@ CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set +CONFIG_USB_ARCH_HAS_HCD=y +CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers @@ -853,6 +915,7 @@ CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_UHCI_HCD is not set +# CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers @@ -860,6 +923,10 @@ CONFIG_USB_OHCI_HCD=y # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set + +# +# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information +# CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_RW_DETECT is not set @@ -873,7 +940,7 @@ CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_JUMPSHOT is not set # -# USB Human Interface Devices (HID) +# USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y @@ -893,7 +960,6 @@ CONFIG_USB_HIDDEV=y # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set -# CONFIG_USB_HPUSBSCSI is not set # # USB Multimedia devices @@ -905,7 +971,7 @@ CONFIG_USB_HIDDEV=y # # -# USB Network adaptors +# USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set @@ -916,6 +982,7 @@ CONFIG_USB_HIDDEV=y # # USB port drivers # +# CONFIG_USB_USS720 is not set # # USB Serial Converter support @@ -927,32 +994,51 @@ CONFIG_USB_HIDDEV=y # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set -# CONFIG_USB_TIGL is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set +# CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set +# CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # +# USB ATM/DSL drivers +# + +# # USB Gadget Support # # CONFIG_USB_GADGET is not set # +# MMC/SD Card support +# +# CONFIG_MMC is not set + +# +# InfiniBand support +# +CONFIG_INFINIBAND=m +CONFIG_INFINIBAND_MTHCA=m +# CONFIG_INFINIBAND_MTHCA_DEBUG is not set +CONFIG_INFINIBAND_IPOIB=m +# CONFIG_INFINIBAND_IPOIB_DEBUG is not set + +# # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y -# CONFIG_EXT2_FS_SECURITY is not set +CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y -# CONFIG_EXT3_FS_SECURITY is not set +CONFIG_EXT3_FS_SECURITY=y CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y @@ -961,20 +1047,27 @@ CONFIG_REISERFS_FS=y # CONFIG_REISERFS_PROC_INFO is not set CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y -# CONFIG_REISERFS_FS_SECURITY is not set +CONFIG_REISERFS_FS_SECURITY=y CONFIG_JFS_FS=m CONFIG_JFS_POSIX_ACL=y +CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y + +# +# XFS support +# CONFIG_XFS_FS=m +CONFIG_XFS_EXPORT=y # CONFIG_XFS_RT is not set # CONFIG_XFS_QUOTA is not set -# CONFIG_XFS_SECURITY is not set +CONFIG_XFS_SECURITY=y CONFIG_XFS_POSIX_ACL=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set +CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m # CONFIG_AUTOFS4_FS is not set @@ -982,8 +1075,9 @@ CONFIG_AUTOFS_FS=m # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y -# CONFIG_JOLIET is not set -# CONFIG_ZISOFS is not set +CONFIG_JOLIET=y +CONFIG_ZISOFS=y +CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y @@ -1005,8 +1099,10 @@ CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y -# CONFIG_DEVPTS_FS_SECURITY is not set +CONFIG_DEVPTS_FS_SECURITY=y CONFIG_TMPFS=y +CONFIG_TMPFS_XATTR=y +CONFIG_TMPFS_SECURITY=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y @@ -1035,13 +1131,13 @@ CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFS_V4=y # CONFIG_NFS_DIRECTIO is not set -CONFIG_NFSD=m +CONFIG_NFSD=y CONFIG_NFSD_V3=y CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y -CONFIG_EXPORTFS=m +CONFIG_EXPORTFS=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y CONFIG_RPCSEC_GSS_KRB5=y @@ -1051,6 +1147,7 @@ CONFIG_CIFS=m # CONFIG_CIFS_STATS is not set CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y +# CONFIG_CIFS_EXPERIMENTAL is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set @@ -1116,21 +1213,25 @@ CONFIG_OPROFILE=y # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y +# CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set +# CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set +CONFIG_DEBUG_FS=y CONFIG_DEBUG_STACKOVERFLOW=y +# CONFIG_KPROBES is not set CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUGGER=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y # CONFIG_PPCDBG is not set CONFIG_IRQSTACKS=y -# CONFIG_SCHEDSTATS is not set # # Security options # +# CONFIG_KEYS is not set # CONFIG_SECURITY is not set # @@ -1144,7 +1245,7 @@ CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m -CONFIG_CRYPTO_WHIRLPOOL=m +CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m @@ -1155,12 +1256,17 @@ CONFIG_CRYPTO_CAST6=m CONFIG_CRYPTO_TEA=m CONFIG_CRYPTO_ARC4=m CONFIG_CRYPTO_KHAZAD=m +CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_DEFLATE=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_CRC32C=m CONFIG_CRYPTO_TEST=m # +# Hardware crypto devices +# + +# # Library routines # CONFIG_CRC_CCITT=m From olh at suse.de Tue Feb 8 02:12:22 2005 From: olh at suse.de (Olaf Hering) Date: Mon, 7 Feb 2005 16:12:22 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 Message-ID: <20050207151222.GA7219@suse.de> make install passes the zImage to the installkernel script. When an initrd is used, this script has to pull out the vmlinux from the zImage because yaboot can not boot a zImage+initrd combo. It can only handle vmlinux+initrd or zImage.initrd. Its simple to just pass the plain vmlinux instead. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile ./arch/ppc64/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile 2005-02-03 02:55:14.000000000 +0100 +++ ./arch/ppc64/Makefile 2005-02-07 16:03:59.708074617 +0100 @@ -65,9 +65,7 @@ boottarget-$(CONFIG_PPC_ISERIES) := vmli $(boottarget-y): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ -bootimage-$(CONFIG_PPC_PSERIES) := zImage -bootimage-$(CONFIG_PPC_MAPLE) := zImage -bootimage-$(CONFIG_PPC_ISERIES) := vmlinux +bootimage-y := vmlinux BOOTIMAGE := $(bootimage-y) install: vmlinux $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@ diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile ./arch/ppc64/boot/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile 2005-02-03 02:56:36.000000000 +0100 +++ ./arch/ppc64/boot/Makefile 2005-02-07 16:05:45.966639316 +0100 @@ -117,7 +117,7 @@ $(obj)/imagesize.c: vmlinux.strip awk '{printf "unsigned long vmlinux_memsize = 0x%s;\n", substr($$1,8)}' \ >> $(obj)/imagesize.c -install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) - sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" +install: $(CONFIGURE) $(BOOTIMAGE) + sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(BOOTIMAGE)" "$(INSTALL_PATH)" clean-files := $(addprefix $(objtree)/, $(obj-boot) vmlinux.strip) From benh at kernel.crashing.org Tue Feb 8 08:35:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Feb 2005 08:35:01 +1100 Subject: question on symbol exports In-Reply-To: <42077EE0.2060505@nortel.com> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> <1107595148.30302.5.camel@gaston> <42077EE0.2060505@nortel.com> Message-ID: <1107812101.7734.42.camel@gaston> On Mon, 2005-02-07 at 08:44 -0600, Chris Friesen wrote: > Benjamin Herrenschmidt wrote: > >>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a > >>module I needed to export the following symbols: > >> > >>__flush_tlb_pending > >>ppc64_tlb_batch > >>hpte_update > > > > > > Any reason why you need to call that from a module ? Is the module > > GPL'd ? > > I explained this at the beginning of the thread, but I'll do so again. > The module will be released under the GPL. > > The basic idea is that we want to be able to track pages dirtied by a > userspace process. The system has no swap, so we use the dirty bit for > this. On demand we look up the page tables for an address range > specified by the caller, store the addresses of any dirty pages, then > mark them clean so that the next write causes them to get marked dirty > again. It is this act of marking them clean that requires the > additional exports. > > I've included the current code below. If there is any way to accomplish > this without the additional exports, I'd love to hear about it. Interesting... more than no swap, you must also make sure you have no r/w mmap'ed file (which are technically equivalent to swap). I'm not too fan about exporting those symbols, but I'll talk to paulus, it should be possible at least to EXPORT_SYMBOL_GPL them... Ben. From cfriesen at nortel.com Tue Feb 8 10:02:46 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Mon, 07 Feb 2005 17:02:46 -0600 Subject: question on symbol exports In-Reply-To: <1107812101.7734.42.camel@gaston> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> <1107595148.30302.5.camel@gaston> <42077EE0.2060505@nortel.com> <1107812101.7734.42.camel@gaston> Message-ID: <4207F396.7080408@nortel.com> Benjamin Herrenschmidt wrote: > Interesting... more than no swap, you must also make sure you have no > r/w mmap'ed file (which are technically equivalent to swap). Ah...thanks for the warning. We want to eventually make it work with swap as well, but that's substantially more complicated. > I'm not too fan about exporting those symbols, but I'll talk to paulus, > it should be possible at least to EXPORT_SYMBOL_GPL them... I understand the reluctance. I'm perfectly willing to export it GPL in my private branch as long as you guys don't consider it evil--the module is going to be GPL anyways. The alternative would be for me to build my code directly in to the kernel...just makes it harder for me to debug. Chris From dan at embeddededge.com Tue Feb 8 10:42:24 2005 From: dan at embeddededge.com (Dan Malek) Date: Mon, 7 Feb 2005 18:42:24 -0500 Subject: question on symbol exports In-Reply-To: <1107812101.7734.42.camel@gaston> References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> <1107595148.30302.5.camel@gaston> <42077EE0.2060505@nortel.com> <1107812101.7734.42.camel@gaston> Message-ID: On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote: > Interesting... more than no swap, you must also make sure you have no > r/w mmap'ed file (which are technically equivalent to swap). Yeah, I kinda had a similar thought. Just because you aren't swapping doesn't mean the VM subsystem isn't looking at dirty bits, too. It could potentially steal a page that it thinks can be replaced from either a zero-fill or reading again from persistent storage. -- Dan From lishun at cn.ibm.com Tue Feb 8 14:10:49 2005 From: lishun at cn.ibm.com (Shun Li) Date: Tue, 8 Feb 2005 11:10:49 +0800 Subject: Shun Li is out of the office. Message-ID: I will be out of the office starting 2005-02-08 and will not return until 2005-02-15. I am out off office from 2/8-2/15 for vacation and Chinese New Year. Will have no mail process from 2/9-2/12, pls contact me through +86-13522042228 for urgent case. Happy Chinese New Year! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050208/cd806f2a/attachment.htm From benh at kernel.crashing.org Tue Feb 8 15:04:15 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Feb 2005 15:04:15 +1100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <20050207144501.GC5516@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> Message-ID: <1107835456.7687.72.camel@gaston> On Mon, 2005-02-07 at 15:45 +0100, Olaf Hering wrote: > This updates the G5 defconfig, disables some option for hardware that is > not used on such toys. Hi ! Good idea, a few comments tho... > +CONFIG_AUDIT=y > +CONFIG_AUDITSYSCALL=y Do we want these ? > -CONFIG_IP_NF_*=y .../... > +CONFIG_IP_NF_*=y I suspect the above was done by Linus (that is putting them all built-in :) You may have to argue with him on these or do a linus_defconfig :) > +CONFIG_BACKLIGHT_LCD_SUPPORT=y > +CONFIG_BACKLIGHT_CLASS_DEVICE=m > +CONFIG_BACKLIGHT_DEVICE=y > +CONFIG_LCD_CLASS_DEVICE=m > +CONFIG_LCD_DEVICE=y Do we have any use of the above on ppc yet ? > -# CONFIG_REISERFS_FS is not set > +CONFIG_REISERFS_FS=y Ahem ... no comment :) > -# CONFIG_NLS_CODEPAGE_437 is not set > +CONFIG_NLS_CODEPAGE_437=y Good idea :) > +CONFIG_DEBUG_FS=y Hrm... Ben. From olh at suse.de Tue Feb 8 16:38:42 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 8 Feb 2005 06:38:42 +0100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <1107835456.7687.72.camel@gaston> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <1107835456.7687.72.camel@gaston> Message-ID: <20050208053842.GA19495@suse.de> On Tue, Feb 08, Benjamin Herrenschmidt wrote: > > -# CONFIG_REISERFS_FS is not set > > +CONFIG_REISERFS_FS=y > > Ahem ... no comment :) I get a panic without that option. No idea why. From benh at kernel.crashing.org Tue Feb 8 17:11:38 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 08 Feb 2005 17:11:38 +1100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <20050208053842.GA19495@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <1107835456.7687.72.camel@gaston> <20050208053842.GA19495@suse.de> Message-ID: <1107843098.7734.89.camel@gaston> On Tue, 2005-02-08 at 06:38 +0100, Olaf Hering wrote: > On Tue, Feb 08, Benjamin Herrenschmidt wrote: > > > > -# CONFIG_REISERFS_FS is not set > > > +CONFIG_REISERFS_FS=y > > > > Ahem ... no comment :) > > I get a panic without that option. No idea why. Use a real filesystem :) Ben. From olh at suse.de Tue Feb 8 17:17:43 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 8 Feb 2005 07:17:43 +0100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <1107843098.7734.89.camel@gaston> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <1107835456.7687.72.camel@gaston> <20050208053842.GA19495@suse.de> <1107843098.7734.89.camel@gaston> Message-ID: <20050208061743.GA20744@suse.de> On Tue, Feb 08, Benjamin Herrenschmidt wrote: > On Tue, 2005-02-08 at 06:38 +0100, Olaf Hering wrote: > > On Tue, Feb 08, Benjamin Herrenschmidt wrote: > > > > > > -# CONFIG_REISERFS_FS is not set > > > > +CONFIG_REISERFS_FS=y > > > > > > Ahem ... no comment :) > > > > I get a panic without that option. No idea why. > > Use a real filesystem :) XFS was also disabled, and reiser4 is not yet available ;) I will check the audit stuff today. From hch at lst.de Tue Feb 8 20:47:46 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 8 Feb 2005 10:47:46 +0100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <1107835456.7687.72.camel@gaston> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <1107835456.7687.72.camel@gaston> Message-ID: <20050208094746.GA24100@lst.de> On Tue, Feb 08, 2005 at 03:04:15PM +1100, Benjamin Herrenschmidt wrote: > On Mon, 2005-02-07 at 15:45 +0100, Olaf Hering wrote: > > This updates the G5 defconfig, disables some option for hardware that is > > not used on such toys. > > Hi ! Good idea, a few comments tho... > > > +CONFIG_AUDIT=y > > +CONFIG_AUDITSYSCALL=y > > Do we want these ? CONFIG_AUDIT is nessecary for Selinux to work. If you have SELINUX enable it's nessecary, else pointless. CONFIG_AUDITSYSCALL is the broken partial syscall auditing support from RH and should be disabled. From greg.weeks at timesys.com Tue Feb 8 23:33:03 2005 From: greg.weeks at timesys.com (Greg Weeks) Date: Tue, 08 Feb 2005 07:33:03 -0500 Subject: KGDB Message-ID: <4208B17F.4010000@timesys.com> Is anyone currently working on KGDB for ppc64? When I checked with Amit Kale he didn't know af any work going on, but I wanted to be sure. Greg Weeks From cfriesen at nortel.com Wed Feb 9 02:36:20 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Tue, 08 Feb 2005 09:36:20 -0600 Subject: question on symbol exports In-Reply-To: References: <41FECA18.50609@nortelnetworks.com> <1107243398.4208.47.camel@laptopd505.fenrus.org> <41FFA21C.8060203@nortelnetworks.com> <1107273017.4208.132.camel@laptopd505.fenrus.org> <20050204203050.GA5889@dmt.cnet> <4203D793.1040604@nortel.com> <1107595148.30302.5.camel@gaston> <42077EE0.2060505@nortel.com> <1107812101.7734.42.camel@gaston> Message-ID: <4208DC74.4000107@nortel.com> Dan Malek wrote: > > On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote: > >> Interesting... more than no swap, you must also make sure you have no >> r/w mmap'ed file (which are technically equivalent to swap). > > > Yeah, I kinda had a similar thought. Just because you aren't > swapping doesn't mean the VM subsystem isn't looking at dirty bits, > too. It could potentially steal a page that it thinks can be replaced > from either a zero-fill or reading again from persistent storage. In our existing case, the app also mlock()s the pages in question. This should get around these two possible sources of inaccuracy. Chris From sfr at canb.auug.org.au Wed Feb 9 18:34:37 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 9 Feb 2005 18:34:37 +1100 Subject: [PATCH] build without PCI or VIO Message-ID: <20050209183437.30302d44.sfr@canb.auug.org.au> Hi Anton, all, This patch (on top of my previous dma fix up patch) allows you to build pSeries without CONFIG_PCI or CONFIG_VIO or both and iSeries without PCI. Don't look to closely at the include/asm-ppc64/floppy.h patch :-). Built on pSeries without PCI and VIO (and both). Built and booted in iSeries without PCI. Please comment. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-dma.4/arch/ppc64/Kconfig linus-bk-dma.5/arch/ppc64/Kconfig --- linus-bk-dma.4/arch/ppc64/Kconfig 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/Kconfig 2005-02-09 18:11:46.000000000 +1100 @@ -126,6 +126,11 @@ config IBMVIO depends on PPC_PSERIES || PPC_ISERIES + bool "Support for virtual I/O" if (EMBEDDED && PPC_PSERIES) + default y + +config IBMIOMMU + depends on IBMVIO || PCI bool default y @@ -236,7 +241,7 @@ config EEH bool "PCI Extended Error Handling (EEH)" if EMBEDDED - depends on PPC_PSERIES + depends on PPC_PSERIES && PCI default y if !EMBEDDED # @@ -295,7 +300,7 @@ bool config PCI - bool + bool "support for PCI devices" if (EMBEDDED && (PPC_PSERIES || PPC_ISERIES)) default y help Find out whether your system includes a PCI bus. PCI is the name of diff -ruN linus-bk-dma.4/arch/ppc64/kernel/Makefile linus-bk-dma.5/arch/ppc64/kernel/Makefile --- linus-bk-dma.4/arch/ppc64/kernel/Makefile 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/Makefile 2005-02-09 18:12:41.000000000 +1100 @@ -11,27 +11,32 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + sysfs.o obj-$(CONFIG_PPC_OF) += of_device.o -pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o +pci-obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o +pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o \ + XmPciLpEvent.o pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o obj-$(CONFIG_PCI) += pci.o pci_iommu.o iomap.o $(pci-obj-y) -obj-$(CONFIG_PPC_ISERIES) += iSeries_irq.o \ - iSeries_VpdInfo.o XmPciLpEvent.o \ +iommu-obj-$(CONFIG_PPC_PSERIES) += pSeries_iommu.o +iommu-obj-$(CONFIG_PPC_ISERIES) += iSeries_iommu.o + +obj-$(CONFIG_IBMIOMMU) += iommu.o $(iommu-obj-y) + +obj-$(CONFIG_PPC_ISERIES) += iSeries_irq.o iSeries_VpdInfo.o \ HvCall.o HvLpConfig.o LparData.o \ iSeries_setup.o ItLpQueue.o hvCall.o \ - mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o \ - iSeries_iommu.o + mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o -obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ +obj-$(CONFIG_PPC_PSERIES) += pSeries_lpar.o pSeries_hvCall.o \ pSeries_nvram.o rtasd.o ras.o \ - xics.o rtas.o pSeries_setup.o pSeries_iommu.o + xics.o rtas.o pSeries_setup.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o diff -ruN linus-bk-dma.4/arch/ppc64/kernel/dma.c linus-bk-dma.5/arch/ppc64/kernel/dma.c --- linus-bk-dma.4/arch/ppc64/kernel/dma.c 2005-02-07 17:47:41.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/dma.c 2005-02-08 17:10:00.000000000 +1100 @@ -15,8 +15,10 @@ static struct dma_mapping_ops *get_dma_ops(struct device *dev) { +#ifdef CONFIG_PCI if (dev->bus == &pci_bus_type) return &pci_dma_ops; +#endif #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return &vio_dma_ops; @@ -37,8 +39,10 @@ int dma_set_mask(struct device *dev, u64 dma_mask) { +#ifdef CONFIG_PCI if (dev->bus == &pci_bus_type) return pci_set_dma_mask(to_pci_dev(dev), dma_mask); +#endif #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return -EIO; diff -ruN linus-bk-dma.4/arch/ppc64/kernel/iSeries_iommu.c linus-bk-dma.5/arch/ppc64/kernel/iSeries_iommu.c --- linus-bk-dma.4/arch/ppc64/kernel/iSeries_iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/iSeries_iommu.c 2005-02-09 18:16:16.000000000 +1100 @@ -34,7 +34,9 @@ #include #include +#ifdef CONFIG_PCI extern struct list_head iSeries_Global_Device_List; +#endif static void tce_build_iSeries(struct iommu_table *tbl, long index, long npages, @@ -84,6 +86,7 @@ } +#ifdef CONFIG_PCI /* * This function compares the known tables to find an iommu_table * that has already been built for hardware TCEs. @@ -162,14 +165,17 @@ static void iommu_dev_setup_iSeries(struct pci_dev *dev) { } static void iommu_bus_setup_iSeries(struct pci_bus *bus) { } +#endif void iommu_init_early_iSeries(void) { ppc_md.tce_build = tce_build_iSeries; ppc_md.tce_free = tce_free_iSeries; +#ifdef CONFIG_PCI ppc_md.iommu_dev_setup = iommu_dev_setup_iSeries; ppc_md.iommu_bus_setup = iommu_bus_setup_iSeries; pci_iommu_init(); +#endif } diff -ruN linus-bk-dma.4/arch/ppc64/kernel/iSeries_irq.c linus-bk-dma.5/arch/ppc64/kernel/iSeries_irq.c --- linus-bk-dma.4/arch/ppc64/kernel/iSeries_irq.c 2004-10-30 08:33:22.000000000 +1000 +++ linus-bk-dma.5/arch/ppc64/kernel/iSeries_irq.c 2005-02-08 17:36:06.000000000 +1100 @@ -65,8 +65,10 @@ /* This is called by init_IRQ. set in ppc_md.init_IRQ by iSeries_setup.c */ void __init iSeries_init_IRQ(void) { +#ifdef CONFIG_PCI /* Register PCI event handler and open an event path */ XmPciLpEvent_init(); +#endif } /* diff -ruN linus-bk-dma.4/arch/ppc64/kernel/iSeries_setup.c linus-bk-dma.5/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-dma.4/arch/ppc64/kernel/iSeries_setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/iSeries_setup.c 2005-02-09 17:22:15.000000000 +1100 @@ -45,7 +45,7 @@ #include #include #include -#include +#include #include #include #include @@ -844,7 +844,9 @@ ppc_md.get_irq = iSeries_get_irq; ppc_md.init_early = iSeries_init_early, +#ifdef CONFIG_PCI ppc_md.pcibios_fixup = iSeries_pci_final_fixup; +#endif ppc_md.restart = iSeries_restart; ppc_md.power_off = iSeries_power_off; diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pSeries_iommu.c linus-bk-dma.5/arch/ppc64/kernel/pSeries_iommu.c --- linus-bk-dma.4/arch/ppc64/kernel/pSeries_iommu.c 2005-02-04 04:10:36.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/pSeries_iommu.c 2005-02-09 13:29:02.000000000 +1100 @@ -236,6 +236,7 @@ } } +#ifdef CONFIG_PCI static void iommu_table_setparms(struct pci_controller *phb, struct device_node *dn, struct iommu_table *tbl) @@ -457,10 +458,12 @@ static void iommu_bus_setup_null(struct pci_bus *b) { } static void iommu_dev_setup_null(struct pci_dev *d) { } +#endif /* These are called very early. */ void iommu_init_early_pSeries(void) { +#ifdef CONFIG_PCI if (of_chosen && get_property(of_chosen, "linux,iommu-off", NULL)) { /* Direct I/O, IOMMU off */ ppc_md.iommu_dev_setup = iommu_dev_setup_null; @@ -469,6 +472,7 @@ return; } +#endif if (systemcfg->platform & PLATFORM_LPAR) { if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { @@ -478,15 +482,21 @@ ppc_md.tce_build = tce_build_pSeriesLP; ppc_md.tce_free = tce_free_pSeriesLP; } +#ifdef CONFIG_PCI ppc_md.iommu_bus_setup = iommu_bus_setup_pSeriesLP; +#endif } else { ppc_md.tce_build = tce_build_pSeries; ppc_md.tce_free = tce_free_pSeries; +#ifdef CONFIG_PCI ppc_md.iommu_bus_setup = iommu_bus_setup_pSeries; +#endif } +#ifdef CONFIG_PCI ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; pci_iommu_init(); +#endif } diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pSeries_setup.c linus-bk-dma.5/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-dma.4/arch/ppc64/kernel/pSeries_setup.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/pSeries_setup.c 2005-02-09 13:32:29.000000000 +1100 @@ -222,10 +222,12 @@ fwnmi_init(); +#ifdef CONFIG_PCI /* Find and initialize PCI host bridges */ init_pci_config_tokens(); eeh_init(); find_and_init_phbs(); +#endif #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; @@ -594,7 +596,9 @@ .init_early = pSeries_init_early, .get_cpuinfo = pSeries_get_cpuinfo, .log_error = pSeries_log_error, +#ifdef CONFIG_PCI .pcibios_fixup = pSeries_final_fixup, +#endif .restart = rtas_restart, .power_off = rtas_power_off, .halt = rtas_halt, diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pci.c linus-bk-dma.5/arch/ppc64/kernel/pci.c --- linus-bk-dma.4/arch/ppc64/kernel/pci.c 2005-02-07 14:45:23.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/pci.c 2005-02-09 16:19:48.000000000 +1100 @@ -63,7 +63,9 @@ * page is mapped and isa_io_limit prevents access to it. */ unsigned long isa_io_base; /* NULL if no ISA bus */ +EXPORT_SYMBOL(isa_io_base); unsigned long pci_io_base; +EXPORT_SYMBOL(pci_io_base); void iSeries_pcibios_init(void); diff -ruN linus-bk-dma.4/arch/ppc64/kernel/ppc_ksyms.c linus-bk-dma.5/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk-dma.4/arch/ppc64/kernel/ppc_ksyms.c 2005-01-12 16:05:22.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/ppc_ksyms.c 2005-02-09 16:20:07.000000000 +1100 @@ -49,9 +49,6 @@ EXPORT_SYMBOL(do_signal); -EXPORT_SYMBOL(isa_io_base); -EXPORT_SYMBOL(pci_io_base); - EXPORT_SYMBOL(strcpy); EXPORT_SYMBOL(strncpy); EXPORT_SYMBOL(strcat); diff -ruN linus-bk-dma.4/arch/ppc64/kernel/prom.c linus-bk-dma.5/arch/ppc64/kernel/prom.c --- linus-bk-dma.4/arch/ppc64/kernel/prom.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/prom.c 2005-02-09 17:09:52.000000000 +1100 @@ -1802,8 +1802,10 @@ */ static void of_cleanup_node(struct device_node *np) { +#ifdef CONFIG_IBMIOMMU if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); +#endif } /* diff -ruN linus-bk-dma.4/arch/ppc64/kernel/sys_ppc32.c linus-bk-dma.5/arch/ppc64/kernel/sys_ppc32.c --- linus-bk-dma.4/arch/ppc64/kernel/sys_ppc32.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/kernel/sys_ppc32.c 2005-02-08 17:26:43.000000000 +1100 @@ -741,6 +741,7 @@ asmlinkage int sys32_pciconfig_iobase(u32 which, u32 in_bus, u32 in_devfn) { +#ifdef CONFIG_PCI struct pci_controller* hose; struct list_head *ln; struct pci_bus *bus = NULL; @@ -786,7 +787,7 @@ case IOBASE_ISA_MEM: return -EINVAL; } - +#endif return -EOPNOTSUPP; } diff -ruN linus-bk-dma.4/arch/ppc64/lib/Makefile linus-bk-dma.5/arch/ppc64/lib/Makefile --- linus-bk-dma.4/arch/ppc64/lib/Makefile 2005-01-04 17:05:28.000000000 +1100 +++ linus-bk-dma.5/arch/ppc64/lib/Makefile 2005-02-08 17:34:53.000000000 +1100 @@ -12,7 +12,7 @@ # e2a provides EBCDIC to ASCII conversions. ifdef CONFIG_PPC_ISERIES -obj-$(CONFIG_PCI) += e2a.o +obj-y += e2a.o endif lib-$(CONFIG_DEBUG_KERNEL) += sstep.o diff -ruN linus-bk-dma.4/drivers/char/Kconfig linus-bk-dma.5/drivers/char/Kconfig --- linus-bk-dma.4/drivers/char/Kconfig 2005-02-04 04:10:36.000000000 +1100 +++ linus-bk-dma.5/drivers/char/Kconfig 2005-02-09 16:33:37.000000000 +1100 @@ -557,7 +557,7 @@ config HVC_CONSOLE bool "pSeries Hypervisor Virtual Console support" - depends on PPC_PSERIES + depends on PPC_PSERIES && IBMVIO help pSeries machines when partitioned support a hypervisor virtual console. This driver allows each pSeries partition to have a console @@ -565,7 +565,7 @@ config HVCS tristate "IBM Hypervisor Virtual Console Server support" - depends on PPC_PSERIES + depends on PPC_PSERIES && IBMVIO help Partitionable IBM Power5 ppc64 machines allow hosting of firmware virtual consoles from one Linux partition by diff -ruN linus-bk-dma.4/drivers/net/Kconfig linus-bk-dma.5/drivers/net/Kconfig --- linus-bk-dma.4/drivers/net/Kconfig 2005-01-20 07:06:57.000000000 +1100 +++ linus-bk-dma.5/drivers/net/Kconfig 2005-02-09 18:26:34.000000000 +1100 @@ -1171,7 +1171,7 @@ config IBMVETH tristate "IBM LAN Virtual Ethernet support" - depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES + depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES && IBMVIO ---help--- This driver supports virtual ethernet adapters on newer IBM iSeries and pSeries systems. diff -ruN linus-bk-dma.4/drivers/pci/hotplug/Makefile linus-bk-dma.5/drivers/pci/hotplug/Makefile --- linus-bk-dma.4/drivers/pci/hotplug/Makefile 2004-11-20 12:05:26.000000000 +1100 +++ linus-bk-dma.5/drivers/pci/hotplug/Makefile 2005-02-09 16:46:14.000000000 +1100 @@ -42,8 +42,10 @@ rpaphp-objs := rpaphp_core.o \ rpaphp_pci.o \ - rpaphp_slot.o \ - rpaphp_vio.o + rpaphp_slot.o +ifdef CONFIG_IBMVIO +rpaphp-objs += rpaphp_vio.o +endif rpadlpar_io-objs := rpadlpar_core.o \ rpadlpar_sysfs.o diff -ruN linus-bk-dma.4/drivers/pci/hotplug/rpaphp_core.c linus-bk-dma.5/drivers/pci/hotplug/rpaphp_core.c --- linus-bk-dma.4/drivers/pci/hotplug/rpaphp_core.c 2005-02-04 06:05:19.000000000 +1100 +++ linus-bk-dma.5/drivers/pci/hotplug/rpaphp_core.c 2005-02-09 16:52:25.000000000 +1100 @@ -157,9 +157,11 @@ case PCI_DEV: retval = rpaphp_get_pci_adapter_status(slot, 0, value); break; +#ifdef CONFIG_IBMVIO case VIO_DEV: retval = rpaphp_get_vio_adapter_status(slot, 0, value); break; +#endif default: retval = -EINVAL; } @@ -363,8 +365,10 @@ dbg("Entry %s: dn->full_name=%s\n", __FUNCTION__, dn->full_name); if (dn->parent && is_vdevice_root(dn->parent)) { +#ifdef CONFIG_IBMVIO /* register a VIO device */ retval = register_vio_slot(dn); +#endif goto exit; } @@ -485,9 +489,11 @@ case PCI_DEV: retval = rpaphp_enable_pci_slot(slot); break; +#ifdef CONFIG_IBMVIO case VIO_DEV: retval = rpaphp_enable_vio_slot(slot); break; +#endif default: retval = -EINVAL; } @@ -515,9 +521,11 @@ case PCI_DEV: retval = rpaphp_unconfig_pci_adapter(slot); break; +#ifdef CONFIG_IBMVIO case VIO_DEV: retval = rpaphp_unconfig_vio_adapter(slot); break; +#endif default: retval = -ENODEV; } diff -ruN linus-bk-dma.4/drivers/scsi/Kconfig linus-bk-dma.5/drivers/scsi/Kconfig --- linus-bk-dma.4/drivers/scsi/Kconfig 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/drivers/scsi/Kconfig 2005-02-09 16:31:46.000000000 +1100 @@ -798,7 +798,7 @@ config SCSI_IBMVSCSI tristate "IBM Virtual SCSI support" - depends on PPC_PSERIES || PPC_ISERIES + depends on (PPC_PSERIES || PPC_ISERIES) && IBMVIO help This is the IBM POWER Virtual SCSI Client diff -ruN linus-bk-dma.4/drivers/serial/Kconfig linus-bk-dma.5/drivers/serial/Kconfig --- linus-bk-dma.4/drivers/serial/Kconfig 2005-02-04 04:10:37.000000000 +1100 +++ linus-bk-dma.5/drivers/serial/Kconfig 2005-02-08 17:57:30.000000000 +1100 @@ -753,7 +753,7 @@ config SERIAL_ICOM tristate "IBM Multiport Serial Adapter" - depends on PPC_ISERIES || PPC_PSERIES + depends on PCI && (PPC_ISERIES || PPC_PSERIES) select SERIAL_CORE help This driver is for a family of multiport serial adapters diff -ruN linus-bk-dma.4/include/asm-ppc64/floppy.h linus-bk-dma.5/include/asm-ppc64/floppy.h --- linus-bk-dma.4/include/asm-ppc64/floppy.h 2004-10-25 18:18:34.000000000 +1000 +++ linus-bk-dma.5/include/asm-ppc64/floppy.h 2005-02-09 13:56:34.000000000 +1100 @@ -31,8 +31,6 @@ "floppy", NULL) #define fd_free_irq() free_irq(FLOPPY_IRQ, NULL); -#ifdef CONFIG_PCI - #include #define fd_dma_setup(addr,size,mode,io) ppc64_fd_dma_setup(addr,size,mode,io) @@ -40,6 +38,7 @@ static __inline__ int ppc64_fd_dma_setup(char *addr, unsigned long size, int mode, int io) { +#ifdef CONFIG_PCI static unsigned long prev_size; static dma_addr_t bus_addr = 0; static char *prev_addr; @@ -71,11 +70,11 @@ fd_set_dma_count(size); virtual_dma_port = io; fd_enable_dma(); +#endif /* CONFIG_PCI */ return 0; } -#endif /* CONFIG_PCI */ __inline__ void virtual_dma_init(void) { diff -ruN linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_io.h linus-bk-dma.5/include/asm-ppc64/iSeries/iSeries_io.h --- linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_io.h 2004-09-14 21:06:08.000000000 +1000 +++ linus-bk-dma.5/include/asm-ppc64/iSeries/iSeries_io.h 2005-02-09 17:57:36.000000000 +1100 @@ -31,6 +31,7 @@ /* Created December 28, 2000 */ /* End Change Activity */ /************************************************************************/ +#ifdef CONFIG_PCI extern u8 iSeries_Read_Byte(const volatile void __iomem * IoAddress); extern u16 iSeries_Read_Word(const volatile void __iomem * IoAddress); extern u32 iSeries_Read_Long(const volatile void __iomem * IoAddress); @@ -41,6 +42,15 @@ extern void iSeries_memset_io(volatile void __iomem *dest, char x, size_t n); extern void iSeries_memcpy_toio(volatile void __iomem *dest, void *source, size_t n); extern void iSeries_memcpy_fromio(void *dest, const volatile void __iomem *source, size_t n); +#else /* CONFIG_PCI */ +static inline u8 iSeries_Read_Byte(const volatile void __iomem * IoAddress) +{ + return 0xff; +} +static inline void iSeries_Write_Byte(u8 IoData, volatile void __iomem * IoAddress) +{ +} +#endif /* CONFIG_PCI */ #endif /* CONFIG_PPC_ISERIES */ #endif /* _ISERIES_IO_H */ diff -ruN linus-bk-dma.4/include/asm-ppc64/io.h linus-bk-dma.5/include/asm-ppc64/io.h --- linus-bk-dma.4/include/asm-ppc64/io.h 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.5/include/asm-ppc64/io.h 2005-02-09 15:53:33.000000000 +1100 @@ -1,4 +1,4 @@ - #ifndef _PPC64_IO_H +#ifndef _PPC64_IO_H #define _PPC64_IO_H /* @@ -31,6 +31,7 @@ #define SLOW_DOWN_IO +#ifdef CONFIG_PCI extern unsigned long isa_io_base; extern unsigned long pci_io_base; extern unsigned long io_page_mask; @@ -39,6 +40,10 @@ #define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ & io_page_mask) +#else +#define pci_io_base 0 +#define _IO_IS_VALID(port) 1 +#endif #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ diff -ruN linus-bk-dma.4/include/asm-ppc64/iommu.h linus-bk-dma.5/include/asm-ppc64/iommu.h --- linus-bk-dma.4/include/asm-ppc64/iommu.h 2005-02-07 15:02:01.000000000 +1100 +++ linus-bk-dma.5/include/asm-ppc64/iommu.h 2005-02-09 18:27:41.000000000 +1100 @@ -154,9 +154,14 @@ extern void iommu_unmap_single(struct iommu_table *tbl, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction); +#ifdef CONFIG_IBMIOMMU extern void iommu_init_early_pSeries(void); extern void iommu_init_early_iSeries(void); extern void iommu_init_early_u3(void); +#else +static inline void iommu_init_early_pSeries(void) {} +static inline void iommu_init_early_iSeries(void) {} +#endif extern void pci_iommu_init(void); extern void pci_direct_iommu_init(void); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050209/072e7567/attachment.pgp From anton at samba.org Wed Feb 9 20:04:18 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 9 Feb 2005 20:04:18 +1100 Subject: KGDB In-Reply-To: <4208B17F.4010000@timesys.com> References: <4208B17F.4010000@timesys.com> Message-ID: <20050209090418.GI5567@krispykreme.ozlabs.ibm.com> Hi, > Is anyone currently working on KGDB for ppc64? When I checked with Amit > Kale he didn't know af any work going on, but I wanted to be sure. Not that I know of. With the kprobes notify_die stuff we should be able to make xmon, kdb and kgdb all use it and clean up a bunch of code. Anton From olh at suse.de Wed Feb 9 20:32:43 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Feb 2005 10:32:43 +0100 Subject: [PATCH] update ppc64 g5_defconfig In-Reply-To: <20050208094746.GA24100@lst.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <1107835456.7687.72.camel@gaston> <20050208094746.GA24100@lst.de> Message-ID: <20050209093243.GA12068@suse.de> On Tue, Feb 08, Christoph Hellwig wrote: > CONFIG_AUDIT is nessecary for Selinux to work. If you have SELINUX > enable it's nessecary, else pointless. Maybe you should argue on the security lists for a patch like this: diff -purNx tags ../linux-2.6.11-rc3.orig/init/Kconfig ./init/Kconfig --- ../linux-2.6.11-rc3.orig/init/Kconfig 2005-02-07 14:14:04.000000000 +0100 +++ ./init/Kconfig 2005-02-09 10:31:38.594363483 +0100 @@ -156,8 +156,8 @@ config SYSCTL config AUDIT bool "Auditing support" - default y if SECURITY_SELINUX - default n + depends on SECURITY_SELINUX + default y help Enable auditing infrastructure that can be used with another kernel subsystem, such as SELinux (which requires this for From paulus at samba.org Wed Feb 9 21:11:04 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 21:11:04 +1100 Subject: [PATCH] update ppc64 pSeries_defconfig In-Reply-To: <20050207145207.GA6596@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <20050207144833.GD5516@suse.de> <20050207145000.GA6591@suse.de> <20050207145207.GA6596@suse.de> Message-ID: <16905.57784.660308.211553@cargo.ozlabs.ibm.com> Olaf Hering writes: > update pSeries defconfig, compile tested. > > Signed-off-by: Olaf Hering Acked-by: Paul Mackerras From paulus at samba.org Wed Feb 9 20:57:59 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 20:57:59 +1100 Subject: [PATCH] disable HMT for RS64 cpus In-Reply-To: <20050207144058.GA5516@suse.de> References: <20050207144058.GA5516@suse.de> Message-ID: <16905.56999.267517.198102@cargo.ozlabs.ibm.com> Olaf Hering writes: > Hardware multithreading for RS64 cpus is currently broken. Anton sent me > a patch a few weeks ago, but it did not work. > So just hide the config option for the time being. > > Signed-off-by: Olaf Hering Acked-by: Paul Mackerras From paulus at samba.org Wed Feb 9 21:09:13 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 21:09:13 +1100 Subject: [PATCH] update ppc64 defconfig In-Reply-To: <20050207144228.GB5516@suse.de> References: <20050207144228.GB5516@suse.de> Message-ID: <16905.57673.786999.502674@cargo.ozlabs.ibm.com> Olaf Hering writes: > This turns the ppc64/defconfig into something useful, it boots on all > systems. > > Signed-off-by: Olaf Hering Acked-by: Paul Mackerras From paulus at samba.org Wed Feb 9 21:10:31 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 21:10:31 +1100 Subject: [PATCH] update ppc64 iSeries_defconfig In-Reply-To: <20050207144833.GD5516@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <20050207144833.GD5516@suse.de> Message-ID: <16905.57751.91883.60968@cargo.ozlabs.ibm.com> Olaf Hering writes: > update iSeries defconfig, compile tested. > > Signed-off-by: Olaf Hering Acked-by: Paul Mackerras From paulus at samba.org Wed Feb 9 21:10:47 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 21:10:47 +1100 Subject: [PATCH] update ppc64 maple_defconfig In-Reply-To: <20050207145000.GA6591@suse.de> References: <20050207144228.GB5516@suse.de> <20050207144501.GC5516@suse.de> <20050207144833.GD5516@suse.de> <20050207145000.GA6591@suse.de> Message-ID: <16905.57767.140139.229796@cargo.ozlabs.ibm.com> Olaf Hering writes: > update Maple defconfig, compile tested. > > Signed-off-by: Olaf Hering Acked-by: Paul Mackerras From paulus at samba.org Wed Feb 9 21:20:53 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Feb 2005 21:20:53 +1100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <20050207151222.GA7219@suse.de> References: <20050207151222.GA7219@suse.de> Message-ID: <16905.58373.707121.332099@cargo.ozlabs.ibm.com> Olaf Hering writes: > make install passes the zImage to the installkernel script. > When an initrd is used, this script has to pull out the vmlinux from the > zImage because yaboot can not boot a zImage+initrd combo. > It can only handle vmlinux+initrd or zImage.initrd. > Its simple to just pass the plain vmlinux instead. As a side-effect you seem to have changed the default target on pSeries from zImage to vmlinux, which I don't like - I find it useful and convenient that just plain "make" makes the zImage, which I can then netboot. Paul. From olh at suse.de Wed Feb 9 22:00:38 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Feb 2005 12:00:38 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <16905.58373.707121.332099@cargo.ozlabs.ibm.com> References: <20050207151222.GA7219@suse.de> <16905.58373.707121.332099@cargo.ozlabs.ibm.com> Message-ID: <20050209110038.GA13600@suse.de> On Wed, Feb 09, Paul Mackerras wrote: > Olaf Hering writes: > > > make install passes the zImage to the installkernel script. > > When an initrd is used, this script has to pull out the vmlinux from the > > zImage because yaboot can not boot a zImage+initrd combo. > > It can only handle vmlinux+initrd or zImage.initrd. > > Its simple to just pass the plain vmlinux instead. > > As a side-effect you seem to have changed the default target on > pSeries from zImage to vmlinux, which I don't like - I find it useful > and convenient that just plain "make" makes the zImage, which I can > then netboot. You are right, the arch/ppc64/Makefile part is wrong. Milton pointed out that zImage may be prefered for netboot. So the question is, who uses make install for that. This zImage is now $5 Furthermore, the $3 was incorrect. iseries did not work either, because there is no arch/ppc64/boot/vmlinux target. make[2]: *** No rule to make target `arch/ppc64/boot/vmlinux', needed by `install'. Stop. make[1]: *** [install] Error 2 Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile ./arch/ppc64/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile 2005-02-03 02:55:14.000000000 +0100 +++ ./arch/ppc64/Makefile 2005-02-09 11:53:12.724975475 +0100 @@ -65,8 +65,8 @@ boottarget-$(CONFIG_PPC_ISERIES) := vmli $(boottarget-y): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ -bootimage-$(CONFIG_PPC_PSERIES) := zImage -bootimage-$(CONFIG_PPC_MAPLE) := zImage +bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage +bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage bootimage-$(CONFIG_PPC_ISERIES) := vmlinux BOOTIMAGE := $(bootimage-y) install: vmlinux diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile ./arch/ppc64/boot/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile 2005-02-03 02:56:36.000000000 +0100 +++ ./arch/ppc64/boot/Makefile 2005-02-09 11:52:18.210092397 +0100 @@ -117,7 +117,7 @@ $(obj)/imagesize.c: vmlinux.strip awk '{printf "unsigned long vmlinux_memsize = 0x%s;\n", substr($$1,8)}' \ >> $(obj)/imagesize.c -install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) - sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" +install: $(CONFIGURE) $(BOOTIMAGE) + sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux System.map "$(INSTALL_PATH)" "$(BOOTIMAGE)" clean-files := $(addprefix $(objtree)/, $(obj-boot) vmlinux.strip) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/install.sh ./arch/ppc64/boot/install.sh --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/install.sh 2005-02-03 02:57:16.000000000 +0100 +++ ./arch/ppc64/boot/install.sh 2005-02-09 11:45:01.528918590 +0100 @@ -17,6 +17,7 @@ # $2 - kernel image file # $3 - kernel map file # $4 - default install path (blank if root directory) +# $5 - kernel boot file, the zImage # # User may have a custom install script @@ -27,7 +28,7 @@ if [ -x /sbin/installkernel ]; then exec # Default install # this should work for both the pSeries zImage and the iSeries vmlinux.sm -image_name=`basename $2` +image_name=`basename $5` if [ -f $4/$image_name ]; then mv $4/$image_name $4/$image_name.old From olh at suse.de Thu Feb 10 02:06:54 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Feb 2005 16:06:54 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef Message-ID: <20050209150654.GA16640@suse.de> Current Linus tree hangs on p620, xmon does not trigger. rc3 was already broken. And 2.6.10 doesnt work either... BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 5793 FINAL File Size = 2965713 bytes. zImage starting: loaded at 0x400000 Allocating 0x94c000 bytes for kernel ... gunzipping (0x2100000 <- 0x407000:0x6c217a)...done 0x7e23d8 bytes 0xe4ac bytes of heap consumed, max in use 0xa2a8 OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 command line: memory layout at init: alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Looking for displays found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done opening PHB /pci at fff7f09000... done opening PHB /pci at fff7f09000/pci at b... done opening PHB /pci at fff7f09000/pci at b,2... done opening PHB /pci at fff7f09000/pci at b,4... done opening PHB /pci at fff7f09000/pci at b,6... done opening PHB /pci at fff7f0a000... done opening PHB /pci at fff7f0a000/pci at b... done opening PHB /pci at fff7f0a000/pci at b,2... done opening PHB /pci at fff7f0a000/pci at b,4... done opening PHB /pci at fff7f0a000/pci at b,6... done opening PHB /pci at fff7f0a000/pci at c... done opening PHB /pci at fff7f0a000/pci at c,2... done opening PHB /pci at fff7f0a000/pci at c,4... done opening PHB /pci at fff7f0a000/pci at c,6... done instantiating rtas at 0x00000000deadbeef... failed 0000000000000000 : boot cpu 0000000000000000 0000000000000001 : starting cpu hw idx 0000000000000002... done 0000000000000002 : starting cpu hw idx 0000000000000004... done 0000000000000003 : starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000002a61000 -> 0x0000000002a621df Device tree struct 0x0000000002a63000 -> 0x0000000002a72000 Calling quiesce ... returning from prom_init SLES9 does work. BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 10913 FINAL File Size = 5587130 bytes. zImage starting: loaded at 0x400000 initial ramdisk moving 0x3d52000 <- 0x695000 (2add80 bytes) trying: 0x01400000 trying: 0x01500000 trying: 0x01600000 trying: 0x01700000 trying: 0x01800000 trying: 0x01900000 trying: 0x01a00000 trying: 0x01b00000 trying: 0x01c00000 trying: 0x01d00000 trying: 0x01e00000 trying: 0x01f00000 trying: 0x02000000 trying: 0x02100000 gunzipping (0x2100000 <- 0x407000:0x69466d)...done 8955683 bytes 58392 bytes of heap consumed, max in use 42136 ... skipping 0x10000 bytes of ELF header copy built-in cmdline(27) manual=1 quiet start_shell setprop bootargs: 27 kernel: entry addr = 0x2110000 a1 = 0x0, a2 = 0x0, prom = 0xc1e030, bi_recs = 0x2a6b000, Looking for displays OF stdout is : /pci at fff7f09000/isa at 10/serial at i3f8 found display : /pci at fff7f0a000/pci at b,4/display at 1 Opening displays... opening display : /pci at fff7f0a000/pci at b,4/display at 1... done instantiating rtas at 0x000000003ff59000... done 0000000000000000 : booting cpu /cpus/PowerPC,RS64-III at 0 0000000000000001 : starting cpu /cpus/PowerPC,RS64-III at 2... ... done 0000000000000002 : starting cpu /cpus/PowerPC,RS64-III at 4... ... done 0000000000000003 : starting cpu /cpus/PowerPC,RS64-III at 6... ... done opening PHB /pci at fff7f09000... done opening PHB /pci at fff7f09000/pci at b... done opening PHB /pci at fff7f09000/pci at b,2... done opening PHB /pci at fff7f09000/pci at b,4... done opening PHB /pci at fff7f09000/pci at b,6... done opening PHB /pci at fff7f0a000... done opening PHB /pci at fff7f0a000/pci at b... done opening PHB /pci at fff7f0a000/pci at b,2... done opening PHB /pci at fff7f0a000/pci at b,4... done opening PHB /pci at fff7f0a000/pci at b,6... done opening PHB /pci at fff7f0a000/pci at c... done opening PHB /pci at fff7f0a000/pci at c,2... done opening PHB /pci at fff7f0a000/pci at c,4... done opening PHB /pci at fff7f0a000/pci at c,6... done Calling quiesce ... returning from prom_init firmware_features = 0x0 Starting Linux PPC64 2.6.5-7.139-pseries64 - From brking at us.ibm.com Thu Feb 10 06:23:47 2005 From: brking at us.ibm.com (Brian King) Date: Wed, 09 Feb 2005 13:23:47 -0600 Subject: [PATCH] ppc64: Mode 2 PCI-X config space size fix In-Reply-To: <41FFE3AF.706@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> <41FFE3AF.706@us.ibm.com> Message-ID: <420A6343.6070307@us.ibm.com> Trimming the cc list a bit since this has become a PPC64 only patch and resending... Pual or Anton - please apply. Thanks -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050209/7a116bcf/attachment.txt From cfriesen at nortel.com Thu Feb 10 07:05:09 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Wed, 09 Feb 2005 14:05:09 -0600 Subject: [BUG] in copy_siginfo_to_user32 on ppc64 (and others?) in 2.6.9/2.6.10 Message-ID: <420A6CF5.9040304@nortel.com> I found a bug which has since been fixed, but I'm hoping to save others the problems that I had tracking it down. It was fairly confusing--the information in the siginfo_t struct was different based on whether I used a signal handler in the regular way, or blocked the signal and retrieved the information using sigtimedwait(). After much instrumentation of the kernel, I tracked it down. Until recently (Jan 5), ppc64 had its own version of compat_sys_rt_sigtimedwait, which simply called sys_rt_sigtimedwait() then copied the results to the userspace struct using copy_siginfo_to_user32(). Unfortunately, sys_rt_sigtimedwait() only copies the lower 16 bits of si_code, and the ppc64 version of copy_siginfo_to_user32() keyed on the upper 16 bits to decide what information to copy. Thus, it always ended up in the default case of the switch statement, and only ever copied si_pid and si_uid. Oops. Chris From olh at suse.de Thu Feb 10 09:28:01 2005 From: olh at suse.de (Olaf Hering) Date: Wed, 9 Feb 2005 23:28:01 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050209150654.GA16640@suse.de> References: <20050209150654.GA16640@suse.de> Message-ID: <20050209222801.GA24113@suse.de> On Wed, Feb 09, Olaf Hering wrote: > > Current Linus tree hangs on p620, xmon does not trigger. > rc3 was already broken. > And 2.6.10 doesnt work either... It broke between 2.6.9-rc2 and -rc3 From david at gibson.dropbear.id.au Thu Feb 10 10:53:40 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Feb 2005 10:53:40 +1100 Subject: [PPC64] (resend) Functions to reserve performance monitor hardware Message-ID: <20050209235340.GA5324@localhost.localdomain> Andrew, here's a resend of this patch. My earlier version had a few stupid errors which should be corrected in this one. Please apply. The PPC64 interrupt code includes a hook to call when an exception from the performance monitor unit occurs. However, there's no way of reserving the hook properly, so if more than one bit of code tries to use it things will get ugly. Currently oprofile is the only user, but there are likely to be more in future e.g. perfctr, if and when it reaches a fit state for merging. This patch creates functions to reserve and release the performance monitor hardware (including its interrupt), and makes oprofile use them. It also creates a new arch/ppc64/kernel/pmc.c, in which we can put any future helper functions for handling the performance monitor counters. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/pmc.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/arch/ppc64/kernel/pmc.c 2005-02-10 10:50:16.639578008 +1100 @@ -0,0 +1,64 @@ +/* + * linux/arch/ppc64/kernel/pmc.c + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +#include +#include + +/* Ensure exceptions are disabled */ +static void dummy_perf(struct pt_regs *regs) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); + mtspr(SPRN_MMCR0, mmcr0); +} + +static spinlock_t pmc_owner_lock = SPIN_LOCK_UNLOCKED; +static void *pmc_owner_caller; /* mostly for debugging */ +perf_irq_t perf_irq = dummy_perf; + +int reserve_pmc_hardware(perf_irq_t new_perf_irq) +{ + int err = 0; + + spin_lock(&pmc_owner_lock); + + if (pmc_owner_caller) { + printk(KERN_WARNING "reserve_pmc_hardware: " + "PMC hardware busy (reserved by caller %p)\n", + pmc_owner_caller); + err = -EBUSY; + goto out; + } + + pmc_owner_caller = __builtin_return_address(0); + perf_irq = new_perf_irq ? : dummy_perf; + + out: + spin_unlock(&pmc_owner_lock); + return err; +} + +void release_pmc_hardware(void) +{ + spin_lock(&pmc_owner_lock); + + WARN_ON(! pmc_owner_caller); + + pmc_owner_caller = NULL; + perf_irq = dummy_perf; + + spin_unlock(&pmc_owner_lock); +} Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-02-10 10:50:14.653478576 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-02-10 10:50:16.640577856 +1100 @@ -41,6 +41,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUGGER int (*__debugger)(struct pt_regs *regs); @@ -450,18 +451,7 @@ die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); } -/* Ensure exceptions are disabled */ -static void dummy_perf(struct pt_regs *regs) -{ - unsigned int mmcr0 = mfspr(SPRN_MMCR0); - - mmcr0 &= ~(MMCR0_PMXE|MMCR0_PMAO); - mtspr(SPRN_MMCR0, mmcr0); -} - -void (*perf_irq)(struct pt_regs *) = dummy_perf; - -EXPORT_SYMBOL(perf_irq); +extern perf_irq_t perf_irq; void performance_monitor_exception(struct pt_regs *regs) { Index: working-2.6/include/asm-ppc64/pmc.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ working-2.6/include/asm-ppc64/pmc.h 2005-02-10 10:50:16.641577704 +1100 @@ -0,0 +1,29 @@ +/* + * pmc.h + * Copyright (C) 2004 David Gibson, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ +#ifndef _PPC64_PMC_H +#define _PPC64_PMC_H + +#include + +typedef void (*perf_irq_t)(struct pt_regs *); + +int reserve_pmc_hardware(perf_irq_t new_perf_irq); +void release_pmc_hardware(void); + +#endif /* _PPC64_PMC_H */ Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2005-02-10 10:50:14.653478576 +1100 +++ working-2.6/arch/ppc64/kernel/Makefile 2005-02-10 10:50:16.641577704 +1100 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o pmc.o obj-$(CONFIG_PPC_OF) += of_device.o Index: working-2.6/arch/ppc64/oprofile/common.c =================================================================== --- working-2.6.orig/arch/ppc64/oprofile/common.c 2005-02-10 10:50:14.653478576 +1100 +++ working-2.6/arch/ppc64/oprofile/common.c 2005-02-10 10:50:16.642577552 +1100 @@ -15,6 +15,7 @@ #include #include #include +#include #include "op_impl.h" @@ -22,9 +23,6 @@ extern struct op_ppc64_model op_model_power4; static struct op_ppc64_model *model; -extern void (*perf_irq)(struct pt_regs *); -static void (*save_perf_irq)(struct pt_regs *); - static struct op_counter_config ctr[OP_MAX_COUNTER]; static struct op_system_config sys; @@ -35,11 +33,12 @@ static int op_ppc64_setup(void) { - /* Install our interrupt handler into the existing hook. */ - save_perf_irq = perf_irq; - perf_irq = op_handle_interrupt; + int err; - mb(); + /* Grab the hardware */ + err = reserve_pmc_hardware(op_handle_interrupt); + if (err) + return err; /* Pre-compute the values to stuff in the hardware registers. */ model->reg_setup(ctr, &sys, model->num_counters); @@ -52,10 +51,7 @@ static void op_ppc64_shutdown(void) { - mb(); - - /* Remove our interrupt handler. We may be removing this module. */ - perf_irq = save_perf_irq; + release_pmc_hardware(); } static void op_ppc64_cpu_start(void *dummy) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Thu Feb 10 11:06:44 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Feb 2005 11:06:44 +1100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050209222801.GA24113@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> Message-ID: <1107994004.7687.154.camel@gaston> On Wed, 2005-02-09 at 23:28 +0100, Olaf Hering wrote: > On Wed, Feb 09, Olaf Hering wrote: > > > > > Current Linus tree hangs on p620, xmon does not trigger. > > rc3 was already broken. > > And 2.6.10 doesnt work either... > > It broke between 2.6.9-rc2 and -rc3 Can you enable debug stuff in prom_init.c ? Ben. From benh at kernel.crashing.org Thu Feb 10 13:32:53 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Feb 2005 13:32:53 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #3 Message-ID: <1108002773.7733.196.camel@gaston> Here's an update of the patch, it uses PAGE_SIZE constant instead of hard-coded 4096 in the wrappers and changes the Makefile according to Sam latest comments. --- This is a rather large patch. See notes below for possible backward compatiblity issues. (Note: It depends on "ppc64: Move systemcfg out of head.S" beeing applied) This patch adds to the ppc64 kernel a virtual .so (vDSO) that is mapped into every process space, similar to the x86 vsyscall page. However, the implementation is very different (and doesn't use the gate area mecanism). Actually, it contains two implementations, a 32 bits and a 64 bits one. These vDSO's are currently mapped at 0x100000 (+1Mb) when possible (when a process load section isn't already there). In the future, we can randomize that address, or even imagine having a special phdr entry letting apps that wnat finer control over their address space to put it elsewhere (or not at all). The implementation adds a hook to binfmt_elf to let the architecture add a real VMA to the process space instead of using the gate area mecanism. This mecanism wasn't very suitable for ppc, we couldn't just "shove" PTE entries mapping kernel addresses into userland without expensive changes to our hash table management. Instead, I made the vDSO be a normal VMA which, additionally, means it supports copy-on-write semantics if made writable via ptrace/mprotect, thus allowing breakpoints in the vDSO code. The current implementation of the vDSOs contain the signal trampolines with appropriate DWARF informations, which enable us to use non-executable stacks (patches to come later) along with a few more functions that we hope glibc will soon make good use of (this is the "hard" part now :) Note that the symbols exposed by the vDSO aren't "normal" function symbols, apps can't be expected to link against them directly, the vDSO's are both seen as if they were linked at 0 and the symbols just contain offsets to the various functions. This is done on purpose to avoid a relocation step (ppc64 functions normally have descriptors with abs addresses in them). When glibc uses those functions, it's expected to use it's own trampolines that know how to reach them. In some cases, the vDSO contains several versions of a given function (for various CPUs), the kernel will "patch" the symbol table at boot to make it point to the appropriate one transparently. What is currently implemented is: - int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); This is a fully userland implementation of gettimeofday, with no barriers and no locks, and providing 100% equivalent results to the syscall version - void __kernel_sync_dicache(unsigned long start, unsigned long end) This function sync's the data and instruction caches (for making data executable), it is expected that userland loaders use this instead of doing it themselves, as the kernel will provide optimized versions for the current CPU. Currently, the vDSO procides a full one for all CPUs prior to POWER5 and a nop one for POWER5 which implements hardware snooping at the L1 level. In the future, an intermediate implementation may be done for the POWER4 and 970 which don't need the "dcbst" loop (the L1D cache is write-through on those). - void *__kernel_get_syscall_map(unsigned int *syscall_count) ; Returns a pointer to a map of implemented syscalls on the currently running kernel. The map is agnostic to the size of "long", unlike kernel bitops, it stores bits from top to bottom so that memory actually contains a linear bitmap check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of * 32 bits int at N >> 5. Note about backward compatibility issues: A bug in the ppc64 libgcc unwinder makes it unable to unwind stacks properly accross signals if the signal trampoline isn't on the stack. This has been fixed in CVS for gcc 4.0 and will be soon on the stable branch, but the problem exist will all currently used versions. That means that until glibc gets the patch to enable it's use of the vDSO symbols for the DWARF unwinder (rather trivial patch that will be pushed to glibc CVS soon hopefully), unwinding from a signal handler will not work for 64 bits applications. I consider this as a non-issue though as a patch is about to be produced, which can easily get pushed to "live" distros like debian, gentoo, fedora, etc... soon enough (it breaks compatilbity with kernels below 2.4.20 unfortunately as our signal stack layout changed, crap crap crap), as there are few 64 bits applications out there (expect gentoo), as it's only really an issue with C++ code relying on throwing exceptions out of signal handlers (extremely rare it seems), and as "release" distros like SLES or RHEL will probably have the vDSO enabled glibc _and_ the unwinder fix by the time they release a version with a 2.6.11 or 2.6.12 kernel anyway :) So far, I yet have to see an app failing because of that... Finally, many many many thanks to Alan Modra for writing the DWARF information of the signal handlers and debugging the libgcc issues ! Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/Makefile =================================================================== --- linux-work.orig/arch/ppc64/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/Makefile 2005-02-10 10:41:42.000000000 +1100 @@ -15,17 +15,38 @@ KERNELLOAD := 0xc000000000000000 +# Set default 32 bits cross compilers for vdso and boot wrapper +CROSS32_COMPILE ?= + +CROSS32CC := $(CROSS32_COMPILE)gcc +CROSS32AS := $(CROSS32_COMPILE)as +CROSS32LD := $(CROSS32_COMPILE)ld +CROSS32OBJCOPY := $(CROSS32_COMPILE)objcopy + +# If we have a biarch compiler, use it for 32 bits cross compile if +# CROSS32_COMPILE wasn't explicitely defined, and add proper explicit +# target type to target compilers + HAS_BIARCH := $(call cc-option-yn, -m64) ifeq ($(HAS_BIARCH),y) +ifeq ($(CROSS32_COMPILE),) +CROSS32CC := $(CC) -m32 +CROSS32AS := $(AS) -a32 +CROSS32LD := $(LD) -m elf32ppc +CROSS32OBJCOPY := $(OBJCOPY) +endif AS := $(AS) -a64 LD := $(LD) -m elf64ppc CC := $(CC) -m64 endif +export CROSS32CC CROSS32AS CROSS32LD CROSS32OBJCOPY + new_nm := $(shell if $(NM) --help 2>&1 | grep -- '--synthetic' > /dev/null; then echo y; else echo n; fi) ifeq ($(new_nm),y) NM := $(NM) --synthetic + endif CHECKFLAGS += -m64 -D__powerpc__ Index: linux-work/arch/ppc64/kernel/asm-offsets.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/asm-offsets.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/asm-offsets.c 2005-02-02 13:28:01.000000000 +1100 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -35,6 +36,8 @@ #include #include #include +#include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -167,5 +170,24 @@ DEFINE(CPU_SPEC_FEATURES, offsetof(struct cpu_spec, cpu_features)); DEFINE(CPU_SPEC_SETUP, offsetof(struct cpu_spec, cpu_setup)); + /* systemcfg offsets for use by vdso */ + DEFINE(CFG_TB_ORIG_STAMP, offsetof(struct systemcfg, tb_orig_stamp)); + DEFINE(CFG_TB_TICKS_PER_SEC, offsetof(struct systemcfg, tb_ticks_per_sec)); + DEFINE(CFG_TB_TO_XS, offsetof(struct systemcfg, tb_to_xs)); + DEFINE(CFG_STAMP_XSEC, offsetof(struct systemcfg, stamp_xsec)); + DEFINE(CFG_TB_UPDATE_COUNT, offsetof(struct systemcfg, tb_update_count)); + DEFINE(CFG_TZ_MINUTEWEST, offsetof(struct systemcfg, tz_minuteswest)); + DEFINE(CFG_TZ_DSTTIME, offsetof(struct systemcfg, tz_dsttime)); + DEFINE(CFG_SYSCALL_MAP32, offsetof(struct systemcfg, syscall_map_32)); + DEFINE(CFG_SYSCALL_MAP64, offsetof(struct systemcfg, syscall_map_64)); + + /* timeval/timezone offsets for use by vdso */ + DEFINE(TVAL64_TV_SEC, offsetof(struct timeval, tv_sec)); + DEFINE(TVAL64_TV_USEC, offsetof(struct timeval, tv_usec)); + DEFINE(TVAL32_TV_SEC, offsetof(struct compat_timeval, tv_sec)); + DEFINE(TVAL32_TV_USEC, offsetof(struct compat_timeval, tv_usec)); + DEFINE(TZONE_TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); + DEFINE(TZONE_TZ_DSTTIME, offsetof(struct timezone, tz_dsttime)); + return 0; } Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/Makefile 2005-02-10 10:41:24.000000000 +1100 @@ -11,7 +11,8 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + iommu.o sysfs.o vdso.o +obj-y += vdso32/ vdso64/ obj-$(CONFIG_PPC_OF) += of_device.o Index: linux-work/arch/ppc64/kernel/signal32.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal32.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal32.c 2005-02-02 13:28:01.000000000 +1100 @@ -31,6 +31,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -656,18 +657,24 @@ /* Save user registers on the stack */ frame = &rt_sf->uc.uc_mcontext; - if (save_user_regs(regs, frame, __NR_rt_sigreturn)) - goto badframe; - if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; + + if (vdso32_rt_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, frame, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_rt_sigtramp; + } else { + if (save_user_regs(regs, frame, __NR_rt_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->tramp; + } regs->gpr[1] = (unsigned long) newsp; regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) &rt_sf->info; regs->gpr[5] = (unsigned long) &rt_sf->uc; regs->gpr[6] = (unsigned long) rt_sf; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->tramp; regs->trap = 0; regs->result = 0; @@ -825,8 +832,15 @@ || __put_user(sig, &sc->signal)) goto badframe; - if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) - goto badframe; + if (vdso32_sigtramp && current->thread.vdso_base) { + if (save_user_regs(regs, &frame->mctx, 0)) + goto badframe; + regs->link = current->thread.vdso_base + vdso32_sigtramp; + } else { + if (save_user_regs(regs, &frame->mctx, __NR_sigreturn)) + goto badframe; + regs->link = (unsigned long) frame->mctx.tramp; + } if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) goto badframe; @@ -834,7 +848,6 @@ regs->gpr[3] = sig; regs->gpr[4] = (unsigned long) sc; regs->nip = (unsigned long) ka->sa.sa_handler; - regs->link = (unsigned long) frame->mctx.tramp; regs->trap = 0; regs->result = 0; Index: linux-work/arch/ppc64/kernel/setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/setup.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/setup.c 2005-02-02 13:28:01.000000000 +1100 @@ -990,6 +990,34 @@ } /* + * Called from setup_arch to initialize the bitmap of available + * syscalls in the systemcfg page + */ +void __init setup_syscall_map(void) +{ + unsigned int i, count64 = 0, count32 = 0; + extern unsigned long *sys_call_table; + extern unsigned long *sys_call_table32; + extern unsigned long sys_ni_syscall; + + + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table[i] == sys_ni_syscall) + continue; + count64++; + systemcfg->syscall_map_64[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + for (i = 0; i < __NR_syscalls; i++) { + if (sys_call_table32[i] == sys_ni_syscall) + continue; + count32++; + systemcfg->syscall_map_32[i >> 5] |= 0x80000000UL >> (i & 0x1f); + } + printk(KERN_INFO "Syscall map setup, %d 32 bits and %d 64 bits syscalls\n", + count32, count64); +} + +/* * Called into from start_kernel, after lock_kernel has been called. * Initializes bootmem, which is unsed to manage page allocation until * mem_init is called. @@ -1027,6 +1055,9 @@ /* set up the bootmem stuff with available memory */ do_init_bootmem(); + /* initialize the syscall map in systemcfg */ + setup_syscall_map(); + ppc_md.setup_arch(); /* Select the correct idle loop for the platform. */ Index: linux-work/arch/ppc64/kernel/signal.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/signal.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/signal.c 2005-02-02 13:28:01.000000000 +1100 @@ -34,6 +34,7 @@ #include #include #include +#include #define DEBUG_SIG 0 @@ -426,10 +427,14 @@ goto badframe; /* Set up to return from userspace. */ - err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); - if (err) - goto badframe; - + if (vdso64_rt_sigtramp && current->thread.vdso_base) { + regs->link = current->thread.vdso_base + vdso64_rt_sigtramp; + } else { + err |= setup_trampoline(__NR_rt_sigreturn, &frame->tramp[0]); + if (err) + goto badframe; + regs->link = (unsigned long) &frame->tramp[0]; + } funct_desc_ptr = (func_descr_t __user *) ka->sa.sa_handler; /* Allocate a dummy caller frame for the signal handler. */ @@ -438,7 +443,6 @@ /* Set up "regs" so we "return" to the signal handler. */ err |= get_user(regs->nip, &funct_desc_ptr->entry); - regs->link = (unsigned long) &frame->tramp[0]; regs->gpr[1] = newsp; err |= get_user(regs->gpr[2], &funct_desc_ptr->toc); regs->gpr[3] = signr; Index: linux-work/arch/ppc64/kernel/smp.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/smp.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/smp.c 2005-02-02 13:28:01.000000000 +1100 @@ -383,7 +383,7 @@ * For now we leave it which means the time can be some * number of msecs off until someone does a settimeofday() */ - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->tb_orig_stamp = tb_last_stamp; #endif Index: linux-work/arch/ppc64/kernel/time.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/time.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/time.c 2005-02-02 13:28:01.000000000 +1100 @@ -86,8 +86,6 @@ unsigned long tb_ticks_per_jiffy; unsigned long tb_ticks_per_usec = 100; /* sane default */ unsigned long tb_ticks_per_sec; -unsigned long next_xtime_sync_tb; -unsigned long xtime_sync_interval; unsigned long tb_to_xs; unsigned tb_to_us; unsigned long processor_freq; @@ -158,8 +156,8 @@ * The conversion to microseconds at the end is done * without a divide (and in fact, without a multiply) */ - tb_ticks = tb_val - do_gtod.tb_orig_stamp; temp_varp = do_gtod.varp; + tb_ticks = tb_val - temp_varp->tb_orig_stamp; temp_tb_to_xs = temp_varp->tb_to_xs; temp_stamp_xsec = temp_varp->stamp_xsec; tb_xsec = mulhdu( tb_ticks, temp_tb_to_xs ); @@ -185,17 +183,55 @@ { struct timeval my_tv; - if (cur_tb > next_xtime_sync_tb) { - next_xtime_sync_tb = cur_tb + xtime_sync_interval; - __do_gettimeofday(&my_tv, cur_tb); - - if (xtime.tv_sec <= my_tv.tv_sec) { - xtime.tv_sec = my_tv.tv_sec; - xtime.tv_nsec = my_tv.tv_usec * 1000; - } + __do_gettimeofday(&my_tv, cur_tb); + + if (xtime.tv_sec <= my_tv.tv_sec) { + xtime.tv_sec = my_tv.tv_sec; + xtime.tv_nsec = my_tv.tv_usec * 1000; } } +/* + * When the timebase - tb_orig_stamp gets too big, we do a manipulation + * between tb_orig_stamp and stamp_xsec. The goal here is to keep the + * difference tb - tb_orig_stamp small enough to always fit inside a + * 32 bits number. This is a requirement of our fast 32 bits userland + * implementation in the vdso. If we "miss" a call to this function + * (interrupt latency, CPU locked in a spinlock, ...) and we end up + * with a too big difference, then the vdso will fallback to calling + * the syscall + */ +static __inline__ void timer_recalc_offset(unsigned long cur_tb) +{ + struct gettimeofday_vars * temp_varp; + unsigned temp_idx; + unsigned long offset, new_stamp_xsec, new_tb_orig_stamp; + + if (((cur_tb - do_gtod.varp->tb_orig_stamp) & 0x80000000u) == 0) + return; + + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + + new_tb_orig_stamp = cur_tb; + offset = new_tb_orig_stamp - do_gtod.varp->tb_orig_stamp; + new_stamp_xsec = do_gtod.varp->stamp_xsec + mulhdu(offset, do_gtod.varp->tb_to_xs); + + temp_varp->tb_to_xs = do_gtod.varp->tb_to_xs; + temp_varp->tb_orig_stamp = new_tb_orig_stamp; + temp_varp->stamp_xsec = new_stamp_xsec; + mb(); + do_gtod.varp = temp_varp; + do_gtod.var_idx = temp_idx; + + ++(systemcfg->tb_update_count); + wmb(); + systemcfg->tb_orig_stamp = new_tb_orig_stamp; + systemcfg->stamp_xsec = new_stamp_xsec; + wmb(); + ++(systemcfg->tb_update_count); +} + #ifdef CONFIG_SMP unsigned long profile_pc(struct pt_regs *regs) { @@ -311,6 +347,7 @@ if (cpu == boot_cpuid) { write_seqlock(&xtime_lock); tb_last_stamp = lpaca->next_jiffy_update_tb; + timer_recalc_offset(lpaca->next_jiffy_update_tb); do_timer(regs); timer_sync_xtime(lpaca->next_jiffy_update_tb); timer_check_rtc(); @@ -398,7 +435,9 @@ time_maxerror = NTP_PHASE_LIMIT; time_esterror = NTP_PHASE_LIMIT; - delta_xsec = mulhdu( (tb_last_stamp-do_gtod.tb_orig_stamp), do_gtod.varp->tb_to_xs ); + delta_xsec = mulhdu( (tb_last_stamp-do_gtod.varp->tb_orig_stamp), + do_gtod.varp->tb_to_xs ); + new_xsec = (new_nsec * XSEC_PER_SEC) / NSEC_PER_SEC; new_xsec += new_sec * XSEC_PER_SEC; if ( new_xsec > delta_xsec ) { @@ -411,7 +450,7 @@ * before 1970 ... eg. we booted ten days ago, and we are setting * the time to Jan 5, 1970 */ do_gtod.varp->stamp_xsec = new_xsec; - do_gtod.tb_orig_stamp = tb_last_stamp; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; systemcfg->stamp_xsec = new_xsec; systemcfg->tb_orig_stamp = tb_last_stamp; } @@ -464,9 +503,9 @@ xtime.tv_sec = mktime(tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); tb_last_stamp = get_tb(); - do_gtod.tb_orig_stamp = tb_last_stamp; do_gtod.varp = &do_gtod.vars[0]; do_gtod.var_idx = 0; + do_gtod.varp->tb_orig_stamp = tb_last_stamp; do_gtod.varp->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; do_gtod.tb_ticks_per_sec = tb_ticks_per_sec; do_gtod.varp->tb_to_xs = tb_to_xs; @@ -477,9 +516,6 @@ systemcfg->stamp_xsec = xtime.tv_sec * XSEC_PER_SEC; systemcfg->tb_to_xs = tb_to_xs; - xtime_sync_interval = tb_ticks_per_sec - (tb_ticks_per_sec/8); - next_xtime_sync_tb = tb_last_stamp + xtime_sync_interval; - time_freq = 0; xtime.tv_nsec = 0; @@ -584,12 +620,12 @@ stamp_xsec which is the time (in 1/2^20 second units) corresponding to tb_orig_stamp. This new value of stamp_xsec compensates for the change in frequency (implied by the new tb_to_xs) which guarantees that the current time remains the same */ - tb_ticks = get_tb() - do_gtod.tb_orig_stamp; + write_seqlock_irqsave( &xtime_lock, flags ); + tb_ticks = get_tb() - do_gtod.varp->tb_orig_stamp; div128_by_32( 1024*1024, 0, new_tb_ticks_per_sec, &divres ); new_tb_to_xs = divres.result_low; new_xsec = mulhdu( tb_ticks, new_tb_to_xs ); - write_seqlock_irqsave( &xtime_lock, flags ); old_xsec = mulhdu( tb_ticks, do_gtod.varp->tb_to_xs ); new_stamp_xsec = do_gtod.varp->stamp_xsec + old_xsec - new_xsec; @@ -597,16 +633,12 @@ values in do_gettimeofday. We alternate the copies and as long as a reasonable time elapses between changes, there will never be inconsistent values. ntpd has a minimum of one minute between updates */ - if (do_gtod.var_idx == 0) { - temp_varp = &do_gtod.vars[1]; - temp_idx = 1; - } - else { - temp_varp = &do_gtod.vars[0]; - temp_idx = 0; - } + temp_idx = (do_gtod.var_idx == 0); + temp_varp = &do_gtod.vars[temp_idx]; + temp_varp->tb_to_xs = new_tb_to_xs; temp_varp->stamp_xsec = new_stamp_xsec; + temp_varp->tb_orig_stamp = do_gtod.varp->tb_orig_stamp; mb(); do_gtod.varp = temp_varp; do_gtod.var_idx = temp_idx; Index: linux-work/arch/ppc64/kernel/vdso.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso.c 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,614 @@ +/* + * linux/arch/ppc64/kernel/vdso.c + * + * Copyright (C) 2004 Benjamin Herrenschmidt, IBM Corp. + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + + +/* + * The vDSOs themselves are here + */ +extern char vdso64_start, vdso64_end; +extern char vdso32_start, vdso32_end; + +static void *vdso64_kbase = &vdso64_start; +static void *vdso32_kbase = &vdso32_start; + +unsigned int vdso64_pages; +unsigned int vdso32_pages; + +/* Signal trampolines user addresses */ + +unsigned long vdso64_rt_sigtramp; +unsigned long vdso32_sigtramp; +unsigned long vdso32_rt_sigtramp; + +/* Format of the patch table */ +struct vdso_patch_def +{ + u32 pvr_mask, pvr_value; + const char *gen_name; + const char *fix_name; +}; + +/* Table of functions to patch based on the CPU type/revision + * + * TODO: Improve by adding whole lists for each entry + */ +static struct vdso_patch_def vdso_patches[] = { + { + 0xffff0000, 0x003a0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, + { + 0xffff0000, 0x003b0000, /* POWER5 */ + "__kernel_sync_dicache", "__kernel_sync_dicache_p5" + }, +}; + +/* + * Some infos carried around for each of them during parsing at + * boot time. + */ +struct lib32_elfinfo +{ + Elf32_Ehdr *hdr; /* ptr to ELF */ + Elf32_Sym *dynsym; /* ptr to .dynsym section */ + unsigned long dynsymsize; /* size of .dynsym section */ + char *dynstr; /* ptr to .dynstr section */ + unsigned long text; /* offset of .text section in .so */ +}; + +struct lib64_elfinfo +{ + Elf64_Ehdr *hdr; + Elf64_Sym *dynsym; + unsigned long dynsymsize; + char *dynstr; + unsigned long text; +}; + + +#ifdef __DEBUG +static void dump_one_vdso_page(struct page *pg, struct page *upg) +{ + printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT), + page_count(pg), + pg->flags); + if (upg/* && pg != upg*/) { + printk(" upg: %p (c:%d,f:%08lx)", __va(page_to_pfn(upg) << PAGE_SHIFT), + page_count(upg), + upg->flags); + } + printk("\n"); +} + +static void dump_vdso_pages(struct vm_area_struct * vma) +{ + int i; + + if (!vma || test_thread_flag(TIF_32BIT)) { + printk("vDSO32 @ %016lx:\n", (unsigned long)vdso32_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } + if (!vma || !test_thread_flag(TIF_32BIT)) { + printk("vDSO64 @ %016lx:\n", (unsigned long)vdso64_kbase); + for (i=0; ivm_mm) ? + follow_page(vma->vm_mm, vma->vm_start + i*PAGE_SIZE, 0) + : NULL; + dump_one_vdso_page(pg, upg); + } + } +} +#endif /* DEBUG */ + +/* + * Keep a dummy vma_close for now, it will prevent VMA merging. + */ +static void vdso_vma_close(struct vm_area_struct * vma) +{ +} + +/* + * Our nopage() function, maps in the actual vDSO kernel pages, they will + * be mapped read-only by do_no_page(), and eventually COW'ed, either + * right away for an initial write access, or by do_wp_page(). + */ +static struct page * vdso_vma_nopage(struct vm_area_struct * vma, + unsigned long address, int *type) +{ + unsigned long offset = address - vma->vm_start; + struct page *pg; + void *vbase = test_thread_flag(TIF_32BIT) ? vdso32_kbase : vdso64_kbase; + + DBG("vdso_vma_nopage(current: %s, address: %016lx, off: %lx)\n", + current->comm, address, offset); + + if (address < vma->vm_start || address > vma->vm_end) + return NOPAGE_SIGBUS; + + /* + * Last page is systemcfg, special handling here, no get_page() a + * this is a reserved page + */ + if ((vma->vm_end - address) <= PAGE_SIZE) + return virt_to_page(systemcfg); + + pg = virt_to_page(vbase + offset); + get_page(pg); + DBG(" ->page count: %d\n", page_count(pg)); + + return pg; +} + +static struct vm_operations_struct vdso_vmops = { + .close = vdso_vma_close, + .nopage = vdso_vma_nopage, +}; + +/* + * This is called from binfmt_elf, we create the special vma for the + * vDSO and insert it into the mm struct tree + */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + unsigned long vdso_pages; + unsigned long vdso_base; + + if (test_thread_flag(TIF_32BIT)) { + vdso_pages = vdso32_pages; + vdso_base = VDSO32_MBASE; + } else { + vdso_pages = vdso64_pages; + vdso_base = VDSO64_MBASE; + } + + /* vDSO has a problem and was disabled, just don't "enable" it for the + * process + */ + if (vdso_pages == 0) { + current->thread.vdso_base = 0; + return 0; + } + vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); + if (vma == NULL) + return -ENOMEM; + if (security_vm_enough_memory(vdso_pages)) { + kmem_cache_free(vm_area_cachep, vma); + return -ENOMEM; + } + memset(vma, 0, sizeof(*vma)); + + /* + * pick a base address for the vDSO in process space. We have a default + * base of 1Mb on which we had a random offset up to 1Mb. + * XXX: Add possibility for a program header to specify that location + */ + current->thread.vdso_base = vdso_base; + /* + ((unsigned long)vma & 0x000ff000); */ + + vma->vm_mm = mm; + vma->vm_start = current->thread.vdso_base; + + /* + * the VMA size is one page more than the vDSO since systemcfg + * is mapped in the last one + */ + vma->vm_end = vma->vm_start + ((vdso_pages + 1) << PAGE_SHIFT); + + /* + * our vma flags don't have VM_WRITE so by default, the process isn't allowed + * to write those pages. + * gdb can break that with ptrace interface, and thus trigger COW on those + * pages but it's then your responsibility to never do that on the "data" page + * of the vDSO or you'll stop getting kernel updates and your nice userland + * gettimeofday will be totally dead. It's fine to use that for setting + * breakpoints in the vDSO code pages though + */ + vma->vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; + vma->vm_flags |= mm->def_flags; + vma->vm_page_prot = protection_map[vma->vm_flags & 0x7]; + vma->vm_ops = &vdso_vmops; + + down_write(&mm->mmap_sem); + insert_vm_struct(mm, vma); + mm->total_vm += (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + up_write(&mm->mmap_sem); + + return 0; +} + +static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf32_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + *size = 0; + return NULL; +} + +static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname, + unsigned long *size) +{ + Elf64_Shdr *sechdrs; + unsigned int i; + char *secnames; + + /* Grab section headers and strings so we can tell who is who */ + sechdrs = (void *)ehdr + ehdr->e_shoff; + secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset; + + /* Find the section they want */ + for (i = 1; i < ehdr->e_shnum; i++) { + if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) { + if (size) + *size = sechdrs[i].sh_size; + return (void *)ehdr + sechdrs[i].sh_offset; + } + } + if (size) + *size = 0; + return NULL; +} + +static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +static Elf64_Sym * __init find_symbol64(struct lib64_elfinfo *lib, const char *symname) +{ + unsigned int i; + char name[32], *c; + + for (i = 0; i < (lib->dynsymsize / sizeof(Elf64_Sym)); i++) { + if (lib->dynsym[i].st_name == 0) + continue; + strlcpy(name, lib->dynstr + lib->dynsym[i].st_name, 32); + c = strchr(name, '@'); + if (c) + *c = 0; + if (strcmp(symname, name) == 0) + return &lib->dynsym[i]; + } + return NULL; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function32(struct lib32_elfinfo *lib, const char *symname) +{ + Elf32_Sym *sym = find_symbol32(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO32: function %s not found !\n", symname); + return 0; + } + return sym->st_value - VDSO32_LBASE; +} + +/* Note that we assume the section is .text and the symbol is relative to + * the library base + */ +static unsigned long __init find_function64(struct lib64_elfinfo *lib, const char *symname) +{ + Elf64_Sym *sym = find_symbol64(lib, symname); + + if (sym == NULL) { + printk(KERN_WARNING "vDSO64: function %s not found !\n", symname); + return 0; + } +#ifdef VDS64_HAS_DESCRIPTORS + return *((u64 *)(vdso64_kbase + sym->st_value - VDSO64_LBASE)) - VDSO64_LBASE; +#else + return sym->st_value - VDSO64_LBASE; +#endif +} + + +static __init int vdso_do_find_sections(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + void *sect; + + /* + * Locate symbol tables & text section + */ + + v32->dynsym = find_section32(v32->hdr, ".dynsym", &v32->dynsymsize); + v32->dynstr = find_section32(v32->hdr, ".dynstr", NULL); + if (v32->dynsym == NULL || v32->dynstr == NULL) { + printk(KERN_ERR "vDSO32: a required symbol section was not found\n"); + return -1; + } + sect = find_section32(v32->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO32: the .text section was not found\n"); + return -1; + } + v32->text = sect - vdso32_kbase; + + v64->dynsym = find_section64(v64->hdr, ".dynsym", &v64->dynsymsize); + v64->dynstr = find_section64(v64->hdr, ".dynstr", NULL); + if (v64->dynsym == NULL || v64->dynstr == NULL) { + printk(KERN_ERR "vDSO64: a required symbol section was not found\n"); + return -1; + } + sect = find_section64(v64->hdr, ".text", NULL); + if (sect == NULL) { + printk(KERN_ERR "vDSO64: the .text section was not found\n"); + return -1; + } + v64->text = sect - vdso64_kbase; + + return 0; +} + +static __init void vdso_setup_trampolines(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + /* + * Find signal trampolines + */ + + vdso64_rt_sigtramp = find_function64(v64, "__kernel_sigtramp_rt64"); + vdso32_sigtramp = find_function32(v32, "__kernel_sigtramp32"); + vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32"); +} + +static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + Elf32_Sym *sym32; + Elf64_Sym *sym64; + + sym32 = find_symbol32(v32, "__kernel_datapage_offset"); + if (sym32 == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) = + (vdso32_pages << PAGE_SHIFT) - (sym32->st_value - VDSO32_LBASE); + + sym64 = find_symbol64(v64, "__kernel_datapage_offset"); + if (sym64 == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol __kernel_datapage_offset !\n"); + return -1; + } + *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) = + (vdso64_pages << PAGE_SHIFT) - (sym64->st_value - VDSO64_LBASE); + + return 0; +} + +static int vdso_do_func_patch32(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf32_Sym *sym32_gen, *sym32_fix; + + sym32_gen = find_symbol32(v32, orig); + if (sym32_gen == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig); + return -1; + } + sym32_fix = find_symbol32(v32, fix); + if (sym32_fix == NULL) { + printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix); + return -1; + } + sym32_gen->st_value = sym32_fix->st_value; + sym32_gen->st_size = sym32_fix->st_size; + sym32_gen->st_info = sym32_fix->st_info; + sym32_gen->st_other = sym32_fix->st_other; + sym32_gen->st_shndx = sym32_fix->st_shndx; + + return 0; +} + +static int vdso_do_func_patch64(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64, + const char *orig, const char *fix) +{ + Elf64_Sym *sym64_gen, *sym64_fix; + + sym64_gen = find_symbol64(v64, orig); + if (sym64_gen == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", orig); + return -1; + } + sym64_fix = find_symbol64(v64, fix); + if (sym64_fix == NULL) { + printk(KERN_ERR "vDSO64: Can't find symbol %s !\n", fix); + return -1; + } + sym64_gen->st_value = sym64_fix->st_value; + sym64_gen->st_size = sym64_fix->st_size; + sym64_gen->st_info = sym64_fix->st_info; + sym64_gen->st_other = sym64_fix->st_other; + sym64_gen->st_shndx = sym64_fix->st_shndx; + + return 0; +} + +static __init int vdso_fixup_alt_funcs(struct lib32_elfinfo *v32, + struct lib64_elfinfo *v64) +{ + u32 pvr; + int i; + + pvr = mfspr(SPRN_PVR); + for (i = 0; i < ARRAY_SIZE(vdso_patches); i++) { + struct vdso_patch_def *patch = &vdso_patches[i]; + int match = (pvr & patch->pvr_mask) == patch->pvr_value; + + DBG("patch %d (mask: %x, pvr: %x) : %s\n", + i, patch->pvr_mask, patch->pvr_value, match ? "match" : "skip"); + + if (!match) + continue; + + DBG("replacing %s with %s...\n", patch->gen_name, patch->fix_name); + + /* + * Patch the 32 bits and 64 bits symbols. Note that we do not patch + * the "." symbol on 64 bits. It would be easy to do, but doesn't + * seem to be necessary, patching the OPD symbol is enough. + */ + vdso_do_func_patch32(v32, v64, patch->gen_name, patch->fix_name); + vdso_do_func_patch64(v32, v64, patch->gen_name, patch->fix_name); + } + + return 0; +} + + +static __init int vdso_setup(void) +{ + struct lib32_elfinfo v32; + struct lib64_elfinfo v64; + + v32.hdr = vdso32_kbase; + v64.hdr = vdso64_kbase; + + if (vdso_do_find_sections(&v32, &v64)) + return -1; + + if (vdso_fixup_datapage(&v32, &v64)) + return -1; + + if (vdso_fixup_alt_funcs(&v32, &v64)) + return -1; + + vdso_setup_trampolines(&v32, &v64); + + return 0; +} + +void __init vdso_init(void) +{ + int i; + + vdso64_pages = (&vdso64_end - &vdso64_start) >> PAGE_SHIFT; + vdso32_pages = (&vdso32_end - &vdso32_start) >> PAGE_SHIFT; + + DBG("vdso64_kbase: %p, 0x%x pages, vdso32_kbase: %p, 0x%x pages\n", + vdso64_kbase, vdso64_pages, vdso32_kbase, vdso32_pages); + + /* + * Initialize the vDSO images in memory, that is do necessary + * fixups of vDSO symbols, locate trampolines, etc... + */ + if (vdso_setup()) { + printk(KERN_ERR "vDSO setup failure, not enabled !\n"); + /* XXX should free pages here ? */ + vdso64_pages = vdso32_pages = 0; + return; + } + + /* Make sure pages are in the correct state */ + for (i = 0; i < vdso64_pages; i++) { + struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } + for (i = 0; i < vdso32_pages; i++) { + struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); + ClearPageReserved(pg); + get_page(pg); + } +} + +int in_gate_area_no_task(unsigned long addr) +{ + return 0; +} + +int in_gate_area(struct task_struct *task, unsigned long addr) +{ + return 0; +} + +struct vm_area_struct *get_gate_vma(struct task_struct *tsk) +{ + return NULL; +} + Index: linux-work/include/asm-ppc64/processor.h =================================================================== --- linux-work.orig/include/asm-ppc64/processor.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/processor.h 2005-02-02 13:28:01.000000000 +1100 @@ -544,8 +544,8 @@ /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ -#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(STACK_TOP_USER32 / 4)) -#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(STACK_TOP_USER64 / 4)) +#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4)) +#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(TASK_SIZE_USER64 / 4)) #define TASK_UNMAPPED_BASE ((test_thread_flag(TIF_32BIT)||(ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 ) @@ -562,7 +562,8 @@ double fpr[32]; /* Complete floating point set */ unsigned long fpscr; /* Floating point status (plus pad) */ unsigned long fpexc_mode; /* Floating-point exception mode */ - unsigned long pad[3]; /* was saved_msr, saved_softe */ + unsigned long pad[2]; /* was saved_msr, saved_softe */ + unsigned long vdso_base; /* base of the vDSO library */ #ifdef CONFIG_ALTIVEC /* Complete AltiVec register set */ vector128 vr[32] __attribute((aligned(16))); Index: linux-work/include/asm-ppc64/systemcfg.h =================================================================== --- linux-work.orig/include/asm-ppc64/systemcfg.h 2005-02-02 13:27:52.000000000 +1100 +++ linux-work/include/asm-ppc64/systemcfg.h 2005-02-02 13:28:01.000000000 +1100 @@ -20,10 +20,14 @@ * Minor version changes are a hint. */ #define SYSTEMCFG_MAJOR 1 -#define SYSTEMCFG_MINOR 0 +#define SYSTEMCFG_MINOR 1 #ifndef __ASSEMBLY__ +#include + +#define SYSCALL_MAP_SIZE ((__NR_syscalls + 31) / 32) + struct systemcfg { __u8 eye_catcher[16]; /* Eyecatcher: SYSTEMCFG:PPC64 0x00 */ struct { /* Systemcfg version numbers */ @@ -47,6 +51,8 @@ __u32 dcache_line_size; /* L1 d-cache line size 0x64 */ __u32 icache_size; /* L1 i-cache size 0x68 */ __u32 icache_line_size; /* L1 i-cache line size 0x6C */ + __u32 syscall_map_64[SYSCALL_MAP_SIZE]; /* map of available syscalls 0x70 */ + __u32 syscall_map_32[SYSCALL_MAP_SIZE]; /* map of available syscalls */ }; #ifdef __KERNEL__ Index: linux-work/include/asm-ppc64/a.out.h =================================================================== --- linux-work.orig/include/asm-ppc64/a.out.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/a.out.h 2005-02-02 13:28:01.000000000 +1100 @@ -30,14 +30,11 @@ #ifdef __KERNEL__ -#define STACK_TOP_USER64 (TASK_SIZE_USER64) +#define STACK_TOP_USER64 TASK_SIZE_USER64 +#define STACK_TOP_USER32 TASK_SIZE_USER32 -/* Give 32-bit user space a full 4G address space to live in. */ -#define STACK_TOP_USER32 (TASK_SIZE_USER32) - -#define STACK_TOP ((test_thread_flag(TIF_32BIT) || \ - (ppcdebugset(PPCDBG_BINFMT_32ADDR))) ? \ - STACK_TOP_USER32 : STACK_TOP_USER64) +#define STACK_TOP (test_thread_flag(TIF_32BIT) ? \ + STACK_TOP_USER32 : STACK_TOP_USER64) #endif /* __KERNEL__ */ Index: linux-work/include/asm-ppc64/elf.h =================================================================== --- linux-work.orig/include/asm-ppc64/elf.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/elf.h 2005-02-02 13:28:01.000000000 +1100 @@ -238,10 +238,20 @@ /* A special ignored type value for PPC, for glibc compatibility. */ #define AT_IGNOREPPC 22 +/* The vDSO location. We have to use the same value as x86 for glibc's + * sake :-) + */ +#define AT_SYSINFO_EHDR 33 + extern int dcache_bsize; extern int icache_bsize; extern int ucache_bsize; +/* We do have an arch_setup_additional_pages for vDSO matters */ +#define ARCH_HAS_SETUP_ADDITIONAL_PAGES +struct linux_binprm; +extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); + /* * The requirements here are: * - keep the final alignment of sp (sp & 0xf) @@ -260,6 +270,8 @@ NEW_AUX_ENT(AT_DCACHEBSIZE, dcache_bsize); \ NEW_AUX_ENT(AT_ICACHEBSIZE, icache_bsize); \ NEW_AUX_ENT(AT_UCACHEBSIZE, ucache_bsize); \ + /* vDSO base */ \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, current->thread.vdso_base); \ } while (0) /* PowerPC64 relocations defined by the ABIs */ Index: linux-work/include/asm-ppc64/time.h =================================================================== --- linux-work.orig/include/asm-ppc64/time.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/time.h 2005-02-02 13:28:01.000000000 +1100 @@ -43,10 +43,10 @@ struct gettimeofday_vars { unsigned long tb_to_xs; unsigned long stamp_xsec; + unsigned long tb_orig_stamp; }; struct gettimeofday_struct { - unsigned long tb_orig_stamp; unsigned long tb_ticks_per_sec; struct gettimeofday_vars vars[2]; struct gettimeofday_vars * volatile varp; Index: linux-work/fs/binfmt_elf.c =================================================================== --- linux-work.orig/fs/binfmt_elf.c 2005-01-31 14:18:24.000000000 +1100 +++ linux-work/fs/binfmt_elf.c 2005-02-02 13:28:01.000000000 +1100 @@ -772,6 +772,14 @@ goto out_free_dentry; } +#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES + retval = arch_setup_additional_pages(bprm, executable_stack); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + goto out_free_dentry; + } +#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ + current->mm->start_stack = bprm->p; /* Now we do a little grungy work by mmaping the ELF image into Index: linux-work/include/asm-ppc64/page.h =================================================================== --- linux-work.orig/include/asm-ppc64/page.h 2005-01-31 14:18:44.000000000 +1100 +++ linux-work/include/asm-ppc64/page.h 2005-02-02 13:28:01.000000000 +1100 @@ -185,6 +185,9 @@ extern u64 ppc64_pft_size; /* Log 2 of page table size */ +/* We do define AT_SYSINFO_EHDR but don't use the gate mecanism */ +#define __HAVE_ARCH_GATE_AREA 1 + #endif /* __ASSEMBLY__ */ #ifdef MODULE Index: linux-work/include/asm-ppc64/vdso.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/include/asm-ppc64/vdso.h 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,83 @@ +#ifndef __PPC64_VDSO_H__ +#define __PPC64_VDSO_H__ + +#ifdef __KERNEL__ + +/* Default link addresses for the vDSOs */ +#define VDSO32_LBASE 0 +#define VDSO64_LBASE 0 + +/* Default map addresses */ +#define VDSO32_MBASE 0x100000 +#define VDSO64_MBASE 0x100000 + +#define VDSO_VERSION_STRING LINUX_2.6.11 + +/* Define if 64 bits VDSO has procedure descriptors */ +#undef VDS64_HAS_DESCRIPTORS + +#ifndef __ASSEMBLY__ + +extern unsigned int vdso64_pages; +extern unsigned int vdso32_pages; + +/* Offsets relative to thread->vdso_base */ +extern unsigned long vdso64_rt_sigtramp; +extern unsigned long vdso32_sigtramp; +extern unsigned long vdso32_rt_sigtramp; + +extern void vdso_init(void); + +#else /* __ASSEMBLY__ */ + +#ifdef __VDSO64__ +#ifdef VDS64_HAS_DESCRIPTORS +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .section ".opd","a"; \ + .align 3; \ + name: \ + .quad .name,.TOC. at tocbase,0; \ + .previous; \ + .globl .name; \ + .type .name, at function; \ + .name: \ + +#define V_FUNCTION_END(name) \ + .size .name,.-.name; + +#define V_LOCAL_FUNC(name) (.name) + +#else /* VDS64_HAS_DESCRIPTORS */ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* VDS64_HAS_DESCRIPTORS */ +#endif /* __VDSO64__ */ + +#ifdef __VDSO32__ + +#define V_FUNCTION_BEGIN(name) \ + .globl name; \ + .type name, at function; \ + name: \ + +#define V_FUNCTION_END(name) \ + .size name,.-name; + +#define V_LOCAL_FUNC(name) (name) + +#endif /* __VDSO32__ */ + +#endif /* __ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* __PPC64_VDSO_H__ */ Index: linux-work/arch/ppc64/mm/init.c =================================================================== --- linux-work.orig/arch/ppc64/mm/init.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/mm/init.c 2005-02-02 13:28:01.000000000 +1100 @@ -62,6 +62,7 @@ #include #include #include +#include int mem_init_done; unsigned long ioremap_bot = IMALLOC_BASE; @@ -743,6 +744,8 @@ #ifdef CONFIG_PPC_ISERIES iommu_vio_init(); #endif + /* Initialize the vDSO */ + vdso_init(); } /* Index: linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,139 @@ +/* + * Userland implementation of gettimeofday() for 32 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r10,r3 /* r10 saves tv */ + mr r11,r4 /* r11 saves tz */ + bl __get_datapage at local /* get data page */ + mr r9, r3 /* datapage ptr in r9 */ + bl __do_get_xsec at local /* get xsec from tb & kernel */ + bne- 2f /* out of line -> do syscall */ + + /* seconds are xsec >> 20 */ + rlwinm r5,r4,12,20,31 + rlwimi r5,r3,12,0,19 + stw r5,TVAL32_TV_SEC(r10) + + /* get remaining xsec and convert to usec. we scale + * up remaining xsec by 12 bits and get the top 32 bits + * of the multiplication + */ + rlwinm r5,r4,12,0,19 + lis r6,1000000 at h + ori r6,r6,1000000 at l + mulhwu r5,r5,r6 + stw r5,TVAL32_TV_USEC(r10) + + cmpli cr0,r11,0 /* check if tz is NULL */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r9)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r9) + stw r4,TZONE_TZ_MINWEST(r11) + stw r5,TZONE_TZ_DSTTIME(r11) + +1: mtlr r12 + blr + +2: mr r3,r10 + mr r4,r11 + li r0,__NR_gettimeofday + sc + b 1b + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r3 & r4 and expects the datapage ptr (non clobbered) + * in r9. clobbers r0,r4,r5,r6,r7,r8 +*/ +__do_get_xsec: + .cfi_startproc + /* Check for update count & load values. We use the low + * order 32 bits of the update count + */ +1: lwz r8,(CFG_TB_UPDATE_COUNT+4)(r9) + andi. r0,r8,1 /* pending update ? loop */ + bne- 1b + xor r0,r8,r8 /* create dependency */ + add r9,r9,r0 + + /* Load orig stamp (offset to TB) */ + lwz r5,CFG_TB_ORIG_STAMP(r9) + lwz r6,(CFG_TB_ORIG_STAMP+4)(r9) + + /* Get a stable TB value */ +2: mftbu r3 + mftbl r4 + mftbu r0 + cmpl cr0,r3,r0 + bne- 2b + + /* Substract tb orig stamp. If the high part is non-zero, we jump to the + * slow path which call the syscall. If it's ok, then we have our 32 bits + * tb_ticks value in r7 + */ + subfc r7,r6,r4 + subfe. r0,r5,r3 + bne- 3f + + /* Load scale factor & do multiplication */ + lwz r5,CFG_TB_TO_XS(r9) /* load values */ + lwz r6,(CFG_TB_TO_XS+4)(r9) + mulhwu r4,r7,r5 + mulhwu r6,r7,r6 + mullw r6,r7,r5 + addc r6,r6,r0 + + /* At this point, we have the scaled xsec value in r4 + XER:CA + * we load & add the stamp since epoch + */ + lwz r5,CFG_STAMP_XSEC(r9) + lwz r6,(CFG_STAMP_XSEC+4)(r9) + adde r4,r4,r6 + addze r3,r5 + + /* We now have our result in r3,r4. We create a fake dependency + * on that result and re-check the counter + */ + xor r0,r4,r4 + add r9,r9,r0 + lwz r0,(CFG_TB_UPDATE_COUNT+4)(r9) + cmpl cr0,r8,r0 /* check if updated */ + bne- 1b + + /* Warning ! The caller expects CR:EQ to be set to indicate a + * successful calculation (so it won't fallback to the syscall + * method). We have overriden that CR bit in the counter check, + * but fortunately, the loop exit condition _is_ CR:EQ set, so + * we can exit safely here. If you change this code, be careful + * of that side effect. + */ +3: blr + .cfi_endproc Index: linux-work/arch/ppc64/kernel/vdso32/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,300 @@ +/* + * Signal trampolines for 32 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by adding a nop before + the real start. */ + nop +V_FUNCTION_BEGIN(__kernel_sigtramp32) +.Lsig_start = . - 4 + li r0,__NR_sigreturn + sc +.Lsig_end: +V_FUNCTION_END(__kernel_sigtramp32) + +.Lsigrt_start: + nop +V_FUNCTION_BEGIN(__kernel_sigtramp_rt32) + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt32) + + .section .eh_frame,"a", at progbits + +/* Register r1 can be found at offset 4 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. The VMX reg struct is at offset VREGS of + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 64+28 + +/* Size of regs. */ +#define RSIZE 4 + +/* This is the offset of the VMX regs. */ +#define VREGS 48*RSIZE+34*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 32*16) +#else +#define EH_FRAME_VMX +#endif + +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -4 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x1b /* DW_EH_PE_pcrel | DW_EH_PE_sdata4. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 4 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .long .Lsig_start - . /* PC start, length */ + .long .Lsig_end - .Lsig_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde0_end: + +/* We have a different stack layout for rt_sigreturn. */ +#undef PTREGS +#define PTREGS 64+16+128+20+28 + + .long .Lfde1_end - .Lfde1_start +.Lfde1_start: + .long .Lfde1_start - .Lcie /* CIE pointer. */ + .long .Lsigrt_start - . /* PC start, length */ + .long .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX + .balign 4 +.Lfde1_end: Index: linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32_wrapper.S 2005-02-10 11:07:53.000000000 +1100 @@ -0,0 +1,13 @@ +#include +#include + + .section ".data.page_aligned" + + .globl vdso32_start, vdso32_end + .balign PAGE_SIZE +vdso32_start: + .incbin "arch/ppc64/kernel/vdso32/vdso32.so" + .balign PAGE_SIZE +vdso32_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,110 @@ +/* + * This is the infamous ld script for the 64 bits vdso + * library + */ +#include + +OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc") +OUTPUT_ARCH(powerpc:common64) +ENTRY(_start) + +SECTIONS +{ + . = VDSO64_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + *(.sfpr .glink) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + + .opd ALIGN(8) : { KEEP (*(.opd)) } + .got ALIGN(8) : { *(.got .toc) } + .rela.dyn ALIGN(8) : { *(.rela.dyn) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + PROVIDE (end = .); + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sectio/ns. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.branch_lt) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp_rt64; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64_wrapper.S 2005-02-10 11:07:55.000000000 +1100 @@ -0,0 +1,13 @@ +#include +#include + + .section ".data.page_aligned" + + .globl vdso64_start, vdso64_end + .balign PAGE_SIZE +vdso64_start: + .incbin "arch/ppc64/kernel/vdso64/vdso64.so" + .balign PAGE_SIZE +vdso64_end: + + .previous Index: linux-work/arch/ppc64/kernel/vdso32/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/datapage.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl __get_datapage at local + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP32 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso32/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,36 @@ + +# List of files in the vdso, has to be asm only for now + +obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o + +# Build rules + +targets := $(obj-vdso32) vdso32.so +obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32)) + + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso32.so.1 +EXTRA_AFLAGS := -D__VDSO32__ -s + +obj-y += vdso32_wrapper.o +extra-y += vdso32.lds +CPPFLAGS_vdso32.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso32.so: $(src)/vdso32.lds $(obj-vdso32) + $(call if_changed,vdso32ld) + +# assembly rules for the .S files +$(obj-vdso32): %.o: %.S + $(call if_changed_dep,vdso32as) + +# actual build commands +quiet_cmd_vdso32ld = VDSO32L $@ + cmd_vdso32ld = $(CROSS32CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso32as = VDSO32A $@ + cmd_vdso32as = $(CROSS32CC) $(a_flags) -c -o $@ $< + Index: linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/gettimeofday.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,91 @@ +/* + * Userland implementation of gettimeofday() for 64 bits processes in a + * ppc64 kernel for use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text +/* + * Exact prototype of gettimeofday + * + * int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); + * + */ +V_FUNCTION_BEGIN(__kernel_gettimeofday) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r11,r3 /* r11 holds tv */ + mr r10,r4 /* r10 holds tz */ + bl V_LOCAL_FUNC(__get_datapage) /* get data page */ + bl V_LOCAL_FUNC(__do_get_xsec) /* get xsec from tb & kernel */ + lis r7,15 /* r7 = 1000000 = USEC_PER_SEC */ + ori r7,r7,16960 + rldicl r5,r4,44,20 /* r5 = sec = xsec / XSEC_PER_SEC */ + rldicr r6,r5,20,43 /* r6 = sec * XSEC_PER_SEC */ + std r5,TVAL64_TV_SEC(r11) /* store sec in tv */ + subf r0,r6,r4 /* r0 = xsec = (xsec - r6) */ + mulld r0,r0,r7 /* usec = (xsec * USEC_PER_SEC) / XSEC_PER_SEC */ + rldicl r0,r0,44,20 + cmpldi cr0,r10,0 /* check if tz is NULL */ + std r0,TVAL64_TV_USEC(r11) /* store usec in tv */ + beq 1f + lwz r4,CFG_TZ_MINUTEWEST(r3)/* fill tz */ + lwz r5,CFG_TZ_DSTTIME(r3) + stw r4,TZONE_TZ_MINWEST(r10) + stw r5,TZONE_TZ_DSTTIME(r10) +1: mtlr r12 + li r3,0 /* always success */ + blr + .cfi_endproc +V_FUNCTION_END(__kernel_gettimeofday) + + +/* + * This is the core of gettimeofday(), it returns the xsec + * value in r4 and expects the datapage ptr (non clobbered) + * in r3. clobbers r0,r4,r5,r6,r7,r8 +*/ +V_FUNCTION_BEGIN(__do_get_xsec) + .cfi_startproc + /* check for update count & load values */ +1: ld r7,CFG_TB_UPDATE_COUNT(r3) + andi. r0,r4,1 /* pending update ? loop */ + bne- 1b + xor r0,r4,r4 /* create dependency */ + add r3,r3,r0 + + /* Get TB & offset it */ + mftb r8 + ld r9,CFG_TB_ORIG_STAMP(r3) + subf r8,r9,r8 + + /* Scale result */ + ld r5,CFG_TB_TO_XS(r3) + mulhdu r8,r8,r5 + + /* Add stamp since epoch */ + ld r6,CFG_STAMP_XSEC(r3) + add r4,r6,r8 + + xor r0,r4,r4 + add r3,r3,r0 + ld r0,CFG_TB_UPDATE_COUNT(r3) + cmpld cr0,r0,r7 /* check if updated */ + bne- 1b + blr + .cfi_endproc +V_FUNCTION_END(__do_get_xsec) Index: linux-work/arch/ppc64/kernel/vdso64/datapage.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/datapage.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,68 @@ +/* + * Access to the shared data page by the vDSO & syscall map + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include + + .text +V_FUNCTION_BEGIN(__get_datapage) + .cfi_startproc + /* We don't want that exposed or overridable as we want other objects + * to be able to bl directly to here + */ + .protected __get_datapage + .hidden __get_datapage + + mflr r0 + .cfi_register lr,r0 + + bcl 20,31,1f + .global __kernel_datapage_offset; +__kernel_datapage_offset: + .long 0 +1: + mflr r3 + mtlr r0 + lwz r0,0(r3) + add r3,r0,r3 + blr + .cfi_endproc +V_FUNCTION_END(__get_datapage) + +/* + * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; + * + * returns a pointer to the syscall map. the map is agnostic to the + * size of "long", unlike kernel bitops, it stores bits from top to + * bottom so that memory actually contains a linear bitmap + * check for syscall N by testing bit (0x80000000 >> (N & 0x1f)) of + * 32 bits int at N >> 5. + */ +V_FUNCTION_BEGIN(__kernel_get_syscall_map) + .cfi_startproc + mflr r12 + .cfi_register lr,r12 + + mr r4,r3 + bl V_LOCAL_FUNC(__get_datapage) + mtlr r12 + addi r3,r3,CFG_SYSCALL_MAP64 + cmpli cr0,r4,0 + beqlr + li r0,__NR_syscalls + stw r0,0(r4) + blr + .cfi_endproc +V_FUNCTION_END(__kernel_get_syscall_map) Index: linux-work/arch/ppc64/kernel/vdso64/sigtramp.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/sigtramp.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,294 @@ +/* + * Signal trampoline for 64 bits processes in a ppc64 kernel for + * use in the vDSO + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), IBM Corp. + * Copyright (C) 2004 Alan Modra (amodra at au.ibm.com)), IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from + the return address to get an address in the middle of the presumed + call instruction. Since we don't have a call here, we artifically + extend the range covered by the unwind info by padding before the + real start. */ + nop + .balign 8 +V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) +.Lsigrt_start = . - 4 + addi r1, r1, __SIGNAL_FRAMESIZE + li r0,__NR_rt_sigreturn + sc +.Lsigrt_end: +V_FUNCTION_END(__kernel_sigtramp_rt64) +/* The ".balign 8" above and the following zeros mimic the old stack + trampoline layout. The last magic value is the ucontext pointer, + chosen in such a way that older libgcc unwind code returns a zero + for a sigcontext pointer. */ + .long 0,0,0 + .quad 0,-21*8 + +/* Register r1 can be found at offset 8 of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define cfa_save \ + .byte 0x0f; /* DW_CFA_def_cfa_expression */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 RSIZE; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ +9: + +/* Register REGNO can be found at offset OFS of a pt_regs structure. + A pointer to the pt_regs is stored in memory at the old sp plus PTREGS. */ +#define rsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .ifne ofs; \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ + .endif; \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. A pointer to the VMX reg struct is at VREGS in + the pt_regs struct. This macro is for REGNO == 0, and contains + 'subroutines' that the other macros jump to. */ +#define vsave_msr0(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit0 */ \ +2: \ + .byte 0x40; /* DW_OP_lit16 */ \ + .byte 0x1e; /* DW_OP_mul */ \ +3: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x12; /* DW_OP_dup */ \ + .byte 0x23; /* DW_OP_plus_uconst */ \ + .uleb128 33*RSIZE; /* msr offset */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x0c; .long 1 << 25; /* DW_OP_const4u */ \ + .byte 0x1a; /* DW_OP_and */ \ + .byte 0x12; /* DW_OP_dup, ret 0 if bra taken */ \ + .byte 0x30; /* DW_OP_lit0 */ \ + .byte 0x29; /* DW_OP_eq */ \ + .byte 0x28; .short 0x7fff; /* DW_OP_bra to end */ \ + .byte 0x13; /* DW_OP_drop, pop the 0 */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x22; /* DW_OP_plus */ \ + .byte 0x2f; .short 0x7fff; /* DW_OP_skip to end */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset REGNO*16 + of the VMX reg struct. REGNO is 1 thru 31. */ +#define vsave_msr1(regno) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x30 + regno; /* DW_OP_lit n */ \ + .byte 0x2f; .short 2b - 9f; /* DW_OP_skip */ \ +9: + +/* If msr bit 1<<25 is set, then VMX register REGNO is at offset OFS of + the VMX save block. */ +#define vsave_msr2(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x0a; .short ofs; /* DW_OP_const2u */ \ + .byte 0x2f; .short 3b - 9f; /* DW_OP_skip */ \ +9: + +/* VMX register REGNO is at offset OFS of the VMX save area. */ +#define vsave(regno, ofs) \ + .byte 0x10; /* DW_CFA_expression */ \ + .uleb128 regno + 77; /* regno */ \ + .uleb128 9f - 1f; /* length */ \ +1: \ + .byte 0x71; .sleb128 PTREGS; /* DW_OP_breg1 */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 VREGS; /* DW_OP_plus_uconst */ \ + .byte 0x06; /* DW_OP_deref */ \ + .byte 0x23; .uleb128 ofs; /* DW_OP_plus_uconst */ \ +9: + +/* This is where the pt_regs pointer can be found on the stack. */ +#define PTREGS 128+168+56 + +/* Size of regs. */ +#define RSIZE 8 + +/* This is the offset of the VMX reg pointer. */ +#define VREGS 48*RSIZE+33*8 + +/* Describe where general purpose regs are saved. */ +#define EH_FRAME_GEN \ + cfa_save; \ + rsave ( 0, 0*RSIZE); \ + rsave ( 2, 2*RSIZE); \ + rsave ( 3, 3*RSIZE); \ + rsave ( 4, 4*RSIZE); \ + rsave ( 5, 5*RSIZE); \ + rsave ( 6, 6*RSIZE); \ + rsave ( 7, 7*RSIZE); \ + rsave ( 8, 8*RSIZE); \ + rsave ( 9, 9*RSIZE); \ + rsave (10, 10*RSIZE); \ + rsave (11, 11*RSIZE); \ + rsave (12, 12*RSIZE); \ + rsave (13, 13*RSIZE); \ + rsave (14, 14*RSIZE); \ + rsave (15, 15*RSIZE); \ + rsave (16, 16*RSIZE); \ + rsave (17, 17*RSIZE); \ + rsave (18, 18*RSIZE); \ + rsave (19, 19*RSIZE); \ + rsave (20, 20*RSIZE); \ + rsave (21, 21*RSIZE); \ + rsave (22, 22*RSIZE); \ + rsave (23, 23*RSIZE); \ + rsave (24, 24*RSIZE); \ + rsave (25, 25*RSIZE); \ + rsave (26, 26*RSIZE); \ + rsave (27, 27*RSIZE); \ + rsave (28, 28*RSIZE); \ + rsave (29, 29*RSIZE); \ + rsave (30, 30*RSIZE); \ + rsave (31, 31*RSIZE); \ + rsave (67, 32*RSIZE); /* ap, used as temp for nip */ \ + rsave (65, 36*RSIZE); /* lr */ \ + rsave (70, 38*RSIZE) /* cr */ + +/* Describe where the FP regs are saved. */ +#define EH_FRAME_FP \ + rsave (32, 48*RSIZE + 0*8); \ + rsave (33, 48*RSIZE + 1*8); \ + rsave (34, 48*RSIZE + 2*8); \ + rsave (35, 48*RSIZE + 3*8); \ + rsave (36, 48*RSIZE + 4*8); \ + rsave (37, 48*RSIZE + 5*8); \ + rsave (38, 48*RSIZE + 6*8); \ + rsave (39, 48*RSIZE + 7*8); \ + rsave (40, 48*RSIZE + 8*8); \ + rsave (41, 48*RSIZE + 9*8); \ + rsave (42, 48*RSIZE + 10*8); \ + rsave (43, 48*RSIZE + 11*8); \ + rsave (44, 48*RSIZE + 12*8); \ + rsave (45, 48*RSIZE + 13*8); \ + rsave (46, 48*RSIZE + 14*8); \ + rsave (47, 48*RSIZE + 15*8); \ + rsave (48, 48*RSIZE + 16*8); \ + rsave (49, 48*RSIZE + 17*8); \ + rsave (50, 48*RSIZE + 18*8); \ + rsave (51, 48*RSIZE + 19*8); \ + rsave (52, 48*RSIZE + 20*8); \ + rsave (53, 48*RSIZE + 21*8); \ + rsave (54, 48*RSIZE + 22*8); \ + rsave (55, 48*RSIZE + 23*8); \ + rsave (56, 48*RSIZE + 24*8); \ + rsave (57, 48*RSIZE + 25*8); \ + rsave (58, 48*RSIZE + 26*8); \ + rsave (59, 48*RSIZE + 27*8); \ + rsave (60, 48*RSIZE + 28*8); \ + rsave (61, 48*RSIZE + 29*8); \ + rsave (62, 48*RSIZE + 30*8); \ + rsave (63, 48*RSIZE + 31*8) + +/* Describe where the VMX regs are saved. */ +#ifdef CONFIG_ALTIVEC +#define EH_FRAME_VMX \ + vsave_msr0 ( 0); \ + vsave_msr1 ( 1); \ + vsave_msr1 ( 2); \ + vsave_msr1 ( 3); \ + vsave_msr1 ( 4); \ + vsave_msr1 ( 5); \ + vsave_msr1 ( 6); \ + vsave_msr1 ( 7); \ + vsave_msr1 ( 8); \ + vsave_msr1 ( 9); \ + vsave_msr1 (10); \ + vsave_msr1 (11); \ + vsave_msr1 (12); \ + vsave_msr1 (13); \ + vsave_msr1 (14); \ + vsave_msr1 (15); \ + vsave_msr1 (16); \ + vsave_msr1 (17); \ + vsave_msr1 (18); \ + vsave_msr1 (19); \ + vsave_msr1 (20); \ + vsave_msr1 (21); \ + vsave_msr1 (22); \ + vsave_msr1 (23); \ + vsave_msr1 (24); \ + vsave_msr1 (25); \ + vsave_msr1 (26); \ + vsave_msr1 (27); \ + vsave_msr1 (28); \ + vsave_msr1 (29); \ + vsave_msr1 (30); \ + vsave_msr1 (31); \ + vsave_msr2 (33, 32*16+12); \ + vsave (32, 33*16) +#else +#define EH_FRAME_VMX +#endif + + .section .eh_frame,"a", at progbits +.Lcie: + .long .Lcie_end - .Lcie_start +.Lcie_start: + .long 0 /* CIE ID */ + .byte 1 /* Version number */ + .string "zR" /* NUL-terminated augmentation string */ + .uleb128 4 /* Code alignment factor */ + .sleb128 -8 /* Data alignment factor */ + .byte 67 /* Return address register column, ap */ + .uleb128 1 /* Augmentation value length */ + .byte 0x14 /* DW_EH_PE_pcrel | DW_EH_PE_udata8. */ + .byte 0x0c,1,0 /* DW_CFA_def_cfa: r1 ofs 0 */ + .balign 8 +.Lcie_end: + + .long .Lfde0_end - .Lfde0_start +.Lfde0_start: + .long .Lfde0_start - .Lcie /* CIE pointer. */ + .quad .Lsigrt_start - . /* PC start, length */ + .quad .Lsigrt_end - .Lsigrt_start + .uleb128 0 /* Augmentation */ + EH_FRAME_GEN + EH_FRAME_FP + EH_FRAME_VMX +# Do we really need to describe the frame at this point? ie. will +# we ever have some call chain that returns somewhere past the addi? +# I don't think so, since gcc doesn't support async signals. +# .byte 0x41 /* DW_CFA_advance_loc 1*4 */ +#undef PTREGS +#define PTREGS 168+56 +# EH_FRAME_GEN +# EH_FRAME_FP +# EH_FRAME_VMX + .balign 8 +.Lfde0_end: Index: linux-work/arch/ppc64/kernel/vdso64/Makefile =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/Makefile 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,35 @@ +# List of files in the vdso, has to be asm only for now + +obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o + +# Build rules + +targets := $(obj-vdso64) vdso64.so +obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64)) + +EXTRA_CFLAGS := -shared -s -fno-common -fno-builtin +EXTRA_CFLAGS += -nostdlib -Wl,-soname=linux-vdso64.so.1 +EXTRA_AFLAGS := -D__VDSO64__ -s + +obj-y += vdso64_wrapper.o +extra-y += vdso64.lds +CPPFLAGS_vdso64.lds += -P -C -U$(ARCH) + +# Force dependency (incbin is bad) +$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so + +# link rule for the .so file, .lds has to be first +$(obj)/vdso64.so: $(src)/vdso64.lds $(obj-vdso64) + $(call if_changed,vdso64ld) + +# assembly rules for the .S files +$(obj-vdso64): %.o: %.S + $(call if_changed_dep,vdso64as) + +# actual build commands +quiet_cmd_vdso64ld = VDSO64L $@ + cmd_vdso64ld = $(CC) $(c_flags) -Wl,-T $^ -o $@ +quiet_cmd_vdso64as = VDSO64A $@ + cmd_vdso64as = $(CC) $(a_flags) -c -o $@ $< + + Index: linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,111 @@ + +/* + * This is the infamous ld script for the 32 bits vdso + * library + */ +#include + +/* Default link addresses for the vDSOs */ +OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") +OUTPUT_ARCH(powerpc:common) +ENTRY(_start) + +SECTIONS +{ + . = VDSO32_LBASE + SIZEOF_HEADERS; + .hash : { *(.hash) } :text + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + . = ALIGN (16); + .text : + { + *(.text .stub .text.* .gnu.linkonce.t.*) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + + /* Other stuff is appended to the text segment: */ + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + + .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr + .eh_frame : { KEEP (*(.eh_frame)) } :text + .gcc_except_table : { *(.gcc_except_table) } + .fixup : { *(.fixup) } + + .got ALIGN(4) : { *(.got.plt) *(.got) } + + .dynamic : { *(.dynamic) } :text :dynamic + + _end = .; + __end = .; + PROVIDE (end = .); + + + /* Stabs debugging sections are here too + */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + + /DISCARD/ : { *(.note.GNU-stack) } + /DISCARD/ : { *(.data .data.* .gnu.linkonce.d.* .sdata*) } + /DISCARD/ : { *(.bss .sbss .dynbss .dynsbss) } +} + + +PHDRS +{ + text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ +} + + +/* + * This controls what symbols we export from the DSO. + */ +VERSION +{ + VDSO_VERSION_STRING { + global: + __kernel_datapage_offset; /* Has to be there for the kernel to find it */ + __kernel_get_syscall_map; + __kernel_gettimeofday; + __kernel_sync_dicache; + __kernel_sync_dicache_p5; + __kernel_sigtramp32; + __kernel_sigtramp_rt32; + local: *; + }; +} Index: linux-work/arch/ppc64/kernel/vdso32/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/cacheflush.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,65 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) + Index: linux-work/arch/ppc64/kernel/vdso64/cacheflush.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/cacheflush.S 2005-02-02 13:28:01.000000000 +1100 @@ -0,0 +1,64 @@ +/* + * vDSO provided cache flush routines + * + * Copyright (C) 2004 Benjamin Herrenschmuidt (benh at kernel.crashing.org), + * IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#include +#include +#include +#include +#include + + .text + +/* + * Default "generic" version of __kernel_sync_dicache. + * + * void __kernel_sync_dicache(unsigned long start, unsigned long end) + * + * Flushes the data cache & invalidate the instruction cache for the + * provided range [start, end[ + * + * Note: all CPUs supported by this kernel have a 128 bytes cache + * line size so we don't have to peek that info from the datapage + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache) + .cfi_startproc + li r5,127 + andc r6,r3,r5 /* round low to line bdy */ + subf r8,r6,r4 /* compute length */ + add r8,r8,r5 /* ensure we get enough */ + srwi. r8,r8,7 /* compute line count */ + beqlr /* nothing to do? */ + mtctr r8 + mr r3,r6 +1: dcbst 0,r3 + addi r3,r3,128 + bdnz 1b + sync + mtctr r8 +1: icbi 0,r6 + addi r6,r6,128 + bdnz 1b + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache) + + +/* + * POWER5 version of __kernel_sync_dicache + */ +V_FUNCTION_BEGIN(__kernel_sync_dicache_p5) + .cfi_startproc + sync + isync + blr + .cfi_endproc +V_FUNCTION_END(__kernel_sync_dicache_p5) Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-02-02 13:27:52.000000000 +1100 +++ linux-work/arch/ppc64/kernel/head.S 2005-02-02 13:28:01.000000000 +1100 @@ -54,7 +54,6 @@ * 0x0100 - 0x2fff : pSeries Interrupt prologs * 0x3000 - 0x3fff : Interrupt support * 0x4000 - 0x4fff : NACA - * 0x5000 - 0x5fff : SystemCfg * 0x6000 : iSeries and common interrupt prologs * 0x9000 - 0x9fff : Initial segment table */ Index: linux-work/arch/ppc64/boot/Makefile =================================================================== --- linux-work.orig/arch/ppc64/boot/Makefile 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/boot/Makefile 2005-02-02 13:28:01.000000000 +1100 @@ -20,17 +20,11 @@ # CROSS32_COMPILE is setup as a prefix just like CROSS_COMPILE # in the toplevel makefile. -CROSS32_COMPILE ?= -#CROSS32_COMPILE = /usr/local/ppc/bin/powerpc-linux- -BOOTCC := $(CROSS32_COMPILE)gcc HOSTCC := gcc BOOTCFLAGS := $(HOSTCFLAGS) $(LINUXINCLUDE) -fno-builtin -BOOTAS := $(CROSS32_COMPILE)as BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -BOOTLD := $(CROSS32_COMPILE)ld BOOTLFLAGS := -Ttext 0x00400000 -e _start -T $(srctree)/$(src)/zImage.lds -BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy OBJCOPYFLAGS := contents,alloc,load,readonly,data src-boot := crt0.S string.S prom.c main.c zlib.c imagesize.c div64.S @@ -38,10 +32,10 @@ obj-boot := $(addsuffix .o, $(basename $(src-boot))) quiet_cmd_bootcc = BOOTCC $@ - cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< + cmd_bootcc = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< quiet_cmd_bootas = BOOTAS $@ - cmd_bootas = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< + cmd_bootas = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $< $(patsubst %.c,%.o, $(filter %.c, $(src-boot))): %.o: %.c $(call if_changed_dep,bootcc) @@ -77,15 +71,15 @@ $(obj)/vmlinux.initrd: vmlinux.strip $(obj)/addRamDisk $(obj)/ramdisk.image.gz FORCE $(call if_changed,ramdisk) -addsection = $(BOOTOBJCOPY) $(1) \ +addsection = $(CROSS32OBJCOPY) $(1) \ --add-section=.kernel:$(strip $(patsubst $(obj)/kernel-%.o,%, $(1)))=$(patsubst %.o,%.gz, $(1)) \ --set-section-flags=.kernel:$(strip $(patsubst $(obj)/kernel-%.o,%, $(1)))=$(OBJCOPYFLAGS) quiet_cmd_addnote = ADDNOTE $@ - cmd_addnote = $(BOOTLD) $(BOOTLFLAGS) -o $@ $(obj-boot) && $(obj)/addnote $@ + cmd_addnote = $(CROSS32LD) $(BOOTLFLAGS) -o $@ $(obj-boot) && $(obj)/addnote $@ quiet_cmd_piggy = PIGGY $@ - cmd_piggy = $(obj)/piggyback $(@:.o=) < $< | $(BOOTAS) -o $@ + cmd_piggy = $(obj)/piggyback $(@:.o=) < $< | $(CROSS32AS) -o $@ $(call gz-sec, $(required)): $(obj)/kernel-%.gz: % FORCE $(call if_changed,gzip) From paulus at samba.org Thu Feb 10 14:12:01 2005 From: paulus at samba.org (Paul Mackerras) Date: Thu, 10 Feb 2005 14:12:01 +1100 Subject: [PATCH] ppc64: Mode 2 PCI-X config space size fix In-Reply-To: <420A6343.6070307@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> <41FFE3AF.706@us.ibm.com> <420A6343.6070307@us.ibm.com> Message-ID: <16906.53505.649325.792660@cargo.ozlabs.ibm.com> Brian King writes: > Trimming the cc list a bit since this has become a PPC64 only patch and > resending... Unless you think this really needs to go in 2.6.11, I'll defer it until after 2.6.11 is out, since we're supposed to be in bug-fix/stabilization mode for 2.6.11. Are you OK with that? Oh, and a minor nit: > + if (type && *type == 1) > + dn->pci_ext_config_space = 1; > + else > + dn->pci_ext_config_space = 0; is more compactly expressed as: dn->pci_ext_config_space = (type && *type == 1); Regards, Paul. From sfr at canb.auug.org.au Thu Feb 10 15:48:46 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 10 Feb 2005 15:48:46 +1100 Subject: [PATCH] distribute ppc64 EXPORT_SYMBOLs Message-ID: <20050210154846.7f1b54f3.sfr@canb.auug.org.au> Hi, This patch just moves aas many as possible EXPORT_SYMBOL()s from arch/ppc64/kernel/ppc_ksyms.c to where the symbols are defined. This has been compiled on pSeries, iSeries and pmac. Please apply. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk/arch/ppc64/kernel/LparData.c linus-bk.sfr.1/arch/ppc64/kernel/LparData.c --- linus-bk/arch/ppc64/kernel/LparData.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/LparData.c 2005-02-10 14:19:50.000000000 +1100 @@ -224,6 +224,7 @@ }; struct msChunks msChunks; +EXPORT_SYMBOL(msChunks); /* Depending on whether this is called from iSeries or pSeries setup * code, the location of the msChunks struct may or may not have diff -ruN linus-bk/arch/ppc64/kernel/cputable.c linus-bk.sfr.1/arch/ppc64/kernel/cputable.c --- linus-bk/arch/ppc64/kernel/cputable.c 2004-06-28 14:30:54.000000000 +1000 +++ linus-bk.sfr.1/arch/ppc64/kernel/cputable.c 2005-02-10 14:46:05.000000000 +1100 @@ -17,9 +17,12 @@ #include #include #include +#include + #include struct cpu_spec* cur_cpu_spec = NULL; +EXPORT_SYMBOL(cur_cpu_spec); /* NOTE: * Unlike ppc32, ppc64 will only call this once for the boot CPU, it's diff -ruN linus-bk/arch/ppc64/kernel/irq.c linus-bk.sfr.1/arch/ppc64/kernel/irq.c --- linus-bk/arch/ppc64/kernel/irq.c 2005-01-22 06:09:00.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/irq.c 2005-02-10 14:41:29.000000000 +1100 @@ -61,6 +61,7 @@ #endif extern irq_desc_t irq_desc[NR_IRQS]; +EXPORT_SYMBOL(irq_desc); int distribute_irqs = 1; int __irq_offset_value; diff -ruN linus-bk/arch/ppc64/kernel/pacaData.c linus-bk.sfr.1/arch/ppc64/kernel/pacaData.c --- linus-bk/arch/ppc64/kernel/pacaData.c 2005-01-12 16:05:22.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/pacaData.c 2005-02-10 14:44:19.000000000 +1100 @@ -216,3 +216,4 @@ #endif #endif }; +EXPORT_SYMBOL(paca); diff -ruN linus-bk/arch/ppc64/kernel/pci.c linus-bk.sfr.1/arch/ppc64/kernel/pci.c --- linus-bk/arch/ppc64/kernel/pci.c 2005-01-22 06:09:00.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/pci.c 2005-02-10 11:58:17.000000000 +1100 @@ -63,7 +63,9 @@ * page is mapped and isa_io_limit prevents access to it. */ unsigned long isa_io_base; /* NULL if no ISA bus */ +EXPORT_SYMBOL(isa_io_base); unsigned long pci_io_base; +EXPORT_SYMBOL(pci_io_base); void iSeries_pcibios_init(void); diff -ruN linus-bk/arch/ppc64/kernel/ppc_ksyms.c linus-bk.sfr.1/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk/arch/ppc64/kernel/ppc_ksyms.c 2005-01-12 16:05:22.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/ppc_ksyms.c 2005-02-10 15:08:06.000000000 +1100 @@ -8,49 +8,18 @@ */ #include #include -#include -#include -#include -#include #include -#include -#include #include -#include -#include -#include -#include -#include -#include +#include -#include -#include #include #include #include -#include -#include -#include -#include #include -#include -#include -#include -#include #include #include #include -#ifdef CONFIG_PPC_ISERIES #include -#include -#endif - -extern int do_signal(sigset_t *, struct pt_regs *); - -EXPORT_SYMBOL(do_signal); - -EXPORT_SYMBOL(isa_io_base); -EXPORT_SYMBOL(pci_io_base); EXPORT_SYMBOL(strcpy); EXPORT_SYMBOL(strncpy); @@ -65,10 +34,6 @@ EXPORT_SYMBOL(strcmp); EXPORT_SYMBOL(strncmp); -EXPORT_SYMBOL(__down_interruptible); -EXPORT_SYMBOL(__up); -EXPORT_SYMBOL(__down); - EXPORT_SYMBOL(csum_partial); EXPORT_SYMBOL(csum_partial_copy_generic); EXPORT_SYMBOL(ip_fast_csum); @@ -79,11 +44,6 @@ EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); -EXPORT_SYMBOL(clear_user_page); - -#ifdef CONFIG_MSCHUNKS -EXPORT_SYMBOL(msChunks); -#endif EXPORT_SYMBOL(reloc_offset); #ifdef CONFIG_PPC_ISERIES @@ -107,11 +67,7 @@ EXPORT_SYMBOL(_outsw_ns); EXPORT_SYMBOL(_insl_ns); EXPORT_SYMBOL(_outsl_ns); -EXPORT_SYMBOL(ioremap); -EXPORT_SYMBOL(__ioremap); -EXPORT_SYMBOL(iounmap); -EXPORT_SYMBOL(start_thread); EXPORT_SYMBOL(kernel_thread); EXPORT_SYMBOL(giveup_fpu); @@ -119,8 +75,7 @@ EXPORT_SYMBOL(giveup_altivec); #endif EXPORT_SYMBOL(flush_icache_range); -EXPORT_SYMBOL(flush_icache_user_range); -EXPORT_SYMBOL(flush_dcache_page); + #ifdef CONFIG_SMP #ifdef CONFIG_PPC_ISERIES EXPORT_SYMBOL(local_get_flags); @@ -129,19 +84,6 @@ #endif #endif -EXPORT_SYMBOL(ppc_md); - -#ifdef CONFIG_PPC_MULTIPLATFORM -EXPORT_SYMBOL(find_devices); -EXPORT_SYMBOL(find_type_devices); -EXPORT_SYMBOL(find_compatible_devices); -EXPORT_SYMBOL(find_path_device); -EXPORT_SYMBOL(device_is_compatible); -EXPORT_SYMBOL(machine_is_compatible); -EXPORT_SYMBOL(find_all_nodes); -EXPORT_SYMBOL(get_property); -#endif - EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(memset); EXPORT_SYMBOL(memmove); @@ -150,10 +92,4 @@ EXPORT_SYMBOL(memchr); EXPORT_SYMBOL(timer_interrupt); -EXPORT_SYMBOL(irq_desc); -EXPORT_SYMBOL(get_wchan); EXPORT_SYMBOL(console_drivers); - -EXPORT_SYMBOL(tb_ticks_per_usec); -EXPORT_SYMBOL(paca); -EXPORT_SYMBOL(cur_cpu_spec); diff -ruN linus-bk/arch/ppc64/kernel/process.c linus-bk.sfr.1/arch/ppc64/kernel/process.c --- linus-bk/arch/ppc64/kernel/process.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/process.c 2005-02-10 14:42:24.000000000 +1100 @@ -469,6 +469,7 @@ current->thread.used_vr = 0; #endif /* CONFIG_ALTIVEC */ } +EXPORT_SYMBOL(start_thread); int set_fpexc_mode(struct task_struct *tsk, unsigned int val) { @@ -607,6 +608,7 @@ } while (count++ < 16); return 0; } +EXPORT_SYMBOL(get_wchan); void show_stack(struct task_struct *p, unsigned long *_sp) { diff -ruN linus-bk/arch/ppc64/kernel/prom.c linus-bk.sfr.1/arch/ppc64/kernel/prom.c --- linus-bk/arch/ppc64/kernel/prom.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/prom.c 2005-02-10 14:39:22.000000000 +1100 @@ -32,6 +32,8 @@ #include #include #include +#include + #include #include #include @@ -1138,6 +1140,7 @@ *prevp = NULL; return head; } +EXPORT_SYMBOL(find_devices); /** * Construct and return a list of the device_nodes with a given type. @@ -1157,6 +1160,7 @@ *prevp = NULL; return head; } +EXPORT_SYMBOL(find_type_devices); /** * Returns all nodes linked together @@ -1174,6 +1178,7 @@ *prevp = NULL; return head; } +EXPORT_SYMBOL(find_all_nodes); /** Checks if the given "compat" string matches one of the strings in * the device's "compatible" property @@ -1197,6 +1202,7 @@ return 0; } +EXPORT_SYMBOL(device_is_compatible); /** @@ -1216,6 +1222,7 @@ } return rc; } +EXPORT_SYMBOL(machine_is_compatible); /** * Construct and return a list of the device_nodes with a given type @@ -1239,6 +1246,7 @@ *prevp = NULL; return head; } +EXPORT_SYMBOL(find_compatible_devices); /** * Find the device_node with a given full_name. @@ -1253,6 +1261,7 @@ return np; return NULL; } +EXPORT_SYMBOL(find_path_device); /******* * @@ -1872,6 +1881,7 @@ } return NULL; } +EXPORT_SYMBOL(get_property); /* * Add a property to a node diff -ruN linus-bk/arch/ppc64/kernel/prom_init.c linus-bk.sfr.1/arch/ppc64/kernel/prom_init.c --- linus-bk/arch/ppc64/kernel/prom_init.c 2005-01-05 17:06:07.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/prom_init.c 2005-02-10 14:57:18.000000000 +1100 @@ -151,7 +151,6 @@ extern void __start(unsigned long r3, unsigned long r4, unsigned long r5); -extern unsigned long reloc_offset(void); extern void enter_prom(struct prom_args *args, unsigned long entry); extern void copy_and_flush(unsigned long dest, unsigned long src, unsigned long size, unsigned long offset); diff -ruN linus-bk/arch/ppc64/kernel/semaphore.c linus-bk.sfr.1/arch/ppc64/kernel/semaphore.c --- linus-bk/arch/ppc64/kernel/semaphore.c 2004-04-13 09:25:09.000000000 +1000 +++ linus-bk.sfr.1/arch/ppc64/kernel/semaphore.c 2005-02-10 12:07:49.000000000 +1100 @@ -18,6 +18,8 @@ #include #include +#include + #include #include #include @@ -62,6 +64,7 @@ __sem_update_count(sem, 1); wake_up(&sem->wait); } +EXPORT_SYMBOL(__up); /* * Note that when we come in to __down or __down_interruptible, @@ -99,6 +102,7 @@ */ wake_up(&sem->wait); } +EXPORT_SYMBOL(__down); int __sched __down_interruptible(struct semaphore * sem) { @@ -129,3 +133,4 @@ wake_up(&sem->wait); return retval; } +EXPORT_SYMBOL(__down_interruptible); diff -ruN linus-bk/arch/ppc64/kernel/setup.c linus-bk.sfr.1/arch/ppc64/kernel/setup.c --- linus-bk/arch/ppc64/kernel/setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/setup.c 2005-02-10 14:34:06.000000000 +1100 @@ -129,6 +129,7 @@ /* The main machine-dep calls structure */ struct machdep_calls ppc_md; +EXPORT_SYMBOL(ppc_md); #ifdef CONFIG_MAGIC_SYSRQ unsigned long SYSRQ_KEY; diff -ruN linus-bk/arch/ppc64/kernel/signal.c linus-bk.sfr.1/arch/ppc64/kernel/signal.c --- linus-bk/arch/ppc64/kernel/signal.c 2005-01-23 07:08:01.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/signal.c 2005-02-10 12:04:45.000000000 +1100 @@ -27,6 +27,8 @@ #include #include #include +#include + #include #include #include @@ -566,6 +568,4 @@ return 0; } - - - +EXPORT_SYMBOL(do_signal); diff -ruN linus-bk/arch/ppc64/kernel/time.c linus-bk.sfr.1/arch/ppc64/kernel/time.c --- linus-bk/arch/ppc64/kernel/time.c 2005-01-22 06:09:01.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/kernel/time.c 2005-02-10 15:08:14.000000000 +1100 @@ -85,6 +85,7 @@ unsigned long tb_ticks_per_jiffy; unsigned long tb_ticks_per_usec = 100; /* sane default */ +EXPORT_SYMBOL(tb_ticks_per_usec); unsigned long tb_ticks_per_sec; unsigned long next_xtime_sync_tb; unsigned long xtime_sync_interval; diff -ruN linus-bk/arch/ppc64/mm/init.c linus-bk.sfr.1/arch/ppc64/mm/init.c --- linus-bk/arch/ppc64/mm/init.c 2005-01-22 06:09:01.000000000 +1100 +++ linus-bk.sfr.1/arch/ppc64/mm/init.c 2005-02-10 14:30:18.000000000 +1100 @@ -38,6 +38,7 @@ #include #include #include +#include #include #include @@ -441,6 +442,10 @@ #endif +EXPORT_SYMBOL(ioremap); +EXPORT_SYMBOL(__ioremap); +EXPORT_SYMBOL(iounmap); + void free_initmem(void) { unsigned long addr; @@ -758,6 +763,7 @@ if (test_bit(PG_arch_1, &page->flags)) clear_bit(PG_arch_1, &page->flags); } +EXPORT_SYMBOL(flush_dcache_page); void clear_user_page(void *page, unsigned long vaddr, struct page *pg) { @@ -775,6 +781,7 @@ if (test_bit(PG_arch_1, &pg->flags)) clear_bit(PG_arch_1, &pg->flags); } +EXPORT_SYMBOL(clear_user_page); void copy_user_page(void *vto, void *vfrom, unsigned long vaddr, struct page *pg) @@ -812,6 +819,7 @@ maddr = (unsigned long)page_address(page) + (addr & ~PAGE_MASK); flush_icache_range(maddr, maddr + len); } +EXPORT_SYMBOL(flush_icache_user_range); /* * This is called at the end of handling a user page fault, when the diff -ruN linus-bk/include/asm-ppc64/lmb.h linus-bk.sfr.1/include/asm-ppc64/lmb.h --- linus-bk/include/asm-ppc64/lmb.h 2004-09-24 15:23:09.000000000 +1000 +++ linus-bk.sfr.1/include/asm-ppc64/lmb.h 2005-02-10 14:58:09.000000000 +1100 @@ -16,8 +16,6 @@ #include #include -extern unsigned long reloc_offset(void); - #define MAX_LMB_REGIONS 128 #define LMB_ALLOC_ANYWHERE 0 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050210/f2a3f9c0/attachment.pgp From kravetz at us.ibm.com Thu Feb 10 16:22:13 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Wed, 9 Feb 2005 21:22:13 -0800 Subject: linux-2.6.11-rc2-mm2 Message-ID: <20050210052213.GA5056@w-mikek2.ibm.com> I just got a brand new 720 and attempted to boot linux-2.6.11-rc2-mm2 (which is the current base for hotplug memory work) with the default configuration. Only got as far as ... PCI: Probing PCI hardware IOMMU table initialized, virtual merging disabled mapping IO 3fe00200000 -> e000000000000000, size: 100000 mapping IO 3fe00700000 -> e000000000100000, size: 100000 PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub i/pSeries Real Time Clock Driver v1.1 RTAS daemon started Total HugeTLB memory allocated, 0 Installing knfsd (copyright (C) 1996 okir at monad.swb.de). Initializing Cryptographic API Using unsupported 640x480 MTRX,G450 at c0000000, depth=8, pitch=640 cpu 0x0: Vector: 300 (Data Access) at [c000000002542e10] pc: c00000000024d354: .cfb_imageblit+0x474/0x68c lr: c000000000242394: .soft_cursor+0x1e0/0x270 sp: c000000002543090 msr: 8000000000009032 dar: e00000008000c800 dsisr: 42000000 current = 0xc00000000eb547b0 paca = 0xc0000000005f1000 pid = 1, comm = swapper enter ? for help 0:mon> t [c0000000025431a0] c000000000242394 .soft_cursor+0x1e0/0x270 [c000000002543270] c00000000023b2a4 .bit_cursor+0x3b4/0x618 [c0000000025433d0] c00000000023607c .fbcon_cursor+0x214/0x300 [c000000002543490] c000000000276c54 .hide_cursor+0x5c/0x8c [c000000002543520] c000000000277148 .redraw_screen+0x25c/0x260 [c0000000025435c0] c000000000234480 .fbcon_prepare_logo+0x304/0x44c [c0000000025436b0] c0000000002357ec .fbcon_init+0x32c/0x3bc [c000000002543780] c000000000277338 .visual_init+0x1a0/0x228 [c000000002543820] c00000000027b964 .take_over_console+0x2bc/0x634 [c0000000025438f0] c000000000234110 .fbcon_takeover+0xa4/0x110 [c000000002543980] c00000000023a3f4 .fbcon_fb_registered+0xe8/0xf8 [c000000002543a20] c00000000023a598 .fbcon_event_notify+0xb8/0x110 [c000000002543ab0] c000000000063bc0 .notifier_call_chain+0x60/0xa4 [c000000002543b40] c00000000023dba8 .register_framebuffer+0x144/0x1cc [c000000002543c30] c000000000567aec .offb_init_fb+0x2f8/0x584 [c000000002543d20] c0000000005677bc .offb_init_nodriver+0x1fc/0x234 [c000000002543de0] c000000000567594 .offb_init+0xf4/0x120 [c000000002543e70] c00000000054ab38 .do_initcalls+0x68/0x134 [c000000002543f00] c00000000000c270 .init+0x7c/0x1d8 [c000000002543f90] c0000000000135c8 .kernel_thread+0x4c/0x6c 0:mon> I looked through the January and February list archives but couldn't find anything similar. Anyone know what this might be, before I start some investigation? No problem booting the linux-2.6.11-rc2 kernel, but I experience the above situation after applying the -mm2 patch. Thanks, -- Mike From anton at samba.org Thu Feb 10 16:26:35 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 10 Feb 2005 16:26:35 +1100 Subject: linux-2.6.11-rc2-mm2 In-Reply-To: <20050210052213.GA5056@w-mikek2.ibm.com> References: <20050210052213.GA5056@w-mikek2.ibm.com> Message-ID: <20050210052635.GK5567@krispykreme.ozlabs.ibm.com> > I just got a brand new 720 and attempted to boot linux-2.6.11-rc2-mm2 > (which is the current base for hotplug memory work) with the default > configuration. Only got as far as ... It will probably boot with offb off. We tried to hit a bad IO address and the hypervisor caught us. Anton From olof at austin.ibm.com Thu Feb 10 17:39:42 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 10 Feb 2005 00:39:42 -0600 Subject: [PATCH] build without PCI or VIO In-Reply-To: <20050209183437.30302d44.sfr@canb.auug.org.au> References: <20050209183437.30302d44.sfr@canb.auug.org.au> Message-ID: <20050210063942.GA11093@austin.ibm.com> On Wed, Feb 09, 2005 at 06:34:37PM +1100, Stephen Rothwell wrote: > Hi Anton, all, > > This patch (on top of my previous dma fix up patch) allows you to build > pSeries without CONFIG_PCI or CONFIG_VIO or both and iSeries without PCI. > Don't look to closely at the include/asm-ppc64/floppy.h patch :-). > > Built on pSeries without PCI and VIO (and both). Built and booted in > iSeries without PCI. > > Please comment. I'm not sure just what to think about this patch. :-) It's neat to be able to boot pSeries without PCI configured, but how much does it really buy us? Building a pSeries allnoconfig + setting EMBEDDED=y and disabling PCI, v.s. allnoconfig + setting EMBEDDED=y and keeping PCI enabled (but all else is disabled) gives: -rwxr-xr-x 1 olof olof 1963982 2005-02-09 23:56 vmlinux.nopci -rwxr-xr-x 1 olof olof 2099167 2005-02-09 23:58 vmlinux 6.8% difference for a kernel completely without drivers. -rwxr-xr-x 1 olof olof 2127661 2005-02-10 00:35 vmlinux.vio-and-pci -rwxr-xr-x 1 olof olof 2011044 2005-02-10 00:33 vmlinux.vio-no-pci 5.7% on vio (with hvc only) vs vio (with hvc only) + pci Does the amount of added #ifdefs to C code justify the savings? How common do we think it will be to run completely without PCI, and how critical will it be to save those bytes of memory? I guess fully virtual partitions are the most likely user, i.e. with CONFIG_VIO but without CONFIG_PCI. Anyway, if people feel it's worth merging, here are some general comments: * CONFIG_IBMIOMMU is somewhat misleading. We have IOMMU's on other hardware than those with IBM badges, albeit we don't support hotplug on them. I guess the IBM part might come from the fact that we call it CONFIG_IBMVIO. :) --> Is CONFIG_IOMMU too generic to use instead? It's not used anywhere else yet. * I'm worried about the amount of new #ifdefs, for two reasons. First is readability, second is risk of breaking non-PCI config with new changes. --> Maybe defining empty inline stubs but still call them is more appropriate for the symbols that need to be considered? How is this handled on other architectures? * If you want to go the whole way, then there's more to do to save space. For example, the iommu_table pointer in struct device_node and some of the ppc_md pointers. They'd be removable only if both PCI and VIO is disabled which I find highly unlikely in most production configs. :-) -Olof From olh at suse.de Thu Feb 10 19:20:43 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 09:20:43 +0100 Subject: [PATCH] typo in arch/ppc64/kernel/prom_init.c prom_debug Message-ID: <20050210082043.GA30336@suse.de> local variable is base, not vbase. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c ./arch/ppc64/kernel/prom_init.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/kernel/prom_init.c 2005-02-10 09:17:42.279770573 +0100 @@ -845,7 +845,7 @@ static void __init prom_initialize_tce_t prom_debug("TCE table: %s\n", path); prom_debug("\tnode = 0x%x\n", node); - prom_debug("\tbase = 0x%x\n", vbase); + prom_debug("\tbase = 0x%x\n", base); prom_debug("\tsize = 0x%x\n", minsize); /* Initialize the table to have a one-to-one mapping From olh at suse.de Thu Feb 10 19:25:29 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 09:25:29 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <1107994004.7687.154.camel@gaston> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> Message-ID: <20050210082529.GB30336@suse.de> On Thu, Feb 10, Benjamin Herrenschmidt wrote: > On Wed, 2005-02-09 at 23:28 +0100, Olaf Hering wrote: > > On Wed, Feb 09, Olaf Hering wrote: > > > > > > > > Current Linus tree hangs on p620, xmon does not trigger. > > > rc3 was already broken. > > > And 2.6.10 doesnt work either... > > > > It broke between 2.6.9-rc2 and -rc3 > > Can you enable debug stuff in prom_init.c ? Doesnt look very verbose: BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 5793 FINAL File Size = 2965713 bytes. zImage starting: loaded at 0x400000 Allocating 0x94c000 bytes for kernel ... gunzipping (0x2100000 <- 0x407000:0x6c217a)...done 0x7e23d8 bytes 0xe4ac bytes of heap consumed, max in use 0xa2a8 OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 command line: memory layout at init: alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Looking for displays found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done opening PHB /pci at fff7f09000... done opening PHB /pci at fff7f09000/pci at b... done opening PHB /pci at fff7f09000/pci at b,2... done opening PHB /pci at fff7f09000/pci at b,4... done opening PHB /pci at fff7f09000/pci at b,6... done opening PHB /pci at fff7f0a000... done opening PHB /pci at fff7f0a000/pci at b... done opening PHB /pci at fff7f0a000/pci at b,2... done opening PHB /pci at fff7f0a000/pci at b,4... done opening PHB /pci at fff7f0a000/pci at b,6... done opening PHB /pci at fff7f0a000/pci at c... done opening PHB /pci at fff7f0a000/pci at c,2... done opening PHB /pci at fff7f0a000/pci at c,4... done opening PHB /pci at fff7f0a000/pci at c,6... done instantiating rtas at 0x00000000deadbeef... failed 0000000000000000 : boot cpu 0000000000000000 0000000000000001 : starting cpu hw idx 0000000000000002... done 0000000000000002 : starting cpu hw idx 0000000000000004... done 0000000000000003 : starting cpu hw idx 0000000000000006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000002a61000 -> 0x0000000002a621df Device tree struct 0x0000000002a63000 -> 0x0000000002a72000 Calling quiesce ... returning from prom_init From amodra at bigpond.net.au Thu Feb 10 21:52:16 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Thu, 10 Feb 2005 21:22:16 +1030 Subject: Fix pseries hcall functions In-Reply-To: <20050207034418.GD5567@krispykreme.ozlabs.ibm.com> References: <20050207034418.GD5567@krispykreme.ozlabs.ibm.com> Message-ID: <20050210105216.GL22497@bubble.modra.org> On Mon, Feb 07, 2005 at 02:44:18PM +1100, Anton Blanchard wrote: > _GLOBAL(plpar_hcall) > mfcr r0 Do you really need to save cr? ie. Does the hypervisor call trash cr2, cr3 or cr4? > _GLOBAL(plpar_hcall_4out) > mfcr r0 > - std r0,-8(r1) > - ld r14,112(r1) > - stdu r1,-48(r1) > - > - std r8,32(r1) /* Save out ptrs. */ > - std r9,24(r1) > - std r10,16(r1) > - std r14,8(r1) > - > - HVSC /* invoke the hypervisor */ > + std r0,8(r1) stw here. -- Alan Modra IBM OzLabs - Linux Technology Centre From anton at samba.org Thu Feb 10 22:03:48 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 10 Feb 2005 22:03:48 +1100 Subject: Fix pseries hcall functions In-Reply-To: <20050210105216.GL22497@bubble.modra.org> References: <20050207034418.GD5567@krispykreme.ozlabs.ibm.com> <20050210105216.GL22497@bubble.modra.org> Message-ID: <20050210110348.GL5567@krispykreme.ozlabs.ibm.com> Hi Alan, > Do you really need to save cr? ie. Does the hypervisor call trash > cr2, cr3 or cr4? Good question. I went back and read the spec, it only trashes 0, 1, 5-7. > > _GLOBAL(plpar_hcall_4out) > > mfcr r0 > > - std r0,-8(r1) > > - ld r14,112(r1) > > - stdu r1,-48(r1) > > - > > - std r8,32(r1) /* Save out ptrs. */ > > - std r9,24(r1) > > - std r10,16(r1) > > - std r14,8(r1) > > - > > - HVSC /* invoke the hypervisor */ > > + std r0,8(r1) > > stw here. Ouch, good catch. Anton From sfr at canb.auug.org.au Fri Feb 11 00:24:07 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 11 Feb 2005 00:24:07 +1100 Subject: [PATCH] build without PCI or VIO In-Reply-To: <20050210063942.GA11093@austin.ibm.com> References: <20050209183437.30302d44.sfr@canb.auug.org.au> <20050210063942.GA11093@austin.ibm.com> Message-ID: <20050211002407.0f94536f.sfr@canb.auug.org.au> Hi Olof, On Thu, 10 Feb 2005 00:39:42 -0600 olof at austin.ibm.com (Olof Johansson) wrote: > > I'm not sure just what to think about this patch. :-) It's neat to be > able to boot pSeries without PCI configured, but how much does it really > buy us? We do it because we can? :-) > * CONFIG_IBMIOMMU is somewhat misleading. We have IOMMU's on other > hardware than those with IBM badges, albeit we don't support hotplug > on them. I guess the IBM part might come from the fact that we call it > CONFIG_IBMVIO. :) > --> Is CONFIG_IOMMU too generic to use instead? It's not used anywhere > else yet. I agree entirely and was thinking that as I posted the patch. I have used CONFIG_PPC_IOMMU in the new version. > * I'm worried about the amount of new #ifdefs, for two reasons. First is > readability, second is risk of breaking non-PCI config with new > changes. > --> Maybe defining empty inline stubs but still call them is more > appropriate for the symbols that need to be considered? How is > this handled on other architectures? OK, read the new patch, I have left most of the code alone now as you suggested. > * If you want to go the whole way, then there's more to do to save > space. For example, the iommu_table pointer in struct device_node > and some of the ppc_md pointers. They'd be removable only if both PCI > and VIO is disabled which I find highly unlikely in most production > configs. :-) Again I agree, but I will look for more stuff to remove anyway. Below is a new version of the patch. It has been compiled on pSeries and iSeries with PCI disabled and on pSeries with VIO disabled. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linus-bk-dma.4/arch/ppc64/Kconfig linus-bk-dma.6/arch/ppc64/Kconfig --- linus-bk-dma.4/arch/ppc64/Kconfig 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/Kconfig 2005-02-10 18:05:03.000000000 +1100 @@ -126,6 +126,11 @@ config IBMVIO depends on PPC_PSERIES || PPC_ISERIES + bool "Support for virtual I/O" if (EMBEDDED && PPC_PSERIES) + default y + +config PPC_IOMMU + depends on IBMVIO || PCI bool default y @@ -236,7 +241,7 @@ config EEH bool "PCI Extended Error Handling (EEH)" if EMBEDDED - depends on PPC_PSERIES + depends on PPC_PSERIES && PCI default y if !EMBEDDED # @@ -295,7 +300,7 @@ bool config PCI - bool + bool "support for PCI devices" if (EMBEDDED && (PPC_PSERIES || PPC_ISERIES)) default y help Find out whether your system includes a PCI bus. PCI is the name of diff -ruN linus-bk-dma.4/arch/ppc64/kernel/Makefile linus-bk-dma.6/arch/ppc64/kernel/Makefile --- linus-bk-dma.4/arch/ppc64/kernel/Makefile 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/Makefile 2005-02-10 18:04:53.000000000 +1100 @@ -11,27 +11,32 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o + sysfs.o obj-$(CONFIG_PPC_OF) += of_device.o -pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o +pci-obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o +pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o \ + XmPciLpEvent.o pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_direct_iommu.o obj-$(CONFIG_PCI) += pci.o pci_iommu.o iomap.o $(pci-obj-y) -obj-$(CONFIG_PPC_ISERIES) += iSeries_irq.o \ - iSeries_VpdInfo.o XmPciLpEvent.o \ +iommu-obj-$(CONFIG_PPC_PSERIES) += pSeries_iommu.o +iommu-obj-$(CONFIG_PPC_ISERIES) += iSeries_iommu.o + +obj-$(CONFIG_PPC_IOMMU) += iommu.o $(iommu-obj-y) + +obj-$(CONFIG_PPC_ISERIES) += iSeries_irq.o iSeries_VpdInfo.o \ HvCall.o HvLpConfig.o LparData.o \ iSeries_setup.o ItLpQueue.o hvCall.o \ - mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o \ - iSeries_iommu.o + mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o -obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ +obj-$(CONFIG_PPC_PSERIES) += pSeries_lpar.o pSeries_hvCall.o \ pSeries_nvram.o rtasd.o ras.o \ - xics.o rtas.o pSeries_setup.o pSeries_iommu.o + xics.o rtas.o pSeries_setup.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o diff -ruN linus-bk-dma.4/arch/ppc64/kernel/dma.c linus-bk-dma.6/arch/ppc64/kernel/dma.c --- linus-bk-dma.4/arch/ppc64/kernel/dma.c 2005-02-07 17:47:41.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/dma.c 2005-02-08 17:10:00.000000000 +1100 @@ -15,8 +15,10 @@ static struct dma_mapping_ops *get_dma_ops(struct device *dev) { +#ifdef CONFIG_PCI if (dev->bus == &pci_bus_type) return &pci_dma_ops; +#endif #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return &vio_dma_ops; @@ -37,8 +39,10 @@ int dma_set_mask(struct device *dev, u64 dma_mask) { +#ifdef CONFIG_PCI if (dev->bus == &pci_bus_type) return pci_set_dma_mask(to_pci_dev(dev), dma_mask); +#endif #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return -EIO; diff -ruN linus-bk-dma.4/arch/ppc64/kernel/iSeries_iommu.c linus-bk-dma.6/arch/ppc64/kernel/iSeries_iommu.c --- linus-bk-dma.4/arch/ppc64/kernel/iSeries_iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/iSeries_iommu.c 2005-02-10 18:23:53.000000000 +1100 @@ -34,9 +34,6 @@ #include #include -extern struct list_head iSeries_Global_Device_List; - - static void tce_build_iSeries(struct iommu_table *tbl, long index, long npages, unsigned long uaddr, enum dma_data_direction direction) { @@ -84,6 +81,7 @@ } +#ifdef CONFIG_PCI /* * This function compares the known tables to find an iommu_table * that has already been built for hardware TCEs. @@ -159,6 +157,7 @@ else kfree(tbl); } +#endif static void iommu_dev_setup_iSeries(struct pci_dev *dev) { } static void iommu_bus_setup_iSeries(struct pci_bus *bus) { } diff -ruN linus-bk-dma.4/arch/ppc64/kernel/iSeries_setup.c linus-bk-dma.6/arch/ppc64/kernel/iSeries_setup.c --- linus-bk-dma.4/arch/ppc64/kernel/iSeries_setup.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/iSeries_setup.c 2005-02-10 18:55:30.000000000 +1100 @@ -45,7 +45,7 @@ #include #include #include -#include +#include #include #include #include @@ -73,7 +73,11 @@ static void setup_iSeries_cache_sizes(void); static void iSeries_bolt_kernel(unsigned long saddr, unsigned long eaddr); extern void iSeries_setup_arch(void); +#ifdef CONFIG_PCI extern void iSeries_pci_final_fixup(void); +#else +static void iSeries_pci_final_fixup(void) { } +#endif /* Global Variables */ static unsigned long procFreqHz; diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pSeries_iommu.c linus-bk-dma.6/arch/ppc64/kernel/pSeries_iommu.c --- linus-bk-dma.4/arch/ppc64/kernel/pSeries_iommu.c 2005-02-04 04:10:36.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/pSeries_iommu.c 2005-02-10 19:07:48.000000000 +1100 @@ -236,6 +236,7 @@ } } +#ifdef CONFIG_PCI static void iommu_table_setparms(struct pci_controller *phb, struct device_node *dn, struct iommu_table *tbl) @@ -454,6 +455,11 @@ DBG("iommu_dev_setup_pSeries, dev %p (%s) has no iommu table\n", dev, dev->pretty_name); } } +#else +#define iommu_bus_setup_pSeries iommu_bus_setup_null +#define iommu_bus_setup_pSeriesLP iommu_bus_setup_null +#define iommu_dev_setup_pSeries iommu_dev_setup_null +#endif static void iommu_bus_setup_null(struct pci_bus *b) { } static void iommu_dev_setup_null(struct pci_dev *d) { } diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pSeries_setup.c linus-bk-dma.6/arch/ppc64/kernel/pSeries_setup.c --- linus-bk-dma.4/arch/ppc64/kernel/pSeries_setup.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/pSeries_setup.c 2005-02-10 19:04:43.000000000 +1100 @@ -71,7 +71,11 @@ #define DBG(fmt...) #endif +#ifdef CONFIG_PCI extern void pSeries_final_fixup(void); +#else +static void pSeries_final_fixup(void) { } +#endif extern void pSeries_get_boot_time(struct rtc_time *rtc_time); extern void pSeries_get_rtc_time(struct rtc_time *rtc_time); diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pci.c linus-bk-dma.6/arch/ppc64/kernel/pci.c --- linus-bk-dma.4/arch/ppc64/kernel/pci.c 2005-02-07 14:45:23.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/pci.c 2005-02-09 16:19:48.000000000 +1100 @@ -63,7 +63,9 @@ * page is mapped and isa_io_limit prevents access to it. */ unsigned long isa_io_base; /* NULL if no ISA bus */ +EXPORT_SYMBOL(isa_io_base); unsigned long pci_io_base; +EXPORT_SYMBOL(pci_io_base); void iSeries_pcibios_init(void); diff -ruN linus-bk-dma.4/arch/ppc64/kernel/pci.h linus-bk-dma.6/arch/ppc64/kernel/pci.h --- linus-bk-dma.4/arch/ppc64/kernel/pci.h 2005-01-12 16:05:22.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/pci.h 2005-02-10 19:05:03.000000000 +1100 @@ -9,6 +9,7 @@ #ifndef __PPC_KERNEL_PCI_H__ #define __PPC_KERNEL_PCI_H__ +#include #include #include @@ -23,7 +24,11 @@ extern struct list_head hose_list; extern int global_phb_number; +#ifdef CONFIG_PCI extern unsigned long find_and_init_phbs(void); +#else +static inline unsigned long find_and_init_phbs(void) { return 0; } +#endif extern struct pci_dev *ppc64_isabridge_dev; /* may be NULL if no ISA bus */ @@ -42,7 +47,11 @@ void pci_addr_cache_remove_device(struct pci_dev *dev); /* From pSeries_pci.h */ -void init_pci_config_tokens (void); +#ifdef CONFIG_PCI +extern void init_pci_config_tokens (void); +#else +static inline void init_pci_config_tokens (void) { } +#endif unsigned long get_phb_buid (struct device_node *); extern unsigned long pci_probe_only; diff -ruN linus-bk-dma.4/arch/ppc64/kernel/ppc_ksyms.c linus-bk-dma.6/arch/ppc64/kernel/ppc_ksyms.c --- linus-bk-dma.4/arch/ppc64/kernel/ppc_ksyms.c 2005-01-12 16:05:22.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/ppc_ksyms.c 2005-02-09 16:20:07.000000000 +1100 @@ -49,9 +49,6 @@ EXPORT_SYMBOL(do_signal); -EXPORT_SYMBOL(isa_io_base); -EXPORT_SYMBOL(pci_io_base); - EXPORT_SYMBOL(strcpy); EXPORT_SYMBOL(strncpy); EXPORT_SYMBOL(strcat); diff -ruN linus-bk-dma.4/arch/ppc64/kernel/sys_ppc32.c linus-bk-dma.6/arch/ppc64/kernel/sys_ppc32.c --- linus-bk-dma.4/arch/ppc64/kernel/sys_ppc32.c 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/kernel/sys_ppc32.c 2005-02-08 17:26:43.000000000 +1100 @@ -741,6 +741,7 @@ asmlinkage int sys32_pciconfig_iobase(u32 which, u32 in_bus, u32 in_devfn) { +#ifdef CONFIG_PCI struct pci_controller* hose; struct list_head *ln; struct pci_bus *bus = NULL; @@ -786,7 +787,7 @@ case IOBASE_ISA_MEM: return -EINVAL; } - +#endif return -EOPNOTSUPP; } diff -ruN linus-bk-dma.4/arch/ppc64/lib/Makefile linus-bk-dma.6/arch/ppc64/lib/Makefile --- linus-bk-dma.4/arch/ppc64/lib/Makefile 2005-01-04 17:05:28.000000000 +1100 +++ linus-bk-dma.6/arch/ppc64/lib/Makefile 2005-02-08 17:34:53.000000000 +1100 @@ -12,7 +12,7 @@ # e2a provides EBCDIC to ASCII conversions. ifdef CONFIG_PPC_ISERIES -obj-$(CONFIG_PCI) += e2a.o +obj-y += e2a.o endif lib-$(CONFIG_DEBUG_KERNEL) += sstep.o diff -ruN linus-bk-dma.4/drivers/char/Kconfig linus-bk-dma.6/drivers/char/Kconfig --- linus-bk-dma.4/drivers/char/Kconfig 2005-02-04 04:10:36.000000000 +1100 +++ linus-bk-dma.6/drivers/char/Kconfig 2005-02-09 16:33:37.000000000 +1100 @@ -557,7 +557,7 @@ config HVC_CONSOLE bool "pSeries Hypervisor Virtual Console support" - depends on PPC_PSERIES + depends on PPC_PSERIES && IBMVIO help pSeries machines when partitioned support a hypervisor virtual console. This driver allows each pSeries partition to have a console @@ -565,7 +565,7 @@ config HVCS tristate "IBM Hypervisor Virtual Console Server support" - depends on PPC_PSERIES + depends on PPC_PSERIES && IBMVIO help Partitionable IBM Power5 ppc64 machines allow hosting of firmware virtual consoles from one Linux partition by diff -ruN linus-bk-dma.4/drivers/net/Kconfig linus-bk-dma.6/drivers/net/Kconfig --- linus-bk-dma.4/drivers/net/Kconfig 2005-01-20 07:06:57.000000000 +1100 +++ linus-bk-dma.6/drivers/net/Kconfig 2005-02-09 18:26:34.000000000 +1100 @@ -1171,7 +1171,7 @@ config IBMVETH tristate "IBM LAN Virtual Ethernet support" - depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES + depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES && IBMVIO ---help--- This driver supports virtual ethernet adapters on newer IBM iSeries and pSeries systems. diff -ruN linus-bk-dma.4/drivers/pci/hotplug/Makefile linus-bk-dma.6/drivers/pci/hotplug/Makefile --- linus-bk-dma.4/drivers/pci/hotplug/Makefile 2004-11-20 12:05:26.000000000 +1100 +++ linus-bk-dma.6/drivers/pci/hotplug/Makefile 2005-02-09 16:46:14.000000000 +1100 @@ -42,8 +42,10 @@ rpaphp-objs := rpaphp_core.o \ rpaphp_pci.o \ - rpaphp_slot.o \ - rpaphp_vio.o + rpaphp_slot.o +ifdef CONFIG_IBMVIO +rpaphp-objs += rpaphp_vio.o +endif rpadlpar_io-objs := rpadlpar_core.o \ rpadlpar_sysfs.o diff -ruN linus-bk-dma.4/drivers/pci/hotplug/rpaphp.h linus-bk-dma.6/drivers/pci/hotplug/rpaphp.h --- linus-bk-dma.4/drivers/pci/hotplug/rpaphp.h 2005-02-04 06:05:19.000000000 +1100 +++ linus-bk-dma.6/drivers/pci/hotplug/rpaphp.h 2005-02-10 19:29:02.000000000 +1100 @@ -27,7 +27,9 @@ #ifndef _PPC64PHP_H #define _PPC64PHP_H +#include #include +#include #include "pci_hotplug.h" #define PHB 2 @@ -127,10 +129,16 @@ char **drc_name, char **drc_type, int *drc_power_domain); /* rpaphp_vio.c */ +#ifdef CONFIG_IBMVIO extern int rpaphp_get_vio_adapter_status(struct slot *slot, int is_init, u8 * value); -extern int rpaphp_unconfig_vio_adapter(struct slot *slot); extern int register_vio_slot(struct device_node *dn); extern int rpaphp_enable_vio_slot(struct slot *slot); +#else +static inline int rpaphp_get_vio_adapter_status(struct slot *slot, int is_init, u8 * value) { return -EINVAL; } +static inline int rpaphp_unconfig_vio_adapter(struct slot *slot) { return -ENODEV; } +static inline int register_vio_slot(struct device_node *dn) { return 1; } +static inline int rpaphp_enable_vio_slot(struct slot *slot) { return -EINVAL; } +#endif /* rpaphp_slot.c */ extern void dealloc_slot_struct(struct slot *slot); diff -ruN linus-bk-dma.4/drivers/scsi/Kconfig linus-bk-dma.6/drivers/scsi/Kconfig --- linus-bk-dma.4/drivers/scsi/Kconfig 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/drivers/scsi/Kconfig 2005-02-09 16:31:46.000000000 +1100 @@ -798,7 +798,7 @@ config SCSI_IBMVSCSI tristate "IBM Virtual SCSI support" - depends on PPC_PSERIES || PPC_ISERIES + depends on (PPC_PSERIES || PPC_ISERIES) && IBMVIO help This is the IBM POWER Virtual SCSI Client diff -ruN linus-bk-dma.4/drivers/serial/Kconfig linus-bk-dma.6/drivers/serial/Kconfig --- linus-bk-dma.4/drivers/serial/Kconfig 2005-02-04 04:10:37.000000000 +1100 +++ linus-bk-dma.6/drivers/serial/Kconfig 2005-02-08 17:57:30.000000000 +1100 @@ -753,7 +753,7 @@ config SERIAL_ICOM tristate "IBM Multiport Serial Adapter" - depends on PPC_ISERIES || PPC_PSERIES + depends on PCI && (PPC_ISERIES || PPC_PSERIES) select SERIAL_CORE help This driver is for a family of multiport serial adapters diff -ruN linus-bk-dma.4/include/asm-ppc64/floppy.h linus-bk-dma.6/include/asm-ppc64/floppy.h --- linus-bk-dma.4/include/asm-ppc64/floppy.h 2004-10-25 18:18:34.000000000 +1000 +++ linus-bk-dma.6/include/asm-ppc64/floppy.h 2005-02-09 13:56:34.000000000 +1100 @@ -31,8 +31,6 @@ "floppy", NULL) #define fd_free_irq() free_irq(FLOPPY_IRQ, NULL); -#ifdef CONFIG_PCI - #include #define fd_dma_setup(addr,size,mode,io) ppc64_fd_dma_setup(addr,size,mode,io) @@ -40,6 +38,7 @@ static __inline__ int ppc64_fd_dma_setup(char *addr, unsigned long size, int mode, int io) { +#ifdef CONFIG_PCI static unsigned long prev_size; static dma_addr_t bus_addr = 0; static char *prev_addr; @@ -71,11 +70,11 @@ fd_set_dma_count(size); virtual_dma_port = io; fd_enable_dma(); +#endif /* CONFIG_PCI */ return 0; } -#endif /* CONFIG_PCI */ __inline__ void virtual_dma_init(void) { diff -ruN linus-bk-dma.4/include/asm-ppc64/iSeries/XmPciLpEvent.h linus-bk-dma.6/include/asm-ppc64/iSeries/XmPciLpEvent.h --- linus-bk-dma.4/include/asm-ppc64/iSeries/XmPciLpEvent.h 2002-02-14 23:14:36.000000000 +1100 +++ linus-bk-dma.6/include/asm-ppc64/iSeries/XmPciLpEvent.h 2005-02-10 18:46:28.000000000 +1100 @@ -1,18 +1,13 @@ - #ifndef __XMPCILPEVENT_H__ #define __XMPCILPEVENT_H__ +#include -#ifdef __cplusplus -extern "C" { +#ifdef CONFIG_PCI +extern int XmPciLpEvent_init(void); +#else +static inline int XmPciLpEvent_init(void) { return 0; } #endif - -int XmPciLpEvent_init(void); void ppc_irq_dispatch_handler(struct pt_regs *regs, int irq); - -#ifdef __cplusplus -} -#endif - #endif /* __XMPCILPEVENT_H__ */ diff -ruN linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_io.h linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_io.h --- linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_io.h 2004-09-14 21:06:08.000000000 +1000 +++ linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_io.h 2005-02-09 17:57:36.000000000 +1100 @@ -31,6 +31,7 @@ /* Created December 28, 2000 */ /* End Change Activity */ /************************************************************************/ +#ifdef CONFIG_PCI extern u8 iSeries_Read_Byte(const volatile void __iomem * IoAddress); extern u16 iSeries_Read_Word(const volatile void __iomem * IoAddress); extern u32 iSeries_Read_Long(const volatile void __iomem * IoAddress); @@ -41,6 +42,15 @@ extern void iSeries_memset_io(volatile void __iomem *dest, char x, size_t n); extern void iSeries_memcpy_toio(volatile void __iomem *dest, void *source, size_t n); extern void iSeries_memcpy_fromio(void *dest, const volatile void __iomem *source, size_t n); +#else /* CONFIG_PCI */ +static inline u8 iSeries_Read_Byte(const volatile void __iomem * IoAddress) +{ + return 0xff; +} +static inline void iSeries_Write_Byte(u8 IoData, volatile void __iomem * IoAddress) +{ +} +#endif /* CONFIG_PCI */ #endif /* CONFIG_PPC_ISERIES */ #endif /* _ISERIES_IO_H */ diff -ruN linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_irq.h linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_irq.h --- linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_irq.h 2004-01-20 08:20:26.000000000 +1100 +++ linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_irq.h 2005-02-10 18:37:57.000000000 +1100 @@ -1,19 +1,9 @@ #ifndef __ISERIES_IRQ_H__ #define __ISERIES_IRQ_H__ -#ifdef __cplusplus -extern "C" { -#endif - void iSeries_init_IRQ(void); int iSeries_allocate_IRQ(HvBusNumber, HvSubBusNumber, HvAgentId); int iSeries_assign_IRQ(int, HvBusNumber, HvSubBusNumber, HvAgentId); void iSeries_activate_IRQs(void); -int XmPciLpEvent_init(void); - -#ifdef __cplusplus -} -#endif - #endif /* __ISERIES_IRQ_H__ */ diff -ruN linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_pci.h linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_pci.h --- linus-bk-dma.4/include/asm-ppc64/iSeries/iSeries_pci.h 2005-01-22 06:09:02.000000000 +1100 +++ linus-bk-dma.6/include/asm-ppc64/iSeries/iSeries_pci.h 2005-02-10 18:54:48.000000000 +1100 @@ -101,6 +101,8 @@ char Location[20]; /* Frame 1, Card C10 */ }; +extern struct list_head iSeries_Global_Device_List; + /************************************************************************/ /* Functions */ /************************************************************************/ diff -ruN linus-bk-dma.4/include/asm-ppc64/io.h linus-bk-dma.6/include/asm-ppc64/io.h --- linus-bk-dma.4/include/asm-ppc64/io.h 2005-01-29 06:05:47.000000000 +1100 +++ linus-bk-dma.6/include/asm-ppc64/io.h 2005-02-09 15:53:33.000000000 +1100 @@ -1,4 +1,4 @@ - #ifndef _PPC64_IO_H +#ifndef _PPC64_IO_H #define _PPC64_IO_H /* @@ -31,6 +31,7 @@ #define SLOW_DOWN_IO +#ifdef CONFIG_PCI extern unsigned long isa_io_base; extern unsigned long pci_io_base; extern unsigned long io_page_mask; @@ -39,6 +40,10 @@ #define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ & io_page_mask) +#else +#define pci_io_base 0 +#define _IO_IS_VALID(port) 1 +#endif #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ diff -ruN linus-bk-dma.4/include/asm-ppc64/iommu.h linus-bk-dma.6/include/asm-ppc64/iommu.h --- linus-bk-dma.4/include/asm-ppc64/iommu.h 2005-02-07 15:02:01.000000000 +1100 +++ linus-bk-dma.6/include/asm-ppc64/iommu.h 2005-02-10 19:12:15.000000000 +1100 @@ -109,7 +109,11 @@ extern void iommu_setup_u3(void); /* Frees table for an individual device node */ +#ifdef CONFIG_PPC_IOMMU extern void iommu_free_table(struct device_node *dn); +#else +static inline void iommu_free_table(struct device_node *dn) { } +#endif #endif /* CONFIG_PPC_MULTIPLATFORM */ @@ -154,12 +158,22 @@ extern void iommu_unmap_single(struct iommu_table *tbl, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction); +#ifdef CONFIG_PPC_IOMMU extern void iommu_init_early_pSeries(void); extern void iommu_init_early_iSeries(void); extern void iommu_init_early_u3(void); +#else +static inline void iommu_init_early_pSeries(void) {} +static inline void iommu_init_early_iSeries(void) {} +#endif +#ifdef CONFIG_PCI extern void pci_iommu_init(void); extern void pci_direct_iommu_init(void); +#else +static inline void pci_iommu_init(void) {} +static inline void pci_direct_iommu_init(void) {} +#endif extern void alloc_u3_dart_table(void); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050211/efaa9cfb/attachment.pgp From linas at austin.ibm.com Fri Feb 11 04:52:53 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 10 Feb 2005 11:52:53 -0600 Subject: [PATCH] build without PCI or VIO In-Reply-To: <20050210063942.GA11093@austin.ibm.com> References: <20050209183437.30302d44.sfr@canb.auug.org.au> <20050210063942.GA11093@austin.ibm.com> Message-ID: <20050210175253.GB23424@austin.ibm.com> On Thu, Feb 10, 2005 at 12:39:42AM -0600, Olof Johansson was heard to remark: > On Wed, Feb 09, 2005 at 06:34:37PM +1100, Stephen Rothwell wrote: > > Hi Anton, all, > > > > This patch (on top of my previous dma fix up patch) allows you to build > > pSeries without CONFIG_PCI or CONFIG_VIO or both and iSeries without PCI. > > Don't look to closely at the include/asm-ppc64/floppy.h patch :-). > > > > Please comment. > > 6.8% difference for a kernel completely without drivers. ... A penny here and a penny there and pretty soon it adds up to real money. If someone else reduces things by 5% in a couple of other places, then the original 6.8% savings becomes even larger. > Does the amount of added #ifdefs to C code justify the savings? How The patch didn't strike me as excessively intrusive. Although I don't know of the particular application for this, past experience says that little things like this make the difference between a "joke" and a "serious contender". I once lost a linux-embedded contract because I was unable to make the kernel small enough, even after turning *everything* off. (Back then I couldn't turn tcp/ip off, even though the embedded app had absolutely no need for tcp/ip). --linas From olh at suse.de Fri Feb 11 06:43:58 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 20:43:58 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050210193706.GC23424@austin.ibm.com> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> Message-ID: <20050210194358.GB22111@suse.de> On Thu, Feb 10, Linas Vepstas wrote: > which would print more than what you've posted ... Yeah, something is going on. But it has to wait for the weekend. > Maybe I missed this in the original post ... where are you seeing deadbeef? The instantinating rtas hexvalue is 0xdeadbeef From linas at austin.ibm.com Fri Feb 11 06:44:11 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 10 Feb 2005 13:44:11 -0600 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050210082529.GB30336@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> Message-ID: <20050210194411.GD23424@austin.ibm.com> On Thu, Feb 10, 2005 at 09:25:29AM +0100, Olaf Hering was heard to remark: > On Thu, Feb 10, Benjamin Herrenschmidt wrote: > > On Wed, 2005-02-09 at 23:28 +0100, Olaf Hering wrote: > > > On Wed, Feb 09, Olaf Hering wrote: > > > > > > > Current Linus tree hangs on p620, xmon does not trigger. > > > > rc3 was already broken. > > > > And 2.6.10 doesnt work either... > > Can you enable debug stuff in prom_init.c ? ... > instantiating rtas at 0x00000000deadbeef... failed Dohhhhh I should read more carefully... Can you also add prom_printf("size=0x%x \n", size); at line 715 of prom_init.c ? --linas From linas at austin.ibm.com Fri Feb 11 06:37:06 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 10 Feb 2005 13:37:06 -0600 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050210082529.GB30336@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> Message-ID: <20050210193706.GC23424@austin.ibm.com> On Thu, Feb 10, 2005 at 09:25:29AM +0100, Olaf Hering was heard to remark: > On Thu, Feb 10, Benjamin Herrenschmidt wrote: > > On Wed, 2005-02-09 at 23:28 +0100, Olaf Hering wrote: > > > On Wed, Feb 09, Olaf Hering wrote: > > > > > > > > > > > Current Linus tree hangs on p620, xmon does not trigger. > > > > rc3 was already broken. > > > > And 2.6.10 doesnt work either... > > > > > > It broke between 2.6.9-rc2 and -rc3 > > > > Can you enable debug stuff in prom_init.c ? > > Doesnt look very verbose: ... > Calling quiesce ... > returning from prom_init I don't think you enabled the debugging ... my source code looks like: #define prom_debug(x...) prom_printf(x) prom_printf("returning from prom_init\n"); prom_debug("->dt_header_start=0x%x\n", RELOC(dt_header_start)); prom_debug("->phys=0x%x\n", phys); which would print more than what you've posted ... Maybe I missed this in the original post ... where are you seeing deadbeef? --linas From olh at suse.de Fri Feb 11 07:15:38 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 21:15:38 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <20050209110038.GA13600@suse.de> References: <20050207151222.GA7219@suse.de> <16905.58373.707121.332099@cargo.ozlabs.ibm.com> <20050209110038.GA13600@suse.de> Message-ID: <20050210201538.GA23092@suse.de> On Wed, Feb 09, Olaf Hering wrote: > diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile ./arch/ppc64/Makefile > --- ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile 2005-02-03 02:55:14.000000000 +0100 > +++ ./arch/ppc64/Makefile 2005-02-09 11:53:12.724975475 +0100 > @@ -65,8 +65,8 @@ boottarget-$(CONFIG_PPC_ISERIES) := vmli > $(boottarget-y): vmlinux > $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ > > -bootimage-$(CONFIG_PPC_PSERIES) := zImage > -bootimage-$(CONFIG_PPC_MAPLE) := zImage > +bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage > +bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage > bootimage-$(CONFIG_PPC_ISERIES) := vmlinux > BOOTIMAGE := $(bootimage-y) > install: vmlinux That one is fine for make install, but it loses the zImage target as well when I just run make. Somehow I did not noticed it... From olh at suse.de Fri Feb 11 07:18:30 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 21:18:30 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050210193706.GC23424@austin.ibm.com> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> Message-ID: <20050210201830.GA23150@suse.de> On Thu, Feb 10, Linas Vepstas wrote: > I don't think you enabled the debugging ... my source code looks like: > > #define prom_debug(x...) prom_printf(x) Perhaps a broken make dependency: whatever 0xdeadbeef is, perhaps a hint to call prom_exit ;) BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 5801 FINAL File Size = 2969809 bytes. zImage starting: loaded at 0x400000 Allocating 0x94c000 bytes for kernel ... gunzipping (0x2100000 <- 0x407000:0x6c3192)...done 0x7e23b8 bytes 0xe60c bytes of heap consumed, max in use 0xa318 OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 klimit=0xc00000000084c000 offset=0xbffffffffdef0000 command line: root_addr_cells: 0000000000000002 root_size_cells: 0000000000000002 scanning memory: node /memory at 0 : 0000000000000000 0000000100000000 memory layout at init: alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Booting CPU hw index = 0x0000000000000000 Looking for displays found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done starting prom_initialize_tce_table alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000ff800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000 node = 0x0000000000cc7380 base = 0x00000000ff800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b node = 0x0000000000cd8560 base = 0x00000000ff400000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,2 node = 0x0000000000cdc5f8 base = 0x00000000ff000000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fec00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fec00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,4 node = 0x0000000000ce0a88 base = 0x00000000fec00000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fe800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fe800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,6 node = 0x0000000000ce4f18 base = 0x00000000fe800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,6... done alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000fe000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fe000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000 node = 0x0000000000ce97e0 base = 0x00000000fe000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fdc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fdc00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b node = 0x0000000000cec720 base = 0x00000000fdc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,2 node = 0x0000000000cf0b38 base = 0x00000000fd800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,4 node = 0x0000000000cf4fc8 base = 0x00000000fd400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,6 node = 0x0000000000cf9458 base = 0x00000000fd000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,6... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fcc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fcc00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c node = 0x0000000000cfd8e8 base = 0x00000000fcc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,2 node = 0x0000000000d01d88 base = 0x00000000fc800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,4 node = 0x0000000000d06228 base = 0x00000000fc400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,6 node = 0x0000000000d0a6c8 base = 0x00000000fc000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,6... done ending prom_initialize_tce_table prom_instantiate_rtas: start... prom_rtas: 0000000000cb5050 size=0x00000000000a7000 size=0x00000000000a7000 size=0x00000000000a7000 alloc_down(00000000000a7000, 0000000000001000, (low)) trying: 0x000000003ff59000 -> 00000000deadbeef alloc_bottom : 0000000002960000 alloc_top : 00000000deadbeef alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 instantiating rtas at 0x00000000deadbeef... failed prom_hold_cpus: start... 1) spinloop = 0x0000000000000008 1) *spinloop = 0x0000000000000000 1) acknowledge = 0x0000000000000010 1) *acknowledge = 0x0000000000000000 1) secondary_hold = 0x0000000000000060 cpuid = 0x0000000000000000 cpu hw idx = 0x0000000000000000 0000000000000000 : boot cpu 0000000000000000 cpuid = 0x0000000000000001 cpu hw idx = 0x0000000000000002 0000000000000001 : starting cpu hw idx 0000000000000002... done cpuid = 0x0000000000000002 cpu hw idx = 0x0000000000000004 0000000000000002 : starting cpu hw idx 0000000000000004... done cpuid = 0x0000000000000003 cpu hw idx = 0x0000000000000006 0000000000000003 : starting cpu hw idx 0000000000000006... done prom_hold_cpus: end... copying OF device tree ... starting device tree allocs at 0000000002960000 alloc_up(0000000000100000, 0000000000001000) trying: 0x0000000002960000 trying: 0x0000000002a60000 -> 0000000002a60000 alloc_bottom : 0000000002a60000 alloc_top : 00000000deadbeef alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Building dt strings... Building dt structure... reserved memory map: 00000000fc000000 - 0000000004000000 0000000002a60000 - 0000000000012000 Device tree strings 0x0000000002a61000 -> 0x0000000002a621df Device tree struct 0x0000000002a63000 -> 0x0000000002a72000 Calling quiesce ... returning from prom_init ->dt_header_start=0x0000000002a60000 ->phys=0x0000000002110000 From service at paypal.com Fri Feb 11 06:35:27 2005 From: service at paypal.com (PayPal) Date: Thu, 10 Feb 05 19:35:27 GMT Subject: PayPal Account Security Measures Message-ID: <65q-95-392$$q539-n206ki73@80q11.y8.06qx> An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050210/1b0ba1b2/attachment.htm From olh at suse.de Fri Feb 11 08:07:54 2005 From: olh at suse.de (Olaf Hering) Date: Thu, 10 Feb 2005 22:07:54 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <16905.58373.707121.332099@cargo.ozlabs.ibm.com> References: <20050207151222.GA7219@suse.de> <16905.58373.707121.332099@cargo.ozlabs.ibm.com> Message-ID: <20050210210754.GA23610@suse.de> On Wed, Feb 09, Paul Mackerras wrote: > Olaf Hering writes: > > > make install passes the zImage to the installkernel script. > > When an initrd is used, this script has to pull out the vmlinux from the > > zImage because yaboot can not boot a zImage+initrd combo. > > It can only handle vmlinux+initrd or zImage.initrd. > > Its simple to just pass the plain vmlinux instead. > > As a side-effect you seem to have changed the default target on > pSeries from zImage to vmlinux, which I don't like - I find it useful > and convenient that just plain "make" makes the zImage, which I can > then netboot. use-vmlinux-during-make-install-on-ppc64.patch is in 2.6.11-rc3-mm2, and it contains another broken version of 'simple make doesnt built zImage'. Here is another try to fix a simple 'make' and also get 'make install' right. pseries needs zImage, iseries and pmac just the plain vmlinux passed to the install.sh script. http://ozlabs.org/pipermail/linuxppc64-dev/2005-February/003215.html Sam, can that be done in a simpler way? I guess not. diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile ./arch/ppc64/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/Makefile 2005-02-03 02:55:14.000000000 +0100 +++ ./arch/ppc64/Makefile 2005-02-10 21:53:13.653196323 +0100 @@ -65,14 +65,20 @@ boottarget-$(CONFIG_PPC_ISERIES) := vmli $(boottarget-y): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ -bootimage-$(CONFIG_PPC_PSERIES) := zImage -bootimage-$(CONFIG_PPC_MAPLE) := zImage +bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage +bootimage-$(CONFIG_PPC_PMAC) := vmlinux +bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage bootimage-$(CONFIG_PPC_ISERIES) := vmlinux BOOTIMAGE := $(bootimage-y) install: vmlinux $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@ -all: $(BOOTIMAGE) +defaultimage-$(CONFIG_PPC_PSERIES) := zImage +defaultimage-$(CONFIG_PPC_PMAC) := vmlinux +defaultimage-$(CONFIG_PPC_MAPLE) := zImage +defaultimage-$(CONFIG_PPC_ISERIES) := vmlinux +DEFAULTIMAGE := $(defaultimage-y) +all: $(DEFAULTIMAGE) archclean: $(Q)$(MAKE) $(clean)=$(boot) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile ./arch/ppc64/boot/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/Makefile 2005-02-03 02:56:36.000000000 +0100 +++ ./arch/ppc64/boot/Makefile 2005-02-10 20:55:57.097429193 +0100 @@ -117,7 +117,7 @@ $(obj)/imagesize.c: vmlinux.strip awk '{printf "unsigned long vmlinux_memsize = 0x%s;\n", substr($$1,8)}' \ >> $(obj)/imagesize.c -install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) - sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" +install: $(CONFIGURE) $(BOOTIMAGE) + sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux System.map "$(INSTALL_PATH)" "$(BOOTIMAGE)" clean-files := $(addprefix $(objtree)/, $(obj-boot) vmlinux.strip) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/install.sh ./arch/ppc64/boot/install.sh --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/install.sh 2005-02-03 02:57:16.000000000 +0100 +++ ./arch/ppc64/boot/install.sh 2005-02-10 20:55:57.098429038 +0100 @@ -17,6 +17,7 @@ # $2 - kernel image file # $3 - kernel map file # $4 - default install path (blank if root directory) +# $5 - kernel boot file, the zImage # # User may have a custom install script @@ -27,7 +28,7 @@ if [ -x /sbin/installkernel ]; then exec # Default install # this should work for both the pSeries zImage and the iSeries vmlinux.sm -image_name=`basename $2` +image_name=`basename $5` if [ -f $4/$image_name ]; then mv $4/$image_name $4/$image_name.old From sam at ravnborg.org Fri Feb 11 08:45:05 2005 From: sam at ravnborg.org (Sam Ravnborg) Date: Thu, 10 Feb 2005 22:45:05 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <20050210210754.GA23610@suse.de> References: <20050207151222.GA7219@suse.de> <16905.58373.707121.332099@cargo.ozlabs.ibm.com> <20050210210754.GA23610@suse.de> Message-ID: <20050210214505.GA16566@mars.ravnborg.org> On Thu, Feb 10, 2005 at 10:07:54PM +0100, Olaf Hering wrote: > On Wed, Feb 09, Paul Mackerras wrote: > > > Olaf Hering writes: > > > > > make install passes the zImage to the installkernel script. > > > When an initrd is used, this script has to pull out the vmlinux from the > > > zImage because yaboot can not boot a zImage+initrd combo. > > > It can only handle vmlinux+initrd or zImage.initrd. > > > Its simple to just pass the plain vmlinux instead. > > > > As a side-effect you seem to have changed the default target on > > pSeries from zImage to vmlinux, which I don't like - I find it useful > > and convenient that just plain "make" makes the zImage, which I can > > then netboot. > > use-vmlinux-during-make-install-on-ppc64.patch is in 2.6.11-rc3-mm2, and > it contains another broken version of 'simple make doesnt built zImage'. > > Here is another try to fix a simple 'make' and also get 'make install' > right. pseries needs zImage, iseries and pmac just the plain vmlinux > passed to the install.sh script. > > http://ozlabs.org/pipermail/linuxppc64-dev/2005-February/003215.html > > > Sam, can that be done in a simpler way? I guess not. Please take a look at the funcionality offered by KBUILD_IMAGE KBUILD_IMAGE is supposed to be used for this purpose. Otherwise your patch looks good. Sam From brking at us.ibm.com Fri Feb 11 10:49:37 2005 From: brking at us.ibm.com (Brian King) Date: Thu, 10 Feb 2005 17:49:37 -0600 Subject: [PATCH] ppc64: Mode 2 PCI-X config space size fix In-Reply-To: <16906.53505.649325.792660@cargo.ozlabs.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> <41FFE3AF.706@us.ibm.com> <420A6343.6070307@us.ibm.com> <16906.53505.649325.792660@cargo.ozlabs.ibm.com> Message-ID: <420BF311.6020800@us.ibm.com> Paul Mackerras wrote: > Brian King writes: > > >>Trimming the cc list a bit since this has become a PPC64 only patch and >>resending... > > > Unless you think this really needs to go in 2.6.11, I'll defer it > until after 2.6.11 is out, since we're supposed to be in > bug-fix/stabilization mode for 2.6.11. Are you OK with that? That is fine. > Oh, and a minor nit: > > >>+ if (type && *type == 1) >>+ dn->pci_ext_config_space = 1; >>+ else >>+ dn->pci_ext_config_space = 0; > > > is more compactly expressed as: > > dn->pci_ext_config_space = (type && *type == 1); I'll fix this and send out an updated patch. Thanks -- Brian King eServer Storage I/O IBM Linux Technology Center From linas at austin.ibm.com Fri Feb 11 11:01:34 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 10 Feb 2005 18:01:34 -0600 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050210201830.GA23150@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> <20050210201830.GA23150@suse.de> Message-ID: <20050211000134.GE23424@austin.ibm.com> On Thu, Feb 10, 2005 at 09:18:30PM +0100, Olaf Hering was heard to remark: > On Thu, Feb 10, Linas Vepstas wrote: > whatever 0xdeadbeef is, perhaps a hint to call prom_exit ;) Once, a long time ago, it was what a register would hold after the CPU was powered on the very first time ... Now, it seems to be an error return value from prom_claim() ... seems to be getting returned by firmware ... they probably should have returned a -1, those jokers ... Anyway, the firmware seems to be telling us that it cannot honour the very first request to claim memory right below RMO top. I might be totally insane but I notice that rmo_top is set to 1GB, and I thought 256MB was the top ... so try this, for laughs ... in the routine static void __init prom_init_mem(void) around line 675 RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top change the 4 to a 1 ... That is my wild guess. I notice that someone re-wrote all of that prom code in the last half-year, I don't know who ... probably Ben ... they would be the expert for what's going on in here, not me. I bow out here. ---linas > BOOTP S = 1 > FILE: orange > Load Addr=0x4000 Max Size=0xbfc000 > FINAL Packet Count = 5801 > FINAL File Size = 2969809 bytes. > zImage starting: loaded at 0x400000 > Allocating 0x94c000 bytes for kernel ... > gunzipping (0x2100000 <- 0x407000:0x6c3192)...done 0x7e23b8 bytes > 0xe60c bytes of heap consumed, max in use 0xa318 > OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 > klimit=0xc00000000084c000 > offset=0xbffffffffdef0000 > command line: > root_addr_cells: 0000000000000002 > root_size_cells: 0000000000000002 > scanning memory: > node /memory at 0 : > 0000000000000000 0000000100000000 > memory layout at init: > alloc_bottom : 0000000002960000 > alloc_top : 0000000040000000 > alloc_top_hi : 0000000100000000 > rmo_top : 0000000040000000 > ram_top : 0000000100000000 > Booting CPU hw index = 0x0000000000000000 > Looking for displays > found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done ............. > prom_instantiate_rtas: start... > prom_rtas: 0000000000cb5050 > size=0x00000000000a7000 > size=0x00000000000a7000 > size=0x00000000000a7000 > alloc_down(00000000000a7000, 0000000000001000, (low)) > trying: 0x000000003ff59000 > -> 00000000deadbeef I'm guessing that prom_claim did not like the large value of 1GB ... > alloc_bottom : 0000000002960000 > alloc_top : 00000000deadbeef > alloc_top_hi : 00000000fc000000 > rmo_top : 0000000040000000 > ram_top : 0000000100000000 > instantiating rtas at 0x00000000deadbeef... failed From hollis at penguinppc.org Fri Feb 11 14:20:33 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Thu, 10 Feb 2005 21:20:33 -0600 Subject: [PATCH] build without PCI or VIO In-Reply-To: <20050210063942.GA11093@austin.ibm.com> References: <20050209183437.30302d44.sfr@canb.auug.org.au> <20050210063942.GA11093@austin.ibm.com> Message-ID: On Feb 10, 2005, at 12:39 AM, Olof Johansson wrote: > On Wed, Feb 09, 2005 at 06:34:37PM +1100, Stephen Rothwell wrote: >> This patch (on top of my previous dma fix up patch) allows you to >> build >> pSeries without CONFIG_PCI or CONFIG_VIO or both and iSeries without >> PCI. >> Don't look to closely at the include/asm-ppc64/floppy.h patch :-). > > I'm not sure just what to think about this patch. :-) It's neat to be > able to boot pSeries without PCI configured, but how much does it > really > buy us? I would add that booting without IO can be extremely useful when playing in many experimental environments, such as early development hardware or a simulator. I think it's worthwhile. -Hollis From olof at austin.ibm.com Fri Feb 11 15:14:05 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 10 Feb 2005 22:14:05 -0600 Subject: [PATCH] build without PCI or VIO In-Reply-To: <20050211002407.0f94536f.sfr@canb.auug.org.au> References: <20050209183437.30302d44.sfr@canb.auug.org.au> <20050210063942.GA11093@austin.ibm.com> <20050211002407.0f94536f.sfr@canb.auug.org.au> Message-ID: <20050211041405.GA15110@austin.ibm.com> On Fri, Feb 11, 2005 at 12:24:07AM +1100, Stephen Rothwell wrote: > I agree entirely and was thinking that as I posted the patch. I have used > CONFIG_PPC_IOMMU in the new version. Great. :) > > * I'm worried about the amount of new #ifdefs, for two reasons. First is > > readability, second is risk of breaking non-PCI config with new > > changes. > > --> Maybe defining empty inline stubs but still call them is more > > appropriate for the symbols that need to be considered? How is > > this handled on other architectures? > > OK, read the new patch, I have left most of the code alone now as you > suggested. I like this patch alot more. dma.c is still filled with ifdefs, but it already has them for CONFIG_IBMVIO and it doesn't really contain any "real" code anyway, just wrappers. The benefit is the same, but the cost (in lost readability and maintainability) is considerably less. I'm happy with it. One minor CodingStyle nitpick: @@ -42,7 +47,11 @@ void pci_addr_cache_remove_device(struct pci_dev *dev); /* From pSeries_pci.h */ -void init_pci_config_tokens (void); +#ifdef CONFIG_PCI +extern void init_pci_config_tokens (void); +#else +static inline void init_pci_config_tokens (void) { } +#endif unsigned long get_phb_buid (struct device_node *); Whitespace after function names above, might as well fix it on the lines you change... Besides that, for what it's worth: Acked-by: Olof Johansson -Olof From olof at austin.ibm.com Fri Feb 11 15:19:34 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 10 Feb 2005 22:19:34 -0600 Subject: [PATCH] build without PCI or VIO In-Reply-To: References: <20050209183437.30302d44.sfr@canb.auug.org.au> <20050210063942.GA11093@austin.ibm.com> Message-ID: <20050211041933.GB15110@austin.ibm.com> On Thu, Feb 10, 2005 at 09:20:33PM -0600, Hollis Blanchard wrote: > On Feb 10, 2005, at 12:39 AM, Olof Johansson wrote: > > >I'm not sure just what to think about this patch. :-) It's neat to be > >able to boot pSeries without PCI configured, but how much does it > >really > >buy us? > > I would add that booting without IO can be extremely useful when > playing in many experimental environments, such as early development > hardware or a simulator. I think it's worthwhile. I never opposed it's usefulness, just the tradeoff vs code impact. For bringups in new environments you'll likely have a stack of patches anyway so it could be carried in there separately. Either way, the second patch is much better w.r.t. to impact, i.e. less of a tradeoff and certainly worthwhile. -Olof From olh at suse.de Fri Feb 11 18:03:32 2005 From: olh at suse.de (Olaf Hering) Date: Fri, 11 Feb 2005 08:03:32 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <1107994004.7687.154.camel@gaston> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> Message-ID: <20050211070332.GA29130@suse.de> On Thu, Feb 10, Benjamin Herrenschmidt wrote: > On Wed, 2005-02-09 at 23:28 +0100, Olaf Hering wrote: > > On Wed, Feb 09, Olaf Hering wrote: > > > > > > > > Current Linus tree hangs on p620, xmon does not trigger. > > > rc3 was already broken. > > > And 2.6.10 doesnt work either... > > > > It broke between 2.6.9-rc2 and -rc3 > > Can you enable debug stuff in prom_init.c ? This seems to fix it, but later it hangs here, maybe a different problem. PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 601.578322 MHz time_init: processor frequency = 601.600000 MHz ... diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c ./arch/ppc64/kernel/prom_init.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/kernel/prom_init.c 2005-02-11 07:51:21.280306356 +0100 @@ -671,7 +671,7 @@ static void __init prom_init_mem(void) if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) RELOC(alloc_top) = RELOC(rmo_top); else - RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); + RELOC(alloc_top) = RELOC(rmo_top) = min(0x10000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); RELOC(alloc_top_high) = RELOC(ram_top); From anton at samba.org Fri Feb 11 18:10:44 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 11 Feb 2005 18:10:44 +1100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211070332.GA29130@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050211070332.GA29130@suse.de> Message-ID: <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> > This seems to fix it, but later it hangs here, maybe a different > problem. > > PID hash table entries: 4096 (order: 12, 131072 bytes) > time_init: decrementer frequency = 601.578322 MHz > time_init: processor frequency = 601.600000 MHz > ... Thats classic console issues. Are you forcing it on the command line? Maybe the autodetect is failing. Anton From ananth at in.ibm.com Fri Feb 11 17:00:43 2005 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 11 Feb 2005 11:30:43 +0530 Subject: [PATCH] ppc64: kprobes: handle trap variants while processing probes Message-ID: <20050211060043.GA5214@in.ibm.com> Hi, While processing a kprobe, we were currently not handling all available trap variants available on PowerPC. This lead to the breakage of BUG() handling in ppc64. Here is a patch to fix the issue. Please apply. Thanks, Ananth diff -Naurp temp/linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c --- temp/linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c 2005-02-03 07:26:53.000000000 +0530 +++ linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c 2005-02-10 18:08:25.000000000 +0530 @@ -105,8 +105,16 @@ static inline int kprobe_handler(struct p = get_kprobe(addr); if (!p) { unlock_kprobes(); -#if 0 if (*addr != BREAKPOINT_INSTRUCTION) { + /* + * PowerPC has multiple variants of the "trap" + * instruction. If the current instruction is a + * trap variant, it could belong to someone else + */ + kprobe_opcode_t cur_insn = *addr; + if (IS_TW(cur_insn) || IS_TD(cur_insn) || + IS_TWI(cur_insn) || IS_TDI(cur_insn)) + goto no_kprobe; /* * The breakpoint instruction was removed right * after we hit it. Another cpu has removed @@ -116,7 +124,6 @@ static inline int kprobe_handler(struct */ ret = 1; } -#endif /* Not one of ours: let kernel handle it */ goto no_kprobe; } diff -Naurp temp/linux-2.6.11-rc3/include/asm-ppc64/kprobes.h linux-2.6.11-rc3/include/asm-ppc64/kprobes.h --- temp/linux-2.6.11-rc3/include/asm-ppc64/kprobes.h 2005-02-03 07:25:50.000000000 +0530 +++ linux-2.6.11-rc3/include/asm-ppc64/kprobes.h 2005-02-10 18:08:58.000000000 +0530 @@ -35,6 +35,11 @@ typedef unsigned int kprobe_opcode_t; #define BREAKPOINT_INSTRUCTION 0x7fe00008 /* trap */ #define MAX_INSN_SIZE 1 +#define IS_TW(instr) (((instr) & 0xfc0007fe) == 0x7c000008) +#define IS_TD(instr) (((instr) & 0xfc0007fe) == 0x7c000088) +#define IS_TDI(instr) (((instr) & 0xfc000000) == 0x08000000) +#define IS_TWI(instr) (((instr) & 0xfc000000) == 0x0c000000) + #define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)((func_descr_t *)pentry) /* Architecture specific copy of original instruction */ From olh at suse.de Fri Feb 11 21:44:20 2005 From: olh at suse.de (Olaf Hering) Date: Fri, 11 Feb 2005 11:44:20 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050211070332.GA29130@suse.de> <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> Message-ID: <20050211104420.GA31683@suse.de> On Fri, Feb 11, Anton Blanchard wrote: > > > This seems to fix it, but later it hangs here, maybe a different > > problem. > > > > PID hash table entries: 4096 (order: 12, 131072 bytes) > > time_init: decrementer frequency = 601.578322 MHz > > time_init: processor frequency = 601.600000 MHz > > ... > > Thats classic console issues. Are you forcing it on the command line? > Maybe the autodetect is failing. its now stuck elsewhere. CONFIG_POWER4_ONLY is not set. I'm sure 2.6.10 boots ok on power3. ... [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 601.580095 MHz time_init: processor frequency = 601.600000 MHz -> set_preferred_console() stdout is /pci at fff7f09000/isa at 10/serial at i3f8 Found serial console at ttyS0 smp_prepare_cpus smp: kicking cpu 1 smp: kicking cpu 2 smp: kicking cpu 3 From olh at suse.de Fri Feb 11 21:54:53 2005 From: olh at suse.de (Olaf Hering) Date: Fri, 11 Feb 2005 11:54:53 +0100 Subject: [PATCH] enable DEBUG via config option Message-ID: <20050211105453.GA31718@suse.de> Its always boring to edit each file and turn the #undef DEBUG into #define DEBUG. This patch makes it a simple config option. Now the question is, how verbose will the boot be when all the printk are enabled? appears to be ok so far on a p620. diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/Kconfig.debug ./arch/ppc64/Kconfig.debug --- ../linux-2.6.11-rc3.orig/arch/ppc64/Kconfig.debug 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/Kconfig.debug 2005-02-11 11:19:36.473018091 +0100 @@ -47,6 +47,10 @@ config PPCDBG bool "Include PPCDBG realtime debugging" depends on DEBUG_KERNEL +config DEBUG_PPC64 + bool "enable all the #define DEBUG in arch/ppc64/kernel" + depends on DEBUG_KERNEL + config IRQSTACKS bool "Use separate kernel stacks when processing interrupts" help diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/boot/main.c ./arch/ppc64/boot/main.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/boot/main.c 2005-02-03 02:57:04.000000000 +0100 +++ ./arch/ppc64/boot/main.c 2005-02-11 11:21:09.109714397 +0100 @@ -73,7 +73,6 @@ void *stdin; void *stdout; void *stderr; -#undef DEBUG static unsigned long claim_base = PROG_START; diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/Makefile ./arch/ppc64/kernel/Makefile --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/Makefile 2005-02-03 02:56:53.000000000 +0100 +++ ./arch/ppc64/kernel/Makefile 2005-02-11 11:18:23.755287408 +0100 @@ -65,3 +65,7 @@ obj-$(CONFIG_ALTIVEC) += vecemu.o vecto obj-$(CONFIG_KPROBES) += kprobes.o CFLAGS_ioctl32.o += -Ifs/ + +ifeq ($(CONFIG_DEBUG_PPC64),y) +EXTRA_CFLAGS += -DDEBUG +endif diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/eeh.c ./arch/ppc64/kernel/eeh.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/eeh.c 2005-02-03 02:56:11.000000000 +0100 +++ ./arch/ppc64/kernel/eeh.c 2005-02-11 11:21:28.519621514 +0100 @@ -35,7 +35,6 @@ #include #include "pci.h" -#undef DEBUG /** Overview: * EEH, or "Extended Error Handling" is a PCI bridge technology for diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/iSeries_setup.c ./arch/ppc64/kernel/iSeries_setup.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/iSeries_setup.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/iSeries_setup.c 2005-02-11 11:21:45.000000000 +0100 @@ -16,7 +16,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/iSeries_smp.c ./arch/ppc64/kernel/iSeries_smp.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/iSeries_smp.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/iSeries_smp.c 2005-02-11 11:21:23.000000000 +0100 @@ -12,7 +12,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/idle_power4.S ./arch/ppc64/kernel/idle_power4.S --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/idle_power4.S 2005-02-03 02:55:52.000000000 +0100 +++ ./arch/ppc64/kernel/idle_power4.S 2005-02-11 11:23:49.392135145 +0100 @@ -22,7 +22,6 @@ #include #include -#undef DEBUG .text diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/lmb.c ./arch/ppc64/kernel/lmb.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/lmb.c 2005-02-03 02:55:53.000000000 +0100 +++ ./arch/ppc64/kernel/lmb.c 2005-02-11 11:21:52.869560042 +0100 @@ -22,7 +22,6 @@ struct lmb lmb; -#undef DEBUG void lmb_dump_all(void) { diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_pci.c ./arch/ppc64/kernel/maple_pci.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_pci.c 2005-02-03 02:56:49.000000000 +0100 +++ ./arch/ppc64/kernel/maple_pci.c 2005-02-11 11:34:14.593874496 +0100 @@ -8,7 +8,6 @@ * 2 of the License, or (at your option) any later version. */ -#define DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_setup.c ./arch/ppc64/kernel/maple_setup.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_setup.c 2005-02-03 02:55:14.000000000 +0100 +++ ./arch/ppc64/kernel/maple_setup.c 2005-02-11 11:34:05.545954836 +0100 @@ -11,7 +11,6 @@ * */ -#define DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_time.c ./arch/ppc64/kernel/maple_time.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/maple_time.c 2005-02-03 02:55:53.000000000 +0100 +++ ./arch/ppc64/kernel/maple_time.c 2005-02-11 11:22:33.000000000 +0100 @@ -11,7 +11,6 @@ * */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/mpic.c ./arch/ppc64/kernel/mpic.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/mpic.c 2005-02-03 02:57:04.000000000 +0100 +++ ./arch/ppc64/kernel/mpic.c 2005-02-11 11:22:39.300391150 +0100 @@ -12,7 +12,6 @@ * for more details. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_lpar.c ./arch/ppc64/kernel/pSeries_lpar.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_lpar.c 2005-02-03 02:57:04.000000000 +0100 +++ ./arch/ppc64/kernel/pSeries_lpar.c 2005-02-11 11:33:34.111057904 +0100 @@ -19,7 +19,6 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#define DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_setup.c ./arch/ppc64/kernel/pSeries_setup.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_setup.c 2005-02-03 02:55:08.000000000 +0100 +++ ./arch/ppc64/kernel/pSeries_setup.c 2005-02-11 11:21:15.471911174 +0100 @@ -16,7 +16,6 @@ * bootup setup stuff.. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_smp.c ./arch/ppc64/kernel/pSeries_smp.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pSeries_smp.c 2005-02-03 02:55:53.000000000 +0100 +++ ./arch/ppc64/kernel/pSeries_smp.c 2005-02-11 11:23:01.739361206 +0100 @@ -12,7 +12,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pci.c ./arch/ppc64/kernel/pci.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pci.c 2005-02-03 02:56:35.000000000 +0100 +++ ./arch/ppc64/kernel/pci.c 2005-02-11 11:21:57.105567991 +0100 @@ -11,7 +11,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_low_i2c.c ./arch/ppc64/kernel/pmac_low_i2c.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_low_i2c.c 2005-02-03 02:55:36.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_low_i2c.c 2005-02-11 11:22:44.241420717 +0100 @@ -16,7 +16,6 @@ * properties parser */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_nvram.c ./arch/ppc64/kernel/pmac_nvram.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_nvram.c 2005-02-03 02:57:17.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_nvram.c 2005-02-11 11:34:19.703878621 +0100 @@ -29,7 +29,6 @@ #include #include -#define DEBUG #ifdef DEBUG #define DBG(x...) printk(x) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_pci.c ./arch/ppc64/kernel/pmac_pci.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_pci.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_pci.c 2005-02-11 11:32:15.799407889 +0100 @@ -31,7 +31,6 @@ #include "pci.h" #include "pmac.h" -#define DEBUG #ifdef DEBUG #define DBG(x...) printk(x) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_setup.c ./arch/ppc64/kernel/pmac_setup.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_setup.c 2005-02-03 02:55:07.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_setup.c 2005-02-11 11:22:29.300484590 +0100 @@ -23,7 +23,6 @@ * bootup setup stuff.. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_smp.c ./arch/ppc64/kernel/pmac_smp.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_smp.c 2005-02-03 02:55:07.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_smp.c 2005-02-11 11:22:24.578418622 +0100 @@ -22,7 +22,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_time.c ./arch/ppc64/kernel/pmac_time.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_time.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_time.c 2005-02-11 11:21:34.225666358 +0100 @@ -29,7 +29,6 @@ #include #include -#undef DEBUG #ifdef DEBUG #define DBG(x...) printk(x) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom.c ./arch/ppc64/kernel/prom.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/prom.c 2005-02-11 11:22:56.927311249 +0100 @@ -15,7 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c ./arch/ppc64/kernel/prom_init.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/kernel/prom_init.c 2005-02-11 11:47:12.781010460 +0100 @@ -15,7 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG_PROM #include #include @@ -106,7 +105,7 @@ extern const struct linux_logo logo_linu __asm__ __volatile__(".long " BUG_ILLEGAL_INSTR); \ } while (0) -#ifdef DEBUG_PROM +#ifdef DEBUG #define prom_debug(x...) prom_printf(x) #else #define prom_debug(x...) @@ -643,11 +642,11 @@ static void __init prom_init_mem(void) p = RELOC(regbuf); endp = p + (plen / sizeof(cell_t)); -#ifdef DEBUG_PROM +#ifdef DEBUG memset(path, 0, PROM_SCRATCH_SIZE); call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1); prom_debug(" node %s :\n", path); -#endif /* DEBUG_PROM */ +#endif /* DEBUG */ while ((endp - p) >= (_prom->root_addr_cells + _prom->root_size_cells)) { unsigned long base, size; @@ -845,7 +844,7 @@ static void __init prom_initialize_tce_t prom_debug("TCE table: %s\n", path); prom_debug("\tnode = 0x%x\n", node); - prom_debug("\tbase = 0x%x\n", vbase); + prom_debug("\tbase = 0x%x\n", base); prom_debug("\tsize = 0x%x\n", minsize); /* Initialize the table to have a one-to-one mapping @@ -1516,7 +1515,7 @@ static void __init flatten_device_tree(v reserve_mem(RELOC(dt_header_start), hdr->totalsize); memcpy(rsvmap, RELOC(mem_reserve_map), sizeof(mem_reserve_map)); -#ifdef DEBUG_PROM +#ifdef DEBUG { int i; prom_printf("reserved memory map:\n"); diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/ras.c ./arch/ppc64/kernel/ras.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/ras.c 2005-02-03 02:55:23.000000000 +0100 +++ ./arch/ppc64/kernel/ras.c 2005-02-11 11:32:07.303399378 +0100 @@ -74,7 +74,6 @@ static irqreturn_t ras_epow_interrupt(in static irqreturn_t ras_error_interrupt(int irq, void *dev_id, struct pt_regs * regs); -/* #define DEBUG */ static void request_ras_irqs(struct device_node *np, char *propname, irqreturn_t (*handler)(int, void *, struct pt_regs *), diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/rtasd.c ./arch/ppc64/kernel/rtasd.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/rtasd.c 2005-02-03 02:55:35.000000000 +0100 +++ ./arch/ppc64/kernel/rtasd.c 2005-02-11 11:33:14.493184059 +0100 @@ -28,10 +28,10 @@ #include #include -#if 0 -#define DEBUG(A...) printk(KERN_ERR A) +#ifdef DEBUG +#define DBG(A...) printk(KERN_ERR A) #else -#define DEBUG(A...) +#define DBG(A...) #endif static DEFINE_SPINLOCK(rtasd_log_lock); @@ -194,7 +194,7 @@ void pSeries_log_error(char *buf, unsign unsigned long s; int len = 0; - DEBUG("logging event\n"); + DBG("logging event\n"); if (buf == NULL) return; @@ -370,7 +370,7 @@ static int get_eventscan_parms(void) return -1; } rtas_event_scan_rate = *ip; - DEBUG("rtas-event-scan-rate %d\n", rtas_event_scan_rate); + DBG("rtas-event-scan-rate %d\n", rtas_event_scan_rate); /* Make room for the sequence number */ rtas_error_log_max = rtas_get_error_log_max(); @@ -420,7 +420,7 @@ static int rtasd(void *unused) printk(KERN_ERR "RTAS daemon started\n"); - DEBUG("will sleep for %d jiffies\n", (HZ*60/rtas_event_scan_rate) / 2); + DBG("will sleep for %d jiffies\n", (HZ*60/rtas_event_scan_rate) / 2); /* See if we have any error stored in NVRAM */ memset(logdata, 0, rtas_error_log_max); @@ -439,9 +439,9 @@ static int rtasd(void *unused) /* First pass. */ lock_cpu_hotplug(); for_each_online_cpu(cpu) { - DEBUG("scheduling on %d\n", cpu); + DBG("scheduling on %d\n", cpu); set_cpus_allowed(current, cpumask_of_cpu(cpu)); - DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); + DBG("watchdog scheduled on cpu %d\n", smp_processor_id()); do_event_scan(event_scan); set_current_state(TASK_INTERRUPTIBLE); @@ -450,9 +450,9 @@ static int rtasd(void *unused) unlock_cpu_hotplug(); if (surveillance_timeout != -1) { - DEBUG("enabling surveillance\n"); + DBG("enabling surveillance\n"); enable_surveillance(surveillance_timeout); - DEBUG("surveillance enabled\n"); + DBG("surveillance enabled\n"); } lock_cpu_hotplug(); diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/scanlog.c ./arch/ppc64/kernel/scanlog.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/scanlog.c 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/kernel/scanlog.c 2005-02-11 11:33:56.530027345 +0100 @@ -37,7 +37,7 @@ #define SCANLOG_HWERROR -1 #define SCANLOG_CONTINUE 1 -#define DEBUG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) +#define DBG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) static int scanlog_debug; static unsigned int ibm_scan_log_dump; /* RTAS token */ @@ -85,14 +85,14 @@ static ssize_t scanlog_read(struct file memcpy(data, rtas_data_buf, RTAS_DATA_BUF_SIZE); spin_unlock(&rtas_data_buf_lock); - DEBUG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", + DBG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", status, data[0], data[1], data[2]); switch (status) { case SCANLOG_COMPLETE: - DEBUG("hit eof\n"); + DBG("hit eof\n"); return 0; case SCANLOG_HWERROR: - DEBUG("hardware error reading scan log data\n"); + DBG("hardware error reading scan log data\n"); return -EIO; case SCANLOG_CONTINUE: /* We may or may not have data yet */ @@ -143,9 +143,9 @@ static ssize_t scanlog_write(struct file if (buf) { if (strncmp(stkbuf, "reset", 5) == 0) { - DEBUG("reset scanlog\n"); + DBG("reset scanlog\n"); status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, 0, 0); - DEBUG("rtas returns %d\n", status); + DBG("rtas returns %d\n", status); } else if (strncmp(stkbuf, "debugon", 7) == 0) { printk(KERN_ERR "scanlog: debug on\n"); scanlog_debug = 1; diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/setup.c ./arch/ppc64/kernel/setup.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/setup.c 2005-02-03 02:55:51.000000000 +0100 +++ ./arch/ppc64/kernel/setup.c 2005-02-11 11:22:19.422558571 +0100 @@ -10,7 +10,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/smp.c ./arch/ppc64/kernel/smp.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/smp.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/smp.c 2005-02-11 11:22:06.465576018 +0100 @@ -15,7 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/mm/hash_utils.c ./arch/ppc64/mm/hash_utils.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/mm/hash_utils.c 2005-02-03 02:56:10.000000000 +0100 +++ ./arch/ppc64/mm/hash_utils.c 2005-02-11 11:21:04.596749411 +0100 @@ -18,7 +18,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG #include #include From olh at suse.de Fri Feb 11 22:04:32 2005 From: olh at suse.de (Olaf Hering) Date: Fri, 11 Feb 2005 12:04:32 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050211070332.GA29130@suse.de> <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> Message-ID: <20050211110432.GA31833@suse.de> On Fri, Feb 11, Anton Blanchard wrote: > > > This seems to fix it, but later it hangs here, maybe a different > > problem. > > > > PID hash table entries: 4096 (order: 12, 131072 bytes) > > time_init: decrementer frequency = 601.578322 MHz > > time_init: processor frequency = 601.600000 MHz > > ... > > Thats classic console issues. Are you forcing it on the command line? > Maybe the autodetect is failing. Here is the full log with the debug patch. BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 5801 FINAL File Size = 2969809 bytes. zImage starting: loaded at 0x400000 Allocating 0x94c000 bytes for kernel ... gunzipping (0x2100000 <- 0x407000:0x6c3a30)...done 0x7e23b8 bytes 0xe33c bytes of heap consumed, max in use 0xa214 OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 klimit=0xc00000000084c000 offset=0xbffffffffdef0000 command line: root_addr_cells: 0000000000000002 root_size_cells: 0000000000000002 scanning memory: node /memory at 0 : 0000000000000000 0000000100000000 memory layout at init: alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 Booting CPU hw index = 0x0000000000000000 Looking for displays found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done starting prom_initialize_tce_table alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000ff800000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000ff800000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000 node = 0x0000000000cc7380 base = 0x00000000ff800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff400000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000ff400000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b node = 0x0000000000cd8560 base = 0x00000000ff400000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff000000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000ff000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,2 node = 0x0000000000cdc5f8 base = 0x00000000ff000000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fec00000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fec00000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,4 node = 0x0000000000ce0a88 base = 0x00000000fec00000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fe800000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fe800000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,6 node = 0x0000000000ce4f18 base = 0x00000000fe800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,6... done alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000fe000000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fe000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000 node = 0x0000000000ce97e0 base = 0x00000000fe000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fdc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fdc00000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b node = 0x0000000000cec720 base = 0x00000000fdc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd800000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fd800000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,2 node = 0x0000000000cf0b38 base = 0x00000000fd800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd400000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fd400000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,4 node = 0x0000000000cf4fc8 base = 0x00000000fd400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd000000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fd000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,6 node = 0x0000000000cf9458 base = 0x00000000fd000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,6... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fcc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fcc00000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c node = 0x0000000000cfd8e8 base = 0x00000000fcc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc800000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fc800000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,2 node = 0x0000000000d01d88 base = 0x00000000fc800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc400000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fc400000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,4 node = 0x0000000000d06228 base = 0x00000000fc400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc000000 alloc_bottom : 0000000002960000 alloc_top : 0000000010000000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,6 node = 0x0000000000d0a6c8 base = 0x00000000fc000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,6... done ending prom_initialize_tce_table prom_instantiate_rtas: start... prom_rtas: 0000000000cb5050 alloc_down(00000000000a7000, 0000000000001000, (low)) trying: 0x000000000ff59000 -> 000000000ff59000 alloc_bottom : 0000000002960000 alloc_top : 000000000ff59000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 instantiating rtas at 0x000000000ff59000... done rtas base = 0x000000000ff59000 rtas entry = 0x000000000ff59900 rtas size = 0x00000000000a7000 prom_instantiate_rtas: end... prom_hold_cpus: start... 1) spinloop = 0x0000000000000008 1) *spinloop = 0x0000000000000000 1) acknowledge = 0x0000000000000010 1) *acknowledge = 0x0000000000000000 1) secondary_hold = 0x0000000000000060 cpuid = 0x0000000000000000 cpu hw idx = 0x0000000000000000 0000000000000000 : boot cpu 0000000000000000 cpuid = 0x0000000000000001 cpu hw idx = 0x0000000000000002 0000000000000001 : starting cpu hw idx 0000000000000002... done cpuid = 0x0000000000000002 cpu hw idx = 0x0000000000000004 0000000000000002 : starting cpu hw idx 0000000000000004... done cpuid = 0x0000000000000003 cpu hw idx = 0x0000000000000006 0000000000000003 : starting cpu hw idx 0000000000000006... done prom_hold_cpus: end... copying OF device tree ... starting device tree allocs at 0000000002960000 alloc_up(0000000000100000, 0000000000001000) trying: 0x0000000002960000 trying: 0x0000000002a60000 -> 0000000002a60000 alloc_bottom : 0000000002a60000 alloc_top : 000000000ff59000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 Building dt strings... Building dt structure... reserved memory map: 00000000fc000000 - 0000000004000000 000000000ff59000 - 00000000000a7000 0000000002a60000 - 0000000000013000 Device tree strings 0x0000000002a61000 -> 0x0000000002a62200 Device tree struct 0x0000000002a63000 -> 0x0000000002a73000 Calling quiesce ... returning from prom_init ->dt_header_start=0x0000000002a60000 ->phys=0x0000000002110000 Hello World ! <- pSeries_init_early() -> finish_device_tree <- finish_device_tree firmware_features = 0x0 <- setup_system() -> smp_init_pSeries() <- smp_init_pSeries() phb0: IO 0x0 -> 0xfffff phb0: MEM 0xfe80000000 -> 0xfebfffffff phb0 io_base_phys 0xfeffe00000 io_base_virt 0xe000000000000000 phb1: IO 0x0 -> 0xfffff phb1: MEM 0xff00000000 -> 0xff3fffffff phb1 io_base_phys 0xfefff00000 io_base_virt 0xe000000000100000 Starting Linux PPC64 2.6.11-rc3-bk7 ----------------------------------------------------- ppc64_pft_size = 0x1a ppc64_debug_switch = 0x0 ppc64_interrupt_controller = 0x2 systemcfg = 0xc000000000005000 systemcfg->platform = 0x100 systemcfg->processorCount = 0x4 systemcfg->physicalMemorySize = 0x100000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0xc0000000f8000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.11-rc3-bk7 (olaf at pomegranate) (gcc version 3.3.3 (SuSE Linux)) #18 SMP Fri Feb 11 11:48:49 CET 2005 [boot]0012 Setup Arch Top of RAM: 0x100000000, Total RAM: 0x100000000 Memory hole size: 0MB No ramdisk, default root is /dev/sda2 EEH: No capable adapters found PPC64 nvram contains 262144 bytes Using default idle loop [boot]0015 Setup Done Built 1 zonelists Kernel command line: [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 601.579361 MHz time_init: processor frequency = 601.600000 MHz -> set_preferred_console() stdout is /pci at fff7f09000/isa at 10/serial at i3f8 Found serial console at ttyS0 smp_prepare_cpus smp: kicking cpu 1 smp: kicking cpu 2 smp: kicking cpu 3 From olh at suse.de Fri Feb 11 23:34:52 2005 From: olh at suse.de (Olaf Hering) Date: Fri, 11 Feb 2005 13:34:52 +0100 Subject: [PATCH] enable DEBUG via config option In-Reply-To: <20050211105453.GA31718@suse.de> References: <20050211105453.GA31718@suse.de> Message-ID: <20050211123452.GA32465@suse.de> On Fri, Feb 11, Olaf Hering wrote: > > Its always boring to edit each file and turn the #undef DEBUG into > #define DEBUG. This patch makes it a simple config option. > Now the question is, how verbose will the boot be when all the printk > are enabled? appears to be ok so far on a p620. Here is another hunk. diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/nvram.c ./arch/ppc64/kernel/nvram.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/nvram.c 2005-02-03 02:54:59.000000000 +0100 +++ ./arch/ppc64/kernel/nvram.c 2005-02-11 13:17:12.036669795 +0100 @@ -33,7 +33,6 @@ #include #include -#undef DEBUG_NVRAM static int nvram_scan_partitions(void); static int nvram_setup_partition(void); @@ -200,7 +199,7 @@ static struct miscdevice nvram_dev = { }; -#ifdef DEBUG_NVRAM +#ifdef DEBUG static void nvram_print_partitions(char * label) { struct list_head * p; @@ -591,7 +590,7 @@ static int __init nvram_init(void) printk(KERN_WARNING "nvram_init: Could not find nvram partition" " for nvram buffered error logging.\n"); -#ifdef DEBUG_NVRAM +#ifdef DEBUG nvram_print_partitions("NVRAM Partitions"); #endif diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_feature.c ./arch/ppc64/kernel/pmac_feature.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/pmac_feature.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/pmac_feature.c 2005-02-11 13:15:26.098060418 +0100 @@ -41,9 +41,8 @@ #include #include -#undef DEBUG_FEATURE -#ifdef DEBUG_FEATURE +#ifdef DEBUG #define DBG(fmt...) printk(KERN_DEBUG fmt) #else #define DBG(fmt...) diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom.c ./arch/ppc64/kernel/prom.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom.c 2005-02-03 02:56:22.000000000 +0100 +++ ./arch/ppc64/kernel/prom.c 2005-02-11 13:16:38.247739189 +0100 @@ -53,6 +52,7 @@ #include #ifdef DEBUG +#define DEBUG_IRQ #define DBG(fmt...) udbg_printf(fmt) #else #define DBG(fmt...) From nathanl at austin.ibm.com Sat Feb 12 04:03:21 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 11 Feb 2005 11:03:21 -0600 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211110432.GA31833@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050211070332.GA29130@suse.de> <20050211071044.GN5567@krispykreme.ozlabs.ibm.com> <20050211110432.GA31833@suse.de> Message-ID: <1108141401.1243.4.camel@biclops> > smp_prepare_cpus > smp: kicking cpu 1 > smp: kicking cpu 2 > smp: kicking cpu 3 That's odd. We should be seeing "Processor x found" or "Processor x is stuck" after each of those "kicking cpu" messages. Or do udbg and printk not work well together? From dhowells at redhat.com Sat Feb 12 05:52:38 2005 From: dhowells at redhat.com (David Howells) Date: Fri, 11 Feb 2005 18:52:38 +0000 Subject: [PATCH] Fix the mincore() syscall Message-ID: <20686.1108147958@redhat.com> The attached patch fixes the mincore syscall in three ways: (1) It moves as much argument checking outside of the semaphore-holding region as possible. (2) It checks the region parameters against TASK_SIZE so that a 32-bit binary on a 64-bit platform will get the right error when calling this syscall on a region that overlaps the end of the 32-bit address space. (3) It tidies up the VMA checking loop a little. Signed-Off-By: David Howells --- warthog>diffstat mincore-2611rc3bk8.diff mincore.c | 50 ++++++++++++++++++++++++++++++++------------------ 1 files changed, 32 insertions(+), 18 deletions(-) diff -uNrp linux-2.6.11-rc3-bk8/mm/mincore.c linux-2.6.11-rc3-bk8-mincore/mm/mincore.c --- linux-2.6.11-rc3-bk8/mm/mincore.c 2005-01-04 11:13:57.000000000 +0000 +++ linux-2.6.11-rc3-bk8-mincore/mm/mincore.c 2005-02-11 18:44:25.563625998 +0000 @@ -109,39 +109,45 @@ asmlinkage long sys_mincore(unsigned lon unsigned char __user * vec) { int index = 0; - unsigned long end; + unsigned long end, limit; struct vm_area_struct * vma; + size_t max; int unmapped_error = 0; - long error = -EINVAL; + long error; - down_read(¤t->mm->mmap_sem); + /* check the arguments */ + if (start & ~PAGE_CACHE_MASK) + goto einval; + + if (start < FIRST_USER_PGD_NR * PGDIR_SIZE) + goto enomem; + + limit = TASK_SIZE; + if (start >= limit) + goto enomem; + + max = limit - start; + len = PAGE_CACHE_ALIGN(len); + if (len > max) + goto einval; - if (start & ~PAGE_CACHE_MASK) - goto out; - len = (len + ~PAGE_CACHE_MASK) & PAGE_CACHE_MASK; end = start + len; - if (end < start) - goto out; + /* check the output buffer whilst holding the lock */ error = -EFAULT; - if (!access_ok(VERIFY_WRITE, vec, len >> PAGE_SHIFT)) - goto out; + down_read(¤t->mm->mmap_sem); - error = 0; - if (end == start) + if (!access_ok(VERIFY_WRITE, vec, len >> PAGE_SHIFT)) goto out; /* * If the interval [start,end) covers some unmapped address * ranges, just ignore them, but return -ENOMEM at the end. */ - vma = find_vma(current->mm, start); - for (;;) { - /* Still start < end. */ - error = -ENOMEM; - if (!vma) - goto out; + error = 0; + vma = find_vma(current->mm, start); + while (vma) { /* Here start < vma->vm_end. */ if (start < vma->vm_start) { unmapped_error = -ENOMEM; @@ -169,7 +175,15 @@ asmlinkage long sys_mincore(unsigned lon vma = vma->vm_next; } + /* we found a hole in the area queried if we arrive here */ + error = -ENOMEM; + out: up_read(¤t->mm->mmap_sem); return error; + +einval: + return -EINVAL; +enomem: + return -ENOMEM; } From hpj at urpla.net Sat Feb 12 05:10:00 2005 From: hpj at urpla.net (Hans-Peter Jansen) Date: Fri, 11 Feb 2005 19:10:00 +0100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #3 In-Reply-To: <1108002773.7733.196.camel@gaston> References: <1108002773.7733.196.camel@gaston> Message-ID: <200502111910.00725.hpj@urpla.net> Hi Ben, are you copyrighting under a new pseudonym? E.g.: On Thursday 10 February 2005 03:32, Benjamin Herrenschmidt wrote: > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S 2005-02-02 > 13:28:01.000000000 +1100 @@ -0,0 +1,300 @@ > +/* > + * Signal trampolines for 32 bits processes in a ppc64 kernel for > + * use in the vDSO > + * > + * Copyright (C) 2004 Benjamin Herrenschmuidt ^ > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux-work/arch/ppc64/kernel/vdso32/datapage.S 2005-02-02 > 13:28:01.000000000 +1100 @@ -0,0 +1,68 @@ > +/* > + * Access to the shared data page by the vDSO & syscall map > + * > + * Copyright (C) 2004 Benjamin Herrenschmuidt Who's that guy? Pete From benh at kernel.crashing.org Sat Feb 12 09:09:30 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Feb 2005 09:09:30 +1100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211070332.GA29130@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050211070332.GA29130@suse.de> Message-ID: <1108159770.7733.226.camel@gaston> > > diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c ./arch/ppc64/kernel/prom_init.c > --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c 2005-02-03 02:56:48.000000000 +0100 > +++ ./arch/ppc64/kernel/prom_init.c 2005-02-11 07:51:21.280306356 +0100 > @@ -671,7 +671,7 @@ static void __init prom_init_mem(void) > if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) > RELOC(alloc_top) = RELOC(rmo_top); > else > - RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); > + RELOC(alloc_top) = RELOC(rmo_top) = min(0x10000000ul, RELOC(ram_top)); > RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); > RELOC(alloc_top_high) = RELOC(ram_top); > Hrm... that fixes it ? weird... I suspect the TCE allocation is screwing up. Can you please enable the full debug output and send me the log ? The above patch is definitely not a solution. Ben. From benh at kernel.crashing.org Sat Feb 12 09:15:19 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Feb 2005 09:15:19 +1100 Subject: [PATCH] ppc64: Implement a vDSO and use it for signal trampoline #3 In-Reply-To: <200502111910.00725.hpj@urpla.net> References: <1108002773.7733.196.camel@gaston> <200502111910.00725.hpj@urpla.net> Message-ID: <1108160119.7733.230.camel@gaston> On Fri, 2005-02-11 at 19:10 +0100, Hans-Peter Jansen wrote: > Hi Ben, > > are you copyrighting under a new pseudonym? E.g.: > > On Thursday 10 February 2005 03:32, Benjamin Herrenschmidt wrote: > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux-work/arch/ppc64/kernel/vdso32/sigtramp.S 2005-02-02 > > 13:28:01.000000000 +1100 @@ -0,0 +1,300 @@ > > +/* > > + * Signal trampolines for 32 bits processes in a ppc64 kernel for > > + * use in the vDSO > > + * > > + * Copyright (C) 2004 Benjamin Herrenschmuidt > ^ > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux-work/arch/ppc64/kernel/vdso32/datapage.S 2005-02-02 > > 13:28:01.000000000 +1100 @@ -0,0 +1,68 @@ > > +/* > > + * Access to the shared data page by the vDSO & syscall map > > + * > > + * Copyright (C) 2004 Benjamin Herrenschmuidt > > Who's that guy? Hehe, good catch, I'll fix that :) Thanks, Ben. From benh at kernel.crashing.org Sat Feb 12 09:54:02 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Feb 2005 09:54:02 +1100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211000134.GE23424@austin.ibm.com> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> <20050210201830.GA23150@suse.de> <20050211000134.GE23424@austin.ibm.com> Message-ID: <1108162442.7733.235.camel@gaston> On Thu, 2005-02-10 at 18:01 -0600, Linas Vepstas wrote: > Once, a long time ago, it was what a register would hold after the CPU > was powered on the very first time ... > > Now, it seems to be an error return value from prom_claim() ... > seems to be getting returned by firmware ... they probably > should have returned a -1, those jokers ... Yes, definitely looks like a firmware bug to me. This return value is just insane. > Anyway, the firmware seems to be telling us that it cannot > honour the very first request to claim memory right below > RMO top. In a broken way, but yes. > I might be totally insane but I notice that rmo_top is set to 1GB, and I > thought 256MB was the top ... so try this, for laughs ... in the routine > > static void __init prom_init_mem(void) > around line 675 > RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top > > change the 4 to a 1 ... > > That is my wild guess. This is not LPAR right ? In this case, RMO doesn't exist as-is, and I arbitrarily fixed the limit at 1Gb which, according to everybody I asked by then, was enough (some RTAS didn't support beeing instanciated about 2Gb). > I notice that someone re-wrote all of that prom code in the last half-year, > I don't know who ... probably Ben ... they would be the expert > for what's going on in here, not me. I bow out here. Well, it would be interesting to understand what's up with the firmware. The alloc_down() routine is designed to retry at a lower address when the allocation fails but that mecanism is defeated by the bogus return value from the firmware. Ben. From olh at suse.de Sat Feb 12 10:22:34 2005 From: olh at suse.de (Olaf Hering) Date: Sat, 12 Feb 2005 00:22:34 +0100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <1108162442.7733.235.camel@gaston> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> <20050210201830.GA23150@suse.de> <20050211000134.GE23424@austin.ibm.com> <1108162442.7733.235.camel@gaston> Message-ID: <20050211232234.GA17820@suse.de> On Sat, Feb 12, Benjamin Herrenschmidt wrote: > On Thu, 2005-02-10 at 18:01 -0600, Linas Vepstas wrote: > > > Once, a long time ago, it was what a register would hold after the CPU > > was powered on the very first time ... > > > > Now, it seems to be an error return value from prom_claim() ... > > seems to be getting returned by firmware ... they probably > > should have returned a -1, those jokers ... > > Yes, definitely looks like a firmware bug to me. This return value is > just insane. It still hangs later, have to try a plain 2.6.10. diff -purNx tags ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c ./arch/ppc64/kernel/prom_init.c --- ../linux-2.6.11-rc3.orig/arch/ppc64/kernel/prom_init.c 2005-02-03 02:56:48.000000000 +0100 +++ ./arch/ppc64/kernel/prom_init.c 2005-02-12 00:11:06.045040377 +0100 @@ -530,6 +529,11 @@ static unsigned long __init alloc_down(u for(; base > RELOC(alloc_bottom); base = _ALIGN_DOWN(base - 0x100000, align)) { prom_debug(" trying: 0x%x\n\r", base); addr = (unsigned long)prom_claim(base, size, 0); + if ((u32)addr == 0xdeadbeef) { + prom_debug("prom_claim, addr == 0xdeadbeef"); + addr = 0; + continue; + } if ((int)addr != PROM_ERROR) break; addr = 0; BOOTP S = 1 FILE: orange Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 5801 FINAL File Size = 2969809 bytes. zImage starting: loaded at 0x400000 Allocating 0x94c000 bytes for kernel ... gunzipping (0x2100000 <- 0x407000:0x6c38b0)...done 0x7e23b8 bytes 0xe3fc bytes of heap consumed, max in use 0xa1f4 OF stdout device is: /pci at fff7f09000/isa at 10/serial at i3f8 klimit=0xc00000000084c000 offset=0xbffffffffdef0000 command line: root_addr_cells: 0000000000000002 root_size_cells: 0000000000000002 scanning memory: node /memory at 0 : 0000000000000000 0000000100000000 memory layout at init: alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Booting CPU hw index = 0x0000000000000000 Looking for displays found display : /pci at fff7f0a000/pci at b,4/display at 1, opening ... done starting prom_initialize_tce_table alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000ff800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000 node = 0x0000000000cc7380 base = 0x00000000ff800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b node = 0x0000000000cd8560 base = 0x00000000ff400000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000ff000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000ff000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,2 node = 0x0000000000cdc5f8 base = 0x00000000ff000000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fec00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fec00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,4 node = 0x0000000000ce0a88 base = 0x00000000fec00000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fe800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fe800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f09000/pci at b,6 node = 0x0000000000ce4f18 base = 0x00000000fe800000 size = 0x0000000000400000 opening PHB /pci at fff7f09000/pci at b,6... done alloc_down(0000000000400000, 0000000000800000, (high)) -> 00000000fe000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fe000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000 node = 0x0000000000ce97e0 base = 0x00000000fe000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fdc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fdc00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b node = 0x0000000000cec720 base = 0x00000000fdc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,2 node = 0x0000000000cf0b38 base = 0x00000000fd800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,4 node = 0x0000000000cf4fc8 base = 0x00000000fd400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fd000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fd000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at b,6 node = 0x0000000000cf9458 base = 0x00000000fd000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at b,6... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fcc00000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fcc00000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c node = 0x0000000000cfd8e8 base = 0x00000000fcc00000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc800000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc800000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,2 node = 0x0000000000d01d88 base = 0x00000000fc800000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,2... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc400000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc400000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,4 node = 0x0000000000d06228 base = 0x00000000fc400000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,4... done alloc_down(0000000000400000, 0000000000400000, (high)) -> 00000000fc000000 alloc_bottom : 0000000002960000 alloc_top : 0000000040000000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 TCE table: /pci at fff7f0a000/pci at c,6 node = 0x0000000000d0a6c8 base = 0x00000000fc000000 size = 0x0000000000400000 opening PHB /pci at fff7f0a000/pci at c,6... done ending prom_initialize_tce_table prom_instantiate_rtas: start... prom_rtas: 0000000000cb5050 alloc_down(00000000000a7000, 0000000000001000, (low)) trying: 0x000000003ff59000 prom_claim, addr == 0xdeadbeef trying: 0x000000003fe59000 -> 000000003fe59000 alloc_bottom : 0000000002960000 alloc_top : 000000003fe59000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 instantiating rtas at 0x000000003fe59000... done rtas base = 0x000000003fe59000 rtas entry = 0x000000003fe59900 rtas size = 0x00000000000a7000 prom_instantiate_rtas: end... prom_hold_cpus: start... 1) spinloop = 0x0000000000000008 1) *spinloop = 0x0000000000000000 1) acknowledge = 0x0000000000000010 1) *acknowledge = 0x0000000000000000 1) secondary_hold = 0x0000000000000060 cpuid = 0x0000000000000000 cpu hw idx = 0x0000000000000000 0000000000000000 : boot cpu 0000000000000000 cpuid = 0x0000000000000001 cpu hw idx = 0x0000000000000002 0000000000000001 : starting cpu hw idx 0000000000000002... done cpuid = 0x0000000000000002 cpu hw idx = 0x0000000000000004 0000000000000002 : starting cpu hw idx 0000000000000004... done cpuid = 0x0000000000000003 cpu hw idx = 0x0000000000000006 0000000000000003 : starting cpu hw idx 0000000000000006... done prom_hold_cpus: end... copying OF device tree ... starting device tree allocs at 0000000002960000 alloc_up(0000000000100000, 0000000000001000) trying: 0x0000000002960000 trying: 0x0000000002a60000 -> 0000000002a60000 alloc_bottom : 0000000002a60000 alloc_top : 000000003fe59000 alloc_top_hi : 00000000fc000000 rmo_top : 0000000040000000 ram_top : 0000000100000000 Building dt strings... Building dt structure... reserved memory map: 00000000fc000000 - 0000000004000000 000000003fe59000 - 00000000000a7000 0000000002a60000 - 0000000000013000 Device tree strings 0x0000000002a61000 -> 0x0000000002a621f9 Device tree struct 0x0000000002a63000 -> 0x0000000002a73000 Calling quiesce ... returning from prom_init ->dt_header_start=0x0000000002a60000 ->phys=0x0000000002110000 Hello World ! <- pSeries_init_early() -> finish_device_tree <- finish_device_tree firmware_features = 0x0 <- setup_system() -> smp_init_pSeries() <- smp_init_pSeries() phb0: IO 0x0 -> 0xfffff phb0: MEM 0xfe80000000 -> 0xfebfffffff phb0 io_base_phys 0xfeffe00000 io_base_virt 0xe000000000000000 phb1: IO 0x0 -> 0xfffff phb1: MEM 0xff00000000 -> 0xff3fffffff phb1 io_base_phys 0xfefff00000 io_base_virt 0xe000000000100000 Starting Linux PPC64 2.6.11-rc3-bk7 ----------------------------------------------------- ppc64_pft_size = 0x1a ppc64_debug_switch = 0x0 ppc64_interrupt_controller = 0x2 systemcfg = 0xc000000000005000 systemcfg->platform = 0x100 systemcfg->processorCount = 0x4 systemcfg->physicalMemorySize = 0x100000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0xc0000000f8000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- [boot]0100 MM Init [boot]0100 MM Init Done Linux version 2.6.11-rc3-bk7 (olaf at pomegranate) (gcc version 3.3.3 (SuSE Linux)) #21 SMP Sat Feb 12 00:13:20 CET 2005 [boot]0012 Setup Arch Top of RAM: 0x100000000, Total RAM: 0x100000000 Memory hole size: 0MB No ramdisk, default root is /dev/sda2 EEH: No capable adapters found PPC64 nvram contains 262144 bytes Using default idle loop [boot]0015 Setup Done Built 1 zonelists Kernel command line: [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 131072 bytes) time_init: decrementer frequency = 601.579296 MHz time_init: processor frequency = 601.600000 MHz -> set_preferred_console() stdout is /pci at fff7f09000/isa at 10/serial at i3f8 Found serial console at ttyS0 smp_prepare_cpus smp: kicking cpu 1 smp: kicking cpu 2 smp: kicking cpu 3 From benh at kernel.crashing.org Sat Feb 12 10:24:37 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Feb 2005 10:24:37 +1100 Subject: p620 hangs instantiating rtas at 0x00000000deadbeef In-Reply-To: <20050211232234.GA17820@suse.de> References: <20050209150654.GA16640@suse.de> <20050209222801.GA24113@suse.de> <1107994004.7687.154.camel@gaston> <20050210082529.GB30336@suse.de> <20050210193706.GC23424@austin.ibm.com> <20050210201830.GA23150@suse.de> <20050211000134.GE23424@austin.ibm.com> <1108162442.7733.235.camel@gaston> <20050211232234.GA17820@suse.de> Message-ID: <1108164278.7733.237.camel@gaston> > > Yes, definitely looks like a firmware bug to me. This return value is > > just insane. > > It still hangs later, have to try a plain 2.6.10. Well, a different problem then. Note that the fix for the prom_claim problem should be in the prom_claim() wrapper itself, and the zImage wrapper probably wants a similar "workaround". Ben From olh at suse.de Sun Feb 13 22:36:07 2005 From: olh at suse.de (Olaf Hering) Date: Sun, 13 Feb 2005 12:36:07 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <20050210214505.GA16566@mars.ravnborg.org> References: <20050207151222.GA7219@suse.de> <16905.58373.707121.332099@cargo.ozlabs.ibm.com> <20050210210754.GA23610@suse.de> <20050210214505.GA16566@mars.ravnborg.org> Message-ID: <20050213113607.GA25270@suse.de> On Thu, Feb 10, Sam Ravnborg wrote: > Please take a look at the funcionality offered by KBUILD_IMAGE > KBUILD_IMAGE is supposed to be used for this purpose. > > Otherwise your patch looks good. Is that one ok for akpm? There is also a typo: diff -purNx tags ../linux-2.6.11-rc4.orig/Makefile ./Makefile --- ../linux-2.6.11-rc4.orig/Makefile 2005-02-13 04:06:56.000000000 +0100 +++ ./Makefile 2005-02-13 11:13:18.889376754 +0100 @@ -539,7 +539,7 @@ CFLAGS += $(call cc-option,-Wno-pointer- # Default kernel image to build when no specific target is given. # KBUILD_IMAGE may be overruled on the commandline or # set in the environment -# Also any assingments in arch/$(ARCH)/Makefiel take precedence over +# Also any assingments in arch/$(ARCH)/Makefile take precedence over # this default value export KBUILD_IMAGE ?= vmlinux diff -purNx tags ../linux-2.6.11-rc4.orig/arch/ppc64/Makefile ./arch/ppc64/Makefile --- ../linux-2.6.11-rc4.orig/arch/ppc64/Makefile 2005-02-13 04:05:28.000000000 +0100 +++ ./arch/ppc64/Makefile 2005-02-13 11:13:45.991280990 +0100 @@ -65,14 +65,20 @@ boottarget-$(CONFIG_PPC_ISERIES) := vmli $(boottarget-y): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ -bootimage-$(CONFIG_PPC_PSERIES) := zImage -bootimage-$(CONFIG_PPC_MAPLE) := zImage +bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage +bootimage-$(CONFIG_PPC_PMAC) := vmlinux +bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage bootimage-$(CONFIG_PPC_ISERIES) := vmlinux BOOTIMAGE := $(bootimage-y) install: vmlinux $(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@ -all: $(BOOTIMAGE) +defaultimage-$(CONFIG_PPC_PSERIES) := zImage +defaultimage-$(CONFIG_PPC_PMAC) := vmlinux +defaultimage-$(CONFIG_PPC_MAPLE) := zImage +defaultimage-$(CONFIG_PPC_ISERIES) := vmlinux +KBUILD_IMAGE := $(defaultimage-y) +all: $(KBUILD_IMAGE) archclean: $(Q)$(MAKE) $(clean)=$(boot) diff -purNx tags ../linux-2.6.11-rc4.orig/arch/ppc64/boot/Makefile ./arch/ppc64/boot/Makefile --- ../linux-2.6.11-rc4.orig/arch/ppc64/boot/Makefile 2005-02-13 04:07:34.000000000 +0100 +++ ./arch/ppc64/boot/Makefile 2005-02-13 10:42:48.782252046 +0100 @@ -117,7 +117,7 @@ $(obj)/imagesize.c: vmlinux.strip awk '{printf "unsigned long vmlinux_memsize = 0x%s;\n", substr($$1,8)}' \ >> $(obj)/imagesize.c -install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) - sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" +install: $(CONFIGURE) $(BOOTIMAGE) + sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux System.map "$(INSTALL_PATH)" "$(BOOTIMAGE)" clean-files := $(addprefix $(objtree)/, $(obj-boot) vmlinux.strip) diff -purNx tags ../linux-2.6.11-rc4.orig/arch/ppc64/boot/install.sh ./arch/ppc64/boot/install.sh --- ../linux-2.6.11-rc4.orig/arch/ppc64/boot/install.sh 2005-02-13 04:08:05.000000000 +0100 +++ ./arch/ppc64/boot/install.sh 2005-02-13 10:42:48.783251890 +0100 @@ -17,6 +17,7 @@ # $2 - kernel image file # $3 - kernel map file # $4 - default install path (blank if root directory) +# $5 - kernel boot file, the zImage # # User may have a custom install script @@ -27,7 +28,7 @@ if [ -x /sbin/installkernel ]; then exec # Default install # this should work for both the pSeries zImage and the iSeries vmlinux.sm -image_name=`basename $2` +image_name=`basename $5` if [ -f $4/$image_name ]; then mv $4/$image_name $4/$image_name.old From arnd at arndb.de Mon Feb 14 00:26:41 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sun, 13 Feb 2005 14:26:41 +0100 Subject: [PATCH] use vmlinux during make install on ppc64 In-Reply-To: <20050213113607.GA25270@suse.de> References: <20050207151222.GA7219@suse.de> <20050210214505.GA16566@mars.ravnborg.org> <20050213113607.GA25270@suse.de> Message-ID: <200502131426.46354.arnd@arndb.de> On S?nndag 13 Februar 2005 12:36, Olaf Hering wrote: > There is also a typo: Two typos, actually ;-) > -# Also any assingments in arch/$(ARCH)/Makefiel take precedence over > +# Also any assingments in arch/$(ARCH)/Makefile take precedence over ^^^^ Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050213/d68b56ee/attachment.pgp From olh at suse.de Mon Feb 14 04:13:58 2005 From: olh at suse.de (Olaf Hering) Date: Sun, 13 Feb 2005 18:13:58 +0100 Subject: [PATCH] remove extra whitespace before preprocessor token Message-ID: <20050213171358.GB24061@suse.de> unifdef complains about the space before #ifndef. Signed-off-by: Olaf Hering diff -purNx tags ../linux-2.6.11-rc4.orig/include/asm-ppc64/io.h ./include/asm-ppc64/io.h --- ../linux-2.6.11-rc4.orig/include/asm-ppc64/io.h 2005-02-13 04:07:49.000000000 +0100 +++ ./include/asm-ppc64/io.h 2005-02-13 18:06:56.493502484 +0100 @@ -1,4 +1,4 @@ - #ifndef _PPC64_IO_H +#ifndef _PPC64_IO_H #define _PPC64_IO_H /* From dhowells at redhat.com Tue Feb 15 01:51:00 2005 From: dhowells at redhat.com (David Howells) Date: Mon, 14 Feb 2005 14:51:00 +0000 Subject: [PATCH] increase the upper limit of sg_io64.dxfer_len to make "sg_logs -p=0x0f /dev/sgN" happy In-Reply-To: <20040604023808.87240.qmail@web60007.mail.yahoo.com> References: <20040604023808.87240.qmail@web60007.mail.yahoo.com> Message-ID: <16038.1108392660@redhat.com> Woody Zhou wrote: > Following is a patch to remedy an "invalid argument" error while > executing "sg_logs -p=0x0f /dev/sgN". I submit it here for your > review. At this place /dev/sgN should be mapped to a SCSI device > which support page 0x0f(Application client), such as IBM > IC35L036UCDY10-0 disks. > > Index: 2.4.21-15.EL/arch/ppc64/kernel/ioctl32.c > =================================================================== > --- arch/ppc64/kernel/ioctl32.c.orig 2004-05-25 > 15:18:46.000000000 +0800 > +++ arch/ppc64/kernel/ioctl32.c 2004-05-25 15:20:33.000000000 +0800 > @@ -1301,7 +1301,7 @@ > goto out; > } > } else { > - if (sg_io64.dxfer_len > 4*PAGE_SIZE) { > + if (sg_io64.dxfer_len > 8*PAGE_SIZE) { I'm not sure that this patch is the best way to do things. The larger the kmalloc() call made, the harder it is for the kernel to honour it. The kernel needs to find enough contiguous and properly aligned memory to be able to do this, and the larger the request, the harder it will be. Furthermore, this takes no account of the fact that PAGE_SIZE can be changed. If it was, for example, changed to 64KB, you'd be allocating an enormous chunk of memory, and only using less than a page in this instance. A better way would be, perhaps, to abuse the way KERNEL_DS and USER_DS affect pointer checking: Rather than do this in sg_ioctl_trans(): (1) pull parameter block into kernel space (2) allocate an enormous scratch buffer (3) copy the input data into the buffer (4) set FS selector to KERNEL_DS (5) call sys_ioctl() (6) restore FS selector (7) copy the output data from the buffer (8) free the buffers You could do this: (1) pull parameter block into kernel space (2) call verify_area() on the userspace buffers (3) point kernel param block buffer pointers at the userspace buffers (4) set FS selector to KERNEL_DS (5) call sys_ioctl() (6) restore FS selector This ought to be sufficient. Setting KERNEL_DS merely disables the address bounds checking in access_ok() and verify_area(). These don't actually limit kernel pointers to kernel space; so as long as the check is performed _somewhere_ before the driver tries to access the buffers, it should be okay. After all, verify_area() and co. don't check VMA lists or anything like that on ppc64. The driver is reliant on the other members of asm/uaccess.h for that. Actually, the driver is very poor in this regard; it doesn't check the return value of the functions that actually touch userspace. It does in fact rely on verify_area(), which it shouldn't since that doesn't tell it whether or not there's actually a mapping there. That is, however, a different problem entirely. David From arnd at arndb.de Tue Feb 15 02:27:43 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 14 Feb 2005 16:27:43 +0100 Subject: [PATCH] increase the upper limit of sg_io64.dxfer_len to make "sg_logs -p=0x0f /dev/sgN" happy In-Reply-To: <16038.1108392660@redhat.com> References: <20040604023808.87240.qmail@web60007.mail.yahoo.com> <16038.1108392660@redhat.com> Message-ID: <200502141627.44089.arnd@arndb.de> On Maandag 14 Februar 2005 15:51, David Howells wrote: > Furthermore, this takes no account of the fact that PAGE_SIZE can be > changed. If it was, for example, changed to 64KB, you'd be allocating an > enormous chunk of memory, and only using less than a page in this instance. Why would anyone want to make such a big change on a three year old kernel? I'm not even convinced it would be a good idea to support 64k pages on RHEL4, and that doesn't have this particular problem. > ?(1) pull parameter block into kernel space > ?(2) call verify_area() on the userspace buffers > ?(3) point kernel param block buffer pointers at the userspace buffers > ?(4) set FS selector to KERNEL_DS > ?(5) call sys_ioctl() > ?(6) restore FS selector > > This ought to be sufficient. Setting KERNEL_DS merely disables the address > bounds checking in access_ok() and verify_area(). These don't actually limit > kernel pointers to kernel space; so as long as the check is performed > _somewhere_ before the driver tries to access the buffers, it should be okay. Of course, such code will be highly non-portable, unlike most other functions in compat_ioctl.c. The right solution IMHO would be to backport the code from 2.6, which uses compat_alloc_user_space. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050214/91ff0b63/attachment.pgp From martin at meltin.net Tue Feb 15 09:49:06 2005 From: martin at meltin.net (Martin Schwenke) Date: Tue, 15 Feb 2005 09:49:06 +1100 Subject: [PATCH] increase the upper limit of sg_io64.dxfer_len to make "sg_logs -p=0x0f /dev/sgN" happy In-Reply-To: <14886.1108389537@redhat.com> References: <20040604023808.87240.qmail@web60007.mail.yahoo.com> <14886.1108389537@redhat.com> Message-ID: <16913.10978.625135.82686@martins.ozlabs.org> >>>>> "David" == David Howells writes: David> Woody Zhou wrote: >> Index: 2.4.21-15.EL/arch/ppc64/kernel/ioctl32.c >> =================================================================== >> --- arch/ppc64/kernel/ioctl32.c.orig 2004-05-25 >> 15:18:46.000000000 +0800 >> +++ arch/ppc64/kernel/ioctl32.c 2004-05-25 15:20:33.000000000 +0800 >> @@ -1301,7 +1301,7 @@ >> goto out; >> } >> } else { >> - if (sg_io64.dxfer_len > 4*PAGE_SIZE) { >> + if (sg_io64.dxfer_len > 8*PAGE_SIZE) { David> I'm not sure that this patch is the best way to do David> things. The larger the kmalloc() call made, the harder it David> is for the kernel to honour it. The kernel needs to find David> enough contiguous and properly aligned memory to be able to David> do this, and the larger the request, the harder it will be. You're undoubtedly right, but the goal of this patch isn't to fix the problem you're talking about - that problem has always been there! :-) Some time last year (or the year before?) the above condition was added as an anti-denial-of-service measure. However, the limit in the condition is too small, causing a particular ioctl to fail (by only a few bytes, simply due to the arbitrariness of the limit). All that the above patch does is increase the limit (by another arbitrary amount :-) to make that ioctl work again... or work at least as well as it did before the arbitrary limit was introduced. So, the patch that introduced the limit was flawed and we're trying to fix a regression via a minimal patch. In that context, could this patch go in until the code is fixed properly? peace & happiness, martin From sfr at canb.auug.org.au Tue Feb 15 14:01:49 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 15 Feb 2005 14:01:49 +1100 Subject: [PATCH] Consolidate compat_sys_waitid Message-ID: <20050215140149.0b06c96b.sfr@canb.auug.org.au> Hi all, This patch does: - consolidate the three implementations of compat_sys_waitid (some were called sys32_waitid). - adds sys_waitid syscall to ppc - adds sys_waitid and compat_sys_waitid syscalls to ppc64 Parisc seemed to assume th existance of compat_sys_waitid. The MIPS syscall tables have me confused and may need updating. I have arbitrarily chosen the next available syscall number on ppc and ppc64, I hope this is correct. Signed-off-by: Stephen Rothwell Comments? -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus-bk/arch/ia64/ia32/ia32_entry.S linus-bk-waitid.1/arch/ia64/ia32/ia32_entry.S --- linus-bk/arch/ia64/ia32/ia32_entry.S 2005-01-16 07:07:51.000000000 +1100 +++ linus-bk-waitid.1/arch/ia64/ia32/ia32_entry.S 2005-02-15 12:12:21.000000000 +1100 @@ -494,7 +494,7 @@ ia32_syscall_table: data8 compat_sys_mq_notify data8 compat_sys_mq_getsetattr data8 sys_ni_syscall /* reserved for kexec */ - data8 sys32_waitid + data8 compat_sys_waitid // guard against failures to increase IA32_NR_syscalls .org ia32_syscall_table + 8*IA32_NR_syscalls diff -ruNp linus-bk/arch/ia64/ia32/sys_ia32.c linus-bk-waitid.1/arch/ia64/ia32/sys_ia32.c --- linus-bk/arch/ia64/ia32/sys_ia32.c 2005-02-11 13:05:29.000000000 +1100 +++ linus-bk-waitid.1/arch/ia64/ia32/sys_ia32.c 2005-02-15 12:16:35.000000000 +1100 @@ -2633,32 +2633,6 @@ long sys32_fadvise64_64(int fd, __u32 of advice); } -asmlinkage long sys32_waitid(int which, compat_pid_t pid, - compat_siginfo_t __user *uinfo, int options, - struct compat_rusage __user *uru) -{ - siginfo_t info; - struct rusage ru; - long ret; - mm_segment_t old_fs = get_fs(); - - info.si_signo = 0; - set_fs (KERNEL_DS); - ret = sys_waitid(which, pid, (siginfo_t __user *) &info, options, - uru ? (struct rusage __user *) &ru : NULL); - set_fs (old_fs); - - if (ret < 0 || info.si_signo == 0) - return ret; - - if (uru && (ret = put_compat_rusage(&ru, uru))) - return ret; - - BUG_ON(info.si_code & __SI_MASK); - info.si_code |= __SI_CHLD; - return copy_siginfo_to_user32(uinfo, &info); -} - #ifdef NOTYET /* UNTESTED FOR IA64 FROM HERE DOWN */ asmlinkage long sys32_setreuid(compat_uid_t ruid, compat_uid_t euid) diff -ruNp linus-bk/arch/ppc/kernel/misc.S linus-bk-waitid.1/arch/ppc/kernel/misc.S --- linus-bk/arch/ppc/kernel/misc.S 2005-01-04 17:05:28.000000000 +1100 +++ linus-bk-waitid.1/arch/ppc/kernel/misc.S 2005-02-15 13:12:01.000000000 +1100 @@ -1450,3 +1450,4 @@ _GLOBAL(sys_call_table) .long sys_add_key .long sys_request_key /* 270 */ .long sys_keyctl + .long sys_waitid diff -ruNp linus-bk/arch/ppc64/kernel/misc.S linus-bk-waitid.1/arch/ppc64/kernel/misc.S --- linus-bk/arch/ppc64/kernel/misc.S 2005-01-16 07:07:51.000000000 +1100 +++ linus-bk-waitid.1/arch/ppc64/kernel/misc.S 2005-02-15 13:13:51.000000000 +1100 @@ -939,6 +939,7 @@ _GLOBAL(sys_call_table32) .llong .sys32_add_key .llong .sys32_request_key .llong .compat_sys_keyctl + .llong .compat_sys_waitid .balign 8 _GLOBAL(sys_call_table) @@ -1214,3 +1215,4 @@ _GLOBAL(sys_call_table) .llong .sys_add_key .llong .sys_request_key /* 270 */ .llong .sys_keyctl + .llong .sys_waitid diff -ruNp linus-bk/arch/sparc64/kernel/sys_sparc32.c linus-bk-waitid.1/arch/sparc64/kernel/sys_sparc32.c --- linus-bk/arch/sparc64/kernel/sys_sparc32.c 2005-02-11 13:05:29.000000000 +1100 +++ linus-bk-waitid.1/arch/sparc64/kernel/sys_sparc32.c 2005-02-15 12:01:55.000000000 +1100 @@ -1653,34 +1653,3 @@ sys32_timer_create(u32 clock, struct sig return err; } - -asmlinkage long compat_sys_waitid(u32 which, u32 pid, - struct compat_siginfo __user *uinfo, - u32 options, struct compat_rusage __user *uru) -{ - siginfo_t info; - struct rusage ru; - long ret; - mm_segment_t old_fs = get_fs(); - - memset(&info, 0, sizeof(info)); - - set_fs (KERNEL_DS); - ret = sys_waitid(which, pid, (siginfo_t __user *) &info, - options, - uru ? (struct rusage __user *) &ru : NULL); - set_fs (old_fs); - - if (ret < 0 || info.si_signo == 0) - return ret; - - if (uru) { - ret = put_compat_rusage(&ru, uru); - if (ret) - return ret; - } - - BUG_ON(info.si_code & __SI_MASK); - info.si_code |= __SI_CHLD; - return copy_siginfo_to_user32(uinfo, &info); -} diff -ruNp linus-bk/arch/x86_64/ia32/ia32entry.S linus-bk-waitid.1/arch/x86_64/ia32/ia32entry.S --- linus-bk/arch/x86_64/ia32/ia32entry.S 2005-01-16 11:05:29.000000000 +1100 +++ linus-bk-waitid.1/arch/x86_64/ia32/ia32entry.S 2005-02-15 12:11:52.000000000 +1100 @@ -590,7 +590,7 @@ ia32_sys_call_table: .quad compat_sys_mq_notify .quad compat_sys_mq_getsetattr .quad quiet_ni_syscall /* reserved for kexec */ - .quad sys32_waitid + .quad compat_sys_waitid .quad quiet_ni_syscall /* sys_altroot */ .quad sys_add_key .quad sys_request_key diff -ruNp linus-bk/arch/x86_64/ia32/sys_ia32.c linus-bk-waitid.1/arch/x86_64/ia32/sys_ia32.c --- linus-bk/arch/x86_64/ia32/sys_ia32.c 2005-02-04 04:10:36.000000000 +1100 +++ linus-bk-waitid.1/arch/x86_64/ia32/sys_ia32.c 2005-02-15 12:17:04.000000000 +1100 @@ -955,32 +955,6 @@ asmlinkage long sys32_clone(unsigned int return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid); } -asmlinkage long sys32_waitid(int which, compat_pid_t pid, - compat_siginfo_t __user *uinfo, int options, - struct compat_rusage __user *uru) -{ - siginfo_t info; - struct rusage ru; - long ret; - mm_segment_t old_fs = get_fs(); - - info.si_signo = 0; - set_fs (KERNEL_DS); - ret = sys_waitid(which, pid, (siginfo_t __user *) &info, options, - uru ? &ru : NULL); - set_fs (old_fs); - - if (ret < 0 || info.si_signo == 0) - return ret; - - if (uru && (ret = put_compat_rusage(&ru, uru))) - return ret; - - BUG_ON(info.si_code & __SI_MASK); - info.si_code |= __SI_CHLD; - return copy_siginfo_to_user32(uinfo, &info); -} - /* * Some system calls that need sign extended arguments. This could be done by a generic wrapper. */ diff -ruNp linus-bk/include/asm-ppc/unistd.h linus-bk-waitid.1/include/asm-ppc/unistd.h --- linus-bk/include/asm-ppc/unistd.h 2005-01-04 17:05:28.000000000 +1100 +++ linus-bk-waitid.1/include/asm-ppc/unistd.h 2005-02-15 13:08:22.000000000 +1100 @@ -276,8 +276,9 @@ #define __NR_add_key 269 #define __NR_request_key 270 #define __NR_keyctl 271 +#define __NR_waitid 272 -#define __NR_syscalls 272 +#define __NR_syscalls 273 #define __NR(n) #n diff -ruNp linus-bk/include/asm-ppc64/unistd.h linus-bk-waitid.1/include/asm-ppc64/unistd.h --- linus-bk/include/asm-ppc64/unistd.h 2005-01-05 17:06:08.000000000 +1100 +++ linus-bk-waitid.1/include/asm-ppc64/unistd.h 2005-02-15 13:07:33.000000000 +1100 @@ -282,8 +282,9 @@ #define __NR_add_key 269 #define __NR_request_key 270 #define __NR_keyctl 271 +#define __NR_waitid 272 -#define __NR_syscalls 272 +#define __NR_syscalls 273 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif diff -ruNp linus-bk/include/linux/compat.h linus-bk-waitid.1/include/linux/compat.h --- linus-bk/include/linux/compat.h 2005-01-05 17:06:08.000000000 +1100 +++ linus-bk-waitid.1/include/linux/compat.h 2005-02-15 13:22:13.000000000 +1100 @@ -81,6 +81,12 @@ struct compat_rusage { extern int put_compat_rusage(const struct rusage *, struct compat_rusage __user *); +struct compat_siginfo; + +extern asmlinkage long compat_sys_waitid(u32, u32, + struct compat_siginfo __user *, u32, + struct compat_rusage __user *); + struct compat_dirent { u32 d_ino; compat_off_t d_off; @@ -143,7 +149,6 @@ long compat_get_bitmap(unsigned long *ma unsigned long bitmap_size); long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask, unsigned long bitmap_size); -struct compat_siginfo; int copy_siginfo_from_user32(siginfo_t *to, struct compat_siginfo __user *from); int copy_siginfo_to_user32(struct compat_siginfo __user *to, siginfo_t *from); #endif /* CONFIG_COMPAT */ diff -ruNp linus-bk/kernel/compat.c linus-bk-waitid.1/kernel/compat.c --- linus-bk/kernel/compat.c 2005-01-16 07:07:51.000000000 +1100 +++ linus-bk-waitid.1/kernel/compat.c 2005-02-15 12:09:46.000000000 +1100 @@ -23,6 +23,7 @@ #include #include +#include int get_compat_timespec(struct timespec *ts, const struct compat_timespec __user *cts) { @@ -413,6 +414,36 @@ compat_sys_wait4(compat_pid_t pid, compa } } +asmlinkage long compat_sys_waitid(u32 which, u32 pid, + struct compat_siginfo __user *uinfo, u32 options, + struct compat_rusage __user *uru) +{ + siginfo_t info; + struct rusage ru; + long ret; + mm_segment_t old_fs = get_fs(); + + memset(&info, 0, sizeof(info)); + + set_fs(KERNEL_DS); + ret = sys_waitid(which, pid, (siginfo_t __user *)&info, options, + uru ? (struct rusage __user *)&ru : NULL); + set_fs(old_fs); + + if ((ret < 0) || (info.si_signo == 0)) + return ret; + + if (uru) { + ret = put_compat_rusage(&ru, uru); + if (ret) + return ret; + } + + BUG_ON(info.si_code & __SI_MASK); + info.si_code |= __SI_CHLD; + return copy_siginfo_to_user32(uinfo, &info); +} + static int compat_get_user_cpu_mask(compat_ulong_t __user *user_mask_ptr, unsigned len, cpumask_t *new_mask) { From brking at us.ibm.com Wed Feb 16 02:29:19 2005 From: brking at us.ibm.com (Brian King) Date: Tue, 15 Feb 2005 09:29:19 -0600 Subject: [PATCH] ppc64: Mode 2 PCI-X config space size fix In-Reply-To: <420BF311.6020800@us.ibm.com> References: <200501281456.j0SEuI12020454@d01av01.pok.ibm.com> <20050131192955.GJ31145@parcelfarce.linux.theplanet.co.uk> <41FEA4AA.1080407@us.ibm.com> <200501312256.44692.arnd@arndb.de> <41FEB492.2020002@us.ibm.com> <1107227727.5963.46.camel@gaston> <41FF0B0D.8020003@us.ibm.com> <20050201123249.GA10088@parcelfarce.linux.theplanet.co.uk> <41FFE3AF.706@us.ibm.com> <420A6343.6070307@us.ibm.com> <16906.53505.649325.792660@cargo.ozlabs.ibm.com> <420BF311.6020800@us.ibm.com> Message-ID: <4212154F.70904@us.ibm.com> Brian King wrote: > Paul Mackerras wrote: > >> Brian King writes: >> >> >>> Trimming the cc list a bit since this has become a PPC64 only patch >>> and resending... >> >> >> >> Unless you think this really needs to go in 2.6.11, I'll defer it >> until after 2.6.11 is out, since we're supposed to be in >> bug-fix/stabilization mode for 2.6.11. Are you OK with that? > > > That is fine. > >> Oh, and a minor nit: >> >> >>> + if (type && *type == 1) >>> + dn->pci_ext_config_space = 1; >>> + else >>> + dn->pci_ext_config_space = 0; >> >> >> >> is more compactly expressed as: >> >> dn->pci_ext_config_space = (type && *type == 1); > > > I'll fix this and send out an updated patch. Here is an updated patch. Please apply once 2.6.11 comes out. Thanks -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64_pcix_mode2_cfg.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050215/e07970bb/attachment.txt From sfr at canb.auug.org.au Wed Feb 16 11:11:46 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 16 Feb 2005 11:11:46 +1100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption Message-ID: <20050216111146.524158ce.sfr@canb.auug.org.au> Hi Marcello, In the presence of threads, there is a possibility of the kernel being fooled by the 32 bit sys_recvmsg control data into copying more than it should into the kernel and corrupting kernel data structures. We call the 64 bit version of sys_recvmsg which writes control messages directly to user memory which we then read back and "fix up" for the differences between 32 and 64 bit structures. If two threads share the buffer that we are writing into (and then reading from) it is possible for the control message headers to be changed from what we expect. One of the header fields is the length we need to copy back into the kernel ... This patch just does some more length checking. This bug was actually being hit by BIND running at a customer site. It is very hard to hit, but (obviously) possible. Signed-off-by: Stephen Rothwell Pleaase consider for inclusion into 2.4.30. A patch similar to this may be required my some of the other 64bit archs. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2.4.30-pre1-sfr.1/arch/ppc64/kernel/sys_ppc32.c --- 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.1/arch/ppc64/kernel/sys_ppc32.c 2005-02-16 11:01:05.000000000 +1100 @@ -3664,7 +3664,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -3695,6 +3696,19 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -3751,6 +3765,7 @@ asmlinkage long sys32_recvmsg(int fd, st struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; PPCDBG(PPCDBG_SYS32, "sys32_recvmsg - entered - fd=%x, user_msg@=%p, user_flags=%x \n", fd, user_msg, user_flags); @@ -3768,6 +3783,7 @@ asmlinkage long sys32_recvmsg(int fd, st total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -3793,7 +3809,8 @@ asmlinkage long sys32_recvmsg(int fd, st * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) From ak at suse.de Wed Feb 16 11:28:41 2005 From: ak at suse.de (Andi Kleen) Date: Wed, 16 Feb 2005 01:28:41 +0100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216111146.524158ce.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> Message-ID: <20050216002841.GA8237@wotan.suse.de> On Wed, Feb 16, 2005 at 11:11:46AM +1100, Stephen Rothwell wrote: > Hi Marcello, > > In the presence of threads, there is a possibility of the kernel being > fooled by the 32 bit sys_recvmsg control data into copying more than it > should into the kernel and corrupting kernel data structures. > > We call the 64 bit version of sys_recvmsg which writes control messages > directly to user memory which we then read back and "fix up" for the > differences between 32 and 64 bit structures. If two threads share the > buffer that we are writing into (and then reading from) it is possible for > the control message headers to be changed from what we expect. One of the > header fields is the length we need to copy back into the kernel ... > > This patch just does some more length checking. > > This bug was actually being hit by BIND running at a customer site. It is > very hard to hit, but (obviously) possible. Did you check if other 32bit emulations don't have the same problem? -Andi From nathanl at austin.ibm.com Wed Feb 16 12:23:47 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 15 Feb 2005 19:23:47 -0600 Subject: [PATCH] use _smp_processor_id() in idle loops In-Reply-To: <1107037459.31457.4.camel@biclops> References: <1106864625.8962.11.camel@pants.austin.ibm.com> <1107037459.31457.4.camel@biclops> Message-ID: <1108517027.16440.11.camel@biclops> On Sat, 2005-01-29 at 16:24 -0600, Nathan Lynch wrote: > On Thu, 2005-01-27 at 16:23 -0600, Nathan Lynch wrote: > > With 2.6.11-rc2-mm1 and 2.6-bk kernels with CONFIG_DEBUG_PREEMPT I'm > > seeing lots of smp_processor_id warnings from the idle loops: > > > > BUG: using smp_processor_id() in preemptible [00000001] code: > > swapper/0 > > caller is .dedicated_idle+0x64/0x228 > > Call Trace: > > [c0000000004a3c50] [ffffffffffffffff] 0xffffffffffffffff (unreliable) > > [c0000000004a3cd0] [c0000000001d179c] .smp_processor_id+0x154/0x168 > > [c0000000004a3d90] [c00000000000f990] .dedicated_idle+0x64/0x228 > > [c0000000004a3e80] [c00000000000fce0] .cpu_idle+0x34/0x4c > > [c0000000004a3f00] [c00000000003a908] .start_secondary+0x10c/0x150 > > [c0000000004a3f90] [c00000000000bd28] .enable_64b_mode+0x0/0x28 > > This appears to be fixed in 2.6.11-rc2-mm2, so I guess my patch isn't > necessary now. Ok I'm a moron. This patch is still needed. With latest 2.6-bk I see the warning whenever I online a cpu, and it's incessant in shared processor mode (no hotplugging of cpus required to trigger it). This is because of the second use of smp_processor_id in the shared_idle function. I think I confused myself by switching a test partition between shared and dedicated processor modes and not taking that into account. Sorry for the mixup. Nathan From sfr at canb.auug.org.au Wed Feb 16 14:06:28 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 16 Feb 2005 14:06:28 +1100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216002841.GA8237@wotan.suse.de> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> Message-ID: <20050216140628.70232669.sfr@canb.auug.org.au> On Wed, 16 Feb 2005 01:28:41 +0100 Andi Kleen wrote: > > > Did you check if other 32bit emulations don't have the same problem? No, becasue I am a lazy bastard! :-) OK, everyone has this bug (I now checked). Sparc64 does some checking, but not enough. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ From sfr at canb.auug.org.au Wed Feb 16 17:22:59 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 16 Feb 2005 17:22:59 +1100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216140628.70232669.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> Message-ID: <20050216172259.1dee3b39.sfr@canb.auug.org.au> [Take 2] Hi Marcello, In the presence of threads, there is a possibility of the kernel being fooled by the 32 bit sys_recvmsg control data into copying more than it should into the kernel and corrupting kernel data structures. We call the 64 bit version of sys_recvmsg which writes control messages directly to user memory which we then read back and "fix up" for the differences between 32 and 64 bit structures. If two threads share the buffer that we are writing into (and then reading from) it is possible for the control message headers to be changed from what we expect. One of the header fields is the length we need to copy back into the kernel ... This patch just does some more length checking. This bug was actually being hit by BIND running at a customer site. It is very hard to hit, but (obviously) possible. Signed-off-by: Stephen Rothwell Please consider for inclusion into 2.4.30. Only the ppc64 part of this patch has been compiled and tested. I have applied the same fix to all the 46 bit archs with 32 bit compatibility. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.4.30-pre1/arch/ia64/ia32/sys_ia32.c 2.4.30-pre1-sfr.2/arch/ia64/ia32/sys_ia32.c --- 2.4.30-pre1/arch/ia64/ia32/sys_ia32.c 2005-02-16 10:57:03.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/ia64/ia32/sys_ia32.c 2005-02-16 16:49:51.000000000 +1100 @@ -1649,7 +1649,8 @@ * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ static void -cmsg32_recvmsg_fixup (struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +cmsg32_recvmsg_fixup (struct msghdr *kmsg, unsigned long orig_cmsg_uptr, + __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -1683,6 +1684,19 @@ goto fail2; clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -1812,6 +1826,7 @@ struct iovec *iov=iovstack; struct msghdr msg_sys; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, iov_size, total_len, len; struct scm_cookie scm; @@ -1856,6 +1871,7 @@ total_len=err; cmsg_ptr = (unsigned long)msg_sys.msg_control; + cmsg_len = msg_sys.msg_controllen; msg_sys.msg_flags = 0; if (sock->file->f_flags & O_NONBLOCK) @@ -1882,7 +1898,8 @@ * fix it up before we tack on more stuff. */ if ((unsigned long) msg_sys.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&msg_sys, cmsg_ptr); + cmsg32_recvmsg_fixup(&msg_sys, cmsg_ptr, + cmsg_len); /* Wheee... */ if (sock->passcred) diff -ruN 2.4.30-pre1/arch/mips64/kernel/linux32.c 2.4.30-pre1-sfr.2/arch/mips64/kernel/linux32.c --- 2.4.30-pre1/arch/mips64/kernel/linux32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/mips64/kernel/linux32.c 2005-02-16 16:56:24.000000000 +1100 @@ -2790,7 +2790,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2821,6 +2822,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2906,6 +2920,7 @@ struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2921,6 +2936,7 @@ total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2946,7 +2962,8 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruN 2.4.30-pre1/arch/parisc/kernel/sys_parisc32.c 2.4.30-pre1-sfr.2/arch/parisc/kernel/sys_parisc32.c --- 2.4.30-pre1/arch/parisc/kernel/sys_parisc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/parisc/kernel/sys_parisc32.c 2005-02-16 16:58:48.000000000 +1100 @@ -2106,7 +2106,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2137,6 +2138,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2222,6 +2236,7 @@ struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2237,6 +2252,7 @@ total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2262,7 +2278,8 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruN 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2.4.30-pre1-sfr.2/arch/ppc64/kernel/sys_ppc32.c --- 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/ppc64/kernel/sys_ppc32.c 2005-02-16 11:01:05.000000000 +1100 @@ -3664,7 +3664,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -3695,6 +3696,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -3751,6 +3765,7 @@ struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; PPCDBG(PPCDBG_SYS32, "sys32_recvmsg - entered - fd=%x, user_msg@=%p, user_flags=%x \n", fd, user_msg, user_flags); @@ -3768,6 +3783,7 @@ total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -3793,7 +3809,8 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruN 2.4.30-pre1/arch/s390x/kernel/linux32.c 2.4.30-pre1-sfr.2/arch/s390x/kernel/linux32.c --- 2.4.30-pre1/arch/s390x/kernel/linux32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/s390x/kernel/linux32.c 2005-02-16 17:18:07.000000000 +1100 @@ -2597,7 +2597,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2628,6 +2629,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2887,7 +2901,8 @@ static __inline__ void scm_recv32(struct socket *sock, struct msghdr *msg, - struct scm_cookie *scm, int flags, unsigned long cmsg_ptr) + struct scm_cookie *scm, int flags, unsigned long cmsg_ptr, + __kernel_size_t cmsg_len) { if(!msg->msg_control) { @@ -2902,7 +2917,7 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) msg->msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(msg, cmsg_ptr); + cmsg32_recvmsg_fixup(msg, cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) put_cmsg32(msg, @@ -2916,14 +2931,14 @@ static int sock_recvmsg32(struct socket *sock, struct msghdr *msg, int size, int flags, - unsigned long cmsg_ptr) + unsigned long cmsg_ptr, __kernel_size_t cmsg_len) { struct scm_cookie scm; memset(&scm, 0, sizeof(scm)); size = sock->ops->recvmsg(sock, msg, size, flags, &scm); if (size >= 0) - scm_recv32(sock, msg, &scm, flags, cmsg_ptr); + scm_recv32(sock, msg, &scm, flags, cmsg_ptr, cmsg_len); return size; } @@ -2940,6 +2955,7 @@ struct iovec *iov=iovstack; struct msghdr msg_sys; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, iov_size, total_len, len; /* kernel mode address */ @@ -2983,11 +2999,12 @@ total_len=err; cmsg_ptr = (unsigned long)msg_sys.msg_control; + cmsg_len = msg_sys.msg_controllen; msg_sys.msg_flags = 0; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; - err = sock_recvmsg32(sock, &msg_sys, total_len, flags, cmsg_ptr); + err = sock_recvmsg32(sock, &msg_sys, total_len, flags, cmsg_ptr, cmsg_len); if (err < 0) goto out_freeiov; len = err; diff -ruN 2.4.30-pre1/arch/sparc64/kernel/sys_sparc32.c 2.4.30-pre1-sfr.2/arch/sparc64/kernel/sys_sparc32.c --- 2.4.30-pre1/arch/sparc64/kernel/sys_sparc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/sparc64/kernel/sys_sparc32.c 2005-02-16 17:11:18.000000000 +1100 @@ -2647,7 +2647,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2678,6 +2679,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } if (kcmsg32->cmsg_level == SOL_SOCKET && kcmsg32->cmsg_type == SO_TIMESTAMP) { struct timeval tv; @@ -2781,6 +2795,7 @@ struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2796,6 +2811,7 @@ total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2821,7 +2837,8 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruN 2.4.30-pre1/arch/x86_64/ia32/socket32.c 2.4.30-pre1-sfr.2/arch/x86_64/ia32/socket32.c --- 2.4.30-pre1/arch/x86_64/ia32/socket32.c 2005-02-16 10:57:04.000000000 +1100 +++ 2.4.30-pre1-sfr.2/arch/x86_64/ia32/socket32.c 2005-02-16 17:13:31.000000000 +1100 @@ -302,7 +302,8 @@ * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -333,6 +334,19 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) + (clen64 > (orig_cmsg_len + wp - workbuf))) { + static int count; + + if (count++ < 20) + printk(KERN_WARNING "recvmsg_fixup: " + "bad data length %d, level %d, " + "type %d, process %d (%s)\n", + clen64, kcmsg32->cmsg_level, + kcmsg32->cmsg_type, + current->pid, current->comm); + break; + } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -418,6 +432,7 @@ struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -433,6 +448,7 @@ total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -458,7 +474,8 @@ * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) From anton at samba.org Wed Feb 16 20:01:35 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 16 Feb 2005 20:01:35 +1100 Subject: gcc 4.0 compiles kernel with altivec Message-ID: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> Hi, A recent gcc 4.0 snapshot is using altivec instructions in the kernel: c0000000000d6434 <.wait_for_completion>: c0000000000d6434: 7c 00 42 a6 mfvrsave r0 Its probably because we are passing the -mcpu=970 option: ifeq ($(CONFIG_POWER4_ONLY),y) ifeq ($(CONFIG_ALTIVEC),y) CFLAGS += $(call cc-option,-mcpu=970) else Anton From ak at suse.de Wed Feb 16 20:58:19 2005 From: ak at suse.de (Andi Kleen) Date: Wed, 16 Feb 2005 10:58:19 +0100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216140628.70232669.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> Message-ID: <20050216095818.GA14545@wotan.suse.de> On Wed, Feb 16, 2005 at 02:06:28PM +1100, Stephen Rothwell wrote: > On Wed, 16 Feb 2005 01:28:41 +0100 Andi Kleen wrote: > > > > > > Did you check if other 32bit emulations don't have the same problem? > > No, becasue I am a lazy bastard! :-) > > OK, everyone has this bug (I now checked). Sparc64 does some checking, > but not enough. Please fix it at least for x86-64 too. And I guess the other arch maintainers won't mind if you cover theirs too. -Andi From amodra at bigpond.net.au Wed Feb 16 22:00:22 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 16 Feb 2005 21:30:22 +1030 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> Message-ID: <20050216110022.GP10128@bubble.modra.org> On Wed, Feb 16, 2005 at 08:01:35PM +1100, Anton Blanchard wrote: > A recent gcc 4.0 snapshot is using altivec instructions in the kernel: > > c0000000000d6434 <.wait_for_completion>: > c0000000000d6434: 7c 00 42 a6 mfvrsave r0 gcc-4.0 will use altivec for moving blocks of memory around. > Its probably because we are passing the -mcpu=970 option: > > ifeq ($(CONFIG_POWER4_ONLY),y) > ifeq ($(CONFIG_ALTIVEC),y) > CFLAGS += $(call cc-option,-mcpu=970) > else Yes, -mcpu=970 says altivec is available (and thus use it). In fact, the only difference between -mcpu=power4 and -mcpu=970 is enabling altivec. -- Alan Modra IBM OzLabs - Linux Technology Centre From schwidefsky at de.ibm.com Wed Feb 16 21:19:02 2005 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Wed, 16 Feb 2005 11:19:02 +0100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216095818.GA14545@wotan.suse.de> Message-ID: Andi Kleen wrote on 16.02.2005 10:58:19: > On Wed, Feb 16, 2005 at 02:06:28PM +1100, Stephen Rothwell wrote: > > On Wed, 16 Feb 2005 01:28:41 +0100 Andi Kleen wrote: > > > > > > > > > Did you check if other 32bit emulations don't have the same problem? > > > > No, becasue I am a lazy bastard! :-) > > > > OK, everyone has this bug (I now checked). Sparc64 does some checking, > > but not enough. > > Please fix it at least for x86-64 too. And I guess the other arch > maintainers won't mind if you cover theirs too. I certainly won't complain ;-) blue skies, Martin Martin Schwidefsky Linux for zSeries Development & Services IBM Deutschland Entwicklung GmbH From anton at samba.org Thu Feb 17 07:50:47 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 17 Feb 2005 07:50:47 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <20050216110022.GP10128@bubble.modra.org> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> Message-ID: <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> > Yes, -mcpu=970 says altivec is available (and thus use it). In fact, > the only difference between -mcpu=power4 and -mcpu=970 is enabling > altivec. I think it was added to make the raid6 kernel code (which does use altivec) easier. Perhaps we should use the .machine macros instead? Anton From benh at kernel.crashing.org Thu Feb 17 09:19:24 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Feb 2005 09:19:24 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> Message-ID: <1108592364.5532.8.camel@gaston> On Thu, 2005-02-17 at 07:50 +1100, Anton Blanchard wrote: > > Yes, -mcpu=970 says altivec is available (and thus use it). In fact, > > the only difference between -mcpu=power4 and -mcpu=970 is enabling > > altivec. > > I think it was added to make the raid6 kernel code (which does use > altivec) easier. Perhaps we should use the .machine macros instead? What are those macros ? Ben. From anton at samba.org Thu Feb 17 09:27:52 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 17 Feb 2005 09:27:52 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <1108592364.5532.8.camel@gaston> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> Message-ID: <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> > What are those macros ? Like the stuff we are using in arch/ppc64/kernel/head.S: .machine push .machine "power4" mtcrf 0x80,r9 mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */ .machine pop Anton From benh at kernel.crashing.org Thu Feb 17 09:31:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Feb 2005 09:31:28 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> Message-ID: <1108593088.5532.16.camel@gaston> On Thu, 2005-02-17 at 09:27 +1100, Anton Blanchard wrote: > > What are those macros ? > > Like the stuff we are using in arch/ppc64/kernel/head.S: > > .machine push > .machine "power4" > mtcrf 0x80,r9 > mtcrf 0x01,r9 /* slb_allocate uses cr0 and cr7 */ > .machine pop But that forces us to use assembly instead of C code for those routines... I think gcc should separate options for "allow altivec" and "implicitely use altivec" ... Also, Alan, when using Altivec implicitely, does it properly fill vrsave with bits indicating which registers it uses and does it save & restore it ? I can imagine userland apps causing a severe hit on context switch time bcs they all start using altivec and cause the kernel to have to swap 32x128bits registers... I've been thinking about using vrsave to break the save/restore code into 2 or 4 parts and only save the ones that need to be saved. (It's an ABI thing anyway) Ben. From amodra at bigpond.net.au Thu Feb 17 14:27:07 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Thu, 17 Feb 2005 13:57:07 +1030 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <1108593088.5532.16.camel@gaston> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> <1108593088.5532.16.camel@gaston> Message-ID: <20050217032706.GV10128@bubble.modra.org> On Thu, Feb 17, 2005 at 09:31:28AM +1100, Benjamin Herrenschmidt wrote: > I think gcc should separate options for "allow altivec" and "implicitely > use altivec" ... Well, it doesn't. :-( Perhaps use -mcpu=970 just for the file where you _want_ altivec. > Also, Alan, when using Altivec implicitely, does it properly fill vrsave > with bits indicating which registers it uses and does it save & restore > it ? Yes, that's part of our ABI. > I can imagine userland apps causing a severe hit on context switch time > bcs they all start using altivec and cause the kernel to have to swap > 32x128bits registers... I've been thinking about using vrsave to break > the save/restore code into 2 or 4 parts and only save the ones that need > to be saved. (It's an ABI thing anyway) Yes, you should definitely do this.. -- Alan Modra IBM OzLabs - Linux Technology Centre From benh at kernel.crashing.org Thu Feb 17 14:33:14 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 17 Feb 2005 14:33:14 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <20050217032706.GV10128@bubble.modra.org> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> <1108593088.5532.16.camel@gaston> <20050217032706.GV10128@bubble.modra.org> Message-ID: <1108611194.5382.31.camel@gaston> On Thu, 2005-02-17 at 13:57 +1030, Alan Modra wrote: > On Thu, Feb 17, 2005 at 09:31:28AM +1100, Benjamin Herrenschmidt wrote: > > I think gcc should separate options for "allow altivec" and "implicitely > > use altivec" ... > > Well, it doesn't. :-( Perhaps use -mcpu=970 just for the file where > you _want_ altivec. Well, it's more a few functions than a file, but I agree we could split... (we must make sure the "wrapper" function that enables/disables the vector engine don't get generated with implicit vec instructions). > > Also, Alan, when using Altivec implicitely, does it properly fill vrsave > > with bits indicating which registers it uses and does it save & restore > > it ? > > Yes, that's part of our ABI. > > > I can imagine userland apps causing a severe hit on context switch time > > bcs they all start using altivec and cause the kernel to have to swap > > 32x128bits registers... I've been thinking about using vrsave to break > > the save/restore code into 2 or 4 parts and only save the ones that need > > to be saved. (It's an ABI thing anyway) > > Yes, you should definitely do this.. Yup, will probably do when I have some time, I need to bench the whole stuff though. Ben. From segher at kernel.crashing.org Fri Feb 18 01:48:39 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 17 Feb 2005 15:48:39 +0100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <1108611194.5382.31.camel@gaston> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> <1108593088.5532.16.camel@gaston> <20050217032706.GV10128@bubble.modra.org> <1108611194.5382.31.camel@gaston> Message-ID: >>> I can imagine userland apps causing a severe hit on context switch >>> time >>> bcs they all start using altivec and cause the kernel to have to swap >>> 32x128bits registers... I've been thinking about using vrsave to >>> break >>> the save/restore code into 2 or 4 parts and only save the ones that >>> need >>> to be saved. (It's an ABI thing anyway) >> >> Yes, you should definitely do this.. > > Yup, will probably do when I have some time, I need to bench the whole > stuff though. Last time I benchmarked it (on 970), it was best to test per group of 4 to 8 registers (bigger was worse, smaller was worse). This was an artificial benchmark though, real usage patterns might skew the results a bit. Anyway, between 4 to 8 the graph was quite flat. Segher From benh at kernel.crashing.org Fri Feb 18 09:37:04 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 18 Feb 2005 09:37:04 +1100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> <1108593088.5532.16.camel@gaston> <20050217032706.GV10128@bubble.modra.org> <1108611194.5382.31.camel@gaston> Message-ID: <1108679824.5665.2.camel@gaston> On Thu, 2005-02-17 at 15:48 +0100, Segher Boessenkool wrote: > >>> I can imagine userland apps causing a severe hit on context switch > >>> time > >>> bcs they all start using altivec and cause the kernel to have to swap > >>> 32x128bits registers... I've been thinking about using vrsave to > >>> break > >>> the save/restore code into 2 or 4 parts and only save the ones that > >>> need > >>> to be saved. (It's an ABI thing anyway) > >> > >> Yes, you should definitely do this.. > > > > Yup, will probably do when I have some time, I need to bench the whole > > stuff though. > > Last time I benchmarked it (on 970), it was best to test per group of > 4 to 8 registers (bigger was worse, smaller was worse). This was an > artificial benchmark though, real usage patterns might skew the results > a bit. Anyway, between 4 to 8 the graph was quite flat. > Ah good ! I'm curious how much registers are used by common gcc constructs using altivec implicitely (and possibly altivec-optimized memcpy & co that I'll put in the vdso eventually). That will probably be the key threshold. Ben. From segher at kernel.crashing.org Fri Feb 18 23:59:18 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Fri, 18 Feb 2005 13:59:18 +0100 Subject: gcc 4.0 compiles kernel with altivec In-Reply-To: <1108679824.5665.2.camel@gaston> References: <20050216090135.GA8524@krispykreme.ozlabs.ibm.com> <20050216110022.GP10128@bubble.modra.org> <20050216205047.GB8524@krispykreme.ozlabs.ibm.com> <1108592364.5532.8.camel@gaston> <20050216222752.GC8524@krispykreme.ozlabs.ibm.com> <1108593088.5532.16.camel@gaston> <20050217032706.GV10128@bubble.modra.org> <1108611194.5382.31.camel@gaston> <1108679824.5665.2.camel@gaston> Message-ID: >> Last time I benchmarked it (on 970), it was best to test per group of >> 4 to 8 registers (bigger was worse, smaller was worse). This was an >> artificial benchmark though, real usage patterns might skew the >> results >> a bit. Anyway, between 4 to 8 the graph was quite flat. >> > > Ah good ! > > I'm curious how much registers are used by common gcc constructs using > altivec implicitely (and possibly altivec-optimized memcpy & co that > I'll put in the vdso eventually). That will probably be the key > threshold. Yes, real usage is the key. Note though that the way GCC does its register allocation, you tend to end up with a clump of used vector registers at both ends of the register file, i.e., VRSAVE will always be something like 0xe800003f or such, so saving per group of, say, 4 registers, doesn't really make you save much too much. Segher From arnd at arndb.de Sat Feb 19 02:22:14 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 18 Feb 2005 16:22:14 +0100 Subject: [PATCH] ppc64: xmon needs to include hvcall.h Message-ID: <200502181622.14726.arnd@arndb.de> ppc64 needs this fix to build with xmon enabled and SMP disabled, because hvcall.h is only sometimes included indirectly at the moment. Signed-off-by: Arnd Bergmann --- 1.62/arch/ppc64/xmon/xmon.c 2005-01-25 16:50:14 -05:00 +++ edited/arch/ppc64/xmon/xmon.c 2005-02-18 10:30:56 -05:00 @@ -32,6 +32,7 @@ #include #include #include +#include #include "nonstdio.h" #include "privinst.h" From arnd at arndb.de Sat Feb 19 03:02:25 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 18 Feb 2005 17:02:25 +0100 Subject: [PATCH] ppc64: fix prom_init calculation for alloc_bottom Message-ID: <200502181702.25173.arnd@arndb.de> X-BeenThere: linuxppc64-dev at ozlabs.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: 64-bit Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 16:12:44 -0000 I haven't tested this, but (prom_initrd_start + prom_initrd_end) just sounds wrong. The code prior to the monster cleanup was using initrd_len instead of initrd_end in some places, perhaps this is a leftover of the conversion. Signed-off-by: Arnd Bergmann --- linux-2.6-ppc.orig/arch/ppc64/kernel/prom_init.c 2005-02-18 10:51:02.518955168 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/prom_init.c 2005-02-18 10:54:23.570006016 -0500 @@ -679,8 +679,7 @@ * point to after it */ if (RELOC(prom_initrd_start)) { - if ((RELOC(prom_initrd_start) + RELOC(prom_initrd_end)) - > RELOC(alloc_bottom)) + if (RELOC(prom_initrd_end) > RELOC(alloc_bottom)) RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(prom_initrd_end)); } From ngupta at mvista.com Sat Feb 19 06:23:35 2005 From: ngupta at mvista.com (Nitin Gupta) Date: Fri, 18 Feb 2005 11:23:35 -0800 Subject: ppc64 biarch binutils configure uses different default "--target" than gcc,glibc Message-ID: <421640B7.5030304@mvista.com> Hi, I downloaded scripts to build ppc 970 biarch toolchain from ftp://ftp.linuxppc64.org/pub/people/janis. I noticed that only binutils uses default target (--target) to be powerpc-linux where as gcc,glibc uses powerpc64-linux. Is there any reason for that? BTW. binutils has --enable-targets=powerpc64-linux in its configure command Please, let me know, if should put this question at binutils mailing list. Thanks, Nitin -- MontaVista Linux- Value Wins From amodra at bigpond.net.au Sat Feb 19 15:38:03 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Sat, 19 Feb 2005 15:08:03 +1030 Subject: ppc64 biarch binutils configure uses different default "--target" than gcc, glibc In-Reply-To: <421640B7.5030304@mvista.com> References: <421640B7.5030304@mvista.com> Message-ID: <20050219043803.GA3780@bubble.modra.org> On Fri, Feb 18, 2005 at 11:23:35AM -0800, Nitin Gupta wrote: > Hi, > I downloaded scripts to build ppc 970 biarch toolchain from > ftp://ftp.linuxppc64.org/pub/people/janis. I noticed that only binutils > uses default target (--target) to be powerpc-linux where as gcc,glibc > uses powerpc64-linux. Is there any reason for that? BTW. binutils has > --enable-targets=powerpc64-linux in its configure command > > Please, let me know, if should put this question at binutils mailing list. GCC biarch support wasn't designed well. The 64-bit target of a pair, eg. (powerpc64-linux, powerpc-linux), (x86_64-linux, i686-linux) is the one you need to select when configuring, but you often don't want this to be the default target. ie. you want 32-bit output by default. On powerpc-linux, you'd do this by specifying --with-cpu=default32. binutils is a little more straight-forward, with the default target being specified from --target, and other targets by --enable-targets. -- Alan Modra IBM OzLabs - Linux Technology Centre From ngupta at mvista.com Sat Feb 19 18:12:58 2005 From: ngupta at mvista.com (Nitin Kishore Gupta) Date: Fri, 18 Feb 2005 23:12:58 -0800 Subject: ppc64 biarch binutils configure uses different default "--target" than gcc, glibc References: <421640B7.5030304@mvista.com> <20050219043803.GA3780@bubble.modra.org> Message-ID: <002c01c51652$72d295a0$0200a8c0@OLZA> Thanks Alan. I dont see that gcc,glibc are using --with-cpu=default32 in the scripts provided. In that case, maybe I can switch the default for binutils to be powerpc64 and enable powerpc. ----- Original Message ----- From: "Alan Modra" To: "Nitin Gupta" Cc: Sent: Friday, February 18, 2005 8:38 PM Subject: Re: ppc64 biarch binutils configure uses different default "--target" than gcc,glibc > On Fri, Feb 18, 2005 at 11:23:35AM -0800, Nitin Gupta wrote: >> Hi, >> I downloaded scripts to build ppc 970 biarch toolchain from >> ftp://ftp.linuxppc64.org/pub/people/janis. I noticed that only binutils >> uses default target (--target) to be powerpc-linux where as gcc,glibc >> uses powerpc64-linux. Is there any reason for that? BTW. binutils has >> --enable-targets=powerpc64-linux in its configure command >> >> Please, let me know, if should put this question at binutils mailing >> list. > > GCC biarch support wasn't designed well. The 64-bit target of a pair, > eg. (powerpc64-linux, powerpc-linux), (x86_64-linux, i686-linux) is the > one you need to select when configuring, but you often don't want this > to be the default target. ie. you want 32-bit output by default. On > powerpc-linux, you'd do this by specifying --with-cpu=default32. > > binutils is a little more straight-forward, with the default target > being specified from --target, and other targets by --enable-targets. > > -- > Alan Modra > IBM OzLabs - Linux Technology Centre From amodra at bigpond.net.au Sat Feb 19 19:18:45 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Sat, 19 Feb 2005 18:48:45 +1030 Subject: ppc64 biarch binutils configure uses different default "--target" than gcc, glibc In-Reply-To: <002c01c51652$72d295a0$0200a8c0@OLZA> References: <421640B7.5030304@mvista.com> <20050219043803.GA3780@bubble.modra.org> <002c01c51652$72d295a0$0200a8c0@OLZA> Message-ID: <20050219081845.GC3780@bubble.modra.org> On Fri, Feb 18, 2005 at 11:12:58PM -0800, Nitin Kishore Gupta wrote: > In that case, maybe I can switch the default for binutils to be powerpc64 > and enable powerpc. You can, but be aware that if you are compiling for a native powerpc system you normally want as, ld, etc. to default to 32-bit. It's not so important when building cross toolchains, since you will be installing powerpc64-linux-as, powerpc64-linux-ld etc. Also, building a biarch toolchain, especially from scratch, isn't as simple as it could be. Don't be surprised when things don't work if you deviate from Janis' build scripts. In fact, don't be surprised if things don't work when you follow the script religiously. -- Alan Modra IBM OzLabs - Linux Technology Centre From anton at samba.org Sun Feb 20 07:43:00 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 20 Feb 2005 07:43:00 +1100 Subject: [PATCH] rtasd cleanup In-Reply-To: <20050219202846.GA32279@otto> References: <20050219202846.GA32279@otto> Message-ID: <20050219204300.GA19279@krispykreme.ozlabs.ibm.com> Hi Nathan, > Make rtasd stop (ab)using lock_cpu_hotplug and the online map. Use a > lazier method: let do_event_scan tell rtasd on which cpu the scan was > done; rtasd then clears that cpu in its allowed map and migrates. > This reduces boot time by one second per cpu when > CONFIG_HOTPLUG_CPU=y. (Sleeping for one second on each cpu while > holding the cpucontrol semaphore is not very polite.) We used to migrate to random cpus each iteration but we got watchdog timeouts on a 16 way nighthawk. The theory was that the worst case touch of the watchdog on a cpu could be 2x the scan rate. I didnt confirm this with the firmware guys but changing it to the current method where we always iterate in the same order made the watchdogs go away. Anton From ntl at pobox.com Sun Feb 20 07:28:46 2005 From: ntl at pobox.com (Nathan Lynch) Date: Sat, 19 Feb 2005 14:28:46 -0600 Subject: [PATCH] rtasd cleanup Message-ID: <20050219202846.GA32279@otto> Make rtasd stop (ab)using lock_cpu_hotplug and the online map. Use a lazier method: let do_event_scan tell rtasd on which cpu the scan was done; rtasd then clears that cpu in its allowed map and migrates. This reduces boot time by one second per cpu when CONFIG_HOTPLUG_CPU=y. (Sleeping for one second on each cpu while holding the cpucontrol semaphore is not very polite.) Stop looking up the "event-scan" token everywhere and just stash it in a file-static variable. Move a bunch of startup code from rtasd to rtas_init so it winds up in the init text section. Replace kernel_thread/daemonize with kthread_create. rtasd.c | 193 +++++++++++++++++++++++++++++--------------------------- 1 files changed, 101 insertions(+), 92 deletions(-) Signed-off-by: Nathan Lynch Index: linux-2.6.11-rc4-bk4/arch/ppc64/kernel/rtasd.c =================================================================== --- linux-2.6.11-rc4-bk4.orig/arch/ppc64/kernel/rtasd.c 2005-02-16 21:29:36.000000000 +0000 +++ linux-2.6.11-rc4-bk4/arch/ppc64/kernel/rtasd.c 2005-02-17 03:09:30.000000000 +0000 @@ -18,7 +18,8 @@ #include #include #include -#include +#include +#include #include #include @@ -43,6 +44,7 @@ static unsigned long rtas_log_size; static int surveillance_timeout = -1; +static int rtas_event_scan = -1; /* "event-scan" token */ static unsigned int rtas_event_scan_rate; static unsigned int rtas_error_log_max; static unsigned int rtas_error_log_buffer_max; @@ -359,95 +361,100 @@ static int get_eventscan_parms(void) { struct device_node *node; - int *ip; + int *ip, err = -1; node = of_find_node_by_path("/rtas"); + if (!node) + goto out; - ip = (int *)get_property(node, "rtas-event-scan-rate", NULL); - if (ip == NULL) { - printk(KERN_ERR "rtasd: no rtas-event-scan-rate\n"); - of_node_put(node); - return -1; - } + if (!(ip = (int *)get_property(node, "event-scan", NULL))) + goto out; + rtas_event_scan = *ip; + + if (!(ip = (int *)get_property(node, "rtas-event-scan-rate", NULL))) + goto out; rtas_event_scan_rate = *ip; - DEBUG("rtas-event-scan-rate %d\n", rtas_event_scan_rate); /* Make room for the sequence number */ rtas_error_log_max = rtas_get_error_log_max(); rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int); + err = 0; +out: of_node_put(node); - - return 0; + return err; } -static void do_event_scan(int event_scan) +/** + * do_event_scan - execute an event scan and log any errors returned + * + * Returns the cpu on which the event scan was run. This is so rtasd + * can remove that cpu from its allowed map to ensure that it gets to + * run on a different cpu next time. Architecture docs recommend that + * event scans be done periodically from every cpu in the system. + */ +static int do_event_scan(void) { - int error; - do { + int cpu = NR_CPUS; + + while (1) { + int error; + memset(logdata, 0, rtas_error_log_max); - error = rtas_call(event_scan, 4, 1, NULL, + + /* Get the cpu only once */ + if (cpu == NR_CPUS) + cpu = get_cpu(); + + error = rtas_call(rtas_event_scan, 4, 1, NULL, RTAS_EVENT_SCAN_ALL_EVENTS, 0, __pa(logdata), rtas_error_log_max); - if (error == -1) { - printk(KERN_ERR "event-scan failed\n"); - break; - } - if (error == 0) + if (likely(error == 1)) + /* No errors found */ + break; + else if (error == 0) { + /* Must log and scan again */ pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0); - - } while(error == 0); + continue; + } else { + /* Hardware error? */ + printk(KERN_ERR "event-scan failed (rc = %d)\n", + error); + break; + } + } + put_cpu(); + return cpu; } -static int rtasd(void *unused) +/** + * do_event_scan_all_cpus - do an RTAS event scan on every cpu + * + * @delay: time to sleep between scans + * + * Run an event scan on every cpu the scheduler will allow; return + * when there aren't any good cpus left. + */ +static void do_event_scan_all_cpus(long delay) { - unsigned int err_type; - int cpu = 0; - int event_scan = rtas_token("event-scan"); - int rc; + cpumask_t allowed = CPU_MASK_ALL; - daemonize("rtasd"); - - if (event_scan == RTAS_UNKNOWN_SERVICE || get_eventscan_parms() == -1) - goto error; - - rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER); - if (!rtas_log_buf) { - printk(KERN_ERR "rtasd: no memory\n"); - goto error; + while (0 == set_cpus_allowed(current, allowed)) { + int cpu = do_event_scan(); + DEBUG("rtasd: completed event scan on cpu %i\n", cpu); + cpu_clear(cpu, allowed); + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(delay); } +} - printk(KERN_ERR "RTAS daemon started\n"); - - DEBUG("will sleep for %d jiffies\n", (HZ*60/rtas_event_scan_rate) / 2); - - /* See if we have any error stored in NVRAM */ - memset(logdata, 0, rtas_error_log_max); - - rc = nvram_read_error_log(logdata, rtas_error_log_max, &err_type); - - /* We can use rtas_log_buf now */ - no_logging = 0; - - if (!rc) { - if (err_type != ERR_FLAG_ALREADY_LOGGED) { - pSeries_log_error(logdata, err_type | ERR_FLAG_BOOT, 0); - } - } +static int rtasd(void *unused) +{ + printk(KERN_INFO "RTAS daemon started\n"); /* First pass. */ - lock_cpu_hotplug(); - for_each_online_cpu(cpu) { - DEBUG("scheduling on %d\n", cpu); - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); - - do_event_scan(event_scan); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(HZ); - } - unlock_cpu_hotplug(); + do_event_scan_all_cpus(HZ); if (surveillance_timeout != -1) { DEBUG("enabling surveillance\n"); @@ -455,51 +462,53 @@ DEBUG("surveillance enabled\n"); } - lock_cpu_hotplug(); - cpu = first_cpu(cpu_online_map); - for (;;) { - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - do_event_scan(event_scan); - set_cpus_allowed(current, CPU_MASK_ALL); - - /* Drop hotplug lock, and sleep for a bit (at least - * one second since some machines have problems if we - * call event-scan too quickly). */ - unlock_cpu_hotplug(); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout((HZ*60/rtas_event_scan_rate) / 2); - lock_cpu_hotplug(); - - cpu = next_cpu(cpu, cpu_online_map); - if (cpu == NR_CPUS) - cpu = first_cpu(cpu_online_map); - } - -error: - /* Should delete proc entries */ - return -EINVAL; + while (1) + do_event_scan_all_cpus((HZ*60/rtas_event_scan_rate) / 2); } static int __init rtas_init(void) { struct proc_dir_entry *entry; + struct task_struct *p; + unsigned int err_type; + int rc; /* No RTAS, only warn if we are on a pSeries box */ - if (rtas_token("event-scan") == RTAS_UNKNOWN_SERVICE) { + if (get_eventscan_parms() == -1) { if (systemcfg->platform & PLATFORM_PSERIES) - printk(KERN_ERR "rtasd: no event-scan on system\n"); + printk(KERN_ERR "no RTAS event-scan on system\n"); + return 1; + } + + rtas_log_buf = vmalloc(rtas_error_log_buffer_max*LOG_NUMBER); + if (!rtas_log_buf) { + printk(KERN_ERR "%s: no memory\n", __FUNCTION__); return 1; } + /* See if we have any error stored in NVRAM */ + memset(logdata, 0, rtas_error_log_max); + + rc = nvram_read_error_log(logdata, rtas_error_log_max, &err_type); + + /* We can use rtas_log_buf now */ + no_logging = 0; + + if (!rc && err_type != ERR_FLAG_ALREADY_LOGGED) + pSeries_log_error(logdata, err_type | ERR_FLAG_BOOT, 0); + entry = create_proc_entry("ppc64/rtas/error_log", S_IRUSR, NULL); if (entry) entry->proc_fops = &proc_rtas_log_operations; else printk(KERN_ERR "Failed to create error_log proc entry\n"); - if (kernel_thread(rtasd, NULL, CLONE_FS) < 0) - printk(KERN_ERR "Failed to start RTAS daemon\n"); - + p = kthread_create(rtasd, NULL, "rtasd"); + if (IS_ERR(p)) { + printk(KERN_ERR "%s: rtasd creation failed\n", __FUNCTION__); + return 1; + } + wake_up_process(p); return 0; } From ntl at pobox.com Sun Feb 20 14:04:25 2005 From: ntl at pobox.com (Nathan Lynch) Date: Sat, 19 Feb 2005 21:04:25 -0600 Subject: [PATCH] rtasd shouldn't hold cpucontrol while sleeping In-Reply-To: <20050219204300.GA19279@krispykreme.ozlabs.ibm.com> References: <20050219202846.GA32279@otto> <20050219204300.GA19279@krispykreme.ozlabs.ibm.com> Message-ID: <20050220030425.GB32279@otto> On Sun, Feb 20, 2005 at 07:43:00AM +1100, Anton Blanchard wrote: > > We used to migrate to random cpus each iteration but we got watchdog > timeouts on a 16 way nighthawk. The theory was that the worst case touch > of the watchdog on a cpu could be 2x the scan rate. > > I didnt confirm this with the firmware guys but changing it to the > current method where we always iterate in the same order made the > watchdogs go away. OK, here's a more conservative patch which preserves that behavior. My primary goal is to get rid of that delay during boot. The rtasd thread should not hold the cpucontrol semaphore while sleeping between event scans in its first pass; it needlessly delays boot by one second per cpu when CONFIG_HOTPLUG_CPU=y. Signed-off-by: Nathan Lynch Index: linux-2.6.11-rc4-bk4/arch/ppc64/kernel/rtasd.c =================================================================== --- linux-2.6.11-rc4-bk4.orig/arch/ppc64/kernel/rtasd.c 2005-02-20 02:08:49.000000000 +0000 +++ linux-2.6.11-rc4-bk4/arch/ppc64/kernel/rtasd.c 2005-02-20 02:28:07.000000000 +0000 @@ -400,10 +400,33 @@ } while(error == 0); } +static void do_event_scan_all_cpus(long delay) +{ + int cpu; + + lock_cpu_hotplug(); + cpu = first_cpu(cpu_online_map); + for (;;) { + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + do_event_scan(rtas_token("event-scan")); + set_cpus_allowed(current, CPU_MASK_ALL); + + /* Drop hotplug lock, and sleep for the specified delay */ + unlock_cpu_hotplug(); + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(delay); + lock_cpu_hotplug(); + + cpu = next_cpu(cpu, cpu_online_map); + if (cpu == NR_CPUS) + break; + } + unlock_cpu_hotplug(); +} + static int rtasd(void *unused) { unsigned int err_type; - int cpu = 0; int event_scan = rtas_token("event-scan"); int rc; @@ -437,17 +460,7 @@ } /* First pass. */ - lock_cpu_hotplug(); - for_each_online_cpu(cpu) { - DEBUG("scheduling on %d\n", cpu); - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); - - do_event_scan(event_scan); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(HZ); - } - unlock_cpu_hotplug(); + do_event_scan_all_cpus(HZ); if (surveillance_timeout != -1) { DEBUG("enabling surveillance\n"); @@ -455,25 +468,10 @@ DEBUG("surveillance enabled\n"); } - lock_cpu_hotplug(); - cpu = first_cpu(cpu_online_map); - for (;;) { - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - do_event_scan(event_scan); - set_cpus_allowed(current, CPU_MASK_ALL); - - /* Drop hotplug lock, and sleep for a bit (at least - * one second since some machines have problems if we - * call event-scan too quickly). */ - unlock_cpu_hotplug(); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout((HZ*60/rtas_event_scan_rate) / 2); - lock_cpu_hotplug(); - - cpu = next_cpu(cpu, cpu_online_map); - if (cpu == NR_CPUS) - cpu = first_cpu(cpu_online_map); - } + /* Delay should be at least one second since some + * machines have problems if we call event-scan too + * quickly. */ + do_event_scan_all_cpus((HZ*60/rtas_event_scan_rate) / 2); error: /* Should delete proc entries */ From ntl at pobox.com Sun Feb 20 14:11:58 2005 From: ntl at pobox.com (Nathan Lynch) Date: Sat, 19 Feb 2005 21:11:58 -0600 Subject: [PATCH] rtasd shouldn't hold cpucontrol while sleeping In-Reply-To: <20050220030425.GB32279@otto> References: <20050219202846.GA32279@otto> <20050219204300.GA19279@krispykreme.ozlabs.ibm.com> <20050220030425.GB32279@otto> Message-ID: <20050220031158.GC32279@otto> On Sat, Feb 19, 2005 at 09:04:25PM -0600, Nathan Lynch wrote: > OK, here's a more conservative patch which preserves that behavior. > My primary goal is to get rid of that delay during boot. > + /* Delay should be at least one second since some > + * machines have problems if we call event-scan too > + * quickly. */ > + do_event_scan_all_cpus((HZ*60/rtas_event_scan_rate) / 2); Whoops, that needs to be called more than just once :) Will send a fixed patch tomorrow... Nathan From anton at samba.org Mon Feb 21 11:16:16 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 21 Feb 2005 11:16:16 +1100 Subject: [PATCH] ppc64: Fix 32bit largepage issue Message-ID: <20050221001616.GA24587@krispykreme.ozlabs.ibm.com> Hi, The paca holds a shadow of the context struct, used for the real mode SLB handler. When we open up a new segment we have to sync up the paca copy otherwise we will instantiate small page SLB entries until the next context switch (at which point we resync the paca copy). Anton Signed-off-by: Anton Blanchard diff -puN arch/ppc64/mm/hugetlbpage.c~fix_32bit_largepage arch/ppc64/mm/hugetlbpage.c --- gr_work/arch/ppc64/mm/hugetlbpage.c~fix_32bit_largepage 2005-02-20 17:32:00.795862566 -0600 +++ gr_work-anton/arch/ppc64/mm/hugetlbpage.c 2005-02-20 17:32:00.806855265 -0600 @@ -264,6 +264,10 @@ static int open_low_hpage_segs(struct mm return -EBUSY; mm->context.htlb_segs |= newsegs; + + /* update the paca copy of the context struct */ + get_paca()->context = mm->context; + /* the context change must make it to memory before the flush, * so that further SLB misses do the right thing. */ mb(); _ From david at gibson.dropbear.id.au Mon Feb 21 11:30:59 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Mon, 21 Feb 2005 11:30:59 +1100 Subject: [PATCH] ppc64: Fix 32bit largepage issue In-Reply-To: <20050221001616.GA24587@krispykreme.ozlabs.ibm.com> References: <20050221001616.GA24587@krispykreme.ozlabs.ibm.com> Message-ID: <20050221003059.GC10688@localhost.localdomain> On Mon, Feb 21, 2005 at 11:16:16AM +1100, Anton Blanchard wrote: > > Hi, > > The paca holds a shadow of the context struct, used for the real mode > SLB handler. When we open up a new segment we have to sync up the paca > copy otherwise we will instantiate small page SLB entries until the > next context switch (at which point we resync the paca copy). Oops, good catch. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From sfr at canb.auug.org.au Mon Feb 21 14:35:55 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 21 Feb 2005 14:35:55 +1100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050216172259.1dee3b39.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> Message-ID: <20050221143555.3d969f24.sfr@canb.auug.org.au> Hi Marcelo, On Wed, 16 Feb 2005 17:22:59 +1100 Stephen Rothwell wrote: > > In the presence of threads, there is a possibility of the kernel being > fooled by the 32 bit sys_recvmsg control data into copying more than it > should into the kernel and corrupting kernel data structures. Any chance of this making 2.4.30? If so, what needs to happen? -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ From ak at suse.de Mon Feb 21 22:27:46 2005 From: ak at suse.de (Andi Kleen) Date: Mon, 21 Feb 2005 12:27:46 +0100 Subject: [PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050221143555.3d969f24.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> <20050221143555.3d969f24.sfr@canb.auug.org.au> Message-ID: <20050221112746.GB17667@wotan.suse.de> On Mon, Feb 21, 2005 at 02:35:55PM +1100, Stephen Rothwell wrote: > Hi Marcelo, > > On Wed, 16 Feb 2005 17:22:59 +1100 Stephen Rothwell wrote: > > > > In the presence of threads, there is a possibility of the kernel being > > fooled by the 32 bit sys_recvmsg control data into copying more than it > > should into the kernel and corrupting kernel data structures. > > Any chance of this making 2.4.30? If so, what needs to happen? It would be a good idea to take the printk out first. -Andi From ntl at pobox.com Tue Feb 22 09:09:40 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 21 Feb 2005 16:09:40 -0600 Subject: [PATCH] rtasd shouldn't hold cpucontrol while sleeping (2nd try) In-Reply-To: <20050220030425.GB32279@otto> References: <20050219202846.GA32279@otto> <20050219204300.GA19279@krispykreme.ozlabs.ibm.com> <20050220030425.GB32279@otto> Message-ID: <20050221220940.GD32279@otto> Ok, trying again. Fixed up the lack of a 'for (;;)' in the last patch. The rtasd thread should not hold the cpucontrol semaphore while sleeping between event scans in its first pass; it needlessly delays boot by one second per cpu when CONFIG_HOTPLUG_CPU=y. Signed-off-by: Nathan Lynch Index: linux-2.6.11-rc4-bk7/arch/ppc64/kernel/rtasd.c =================================================================== --- linux-2.6.11-rc4-bk7.orig/arch/ppc64/kernel/rtasd.c 2005-02-20 02:08:49.000000000 +0000 +++ linux-2.6.11-rc4-bk7/arch/ppc64/kernel/rtasd.c 2005-02-20 18:13:54.000000000 +0000 @@ -400,10 +400,33 @@ } while(error == 0); } +static void do_event_scan_all_cpus(long delay) +{ + int cpu; + + lock_cpu_hotplug(); + cpu = first_cpu(cpu_online_map); + for (;;) { + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + do_event_scan(rtas_token("event-scan")); + set_cpus_allowed(current, CPU_MASK_ALL); + + /* Drop hotplug lock, and sleep for the specified delay */ + unlock_cpu_hotplug(); + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(delay); + lock_cpu_hotplug(); + + cpu = next_cpu(cpu, cpu_online_map); + if (cpu == NR_CPUS) + break; + } + unlock_cpu_hotplug(); +} + static int rtasd(void *unused) { unsigned int err_type; - int cpu = 0; int event_scan = rtas_token("event-scan"); int rc; @@ -437,17 +460,7 @@ } /* First pass. */ - lock_cpu_hotplug(); - for_each_online_cpu(cpu) { - DEBUG("scheduling on %d\n", cpu); - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); - - do_event_scan(event_scan); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(HZ); - } - unlock_cpu_hotplug(); + do_event_scan_all_cpus(HZ); if (surveillance_timeout != -1) { DEBUG("enabling surveillance\n"); @@ -455,25 +468,11 @@ DEBUG("surveillance enabled\n"); } - lock_cpu_hotplug(); - cpu = first_cpu(cpu_online_map); - for (;;) { - set_cpus_allowed(current, cpumask_of_cpu(cpu)); - do_event_scan(event_scan); - set_cpus_allowed(current, CPU_MASK_ALL); - - /* Drop hotplug lock, and sleep for a bit (at least - * one second since some machines have problems if we - * call event-scan too quickly). */ - unlock_cpu_hotplug(); - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout((HZ*60/rtas_event_scan_rate) / 2); - lock_cpu_hotplug(); - - cpu = next_cpu(cpu, cpu_online_map); - if (cpu == NR_CPUS) - cpu = first_cpu(cpu_online_map); - } + /* Delay should be at least one second since some + * machines have problems if we call event-scan too + * quickly. */ + for (;;) + do_event_scan_all_cpus((HZ*60/rtas_event_scan_rate) / 2); error: /* Should delete proc entries */ From benh at kernel.crashing.org Tue Feb 22 10:17:26 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 22 Feb 2005 10:17:26 +1100 Subject: [PATCH] ppc64: Fix 32bit largepage issue In-Reply-To: <20050221003059.GC10688@localhost.localdomain> References: <20050221001616.GA24587@krispykreme.ozlabs.ibm.com> <20050221003059.GC10688@localhost.localdomain> Message-ID: <1109027846.5412.51.camel@gaston> On Mon, 2005-02-21 at 11:30 +1100, David Gibson wrote: > On Mon, Feb 21, 2005 at 11:16:16AM +1100, Anton Blanchard wrote: > > > > Hi, > > > > The paca holds a shadow of the context struct, used for the real mode > > SLB handler. When we open up a new segment we have to sync up the paca > > copy otherwise we will instantiate small page SLB entries until the > > next context switch (at which point we resync the paca copy). > > Oops, good catch. Andrew, might be worth sticking in 2.6.11 ... Ben. From sfr at canb.auug.org.au Tue Feb 22 12:16:27 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 22 Feb 2005 12:16:27 +1100 Subject: [BUG][PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050221112746.GB17667@wotan.suse.de> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> <20050221143555.3d969f24.sfr@canb.auug.org.au> <20050221112746.GB17667@wotan.suse.de> Message-ID: <20050222121627.26374d83.sfr@canb.auug.org.au> Hi Marcelo, [New version with no printk and a bug fixed that noone noticed :-)] In the presence of threads, there is a possibility of the kernel being fooled by the 32 bit sys_recvmsg control data into copying more than it should into the kernel and corrupting kernel data structures. We call the 64 bit version of sys_recvmsg which writes control messages directly to user memory which we then read back and "fix up" for the differences between 32 and 64 bit structures. If two threads share the buffer that we are writing into (and then reading from) it is possible for the control message headers to be changed from what we expect. One of the header fields is the length we need to copy back into the kernel ... This patch just does some more length checking. This bug was actually being hit by BIND running at a customer site. It is very hard to hit, but (obviously) possible. Signed-off-by: Stephen Rothwell Please consider for inclusion into 2.4.30. Only the ppc64 part of this patch has been compiled and tested. I have applied the same fix to all the 46 bit archs with 32 bit compatibility. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp 2.4.30-pre1/arch/ia64/ia32/sys_ia32.c 2.4.30-pre1-sfr.3/arch/ia64/ia32/sys_ia32.c --- 2.4.30-pre1/arch/ia64/ia32/sys_ia32.c 2005-02-16 10:57:03.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/ia64/ia32/sys_ia32.c 2005-02-22 11:58:42.000000000 +1100 @@ -1649,7 +1649,8 @@ scm_detach_fds32 (struct msghdr *kmsg, s * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ static void -cmsg32_recvmsg_fixup (struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +cmsg32_recvmsg_fixup (struct msghdr *kmsg, unsigned long orig_cmsg_uptr, + __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -1683,6 +1684,9 @@ cmsg32_recvmsg_fixup (struct msghdr *kms goto fail2; clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -1812,6 +1816,7 @@ sys32_recvmsg (int fd, struct msghdr32 * struct iovec *iov=iovstack; struct msghdr msg_sys; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, iov_size, total_len, len; struct scm_cookie scm; @@ -1856,6 +1861,7 @@ sys32_recvmsg (int fd, struct msghdr32 * total_len=err; cmsg_ptr = (unsigned long)msg_sys.msg_control; + cmsg_len = msg_sys.msg_controllen; msg_sys.msg_flags = 0; if (sock->file->f_flags & O_NONBLOCK) @@ -1882,7 +1888,8 @@ sys32_recvmsg (int fd, struct msghdr32 * * fix it up before we tack on more stuff. */ if ((unsigned long) msg_sys.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&msg_sys, cmsg_ptr); + cmsg32_recvmsg_fixup(&msg_sys, cmsg_ptr, + cmsg_len); /* Wheee... */ if (sock->passcred) diff -ruNp 2.4.30-pre1/arch/mips64/kernel/linux32.c 2.4.30-pre1-sfr.3/arch/mips64/kernel/linux32.c --- 2.4.30-pre1/arch/mips64/kernel/linux32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/mips64/kernel/linux32.c 2005-02-22 11:58:58.000000000 +1100 @@ -2790,7 +2790,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2821,6 +2822,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2906,6 +2910,7 @@ asmlinkage int sys32_recvmsg(int fd, str struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2921,6 +2926,7 @@ asmlinkage int sys32_recvmsg(int fd, str total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2946,7 +2952,8 @@ asmlinkage int sys32_recvmsg(int fd, str * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruNp 2.4.30-pre1/arch/parisc/kernel/sys_parisc32.c 2.4.30-pre1-sfr.3/arch/parisc/kernel/sys_parisc32.c --- 2.4.30-pre1/arch/parisc/kernel/sys_parisc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/parisc/kernel/sys_parisc32.c 2005-02-22 11:59:05.000000000 +1100 @@ -2106,7 +2106,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2137,6 +2138,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2222,6 +2226,7 @@ asmlinkage int sys32_recvmsg(int fd, str struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2237,6 +2242,7 @@ asmlinkage int sys32_recvmsg(int fd, str total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2262,7 +2268,8 @@ asmlinkage int sys32_recvmsg(int fd, str * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruNp 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2.4.30-pre1-sfr.3/arch/ppc64/kernel/sys_ppc32.c --- 2.4.30-pre1/arch/ppc64/kernel/sys_ppc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/ppc64/kernel/sys_ppc32.c 2005-02-22 11:59:42.000000000 +1100 @@ -3664,7 +3664,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -3695,6 +3696,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -3751,6 +3755,7 @@ asmlinkage long sys32_recvmsg(int fd, st struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; PPCDBG(PPCDBG_SYS32, "sys32_recvmsg - entered - fd=%x, user_msg@=%p, user_flags=%x \n", fd, user_msg, user_flags); @@ -3768,6 +3773,7 @@ asmlinkage long sys32_recvmsg(int fd, st total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -3793,7 +3799,8 @@ asmlinkage long sys32_recvmsg(int fd, st * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruNp 2.4.30-pre1/arch/s390x/kernel/linux32.c 2.4.30-pre1-sfr.3/arch/s390x/kernel/linux32.c --- 2.4.30-pre1/arch/s390x/kernel/linux32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/s390x/kernel/linux32.c 2005-02-22 11:59:50.000000000 +1100 @@ -2597,7 +2597,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2628,6 +2629,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -2887,7 +2891,8 @@ out: static __inline__ void scm_recv32(struct socket *sock, struct msghdr *msg, - struct scm_cookie *scm, int flags, unsigned long cmsg_ptr) + struct scm_cookie *scm, int flags, unsigned long cmsg_ptr, + __kernel_size_t cmsg_len) { if(!msg->msg_control) { @@ -2902,7 +2907,7 @@ scm_recv32(struct socket *sock, struct m * to fix it up before we tack on more stuff. */ if((unsigned long) msg->msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(msg, cmsg_ptr); + cmsg32_recvmsg_fixup(msg, cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) put_cmsg32(msg, @@ -2916,14 +2921,14 @@ scm_recv32(struct socket *sock, struct m static int sock_recvmsg32(struct socket *sock, struct msghdr *msg, int size, int flags, - unsigned long cmsg_ptr) + unsigned long cmsg_ptr, __kernel_size_t cmsg_len) { struct scm_cookie scm; memset(&scm, 0, sizeof(scm)); size = sock->ops->recvmsg(sock, msg, size, flags, &scm); if (size >= 0) - scm_recv32(sock, msg, &scm, flags, cmsg_ptr); + scm_recv32(sock, msg, &scm, flags, cmsg_ptr, cmsg_len); return size; } @@ -2940,6 +2945,7 @@ sys32_recvmsg (int fd, struct msghdr32 * struct iovec *iov=iovstack; struct msghdr msg_sys; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, iov_size, total_len, len; /* kernel mode address */ @@ -2983,11 +2989,12 @@ sys32_recvmsg (int fd, struct msghdr32 * total_len=err; cmsg_ptr = (unsigned long)msg_sys.msg_control; + cmsg_len = msg_sys.msg_controllen; msg_sys.msg_flags = 0; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; - err = sock_recvmsg32(sock, &msg_sys, total_len, flags, cmsg_ptr); + err = sock_recvmsg32(sock, &msg_sys, total_len, flags, cmsg_ptr, cmsg_len); if (err < 0) goto out_freeiov; len = err; diff -ruNp 2.4.30-pre1/arch/sparc64/kernel/sys_sparc32.c 2.4.30-pre1-sfr.3/arch/sparc64/kernel/sys_sparc32.c --- 2.4.30-pre1/arch/sparc64/kernel/sys_sparc32.c 2005-02-16 10:57:39.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/sparc64/kernel/sys_sparc32.c 2005-02-22 11:59:56.000000000 +1100 @@ -2647,7 +2647,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -2678,6 +2679,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; if (kcmsg32->cmsg_level == SOL_SOCKET && kcmsg32->cmsg_type == SO_TIMESTAMP) { struct timeval tv; @@ -2781,6 +2785,7 @@ asmlinkage int sys32_recvmsg(int fd, str struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -2796,6 +2801,7 @@ asmlinkage int sys32_recvmsg(int fd, str total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -2821,7 +2827,8 @@ asmlinkage int sys32_recvmsg(int fd, str * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) diff -ruNp 2.4.30-pre1/arch/x86_64/ia32/socket32.c 2.4.30-pre1-sfr.3/arch/x86_64/ia32/socket32.c --- 2.4.30-pre1/arch/x86_64/ia32/socket32.c 2005-02-16 10:57:04.000000000 +1100 +++ 2.4.30-pre1-sfr.3/arch/x86_64/ia32/socket32.c 2005-02-22 12:00:04.000000000 +1100 @@ -302,7 +302,8 @@ static void scm_detach_fds32(struct msgh * IPV6_RTHDR ipv6 routing exthdr 32-bit clean * IPV6_AUTHHDR ipv6 auth exthdr 32-bit clean */ -static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, unsigned long orig_cmsg_uptr) +static void cmsg32_recvmsg_fixup(struct msghdr *kmsg, + unsigned long orig_cmsg_uptr, __kernel_size_t orig_cmsg_len) { unsigned char *workbuf, *wp; unsigned long bufsz, space_avail; @@ -333,6 +334,9 @@ static void cmsg32_recvmsg_fixup(struct __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) + break; copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + @@ -418,6 +422,7 @@ asmlinkage long sys32_recvmsg(int fd, st struct sockaddr *uaddr; int *uaddr_len; unsigned long cmsg_ptr; + __kernel_size_t cmsg_len; int err, total_len, len = 0; if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) @@ -433,6 +438,7 @@ asmlinkage long sys32_recvmsg(int fd, st total_len = err; cmsg_ptr = (unsigned long) kern_msg.msg_control; + cmsg_len = kern_msg.msg_controllen; kern_msg.msg_flags = 0; sock = sockfd_lookup(fd, &err); @@ -458,7 +464,8 @@ asmlinkage long sys32_recvmsg(int fd, st * to fix it up before we tack on more stuff. */ if((unsigned long) kern_msg.msg_control != cmsg_ptr) - cmsg32_recvmsg_fixup(&kern_msg, cmsg_ptr); + cmsg32_recvmsg_fixup(&kern_msg, + cmsg_ptr, cmsg_len); /* Wheee... */ if(sock->passcred) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050222/b70e78af/attachment.pgp From davem at davemloft.net Tue Feb 22 12:54:25 2005 From: davem at davemloft.net (David S. Miller) Date: Mon, 21 Feb 2005 17:54:25 -0800 Subject: [BUG][PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050222121627.26374d83.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> <20050221143555.3d969f24.sfr@canb.auug.org.au> <20050221112746.GB17667@wotan.suse.de> <20050222121627.26374d83.sfr@canb.auug.org.au> Message-ID: <20050221175425.3bdb5c12.davem@davemloft.net> On Tue, 22 Feb 2005 12:16:27 +1100 Stephen Rothwell wrote: > Please consider for inclusion into 2.4.30. Marcelo already put in an earlier version of your patch with the typo in the conditional which broke compilation on every platform. Please send him a relative patch to fix things up. Thanks a lot Stephen. From sfr at canb.auug.org.au Tue Feb 22 13:29:35 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 22 Feb 2005 13:29:35 +1100 Subject: [BUG][PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050221175425.3bdb5c12.davem@davemloft.net> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> <20050221143555.3d969f24.sfr@canb.auug.org.au> <20050221112746.GB17667@wotan.suse.de> <20050222121627.26374d83.sfr@canb.auug.org.au> <20050221175425.3bdb5c12.davem@davemloft.net> Message-ID: <20050222132935.7f6194ba.sfr@canb.auug.org.au> Hi Dave, Marcleo, On Mon, 21 Feb 2005 17:54:25 -0800 "David S. Miller" wrote: > > On Tue, 22 Feb 2005 12:16:27 +1100 > Stephen Rothwell wrote: > > > Please consider for inclusion into 2.4.30. > > Marcelo already put in an earlier version of your patch with > the typo in the conditional which broke compilation on every > platform. > > Please send him a relative patch to fix things up. Sorry about that. Here is a relative patch that fixes the mossing || and removes the printk as requested by Andi. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN linux-2.4/arch/ia64/ia32/sys_ia32.c linux-2.4-sfr.1/arch/ia64/ia32/sys_ia32.c --- linux-2.4/arch/ia64/ia32/sys_ia32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/ia64/ia32/sys_ia32.c 2005-02-22 13:15:22.000000000 +1100 @@ -1684,19 +1684,9 @@ goto fail2; clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + diff -ruN linux-2.4/arch/mips64/kernel/linux32.c linux-2.4-sfr.1/arch/mips64/kernel/linux32.c --- linux-2.4/arch/mips64/kernel/linux32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/mips64/kernel/linux32.c 2005-02-22 13:15:38.000000000 +1100 @@ -2822,19 +2822,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + diff -ruN linux-2.4/arch/parisc/kernel/sys_parisc32.c linux-2.4-sfr.1/arch/parisc/kernel/sys_parisc32.c --- linux-2.4/arch/parisc/kernel/sys_parisc32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/parisc/kernel/sys_parisc32.c 2005-02-22 13:15:54.000000000 +1100 @@ -2138,19 +2138,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + diff -ruN linux-2.4/arch/ppc64/kernel/sys_ppc32.c linux-2.4-sfr.1/arch/ppc64/kernel/sys_ppc32.c --- linux-2.4/arch/ppc64/kernel/sys_ppc32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/ppc64/kernel/sys_ppc32.c 2005-02-22 13:16:12.000000000 +1100 @@ -3696,19 +3696,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + diff -ruN linux-2.4/arch/s390x/kernel/linux32.c linux-2.4-sfr.1/arch/s390x/kernel/linux32.c --- linux-2.4/arch/s390x/kernel/linux32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/s390x/kernel/linux32.c 2005-02-22 13:16:39.000000000 +1100 @@ -2629,19 +2629,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + diff -ruN linux-2.4/arch/sparc64/kernel/sys_sparc32.c linux-2.4-sfr.1/arch/sparc64/kernel/sys_sparc32.c --- linux-2.4/arch/sparc64/kernel/sys_sparc32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/sparc64/kernel/sys_sparc32.c 2005-02-22 13:16:55.000000000 +1100 @@ -2679,19 +2679,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } if (kcmsg32->cmsg_level == SOL_SOCKET && kcmsg32->cmsg_type == SO_TIMESTAMP) { struct timeval tv; diff -ruN linux-2.4/arch/x86_64/ia32/socket32.c linux-2.4-sfr.1/arch/x86_64/ia32/socket32.c --- linux-2.4/arch/x86_64/ia32/socket32.c 2005-02-22 12:12:35.000000000 +1100 +++ linux-2.4-sfr.1/arch/x86_64/ia32/socket32.c 2005-02-22 13:17:10.000000000 +1100 @@ -334,19 +334,9 @@ __get_user(kcmsg32->cmsg_type, &ucmsg->cmsg_type); clen64 = kcmsg32->cmsg_len; - if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) - (clen64 > (orig_cmsg_len + wp - workbuf))) { - static int count; - - if (count++ < 20) - printk(KERN_WARNING "recvmsg_fixup: " - "bad data length %d, level %d, " - "type %d, process %d (%s)\n", - clen64, kcmsg32->cmsg_level, - kcmsg32->cmsg_type, - current->pid, current->comm); + if ((clen64 < CMSG_ALIGN(sizeof(*ucmsg))) || + (clen64 > (orig_cmsg_len + wp - workbuf))) break; - } copy_from_user(CMSG32_DATA(kcmsg32), CMSG_DATA(ucmsg), clen64 - CMSG_ALIGN(sizeof(*ucmsg))); clen32 = ((clen64 - CMSG_ALIGN(sizeof(*ucmsg))) + -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050222/041d4a97/attachment.pgp From johnrose at austin.ibm.com Tue Feb 22 12:21:34 2005 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 21 Feb 2005 19:21:34 -0600 Subject: pSeries_request_regions() Message-ID: <1109035294.7706.1.camel@sinatra.austin.ibm.com> During init, pSeries_request_regions() blindly requests ioports: 00000000-0000001f : dma1 00000020-0000003f : pic1 00000040-0000005f : timer 00000060-0000006f : reserved (no i8042) 00000080-0000008f : dma page reg 000000a0-000000bf : pic2 000000c0-000000df : dma2 If I understand correctly, this is necessary for legacy ISA support. A system with purely virtual I/O would not need these reservations. If I should skip their reservation conditionally, should I check for the mere existence of physical I/O? Or for the existence of an ISA node? Thoughts? Thanks- John From anton at samba.org Tue Feb 22 17:14:38 2005 From: anton at samba.org (Anton Blanchard) Date: Tue, 22 Feb 2005 17:14:38 +1100 Subject: pSeries_request_regions() In-Reply-To: <1109035294.7706.1.camel@sinatra.austin.ibm.com> References: <1109035294.7706.1.camel@sinatra.austin.ibm.com> Message-ID: <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> Hi John, > During init, pSeries_request_regions() blindly requests ioports: > 00000000-0000001f : dma1 > 00000020-0000003f : pic1 > 00000040-0000005f : timer > 00000060-0000006f : reserved (no i8042) > 00000080-0000008f : dma page reg > 000000a0-000000bf : pic2 > 000000c0-000000df : dma2 > > If I understand correctly, this is necessary for legacy ISA support. A > system with purely virtual I/O would not need these reservations. If I > should skip their reservation conditionally, should I check for the > mere existence of physical I/O? Or for the existence of an ISA node? > Thoughts? Yeah requesting those regions all the time is bogus. We can skip it if we dont have ISA, perhaps something like this: static void __init pSeries_request_regions(void) { + if (!isa_io_base) + return; + request_region(0x20,0x20,"pic1"); request_region(0xa0,0x20,"pic2"); request_region(0x00,0x20,"dma1"); Anton From benh at kernel.crashing.org Tue Feb 22 18:05:23 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 22 Feb 2005 18:05:23 +1100 Subject: pSeries_request_regions() In-Reply-To: <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> References: <1109035294.7706.1.camel@sinatra.austin.ibm.com> <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> Message-ID: <1109055924.5327.102.camel@gaston> On Tue, 2005-02-22 at 17:14 +1100, Anton Blanchard wrote: > Hi John, > > > During init, pSeries_request_regions() blindly requests ioports: > > 00000000-0000001f : dma1 > > 00000020-0000003f : pic1 > > 00000040-0000005f : timer > > 00000060-0000006f : reserved (no i8042) > > 00000080-0000008f : dma page reg > > 000000a0-000000bf : pic2 > > 000000c0-000000df : dma2 > > > > If I understand correctly, this is necessary for legacy ISA support. A > > system with purely virtual I/O would not need these reservations. If I > > should skip their reservation conditionally, should I check for the > > mere existence of physical I/O? Or for the existence of an ISA node? > > Thoughts? > > Yeah requesting those regions all the time is bogus. We can skip it if > we dont have ISA, perhaps something like this: Well... The reason we request them in the first place is to prevent drivers from mucking around with those bits of hardware no ? (so they fail their own request_region). I recently added a platform hook to deal with that in a slightly more suitable way, though not all legacy drivers have been fixed yet to use it. Ben. From michael at ellerman.id.au Tue Feb 22 19:24:23 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Tue, 22 Feb 2005 19:24:23 +1100 Subject: [RFC/PATCH] ppc64: Add mem=X option Message-ID: <20050222192423.727023f7.michael@ellerman.id.au> Hi Anton, Ben, and the rest of ya, Here is my first take at adding support for the mem=X boot option. Please check it out. Anton can you have a look at the NUMA code? It's probably bogus. It works on vego but those LPARs only have one NUMA node, if you can try it on some beefy NUMA thing that'd be sweet. I've successfully booted this on iSeries, Power3, Power4/5 LPAR and a G5. cheers! --- arch/ppc64/kernel/iSeries_setup.c | 38 ++++++++---- arch/ppc64/kernel/lmb.c | 31 +++++++++ arch/ppc64/kernel/prom.c | 15 ++++ arch/ppc64/kernel/prom_init.c | 118 +++++++++++++++++++++++++++++++++++--- arch/ppc64/kernel/setup.c | 14 +++- arch/ppc64/mm/hash_utils.c | 19 +++++- arch/ppc64/mm/numa.c | 32 +++++++++- include/asm-ppc64/lmb.h | 1 8 files changed, 238 insertions(+), 30 deletions(-) Index: latest/arch/ppc64/kernel/setup.c =================================================================== --- latest.orig/arch/ppc64/kernel/setup.c +++ latest/arch/ppc64/kernel/setup.c @@ -641,12 +641,11 @@ void __init setup_system(void) early_console_initialized = 1; register_console(&udbg_console); -#endif /* !CONFIG_PPC_ISERIES */ - /* Save unparsed command line copy for /proc/cmdline */ strlcpy(saved_command_line, cmd_line, COMMAND_LINE_SIZE); parse_early_param(); +#endif /* !CONFIG_PPC_ISERIES */ #if defined(CONFIG_SMP) && !defined(CONFIG_PPC_ISERIES) /* @@ -805,9 +804,16 @@ struct seq_operations cpuinfo_op = { .show = show_cpuinfo, }; -#if 0 /* XXX not currently used */ +/* These three variables are used to save values passed to us by prom_init() + * via the device tree. The TCE variables are needed because with a memory_limit + * in force we may need to explicitly map the TCE are at the top of RAM. */ unsigned long memory_limit; +unsigned long tce_alloc_start; +unsigned long tce_alloc_end; +#ifdef CONFIG_PPC_ISERIES +/* On iSeries we just parse the mem=X option from the command line. + * On pSeries it's a bit more complicated, see prom_init_mem() */ static int __init early_parsemem(char *p) { if (!p) @@ -818,7 +824,7 @@ static int __init early_parsemem(char *p return 0; } early_param("mem", early_parsemem); -#endif +#endif /* CONFIG_PPC_ISERIES */ #ifdef CONFIG_PPC_MULTIPLATFORM static int __init set_preferred_console(void) Index: latest/arch/ppc64/kernel/lmb.c =================================================================== --- latest.orig/arch/ppc64/kernel/lmb.c +++ latest/arch/ppc64/kernel/lmb.c @@ -344,3 +344,34 @@ lmb_abs_to_phys(unsigned long aa) return pa; } + +/* Truncate the lmb list to memory_limit if it's set + * You must call lmb_analyze() after this. */ +void __init lmb_apply_memory_limit(void) +{ + extern unsigned long memory_limit; + unsigned long i, total = 0, crop; + struct lmb_region *mem = &(lmb.memory); + + if (likely(!memory_limit)) + return; + + for (i = 0; i < mem->cnt; i++) { + total += mem->region[i].size; + + if (total <= memory_limit) + continue; + + crop = (memory_limit - (total - mem->region[i].size)); +#ifdef DEBUG + udbg_printf("lmb_truncate(): truncating at region %x\n", i); + udbg_printf("lmb_truncate(): total = %x\n", total); + udbg_printf("lmb_truncate(): size = %x\n", mem->region[i].size); + udbg_printf("lmb_truncate(): crop = %x\n", crop); +#endif + + mem->region[i].size = crop; + mem->cnt = i + 1; + break; + } +} Index: latest/include/asm-ppc64/lmb.h =================================================================== --- latest.orig/include/asm-ppc64/lmb.h +++ latest/include/asm-ppc64/lmb.h @@ -53,6 +53,7 @@ extern unsigned long __init lmb_alloc_ba extern unsigned long __init lmb_phys_mem_size(void); extern unsigned long __init lmb_end_of_DRAM(void); extern unsigned long __init lmb_abs_to_phys(unsigned long); +extern void __init lmb_apply_memory_limit(void); extern void lmb_dump_all(void); Index: latest/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- latest.orig/arch/ppc64/kernel/iSeries_setup.c +++ latest/arch/ppc64/kernel/iSeries_setup.c @@ -284,7 +284,7 @@ unsigned long iSeries_process_mainstore_ return mem_blocks; } -static void __init iSeries_parse_cmdline(void) +static void __init iSeries_get_cmdline(void) { char *p, *q; @@ -304,6 +304,8 @@ static void __init iSeries_parse_cmdline /*static*/ void __init iSeries_init_early(void) { + extern unsigned long memory_limit; + DBG(" -> iSeries_init_early()\n"); ppcdbg_initialize(); @@ -351,6 +353,29 @@ static void __init iSeries_parse_cmdline */ build_iSeries_Memory_Map(); + iSeries_get_cmdline(); + + /* Save unparsed command line copy for /proc/cmdline */ + strlcpy(saved_command_line, cmd_line, COMMAND_LINE_SIZE); + + /* Parse early parameters, in particular mem=x */ + parse_early_param(); + + if (unlikely(memory_limit)) { + if (memory_limit > systemcfg->physicalMemorySize) + printk("Ignoring 'mem' option, value %lu is too large.\n", memory_limit); + else + systemcfg->physicalMemorySize = memory_limit; + } + + /* Bolt kernel mappings for all of memory */ + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); + + lmb_init(); + lmb_add(0, systemcfg->physicalMemorySize); + lmb_analyze(); /* ?? */ + lmb_reserve(0, __pa(klimit)); + /* Initialize machine-dependency vectors */ #ifdef CONFIG_SMP smp_init_iSeries(); @@ -376,9 +401,6 @@ static void __init iSeries_parse_cmdline initrd_start = initrd_end = 0; #endif /* CONFIG_BLK_DEV_INITRD */ - - iSeries_parse_cmdline(); - DBG(" <- iSeries_init_early()\n"); } @@ -539,14 +561,6 @@ static void __init build_iSeries_Memory_ * nextPhysChunk */ systemcfg->physicalMemorySize = chunk_to_addr(nextPhysChunk); - - /* Bolt kernel mappings for all of memory */ - iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); - - lmb_init(); - lmb_add(0, systemcfg->physicalMemorySize); - lmb_analyze(); /* ?? */ - lmb_reserve(0, __pa(klimit)); } /* Index: latest/arch/ppc64/kernel/prom.c =================================================================== --- latest.orig/arch/ppc64/kernel/prom.c +++ latest/arch/ppc64/kernel/prom.c @@ -875,6 +875,8 @@ static int __init early_init_dt_scan_cho const char *full_path, void *data) { u32 *prop; + u64 *prop64; + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; if (strcmp(full_path, "/chosen") != 0) return 0; @@ -891,6 +893,18 @@ static int __init early_init_dt_scan_cho if (get_flat_dt_prop(node, "linux,iommu-force-on", NULL) != NULL) iommu_force_on = 1; + prop64 = (u64*)get_flat_dt_prop(node, "linux,memory-limit", NULL); + if (prop64) + memory_limit = *prop64; + + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-start", NULL); + if (prop64) + tce_alloc_start = *prop64; + + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-end", NULL); + if (prop64) + tce_alloc_end = *prop64; + #ifdef CONFIG_PPC_PSERIES /* To help early debugging via the front panel, we retreive a minimal * set of RTAS infos now if available @@ -1030,6 +1044,7 @@ void __init early_init_devtree(void *par lmb_init(); scan_flat_dt(early_init_dt_scan_root, NULL); scan_flat_dt(early_init_dt_scan_memory, NULL); + lmb_apply_memory_limit(); lmb_analyze(); systemcfg->physicalMemorySize = lmb_phys_mem_size(); lmb_reserve(0, __pa(klimit)); Index: latest/arch/ppc64/mm/hash_utils.c =================================================================== --- latest.orig/arch/ppc64/mm/hash_utils.c +++ latest/arch/ppc64/mm/hash_utils.c @@ -140,6 +140,8 @@ void __init htab_initialize(void) unsigned long pteg_count; unsigned long mode_rw; int i, use_largepages = 0; + unsigned long base = 0, size = 0; + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; DBG(" -> htab_initialize()\n"); @@ -195,8 +197,6 @@ void __init htab_initialize(void) /* create bolted the linear mapping in the hash table */ for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - base = lmb.memory.region[i].physbase + KERNELBASE; size = lmb.memory.region[i].size; @@ -225,6 +225,21 @@ void __init htab_initialize(void) #endif /* CONFIG_U3_DART */ create_pte_mapping(base, base + size, mode_rw, use_largepages); } + + /* If we have a memory_limit and we've allocated TCEs then we need to + * explicitly map the TCE area at the top of RAM. We also cope with the + * case that the TCEs start below memory_limit. */ + if (unlikely(memory_limit && tce_alloc_start && tce_alloc_end)) { + tce_alloc_start += KERNELBASE; + tce_alloc_end += KERNELBASE; + + if (base + size >= tce_alloc_start) + tce_alloc_start = base + size + 1; + + create_pte_mapping(tce_alloc_start, tce_alloc_end, + mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); } #undef KB Index: latest/arch/ppc64/mm/numa.c =================================================================== --- latest.orig/arch/ppc64/mm/numa.c +++ latest/arch/ppc64/mm/numa.c @@ -270,6 +270,7 @@ static int __init parse_numa_properties( int max_domain = 0; long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; unsigned long i; + extern unsigned long memory_limit; if (numa_enabled == 0) { printk(KERN_WARNING "NUMA disabled by user\n"); @@ -378,7 +379,7 @@ new_range: size / PAGE_SIZE; } - for (i = start ; i < (start+size); i += MEMORY_INCREMENT) + for (i = start; i < (start+size) && i < lmb_end_of_DRAM(); i += MEMORY_INCREMENT) numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = numa_domain; @@ -387,8 +388,33 @@ new_range: goto new_range; } - for (i = 0; i <= max_domain; i++) - node_set_online(i); + if (unlikely(memory_limit)) { + unsigned long size, total = 0; + + for (i = 0; i <= max_domain; i++) { + size = init_node_data[i].node_spanned_pages * PAGE_SIZE; + total += size; + + if (total <= memory_limit) + continue; + + size = (memory_limit - (total - size)) / PAGE_SIZE; + dbg("NUMA: truncating node %ld to %ld pages\n", i, size); + init_node_data[i].node_spanned_pages = size; + break; + } + + for (i++; i <= max_domain; i++) { + dbg("NUMA: offlining node %ld for memory_limit\n", i); + node_set_offline(i); + init_node_data[i].node_start_pfn = 0; + init_node_data[i].node_spanned_pages = 0; + } + } else { + /* FIXME do we need this? haven't we already done it in the else above? */ + for (i = 0; i <= max_domain; i++) + node_set_online(i); + } return 0; err: Index: latest/arch/ppc64/kernel/prom_init.c =================================================================== --- latest.orig/arch/ppc64/kernel/prom_init.c +++ latest/arch/ppc64/kernel/prom_init.c @@ -178,6 +178,9 @@ static int __initdata of_platform; static char __initdata prom_cmd_line[COMMAND_LINE_SIZE]; +static unsigned long __initdata memory_limit; +static unsigned long __initdata tce_alloc_start; +static unsigned long __initdata tce_alloc_end; static unsigned long __initdata alloc_top; static unsigned long __initdata alloc_top_high; static unsigned long __initdata alloc_bottom; @@ -385,10 +388,64 @@ static int __init prom_setprop(phandle n (u32)(unsigned long) value, (u32) valuelen); } +/* We can't use the standard versions because of RELOC headaches. */ +#define isxdigit(c) (('0' <= (c) && (c) <= '9') \ + || ('a' <= (c) && (c) <= 'f') \ + || ('A' <= (c) && (c) <= 'F')) + +#define isdigit(c) ('0' <= (c) && (c) <= '9') +#define islower(c) ('a' <= (c) && (c) <= 'z') +#define toupper(c) (islower(c) ? ((c) - 'a' + 'A') : (c)) + +unsigned long prom_strtoul(const char *cp, const char **endp) +{ + unsigned long result = 0, base = 10, value; + + if (*cp == '0') { + base = 8; + cp++; + if (toupper(*cp) == 'X') { + cp++; + base = 16; + } + } + + while (isxdigit(*cp) && + (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < base) { + result = result * base + value; + cp++; + } + + if (endp) + *endp = cp; + + return result; +} + +unsigned long prom_memparse(const char *ptr, const char **retptr) +{ + unsigned long ret = prom_strtoul(ptr, retptr); + + switch (**retptr) { + case 'G': + case 'g': + ret <<= 10; + case 'M': + case 'm': + ret <<= 10; + case 'K': + case 'k': + ret <<= 10; + (*retptr)++; + default: + break; + } + return ret; +} /* * Early parsing of the command line passed to the kernel, used for - * the options that affect the iommu + * "mem=x" and the options that affect the iommu */ static void __init early_cmdline_parse(void) { @@ -419,6 +476,14 @@ static void __init early_cmdline_parse(v else if (!strncmp(opt, RELOC("force"), 5)) RELOC(iommu_force_on) = 1; } + + opt = strstr(RELOC(prom_cmd_line), RELOC("mem=")); + if (opt) { + opt += 4; + RELOC(memory_limit) = prom_memparse(opt, (const char **)&opt); + /* Align to 16 MB == size of large page */ + RELOC(memory_limit) = ALIGN(RELOC(memory_limit), 0x1000000); + } } /* @@ -665,15 +730,7 @@ static void __init prom_init_mem(void) } } - /* Setup our top/bottom alloc points, that is top of RMO or top of - * segment 0 when running non-LPAR - */ - if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) - RELOC(alloc_top) = RELOC(rmo_top); - else - RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); - RELOC(alloc_top_high) = RELOC(ram_top); /* Check if we have an initrd after the kernel, if we do move our bottom * point to after it @@ -683,8 +740,37 @@ static void __init prom_init_mem(void) > RELOC(alloc_bottom)) RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(prom_initrd_end)); } + + /* If memory_limit is set we reduce the upper limits *except* for + * alloc_top_high. This must be the real top of RAM so we can put + * TCE's up there. */ + + RELOC(alloc_top_high) = RELOC(ram_top); + + if (unlikely(RELOC(memory_limit))) { + if (RELOC(memory_limit) <= RELOC(alloc_bottom)) { + prom_printf("Ignoring mem=%x <= alloc_bottom.\n", + RELOC(memory_limit)); + RELOC(memory_limit) = 0; + } else if (RELOC(memory_limit) >= RELOC(ram_top)) { + prom_printf("Ignoring mem=%x >= ram_top.\n", + RELOC(memory_limit)); + RELOC(memory_limit) = 0; + } else { + RELOC(ram_top) = RELOC(memory_limit); + RELOC(rmo_top) = min(RELOC(rmo_top), RELOC(memory_limit)); + } + } + + /* Setup our top alloc point, that is top of RMO or top of + * segment 0 when running non-LPAR. */ + if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) + RELOC(alloc_top) = RELOC(rmo_top); + else + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); prom_printf("memory layout at init:\n"); + prom_printf(" memory_limit : %x\n", RELOC(memory_limit)); prom_printf(" alloc_bottom : %x\n", RELOC(alloc_bottom)); prom_printf(" alloc_top : %x\n", RELOC(alloc_top)); prom_printf(" alloc_top_hi : %x\n", RELOC(alloc_top_high)); @@ -873,6 +959,11 @@ static void __init prom_initialize_tce_t reserve_mem(local_alloc_bottom, local_alloc_top - local_alloc_bottom); + if (RELOC(memory_limit)) { + RELOC(tce_alloc_start) = local_alloc_bottom; + RELOC(tce_alloc_end) = local_alloc_top; + } + /* Flag the first invalid entry */ prom_debug("ending prom_initialize_tce_table\n"); } @@ -1688,6 +1779,15 @@ unsigned long __init prom_init(unsigned prom_setprop(_prom->chosen, "linux,iommu-off", NULL, 0); if (RELOC(iommu_force_on)) prom_setprop(_prom->chosen, "linux,iommu-force-on", NULL, 0); + if (RELOC(memory_limit)) + prom_setprop(_prom->chosen, "linux,memory-limit", + PTRRELOC(&memory_limit), sizeof(RELOC(memory_limit))); + if (RELOC(tce_alloc_start)) + prom_setprop(_prom->chosen, "linux,tce-alloc-start", + PTRRELOC(&tce_alloc_start), sizeof(RELOC(tce_alloc_start))); + if (RELOC(tce_alloc_end)) + prom_setprop(_prom->chosen, "linux,tce-alloc-end", + PTRRELOC(&tce_alloc_end), sizeof(RELOC(tce_alloc_end))); /* * Now finally create the flattened device-tree From clmason at gmail.com Tue Feb 22 20:50:28 2005 From: clmason at gmail.com (Chris L. Mason) Date: Tue, 22 Feb 2005 05:50:28 -0400 Subject: iMac G5 Message-ID: <610e3466050222015016158fb5@mail.gmail.com> Hi, Does anyone know if the iMac G5 stuff has been merged into the main kernel, or if not, if there are plans to do so? Thanks, Chris From johnrose at austin.ibm.com Wed Feb 23 03:24:16 2005 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 22 Feb 2005 10:24:16 -0600 Subject: pSeries_request_regions() In-Reply-To: <1109055924.5327.102.camel@gaston> References: <1109035294.7706.1.camel@sinatra.austin.ibm.com> <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> <1109055924.5327.102.camel@gaston> Message-ID: <1109089456.21332.6.camel@sinatra.austin.ibm.com> > The reason we request them in the first place is to prevent drivers from > mucking around with those bits of hardware no ? (so they fail their own > request_region). > > I recently added a platform hook to deal with that in a slightly more > suitable way, though not all legacy drivers have been fixed yet to use > it. For the case of dynamically adding a PHB to a system that booted with no physical I/O, I need to pick a starting virtual address for the I/O region. I could hard-code to start past these regions, but I'm not sure it makes sense for these to exist in the first place. Would legacy drivers even come into play on a purely virtual system? John From arnd at arndb.de Wed Feb 23 03:23:51 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 22 Feb 2005 17:23:51 +0100 Subject: [RFC] splitting out LPAR support from CONFIG_PSERIES Message-ID: <200502221723.52051.arnd@arndb.de> I have a private patch set that currently depends on this patch. It introduces a new compile time option that makes it possible to disable support for LPAR or native setups from a pSeries kernel. Obviously, this is not for generic distribution kernels, but I think it makes sense to have the option when you're building for just one machine. It also makes some of my subsequent patches simpler, especially enabling RTAS on non-pSeries machines without requiring LPAR support. The current form of the patch introduces lots of new #ifdefs, which can be reduced if we use the scheme I proposed in the 'Introduce CPU_HAS_FEATURE() macro' discussion. It would also be cleaner to split the pSeries_iommu code into LPAR and native files. I'm not proposing inclusion of this patch at this point, but I'd like to know if the idea is ok or if I should better try not to touch the pSeries code. Arnd <>< --- linux-2.6-ppc.orig/arch/ppc64/Kconfig 2005-02-18 16:28:53.305937360 -0500 +++ linux-2.6-ppc/arch/ppc64/Kconfig 2005-02-18 16:28:53.392924136 -0500 @@ -72,9 +72,19 @@ endchoice -config PPC_PSERIES +config PPC_PSERIES_NATIVE depends on PPC_MULTIPLATFORM - bool " IBM pSeries & new iSeries" + bool " IBM pSeries native" + default y + +config PPC_PSERIES_LPAR + depends on PPC_MULTIPLATFORM + bool " IBM pSeries running in LPAR & new iSeries" + default y + +config PPC_PSERIES + depends on PPC_PSERIES_NATIVE || PPC_PSERIES_LPAR + bool default y config PPC_PMAC @@ -115,7 +125,7 @@ default y config PPC_SPLPAR - depends on PPC_PSERIES + depends on PPC_PSERIES_LPAR bool "Support for shared-processor logical partitions" default n help @@ -125,7 +135,7 @@ two or more partitions. config IBMVIO - depends on PPC_PSERIES || PPC_ISERIES + depends on PPC_PSERIES_LPAR || PPC_ISERIES bool default y @@ -271,7 +281,7 @@ config LPARCFG tristate "LPAR Configuration Data" - depends on PPC_PSERIES || PPC_ISERIES + depends on PPC_PSERIES_LPAR || PPC_ISERIES help Provide system capacity information via human readable = pairs through a /proc/ppc64/lparcfg interface. Index: linux-2.6-ppc/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/Makefile 2005-02-18 16:28:53.306937208 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/Makefile 2005-02-18 16:28:53.393923984 -0500 @@ -29,8 +29,9 @@ obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o -obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - pSeries_nvram.o rtasd.o ras.o \ +obj-$(CONFIG_PPC_PSERIES_LPAR) += pSeries_lpar.o pSeries_hvCall.o + +obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_nvram.o rtasd.o ras.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6-ppc/arch/ppc64/kernel/idle.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/idle.c 2005-02-18 16:28:40.528019944 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/idle.c 2005-02-18 16:28:53.394923832 -0500 @@ -154,7 +154,7 @@ return 0; } -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR DECLARE_PER_CPU(unsigned long, smt_snooze_delay); @@ -348,7 +348,7 @@ #else idle_loop = default_idle; #endif -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR if (systemcfg->platform & PLATFORM_PSERIES) { if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { if (get_paca()->lppaca.shared_proc) { Index: linux-2.6-ppc/arch/ppc64/kernel/pSeries_iommu.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/pSeries_iommu.c 2005-02-18 16:28:40.530019640 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/pSeries_iommu.c 2005-02-18 16:28:53.395923680 -0500 @@ -1,6 +1,8 @@ /* * arch/ppc64/kernel/pSeries_iommu.c * + * FIXME: split this file into LPAR and native versions + * * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation * * Rewrite, cleanup: @@ -50,6 +52,7 @@ extern int is_python(struct device_node *); +#ifdef CONFIG_PPC_PSERIES_NATIVE static void tce_build_pSeries(struct iommu_table *tbl, long index, long npages, unsigned long uaddr, enum dma_data_direction direction) @@ -91,8 +94,9 @@ tp++; } } +#endif /* CONFIG_PPC_PSERIES_NATIVE */ - +#ifdef CONFIG_PPC_PSERIES_LPAR static void tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, enum dma_data_direction direction) @@ -235,7 +239,9 @@ show_stack(current, (unsigned long *)__get_SP()); } } +#endif /* CONFIG_PPC_PSERIES_LPAR */ +#ifdef CONFIG_PPC_PSERIES_NATIVE static void iommu_table_setparms(struct pci_controller *phb, struct device_node *dn, struct iommu_table *tbl) @@ -275,7 +281,9 @@ tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } +#endif /* CONFIG_PPC_PSERIES_NATIVE */ +#ifdef CONFIG_PPC_PSERIES_LPAR /* * iommu_table_setparms_lpar * @@ -305,7 +313,9 @@ tbl->it_blocksize = 16; tbl->it_type = TCE_PCI; } +#endif +#ifdef CONFIG_PPC_PSERIES_NATIVE static void iommu_bus_setup_pSeries(struct pci_bus *bus) { struct device_node *dn, *pdn; @@ -393,8 +403,9 @@ } } } +#endif /* CONFIG_PPC_PSERIES_NATIVE */ - +#ifdef CONFIG_PPC_PSERIES_LPAR static void iommu_bus_setup_pSeriesLP(struct pci_bus *bus) { struct iommu_table *tbl; @@ -432,7 +443,7 @@ if (pdn != dn) dn->iommu_table = pdn->iommu_table; } - +#endif static void iommu_dev_setup_pSeries(struct pci_dev *dev) { @@ -471,6 +482,7 @@ } if (systemcfg->platform & PLATFORM_LPAR) { +#ifdef CONFIG_PPC_PSERIES_LPAR if (cur_cpu_spec->firmware_features & FW_FEATURE_MULTITCE) { ppc_md.tce_build = tce_buildmulti_pSeriesLP; ppc_md.tce_free = tce_freemulti_pSeriesLP; @@ -479,10 +491,13 @@ ppc_md.tce_free = tce_free_pSeriesLP; } ppc_md.iommu_bus_setup = iommu_bus_setup_pSeriesLP; +#endif /* CONFIG_PPC_PSERIES_LPAR */ } else { +#ifdef CONFIG_PPC_PSERIES_NATIVE ppc_md.tce_build = tce_build_pSeries; ppc_md.tce_free = tce_free_pSeries; ppc_md.iommu_bus_setup = iommu_bus_setup_pSeries; +#endif /* CONFIG_PPC_PSERIES_NATIVE */ } ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; Index: linux-2.6-ppc/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/pSeries_setup.c 2005-02-18 16:28:40.532019336 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/pSeries_setup.c 2005-02-18 16:28:53.396923528 -0500 @@ -233,8 +233,10 @@ pSeries_nvram_init(); +#ifdef CONFIG_PPC_PSERIES_LPAR if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(boot_cpuid); +#endif } static int __init pSeries_init_panel(void) @@ -326,7 +328,9 @@ /* Some hardware requires clearing the CPPR, while other hardware does not * it is safe either way */ +#ifdef CONFIG_PPC_PSERIES_LPAR pSeriesLP_cppr_info(0, 0); +#endif rtas_stop_self(); /* Should never get here... */ BUG(); @@ -339,8 +343,6 @@ */ static void __init pSeries_init_early(void) { - void *comport; - int iommu_off = 0; unsigned int default_speed; u64 physport; @@ -348,19 +350,28 @@ fw_feature_init(); - if (systemcfg->platform & PLATFORM_LPAR) + if (systemcfg->platform & PLATFORM_LPAR) { +#ifdef CONFIG_PPC_PSERIES_LPAR hpte_init_lpar(); - else { +#endif + } else { +#ifdef CONFIG_PPC_PSERIES_NATIVE + int iommu_off = 0; hpte_init_native(); iommu_off = (of_chosen && get_property(of_chosen, "linux,iommu-off", NULL)); +#endif } generic_find_legacy_serial_ports(&physport, &default_speed); - if (systemcfg->platform & PLATFORM_LPAR) + if (systemcfg->platform & PLATFORM_LPAR) { +#ifdef CONFIG_PPC_PSERIES_LPAR find_udbg_vterm(); - else if (physport) { +#endif + } else if (physport) { +#ifdef CONFIG_PPC_PSERIES_NATIVE + void *comport; /* Map the uart for udbg. */ comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); udbg_init_uart(comport, default_speed); @@ -369,6 +380,7 @@ ppc_md.udbg_getc = udbg_getc; ppc_md.udbg_getc_poll = udbg_getc_poll; DBG("Hello World !\n"); +#endif } Index: linux-2.6-ppc/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/pSeries_smp.c 2005-02-18 16:28:40.536018728 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/pSeries_smp.c 2005-02-18 16:28:53.397923376 -0500 @@ -255,8 +255,10 @@ if (cpu != boot_cpuid) xics_setup_cpu(); +#ifdef CONFIG_PPC_PSERIES_LPAR if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); +#endif /* * Put the calling processor into the GIQ. This is really only Index: linux-2.6-ppc/arch/ppc64/kernel/sysfs.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/sysfs.c 2005-02-18 16:28:40.534019032 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/sysfs.c 2005-02-18 16:28:53.398923224 -0500 @@ -110,11 +110,10 @@ void ppc64_enable_pmcs(void) { unsigned long hid0; -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR unsigned long set, reset; int ret; - unsigned int ctrl; -#endif /* CONFIG_PPC_PSERIES */ +#endif /* CONFIG_PPC_PSERIES_LPAR */ /* Only need to enable them once */ if (__get_cpu_var(pmcs_enabled)) @@ -142,7 +141,7 @@ "memory"); break; -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR case PLATFORM_PSERIES_LPAR: set = 1UL << 63; reset = 0; @@ -158,16 +157,18 @@ break; } -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR /* instruct hypervisor to maintain PMCs */ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) get_paca()->lppaca.pmcregs_in_use = 1; - +#endif /* CONFIG_PPC_PSERIES_LPAR */ +#ifdef CONFIG_PPC_PSERIES /* * On SMT machines we have to set the run latch in the ctrl register * in order to make PMC6 spin. */ if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) { + unsigned int ctrl; ctrl = mfspr(CTRLF); ctrl |= RUNLATCH; mtspr(CTRLT, ctrl); Index: linux-2.6-ppc/arch/ppc64/kernel/xics.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/xics.c 2005-02-18 16:28:40.539018272 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/xics.c 2005-02-18 16:28:53.399923072 -0500 @@ -84,8 +84,6 @@ } qirr; }; -static struct xics_ipl __iomem *xics_per_cpu[NR_CPUS]; - static int xics_irq_8259_cascade = 0; static int xics_irq_8259_cascade_real = 0; static unsigned int default_server = 0xFF; @@ -112,7 +110,8 @@ } xics_ops; -/* SMP */ +#ifdef CONFIG_PPC_PSERIES_NATIVE +static struct xics_ipl __iomem *xics_per_cpu[NR_CPUS]; static int pSeries_xirr_info_get(int n_cpu) { @@ -140,11 +139,9 @@ pSeries_cppr_info, pSeries_qirr_info }; +#endif -static xics_ops *ops = &pSeries_ops; - - -/* LPAR */ +#ifdef CONFIG_PPC_PSERIES_LPAR static inline long plpar_eoi(unsigned long xirr) { @@ -213,6 +210,9 @@ pSeriesLP_cppr_info, pSeriesLP_qirr_info }; +#endif + +static xics_ops *ops; static unsigned int xics_startup(unsigned int virq) { @@ -535,8 +535,9 @@ = virt_irq_create_mapping(xics_irq_8259_cascade_real); of_node_put(np); } - +#ifdef CONFIG_PPC_PSERIES_NATIVE if (systemcfg->platform == PLATFORM_PSERIES) { + ops = &pSeries_ops; #ifdef CONFIG_SMP for_each_cpu(i) { int hard_id; @@ -552,10 +553,13 @@ #else xics_per_cpu[0] = ioremap(intr_base, intr_size); #endif /* CONFIG_SMP */ - } else if (systemcfg->platform == PLATFORM_PSERIES_LPAR) { + } +#endif /* CONFIG_PPC_PSERIES_NATIVE */ +#ifdef CONFIG_PPC_PSERIES_LPAR + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) { ops = &pSeriesLP_ops; } - +#endif /* CONFIG_PPC_PSERIES_LPAR */ xics_8259_pic.enable = i8259_pic.enable; xics_8259_pic.disable = i8259_pic.disable; for (i = 0; i < 16; ++i) Index: linux-2.6-ppc/arch/ppc64/mm/hash_utils.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/mm/hash_utils.c 2005-02-18 16:28:40.541017968 -0500 +++ linux-2.6-ppc/arch/ppc64/mm/hash_utils.c 2005-02-18 16:28:53.400922920 -0500 @@ -116,13 +116,13 @@ hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_PSERIES_LPAR if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, 0, mode, 1, large); else -#endif /* CONFIG_PPC_PSERIES */ +#endif /* CONFIG_PPC_PSERIES_LPAR */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, 0, mode, 1, large); Index: linux-2.6-ppc/arch/ppc64/xmon/xmon.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/xmon/xmon.c 2005-02-18 16:28:53.353930064 -0500 +++ linux-2.6-ppc/arch/ppc64/xmon/xmon.c 2005-02-18 16:28:53.402922616 -0500 @@ -629,11 +629,13 @@ (data address breakpoint register) directly. */ static void set_controlled_dabr(unsigned long val) { +#ifdef CONFIG_PPC_PSERIES_LPAR if (systemcfg->platform == PLATFORM_PSERIES_LPAR) { int rc = plpar_hcall_norets(H_SET_DABR, val); if (rc != H_Success) xmon_printf("Warning: setting DABR failed (%d)\n", rc); } else +#endif /* CONFIG_PPC_PSERIES_LPAR */ set_dabr(val); } Index: linux-2.6-ppc/drivers/char/Kconfig =================================================================== --- linux-2.6-ppc.orig/drivers/char/Kconfig 2005-02-18 16:28:40.545017360 -0500 +++ linux-2.6-ppc/drivers/char/Kconfig 2005-02-18 16:28:53.404922312 -0500 @@ -557,7 +557,7 @@ config HVC_CONSOLE bool "pSeries Hypervisor Virtual Console support" - depends on PPC_PSERIES + depends on PPC_PSERIES_LPAR help pSeries machines when partitioned support a hypervisor virtual console. This driver allows each pSeries partition to have a console @@ -565,7 +565,7 @@ config HVCS tristate "IBM Hypervisor Virtual Console Server support" - depends on PPC_PSERIES + depends on PPC_PSERIES_LPAR help Partitionable IBM Power5 ppc64 machines allow hosting of firmware virtual consoles from one Linux partition by Index: linux-2.6-ppc/drivers/net/Kconfig =================================================================== --- linux-2.6-ppc.orig/drivers/net/Kconfig 2005-02-18 16:28:40.547017056 -0500 +++ linux-2.6-ppc/drivers/net/Kconfig 2005-02-18 16:28:53.407921856 -0500 @@ -1171,7 +1171,7 @@ config IBMVETH tristate "IBM LAN Virtual Ethernet support" - depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES + depends on NETDEVICES && NET_ETHERNET && PPC_PSERIES_LPAR ---help--- This driver supports virtual ethernet adapters on newer IBM iSeries and pSeries systems. Index: linux-2.6-ppc/drivers/scsi/Kconfig =================================================================== --- linux-2.6-ppc.orig/drivers/scsi/Kconfig 2005-02-18 16:28:40.550016600 -0500 +++ linux-2.6-ppc/drivers/scsi/Kconfig 2005-02-18 16:28:53.409921552 -0500 @@ -798,7 +798,7 @@ config SCSI_IBMVSCSI tristate "IBM Virtual SCSI support" - depends on PPC_PSERIES || PPC_ISERIES + depends on PPC_PSERIES_LPAR || PPC_ISERIES help This is the IBM POWER Virtual SCSI Client -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050222/a136d144/attachment.pgp From benh at kernel.crashing.org Wed Feb 23 07:43:33 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 23 Feb 2005 07:43:33 +1100 Subject: pSeries_request_regions() In-Reply-To: <1109089456.21332.6.camel@sinatra.austin.ibm.com> References: <1109035294.7706.1.camel@sinatra.austin.ibm.com> <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> <1109055924.5327.102.camel@gaston> <1109089456.21332.6.camel@sinatra.austin.ibm.com> Message-ID: <1109105013.5326.118.camel@gaston> On Tue, 2005-02-22 at 10:24 -0600, John Rose wrote: > > The reason we request them in the first place is to prevent drivers from > > mucking around with those bits of hardware no ? (so they fail their own > > request_region). > > > > I recently added a platform hook to deal with that in a slightly more > > suitable way, though not all legacy drivers have been fixed yet to use > > it. > > For the case of dynamically adding a PHB to a system that booted with no > physical I/O, I need to pick a starting virtual address for the I/O > region. I could hard-code to start past these regions, but I'm not sure > it makes sense for these to exist in the first place. Would legacy > drivers even come into play on a purely virtual system? Well, if you are adding a PHB, it's no longer purely virtual... you might well have something like a legacy serial card or whatever on this PGB no ? (though most of these can now properly map their BARs above 0x1000 anyway) or a VGA card ... Ben. From johnrose at austin.ibm.com Wed Feb 23 07:55:41 2005 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 22 Feb 2005 14:55:41 -0600 Subject: pSeries_request_regions() In-Reply-To: <1109105013.5326.118.camel@gaston> References: <1109035294.7706.1.camel@sinatra.austin.ibm.com> <20050222061438.GA5618@krispykreme.ozlabs.ibm.com> <1109055924.5327.102.camel@gaston> <1109089456.21332.6.camel@sinatra.austin.ibm.com> <1109105013.5326.118.camel@gaston> Message-ID: <1109105741.21332.23.camel@sinatra.austin.ibm.com> On Tue, 2005-02-22 at 14:43, Benjamin Herrenschmidt wrote: > Well, if you are adding a PHB, it's no longer purely virtual... you > might well have something like a legacy serial card or whatever on this > PGB no ? (though most of these can now properly map their BARs above > 0x1000 anyway) or a VGA card ... For the machines that support PHB DLPAR (Power5), ISA is not supported. So should I hard-code to allow for these I/O ports, or are you against removing these reservations for the case of no ISA? Thanks- John From benh at kernel.crashing.org Wed Feb 23 08:04:34 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 23 Feb 2005 08:04:34 +1100 Subject: iMac G5 In-Reply-To: <610e3466050222015016158fb5@mail.gmail.com> References: <610e3466050222015016158fb5@mail.gmail.com> Message-ID: <1109106275.5411.135.camel@gaston> On Tue, 2005-02-22 at 05:50 -0400, Chris L. Mason wrote: > Hi, > > Does anyone know if the iMac G5 stuff has been merged into the main > kernel, or if not, > if there are plans to do so? Not yet, and yes, I will merge in after 2.6.11 is released. Ben. From clmason at gmail.com Wed Feb 23 08:27:42 2005 From: clmason at gmail.com (Chris L. Mason) Date: Tue, 22 Feb 2005 17:27:42 -0400 Subject: iMac G5 In-Reply-To: <1109106275.5411.135.camel@gaston> References: <610e3466050222015016158fb5@mail.gmail.com> <1109106275.5411.135.camel@gaston> Message-ID: <610e3466050222132751ca9095@mail.gmail.com> On Wed, 23 Feb 2005 08:04:34 +1100, Benjamin Herrenschmidt wrote: > On Tue, 2005-02-22 at 05:50 -0400, Chris L. Mason wrote: > > Hi, > > > > Does anyone know if the iMac G5 stuff has been merged into the main > > kernel, or if not, > > if there are plans to do so? > > Not yet, and yes, I will merge in after 2.6.11 is released. > Great, thanks! Chris From linas at austin.ibm.com Wed Feb 23 11:08:10 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 22 Feb 2005 18:08:10 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) Message-ID: <20050223000810.GA32744@austin.ibm.com> Hi Ben, Paul, Brian The atteched prototype code will recover from EEH errors that would normally take out the root filesystem SCSI volume. The patch adds some simple hooks into the IPR scsi device driver to accomplish this. This code falls back to the old/original design points, the basic idea being: -- A device driver can register some callbacks to get notified of various points in the EEH recovery proceedure. See struct eeh_recovery_ops in include/asm-ppc64/eeh.h -- A "master" recovery routine steps through the EEH recovery steps, notifying the device driver of the stages. The reason for a "master" routine is to handle multi-function adapters (although the prototype doesn't yet handle multi-function). -- If a device driver has not registered any callbacks, then the "master" routine hot-unplugs/replugs the device driver. The code is "prototype", there are things that are broken in there, marked with XXX typically. Ben, this is as close as I could get to the email you sent me yesterday. My goal here was to go as generic as possible, so that the general shape of "struct eeh_recovery_ops" matches what one might expect to get from a generic PCI-Express recovery design. Brian, can you review the IPR portion of the patch and provide comments or fixes? This applies to a circa-January BK tree. --linas -------------- next part -------------- ===== arch/ppc64/kernel/eeh.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/eeh.c 2005-01-06 13:05:42 -06:00 +++ edited/arch/ppc64/kernel/eeh.c 2005-02-22 17:27:36 -06:00 @@ -17,21 +17,19 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include #include -#include #include #include #include #include #include -#include +#include #include #include #include #include -#include #include "pci.h" #undef DEBUG @@ -88,8 +86,7 @@ static struct notifier_block *eeh_notifi * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -106,6 +103,10 @@ static spinlock_t slot_errbuf_lock = SPI static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -224,9 +225,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -244,6 +245,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -368,6 +374,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -378,6 +385,17 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; + } } #ifdef DEBUG @@ -389,6 +407,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ +void eeh_slot_error_detail (struct device_node *dn, int severity) +{ + unsigned long flags; + int rc; + + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); +} + +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * eeh_register_notifier - Register to find out about EEH events. * @nb: notifier block to callback on events @@ -421,10 +465,11 @@ static int read_slot_reset_state(struct outputs = 4; } else { token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ outputs = 3; } - return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, + return rtas_call(token, 3, outputs, rets, dn->eeh_config_addr, BUID_HI(dn->phb->buid), BUID_LO(dn->phb->buid)); } @@ -480,15 +525,15 @@ static void eeh_event_handler(void *dumm if (event == NULL) break; - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s %s\n", event->reset_state, - pci_name(event->dev), pci_pretty_name(event->dev)); - - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); + if (event->reset_state != 5) { + printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " + "%s %s\n", event->reset_state, + pci_name(event->dev), pci_pretty_name(event->dev)); + } __get_cpu_var(slot_resets)++; + notifier_call_chain (&eeh_notifier_chain, + EEH_NOTIFY_FREEZE, event); pci_dev_put(event->dev); kfree(event); @@ -496,8 +541,8 @@ static void eeh_event_handler(void *dumm } /** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -532,7 +577,6 @@ int eeh_dn_check_failure(struct device_n int ret; int rets[3]; unsigned long flags; - int rc, reset_state; struct eeh_event *event; __get_cpu_var(total_mmio_ffs)++; @@ -540,16 +584,20 @@ int eeh_dn_check_failure(struct device_n if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -558,8 +606,11 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_check_count ++; + if (dn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Driver ignored %d bad reads, panicing\n", + dn->eeh_check_count); + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ @@ -576,42 +627,25 @@ int eeh_dn_check_failure(struct device_n * In any case they must share a common PHB. */ ret = read_slot_reset_state(dn, rets); - if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { + if (!(ret == 0 && ((rets[1] == 1 && (rets[0] == 2 || rets[0] >= 4)) + || (rets[0] == 5)))) { __get_cpu_var(false_positives)++; return 0; } - /* prevent repeated reports of this failure */ + /* Prevent repeated reports of this failure */ dn->eeh_mode |= EEH_MODE_ISOLATED; - reset_state = rets[0]; - - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); - - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); - - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); - - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); event = kmalloc(sizeof(*event), GFP_ATOMIC); if (event == NULL) { - eeh_panic(dev, reset_state); + printk (KERN_ERR "EEH: out of memory, event not handled\n"); return 1; } event->dev = dev; event->dn = dn; - event->reset_state = reset_state; + event->reset_state = rets[0]; + event->time_unavail = rets[2]; /* We may or may not be called in an interrupt context */ spin_lock_irqsave(&eeh_eventlist_lock, flags); @@ -621,7 +655,7 @@ int eeh_dn_check_failure(struct device_n /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - dump_stack(); + if (rets[0] != 5) dump_stack(); schedule_work(&eeh_event_wq); return 0; @@ -634,7 +668,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -642,6 +675,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -651,8 +685,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -663,6 +699,218 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +int +eeh_slot_is_isolated(struct pci_dev *dev) +{ + struct device_node *dn; + dn = pci_device_to_OF_node(dev); + return (dn->eeh_mode & EEH_MODE_ISOLATED); +} + +/** rtas_pci_slot_reset raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + */ +void +eeh_pci_slot_reset(struct pci_dev *dev, int state) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + rtas_pci_slot_reset (dn, state); +} + +/* return negative value if a permanent error, else return + * a number of milliseconds to wait until the PCI slot is + * ready to be used. + */ +static int +eeh_slot_availability(struct device_node *dn) +{ + int rc; + int rets[3]; + + rc = read_slot_reset_state(dn, rets); + if (rc) return rc; + + if (rets[1] == 0) return -1; /* EEH is not supported */ + if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 5) { + if (rets[2] == 0) return -1; /* permanently unavailable */ + return rets[2]; /* number of millisecs to wait */ + } + return -1; +} + +int +eeh_pci_slot_availability(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + if (!dn) return -1; + return eeh_slot_availability (dn); +} + +void +rtas_pci_slot_reset(struct device_node *dn, int state) +{ + int rc; + + if (!dn) + return; + + dn->eeh_mode |= EEH_MODE_RECOVERING; + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + state); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot, (%d) #RST=%d\n", rc, state); + return; + } + + if (state == 0) + dn->eeh_mode &= ~(EEH_MODE_RECOVERING|EEH_MODE_ISOLATED); +} + +/** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second + * dn -- device node to be reset. + */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int i, rc; + + rtas_pci_slot_reset (dn, 1); + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. */ + +#define PCI_BUS_RST_HOLD_TIME_MSEC 250 + msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + rtas_pci_slot_reset (dn, 0); + + /* After a PCI slot has been reset, the PCI Express spec requires + * a 1.5 second idle time for the bus to stabilize, before starting + * up traffic. */ +#define PCI_BUS_SETTLE_TIME_MSEC 1800 + msleep (PCI_BUS_SETTLE_TIME_MSEC); + + /* Now double check with the firmware to make sure the device is + * ready to be used; if not, wait for recovery. */ + for (i=0; i<10; i++) { + rc = eeh_slot_availability (dn); + if (rc <= 0) return; + + msleep (rc+100); + } +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge\n"); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/* EEH Error Recovery registeration */ + +void eeh_register_recovery_ops (struct pci_dev *dev, + struct eeh_recovery_ops *ops) +{ + struct device_node *dn; + dn = pci_device_to_OF_node(dev); + dn->eeh_ops = ops; +} + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + + if (NULL==dn->phb) return; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct device_node *dn) +{ + if (! dn->eeh_is_bridge) + __restore_bars (dn); + + if (dn->child) + eeh_restore_bars (dn->child); +#if DO_SIBLINGS + if (dn->sibling) + eeh_restore_bars (dn->sibling); +#endif +} + +void eeh_pci_restore_bars(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_restore_bars (dn); +} + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + +#define EEH_ENABLE 1 + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -742,7 +990,7 @@ static void *early_enable_eeh(struct dev dn->full_name); } - return NULL; + return NULL; } /* @@ -829,7 +1077,9 @@ void eeh_add_device_early(struct device_ return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -848,6 +1098,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -857,6 +1110,14 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; } EXPORT_SYMBOL(eeh_add_device_late); @@ -886,12 +1147,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -899,13 +1165,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; ===== arch/ppc64/kernel/pSeries_pci.c 1.59 vs edited ===== --- 1.59/arch/ppc64/kernel/pSeries_pci.c 2004-11-15 21:29:10 -06:00 +++ edited/arch/ppc64/kernel/pSeries_pci.c 2005-01-20 17:25:37 -06:00 @@ -102,7 +102,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; ===== drivers/pci/hotplug/rpaphp.h 1.11 vs edited ===== --- 1.11/drivers/pci/hotplug/rpaphp.h 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp.h 2005-01-20 17:25:37 -06:00 @@ -125,7 +125,8 @@ extern int rpaphp_enable_pci_slot(struct extern int register_pci_slot(struct slot *slot); extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); -extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); ===== drivers/pci/hotplug/rpaphp_core.c 1.18 vs edited ===== --- 1.18/drivers/pci/hotplug/rpaphp_core.c 2004-10-06 11:43:44 -05:00 +++ edited/drivers/pci/hotplug/rpaphp_core.c 2005-01-20 17:25:37 -06:00 @@ -443,12 +443,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } ===== drivers/pci/hotplug/rpaphp_pci.c 1.17 vs edited ===== --- 1.17/drivers/pci/hotplug/rpaphp_pci.c 2004-11-18 02:36:18 -06:00 +++ edited/drivers/pci/hotplug/rpaphp_pci.c 2005-02-22 17:25:07 -06:00 @@ -22,8 +22,12 @@ * Send feedback to * */ +#include +#include #include +#include #include +#include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -62,6 +66,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -184,6 +189,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -211,6 +229,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -219,7 +239,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -236,11 +255,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -278,7 +293,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -360,7 +375,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -562,36 +576,266 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_search_bus_for_dev - return 1 if device is under this bus, else 0 + * @bus: the bus to search for this device. + * @dev: the pci device we are looking for. + */ +static int rpaphp_search_bus_for_dev (struct pci_bus *bus, struct pci_dev *dev) +{ + struct list_head *ln; + + if (!bus) return 0; + + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return 1; + if (pdev->subordinate) { + int rc; + rc = rpaphp_search_bus_for_dev (pdev->subordinate, dev); + if (rc) + return 1; + } + } + return 0; +} + +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; - struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB's don't have bridges. */ + if (slot->bridge == NULL) continue; - } + + /* The PCI device could be the slot itself. */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; - } + + if (rpaphp_search_bus_for_dev (bus, dev)) + return slot; + } + return NULL; +} + +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ + +extern void rtas_set_eeh_option(struct device_node *dn, int state); + +int eeh_reset_device (struct pci_dev *dev, int reconfig) +{ + struct slot *frozen_slot; + + if (!dev) + return 1; + + frozen_slot = rpaphp_find_slot(dev); + if (!frozen_slot) + { + printk (KERN_ERR "EEH: Cannot find PCI slot for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + return 1; } + if (reconfig) rpaphp_unconfig_pci_adapter (frozen_slot); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (frozen_slot->dn->child); + rtas_configure_bridge(frozen_slot->dn); + eeh_restore_bars(frozen_slot->dn->child); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + if (reconfig) { + ssleep (5); + rpaphp_enable_pci_slot (frozen_slot); + } + return 0; +} + +static inline struct pci_dev * eeh_get_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); + +/* The longest amount of time to wait for a pci device + * to come back on line, in seconds. + */ +#define MAX_WAIT_FOR_RECOVERY 15 + +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + int freeze_count=0; + struct slot *frozen_slot; + struct device_node *frozen_device; + struct eeh_event *event = ev; + struct pci_dev *dev = event->dev; + int perm_failure = 0; + int rc; + + if (!dev) + dev = eeh_get_pci_dev (event->dn); + + if (!dev) + { + if (event->dn) + printk ("EEH: Cannot find the PCI device for dn %s\n", + event->dn->full_name); + else + printk ("EEH: EEH error caught, but no PCI device specified!\n"); + return 1; + } + + frozen_slot = rpaphp_find_slot(dev); + if (!frozen_slot) + { + printk (KERN_ERR "EEH: Cannot find PCI slot for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + return 1; + } + frozen_device = frozen_slot->dn->child; + + /* We get "permanent failure" messages on empty slots. + * These are false alarms. Empty slots have no child dn. */ + if ((event->reset_state == 5) && (frozen_device == NULL)) + return 0; + + if (frozen_device) + freeze_count = frozen_device->eeh_freeze_count; + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) + perm_failure = 1; + + /* If the reset state is a '5' and the time to reset is 0 (infinity) + * or is more then 15 seconds, then mark this as a permanent failure. + */ + if ((event->reset_state == 5) && + ((event->time_unavail <= 0) || + (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) + perm_failure = 1; + + /* Log the error with the rtas logger. */ + if (perm_failure) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (dev), + pci_pretty_name (dev), + freeze_count); + + eeh_slot_error_detail (frozen_device, 2 /* Permanent Error */); + + /* Notify the device that its about to go down. */ + /* XXX this should be a recursive walk to children for + * multi-function devices */ + if (frozen_device->eeh_ops && + frozen_device->eeh_ops->perm_failure) { + frozen_device->eeh_ops->perm_failure (dev, frozen_device->eeh_ops->data); + } + + /* Unconfigure the thing and go home. */ + rpaphp_unconfig_pci_adapter (frozen_slot); + return 1; + } else { + eeh_slot_error_detail (frozen_device, 1 /* Temporary Error */); + } + + printk (KERN_WARNING + "EEH: This device has failed %d times since last reboot: %s:%s\n", + freeze_count, + pci_name (dev), + pci_pretty_name (dev)); + + /* Walk the various device drivers attached to this slot through + * a reset sequence, giving each an opportunity to do what it needs + * to accomplish the reset */ + /* XXX this should be a recursive walk to children for + * multi-function devices; each child should get to report + * status too, if needed ... if any child can't handle the reset, + * then need to hotplug it. */ + if (frozen_device->eeh_ops) { + if (frozen_device->eeh_ops->frozen) { + frozen_device->eeh_ops->frozen (dev, frozen_device->eeh_ops->data); + } + rc = eeh_reset_device (dev, 0); + if (frozen_device->eeh_ops->post_reset) { + frozen_device->eeh_ops->post_reset (dev, frozen_device->eeh_ops->data); + } + + } else { + rc = eeh_reset_device (dev, 1); + } + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + if (frozen_slot->dn->child) + frozen_slot->dn->child->eeh_freeze_count = freeze_count; + + return rc; +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + eeh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + eeh_unregister_notifier (&eeh_block); +} + ===== drivers/scsi/ipr.c 1.31 vs edited ===== --- 1.31/drivers/scsi/ipr.c 2004-12-14 17:06:35 -06:00 +++ edited/drivers/scsi/ipr.c 2005-02-22 17:37:41 -06:00 @@ -80,6 +80,8 @@ #include #include #include + +#define CONFIG_SCSI_IPR_EEH #include "ipr.h" /* @@ -2917,7 +2919,6 @@ static int ipr_eh_host_reset(struct scsi if (WAIT_FOR_DUMP == ioa_cfg->sdt_state) ioa_cfg->sdt_state = GET_DUMP; - rc = ipr_reset_reload(ioa_cfg, IPR_SHUTDOWN_ABBREV); LEAVE; @@ -5007,6 +5008,67 @@ static int ipr_reset_start_bist(struct i return rc; } +#ifdef CONFIG_SCSI_IPR_EEH + +static int ipr_reset_shutdown_ioa(struct ipr_cmnd *ipr_cmd); + +#define IPR_WAIT_FOR_EEH_RESET (HZ) +static int ipr_reset_poll_eeh_recovery(struct ipr_cmnd *ipr_cmd) +{ + struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg; + int rc; + + ENTER; + if (ioa_cfg->wait_on_eeh_reset) { + ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_EEH_RESET); + rc = IPR_RC_JOB_RETURN; + } else { + ipr_cmd->job_step = ipr_reset_start_bist; + rc = IPR_RC_JOB_CONTINUE; + } + + LEAVE; + return rc; +} + +static void ipr_eeh_frozen (struct pci_dev *pdev, void * data) +{ + struct ipr_ioa_cfg *ioa_cfg = data; + ioa_cfg->wait_on_eeh_reset = 1; +} + +static void ipr_eeh_post_reset (struct pci_dev *pdev, void * data) +{ + struct ipr_ioa_cfg *ioa_cfg = data; + ioa_cfg->wait_on_eeh_reset = 0; +} + +static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data) +{ + // struct ipr_ioa_cfg *ioa_cfg = data; + +#if 0 // XXXXXXXXXXXXXXXXXXXXXXX + ipr_cmd->job_step = ipr_reset_shutdown_ioa; + rc = IPR_RC_JOB_CONTINUE; +#endif +} + +static void ipr_register_eeh_handlers (struct ipr_ioa_cfg *ioa_cfg) +{ + /* XXX borken memory management; this malloc not managed */ + struct eeh_recovery_ops *eeh_ops; + eeh_ops = kmalloc (sizeof(struct eeh_recovery_ops), GFP_KERNEL); + memset (eeh_ops, 0, sizeof(struct eeh_recovery_ops)); + eeh_ops->frozen = ipr_eeh_frozen; + eeh_ops->post_reset = ipr_eeh_post_reset; + eeh_ops->perm_failure = ipr_eeh_perm_failure; + eeh_ops->data = ioa_cfg; + eeh_register_recovery_ops (ioa_cfg->pdev, eeh_ops); +} + +#endif + + /** * ipr_reset_allowed - Query whether or not IOA can be reset * @ioa_cfg: ioa config struct @@ -5042,6 +5104,7 @@ static int ipr_reset_wait_to_start_bist( struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg; int rc = IPR_RC_JOB_RETURN; +#ifndef CONFIG_SCSI_IPR_EEH if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left) { ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); @@ -5049,6 +5112,21 @@ static int ipr_reset_wait_to_start_bist( ipr_cmd->job_step = ipr_reset_start_bist; rc = IPR_RC_JOB_CONTINUE; } +#else + if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left + && !eeh_slot_is_isolated (ioa_cfg->pdev)) { + + ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; + ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); + } else { + if (eeh_slot_is_isolated (ioa_cfg->pdev)) { + ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; + } else { + ipr_cmd->job_step = ipr_reset_start_bist; + } + rc = IPR_RC_JOB_CONTINUE; + } +#endif return rc; } @@ -5079,7 +5157,16 @@ static int ipr_reset_alert(struct ipr_cm writel(IPR_UPROCI_RESET_ALERT, ioa_cfg->regs.set_uproc_interrupt_reg); ipr_cmd->job_step = ipr_reset_wait_to_start_bist; } else { +#ifndef CONFIG_SCSI_IPR_EEH ipr_cmd->job_step = ipr_reset_start_bist; +#else + if (eeh_slot_is_isolated (ioa_cfg->pdev)) { + ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; + return IPR_RC_JOB_CONTINUE; + } else { + ipr_cmd->job_step = ipr_reset_start_bist; + } +#endif } ipr_cmd->u.time_left = IPR_WAIT_FOR_RESET_TIMEOUT; @@ -5759,6 +5846,10 @@ static int __devinit ipr_probe_ioa(struc /* Save away PCI config space for use following IOA reset */ rc = pci_save_state(pdev); + +#ifdef CONFIG_SCSI_IPR_EEH + ipr_register_eeh_handlers (ioa_cfg); +#endif if (rc != PCIBIOS_SUCCESSFUL) { dev_err(&pdev->dev, "Failed to save PCI config space\n"); ===== drivers/scsi/ipr.h 1.21 vs edited ===== --- 1.21/drivers/scsi/ipr.h 2004-12-14 17:09:02 -06:00 +++ edited/drivers/scsi/ipr.h 2005-02-22 15:52:36 -06:00 @@ -833,6 +833,9 @@ struct ipr_ioa_cfg { u8 dump_taken:1; u8 allow_cmds:1; u8 allow_ml_add_del:1; +#ifdef CONFIG_SCSI_IPR_EEH + u8 wait_on_eeh_reset:1; +#endif u16 type; /* CCIN of the card */ @@ -1132,9 +1135,11 @@ struct ipr_ucode_image_header { #define ipr_trace ipr_dbg("%s: %s: Line: %d\n",\ __FILE__, __FUNCTION__, __LINE__) +#undef IPR_DBG_TRACE +#define IPR_DBG_TRACE 1 #if IPR_DBG_TRACE -#define ENTER printk(KERN_INFO IPR_NAME": Entering %s\n", __FUNCTION__) -#define LEAVE printk(KERN_INFO IPR_NAME": Leaving %s\n", __FUNCTION__) +#define ENTER printk(KERN_INFO IPR_NAME": Entering %s jiffies=%lu\n", __FUNCTION__, jiffies) +#define LEAVE printk(KERN_INFO IPR_NAME": Leaving %s jiffies=%lu\n", __FUNCTION__, jiffies) #else #define ENTER #define LEAVE ===== drivers/scsi/sym53c8xx_2/sym_glue.c 1.52 vs edited ===== --- 1.52/drivers/scsi/sym53c8xx_2/sym_glue.c 2004-10-24 11:08:18 -05:00 +++ edited/drivers/scsi/sym53c8xx_2/sym_glue.c 2005-02-22 17:15:38 -06:00 @@ -851,6 +851,7 @@ static int sym_eh_handler(int op, char * sprintf(devname, "%s:%d:%d", sym_name(np), cmd->device->id, cmd->device->lun); printf_warning("%s: %s operation started.\n", devname, opname); +printk("duuuude %s: %s like operation started.\n", devname, opname); #if 0 /* This one should be the result of some race, thus to ignore */ @@ -896,6 +897,17 @@ prepare: sts = 0; break; case SYM_EH_HOST_RESET: +#define CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +printk("duuuuuude attempting symbios recovery\n"); +dump_stack(); +int rc = eeh_slot_is_isolated (np->s.device); + +printk ("duude symbios is isolated ??=%d\n", rc); +if (rc) { + eeh_reset_device (np->s.device, 0); +} +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ sym_reset_scsi_bus(np, 0); sym_start_up (np, 1); sts = 0; @@ -1587,6 +1599,21 @@ out_err32: return -1; } +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +static void sym_register_eeh_handlers (struct sym_device *dev) +{ + /* XXX borken memory management; this malloc not managed */ + struct eeh_recovery_ops *eeh_ops; + eeh_ops = kmalloc (sizeof(struct eeh_recovery_ops), GFP_KERNEL); + memset (eeh_ops, 0, sizeof(struct eeh_recovery_ops)); + eeh_ops->frozen = NULL; + eeh_ops->post_reset = NULL; + eeh_ops->perm_failure = NULL; + eeh_ops->data = dev; + eeh_register_recovery_ops (dev->pdev, eeh_ops); +} +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ + /* * Host attach and initialisations. * @@ -1672,6 +1699,9 @@ static struct Scsi_Host * __devinit sym_ strlcpy(np->s.chip_name, dev->chip.name, sizeof(np->s.chip_name)); sprintf(np->s.inst_name, "sym%d", np->s.unit); +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY + sym_register_eeh_handlers (dev); +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ /* * Ask/tell the system about DMA addressing. */ ===== include/asm-ppc64/eeh.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/eeh.h 2004-10-25 18:17:38 -05:00 +++ edited/include/asm-ppc64/eeh.h 2005-02-22 13:21:49 -06:00 @@ -22,8 +22,8 @@ #include #include -#include #include +#include struct pci_dev; struct device_node; @@ -32,6 +32,11 @@ struct device_node; #define EEH_MODE_SUPPORTED (1<<0) #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_RECOVERING (1<<3) + +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 #ifdef CONFIG_PPC_PSERIES extern void __init eeh_init(void); @@ -60,16 +65,76 @@ void eeh_add_device_late(struct pci_dev * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). + * This routine should be called when a device is removed from + * a running system (e.g. by hotplug or dlpar). It unregisters + * the PCI device from the EEH subsystem. I/O errors affecting + * this device will no longer be detected after this call; thus, + * i/o errors affecting this slot may leave this device unusable. */ void eeh_remove_device(struct pci_dev *); -#define EEH_DISABLE 0 -#define EEH_ENABLE 1 -#define EEH_RELEASE_LOADSTORE 2 -#define EEH_RELEASE_DMA 3 -int eeh_set_option(struct pci_dev *dev, int options); +/** + * eeh_slot_is_isolated -- return non-zero value if slot is frozen + */ +int eeh_slot_is_isolated (struct pci_dev *dev); + +/** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** rtas_pci_slot_reset raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + * + * Clear the EEH-frozen condition on a slot. This routine + * asserts the PCI #RST line if the 'state' argument is '1', + * and drops the #RST line if 'state is '0'. This routine is + * safe to call in an interrupt context. + * + */ +void rtas_pci_slot_reset(struct device_node *dn, int state); +void eeh_pci_slot_reset(struct pci_dev *dev, int state); + +/** eeh_pci_slot_availability -- Indicates whether a PCI + * slot is ready to be used. After a PCI reset, it may take a while + * for the PCI fabric to fully reset the comminucations path to the + * given PCI card. This routine can be used to determine how long + * to wait before a PCI slot might become usable. + * + * This routine returns how long to wait (in milliseconds) before + * the slot is expected to be usable. A value of zero means the + * slot is immediately usable. A negavitve value means that the + * slot is permanently disabled. + */ +int eeh_pci_slot_availability(struct pci_dev *dev); + +/** Restore device configuration info across device resets. + */ +void eeh_restore_bars(struct device_node *); +void eeh_pci_restore_bars(struct pci_dev *dev); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); /** @@ -86,11 +151,27 @@ struct eeh_event { struct pci_dev *dev; struct device_node *dn; int reset_state; + int time_unavail; }; /** Register to find out about EEH events. */ int eeh_register_notifier(struct notifier_block *nb); int eeh_unregister_notifier(struct notifier_block *nb); + + +/** EEH error recovery callbacks. These will be called on a + * registered device driver during the EEH recovery proceedure. + * Eventually, this should be a part of struct pci_driver + */ +struct eeh_recovery_ops { + void (*frozen) (struct pci_dev *, void *); /* called when dev is first frozen */ + void (*post_reset) (struct pci_dev *, void *); /* called after card is reset */ + void (*perm_failure) (struct pci_dev *, void *); /* called if card is dead */ + void * data; /* pointer to self */ +}; + +/** Register a set of recovery ops for an EEH event */ +void eeh_register_recovery_ops (struct pci_dev *, struct eeh_recovery_ops *); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. ===== include/asm-ppc64/prom.h 1.24 vs edited ===== --- 1.24/include/asm-ppc64/prom.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/prom.h 2005-02-22 12:07:09 -06:00 @@ -144,6 +144,7 @@ struct property { */ struct pci_controller; struct iommu_table; +struct eeh_recovery_ops; struct device_node { char *name; @@ -164,8 +165,13 @@ struct device_node { int status; /* Current device status (non-zero is bad) */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_check_count; /* number of times device driver ignored error */ + int eeh_freeze_count; /* number of times this device froze up. */ + int eeh_is_bridge; /* device is pci-to-pci bridge */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ + struct eeh_recovery_ops *eeh_ops; /* recovery callbacks */ struct property *properties; struct device_node *parent; ===== include/asm-ppc64/rtas.h 1.25 vs edited ===== --- 1.25/include/asm-ppc64/rtas.h 2004-11-25 00:42:42 -06:00 +++ edited/include/asm-ppc64/rtas.h 2005-01-20 17:25:37 -06:00 @@ -241,4 +241,6 @@ extern void rtas_stop_self(void); /* RMO buffer reserved for user-space RTAS use */ extern unsigned long rtas_rmo_buf; +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ From olof at austin.ibm.com Wed Feb 23 14:40:03 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 22 Feb 2005 21:40:03 -0600 Subject: [RFC/PATCH] ppc64: Add mem=X option In-Reply-To: <20050222192423.727023f7.michael@ellerman.id.au> References: <20050222192423.727023f7.michael@ellerman.id.au> Message-ID: <20050223034003.GB15427@austin.ibm.com> On Tue, Feb 22, 2005 at 07:24:23PM +1100, Michael Ellerman wrote: > Hi Anton, Ben, and the rest of ya, > > Here is my first take at adding support for the mem=X boot option. > Please check it out. Yes, finally! :-) This has been asked for many a time, but noone has implemented it before. Good work! I have some comments below, most of them are nitpicks. > -#if 0 /* XXX not currently used */ > +/* These three variables are used to save values passed to us by prom_init() > + * via the device tree. The TCE variables are needed because with a memory_limit > + * in force we may need to explicitly map the TCE are at the top of RAM. */ I think it's common kernel coding style to put the final */ on a new line to get all the *'s lined up. There are other comments in the patch with the same style, I'll only point out this one. > +unsigned long tce_alloc_start; > +unsigned long tce_alloc_end; > > +#ifdef CONFIG_PPC_ISERIES > +/* On iSeries we just parse the mem=X option from the command line. > + * On pSeries it's a bit more complicated, see prom_init_mem() */ > static int __init early_parsemem(char *p) > { > if (!p) > @@ -818,7 +824,7 @@ static int __init early_parsemem(char *p > return 0; > } > early_param("mem", early_parsemem); > -#endif > +#endif /* CONFIG_PPC_ISERIES */ > > #ifdef CONFIG_PPC_MULTIPLATFORM > static int __init set_preferred_console(void) > Index: latest/arch/ppc64/kernel/lmb.c > =================================================================== > --- latest.orig/arch/ppc64/kernel/lmb.c > +++ latest/arch/ppc64/kernel/lmb.c > @@ -344,3 +344,34 @@ lmb_abs_to_phys(unsigned long aa) > > return pa; > } > + > +/* Truncate the lmb list to memory_limit if it's set > + * You must call lmb_analyze() after this. */ > +void __init lmb_apply_memory_limit(void) > +{ > + extern unsigned long memory_limit; > + unsigned long i, total = 0, crop; > + struct lmb_region *mem = &(lmb.memory); > + > + if (likely(!memory_limit)) > + return; No need to worry about likely/unlikely here. It's not a hot call path. > + > + for (i = 0; i < mem->cnt; i++) { > + total += mem->region[i].size; > + > + if (total <= memory_limit) > + continue; > + > + crop = (memory_limit - (total - mem->region[i].size)); > +#ifdef DEBUG > + udbg_printf("lmb_truncate(): truncating at region %x\n", i); > + udbg_printf("lmb_truncate(): total = %x\n", total); > + udbg_printf("lmb_truncate(): size = %x\n", mem->region[i].size); > + udbg_printf("lmb_truncate(): crop = %x\n", crop); > +#endif > + > + mem->region[i].size = crop; > + mem->cnt = i + 1; > + break; With the above tests, there's a chance for the last LMB to be of size 0. I don't think it matters for things to work, but you might as well skip that one too. Changing the test to (total < memory_limit) should take care of it, the last one will just be cropped by nothing. Also, if you just reduce the size of memory_limit instead of keep the rolling total, the crop calculation will be simpler. It took a bit of thinking to make sure it's right the way it's written now. > + } > +} > Index: latest/include/asm-ppc64/lmb.h > =================================================================== > --- latest.orig/include/asm-ppc64/lmb.h > +++ latest/include/asm-ppc64/lmb.h > @@ -53,6 +53,7 @@ extern unsigned long __init lmb_alloc_ba > extern unsigned long __init lmb_phys_mem_size(void); > extern unsigned long __init lmb_end_of_DRAM(void); > extern unsigned long __init lmb_abs_to_phys(unsigned long); > +extern void __init lmb_apply_memory_limit(void); > > extern void lmb_dump_all(void); > > Index: latest/arch/ppc64/kernel/iSeries_setup.c > =================================================================== > --- latest.orig/arch/ppc64/kernel/iSeries_setup.c > +++ latest/arch/ppc64/kernel/iSeries_setup.c > @@ -284,7 +284,7 @@ unsigned long iSeries_process_mainstore_ > return mem_blocks; > } > > -static void __init iSeries_parse_cmdline(void) > +static void __init iSeries_get_cmdline(void) > { > char *p, *q; > > @@ -304,6 +304,8 @@ static void __init iSeries_parse_cmdline > > /*static*/ void __init iSeries_init_early(void) > { > + extern unsigned long memory_limit; > + > DBG(" -> iSeries_init_early()\n"); > > ppcdbg_initialize(); > @@ -351,6 +353,29 @@ static void __init iSeries_parse_cmdline > */ > build_iSeries_Memory_Map(); > > + iSeries_get_cmdline(); > + > + /* Save unparsed command line copy for /proc/cmdline */ > + strlcpy(saved_command_line, cmd_line, COMMAND_LINE_SIZE); > + > + /* Parse early parameters, in particular mem=x */ > + parse_early_param(); > + > + if (unlikely(memory_limit)) { > + if (memory_limit > systemcfg->physicalMemorySize) > + printk("Ignoring 'mem' option, value %lu is too large.\n", memory_limit); > + else > + systemcfg->physicalMemorySize = memory_limit; > + } > + > + /* Bolt kernel mappings for all of memory */ > + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); Do you want to bolt it all, even if there's a mem= limitation? > + > + lmb_init(); > + lmb_add(0, systemcfg->physicalMemorySize); > + lmb_analyze(); /* ?? */ > + lmb_reserve(0, __pa(klimit)); > + > /* Initialize machine-dependency vectors */ > #ifdef CONFIG_SMP > smp_init_iSeries(); > @@ -376,9 +401,6 @@ static void __init iSeries_parse_cmdline > initrd_start = initrd_end = 0; > #endif /* CONFIG_BLK_DEV_INITRD */ > > - > - iSeries_parse_cmdline(); > - > DBG(" <- iSeries_init_early()\n"); > } > > @@ -539,14 +561,6 @@ static void __init build_iSeries_Memory_ > * nextPhysChunk > */ > systemcfg->physicalMemorySize = chunk_to_addr(nextPhysChunk); > - > - /* Bolt kernel mappings for all of memory */ > - iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); > - > - lmb_init(); > - lmb_add(0, systemcfg->physicalMemorySize); > - lmb_analyze(); /* ?? */ > - lmb_reserve(0, __pa(klimit)); > } > > /* > Index: latest/arch/ppc64/kernel/prom.c > =================================================================== > --- latest.orig/arch/ppc64/kernel/prom.c > +++ latest/arch/ppc64/kernel/prom.c > @@ -875,6 +875,8 @@ static int __init early_init_dt_scan_cho > const char *full_path, void *data) > { > u32 *prop; > + u64 *prop64; > + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; > > if (strcmp(full_path, "/chosen") != 0) > return 0; > @@ -891,6 +893,18 @@ static int __init early_init_dt_scan_cho > if (get_flat_dt_prop(node, "linux,iommu-force-on", NULL) != NULL) > iommu_force_on = 1; > > + prop64 = (u64*)get_flat_dt_prop(node, "linux,memory-limit", NULL); > + if (prop64) > + memory_limit = *prop64; > + > + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-start", NULL); > + if (prop64) > + tce_alloc_start = *prop64; > + > + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-end", NULL); > + if (prop64) > + tce_alloc_end = *prop64; > + > #ifdef CONFIG_PPC_PSERIES > /* To help early debugging via the front panel, we retreive a minimal > * set of RTAS infos now if available > @@ -1030,6 +1044,7 @@ void __init early_init_devtree(void *par > lmb_init(); > scan_flat_dt(early_init_dt_scan_root, NULL); > scan_flat_dt(early_init_dt_scan_memory, NULL); > + lmb_apply_memory_limit(); > lmb_analyze(); > systemcfg->physicalMemorySize = lmb_phys_mem_size(); > lmb_reserve(0, __pa(klimit)); > Index: latest/arch/ppc64/mm/hash_utils.c > =================================================================== > --- latest.orig/arch/ppc64/mm/hash_utils.c > +++ latest/arch/ppc64/mm/hash_utils.c > @@ -140,6 +140,8 @@ void __init htab_initialize(void) > unsigned long pteg_count; > unsigned long mode_rw; > int i, use_largepages = 0; > + unsigned long base = 0, size = 0; > + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; > > DBG(" -> htab_initialize()\n"); > > @@ -195,8 +197,6 @@ void __init htab_initialize(void) > > /* create bolted the linear mapping in the hash table */ > for (i=0; i < lmb.memory.cnt; i++) { > - unsigned long base, size; > - > base = lmb.memory.region[i].physbase + KERNELBASE; > size = lmb.memory.region[i].size; > > @@ -225,6 +225,21 @@ void __init htab_initialize(void) > #endif /* CONFIG_U3_DART */ > create_pte_mapping(base, base + size, mode_rw, use_largepages); > } > + > + /* If we have a memory_limit and we've allocated TCEs then we need to > + * explicitly map the TCE area at the top of RAM. We also cope with the > + * case that the TCEs start below memory_limit. */ > + if (unlikely(memory_limit && tce_alloc_start && tce_alloc_end)) { Same here, not really a hot path, you can take out the unlikely() Do you need to check both tce_alloc_start and tce_alloc_end? Won't both be set if one is? > + tce_alloc_start += KERNELBASE; > + tce_alloc_end += KERNELBASE; > + > + if (base + size >= tce_alloc_start) > + tce_alloc_start = base + size + 1; > + > + create_pte_mapping(tce_alloc_start, tce_alloc_end, > + mode_rw, use_largepages); Could tce_alloc_end ever be below memory_limit too? You might need to check tce_alloc_start and end to make sure you can use 16MB pages, or if you need 4K because of alignment/size constraints. Even if you don't have to, a comment related to it could be warranted. > + } > + > DBG(" <- htab_initialize()\n"); > } > #undef KB > Index: latest/arch/ppc64/mm/numa.c > =================================================================== > --- latest.orig/arch/ppc64/mm/numa.c > +++ latest/arch/ppc64/mm/numa.c > @@ -270,6 +270,7 @@ static int __init parse_numa_properties( > int max_domain = 0; > long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; > unsigned long i; > + extern unsigned long memory_limit; > > if (numa_enabled == 0) { > printk(KERN_WARNING "NUMA disabled by user\n"); > @@ -378,7 +379,7 @@ new_range: > size / PAGE_SIZE; > } > > - for (i = start ; i < (start+size); i += MEMORY_INCREMENT) > + for (i = start; i < (start+size) && i < lmb_end_of_DRAM(); i += MEMORY_INCREMENT) > numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = > numa_domain; > > @@ -387,8 +388,33 @@ new_range: > goto new_range; > } > > - for (i = 0; i <= max_domain; i++) > - node_set_online(i); > + if (unlikely(memory_limit)) { #include :-) > + unsigned long size, total = 0; > + > + for (i = 0; i <= max_domain; i++) { > + size = init_node_data[i].node_spanned_pages * PAGE_SIZE; > + total += size; > + > + if (total <= memory_limit) > + continue; > + > + size = (memory_limit - (total - size)) / PAGE_SIZE; > + dbg("NUMA: truncating node %ld to %ld pages\n", i, size); > + init_node_data[i].node_spanned_pages = size; > + break; > + } > + > + for (i++; i <= max_domain; i++) { I think I would change this to a regular while(++i < max_domain) loop instead. > + dbg("NUMA: offlining node %ld for memory_limit\n", i); > + node_set_offline(i); Just because the node doesn't get memory allocated we can't set it offline. I.e. the cpus will still be online. > + init_node_data[i].node_start_pfn = 0; > + init_node_data[i].node_spanned_pages = 0; > + } > + } else { > + /* FIXME do we need this? haven't we already done it in the else above? */ > + for (i = 0; i <= max_domain; i++) > + node_set_online(i); Good question, I think can go. That code was added by Matt Dobson earlier this year. > + } > > return 0; > err: > Index: latest/arch/ppc64/kernel/prom_init.c > =================================================================== > --- latest.orig/arch/ppc64/kernel/prom_init.c > +++ latest/arch/ppc64/kernel/prom_init.c > @@ -178,6 +178,9 @@ static int __initdata of_platform; > > static char __initdata prom_cmd_line[COMMAND_LINE_SIZE]; > > +static unsigned long __initdata memory_limit; > +static unsigned long __initdata tce_alloc_start; > +static unsigned long __initdata tce_alloc_end; A little confusing to have the same variable names here as local statics. Maybe rename them? Or use the global. > static unsigned long __initdata alloc_top; > static unsigned long __initdata alloc_top_high; > static unsigned long __initdata alloc_bottom; > @@ -385,10 +388,64 @@ static int __init prom_setprop(phandle n > (u32)(unsigned long) value, (u32) valuelen); > } > > +/* We can't use the standard versions because of RELOC headaches. */ > +#define isxdigit(c) (('0' <= (c) && (c) <= '9') \ > + || ('a' <= (c) && (c) <= 'f') \ > + || ('A' <= (c) && (c) <= 'F')) > + > +#define isdigit(c) ('0' <= (c) && (c) <= '9') > +#define islower(c) ('a' <= (c) && (c) <= 'z') > +#define toupper(c) (islower(c) ? ((c) - 'a' + 'A') : (c)) #define a tonum(c,base) or something like that, and use that below: > + > +unsigned long prom_strtoul(const char *cp, const char **endp) > +{ > + unsigned long result = 0, base = 10, value; > + > + if (*cp == '0') { > + base = 8; > + cp++; > + if (toupper(*cp) == 'X') { > + cp++; > + base = 16; > + } > + } > + > + while (isxdigit(*cp) && > + (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < base) { > + result = result * base + value; > + cp++; > + } Hmm. Would you mind breaking that up? It'll be a few more lines but much easier to read. > + > + if (endp) > + *endp = cp; > + > + return result; > +} > + > +unsigned long prom_memparse(const char *ptr, const char **retptr) > +{ > + unsigned long ret = prom_strtoul(ptr, retptr); > + > + switch (**retptr) { > + case 'G': > + case 'g': > + ret <<= 10; > + case 'M': > + case 'm': > + ret <<= 10; > + case 'K': > + case 'k': > + ret <<= 10; > + (*retptr)++; Do other architectures swallow/tolerate a b/B after the unit? Could be nice. > + default: > + break; > + } > + return ret; > +} > > /* > * Early parsing of the command line passed to the kernel, used for > - * the options that affect the iommu > + * "mem=x" and the options that affect the iommu > */ > static void __init early_cmdline_parse(void) > { > @@ -419,6 +476,14 @@ static void __init early_cmdline_parse(v > else if (!strncmp(opt, RELOC("force"), 5)) > RELOC(iommu_force_on) = 1; > } > + > + opt = strstr(RELOC(prom_cmd_line), RELOC("mem=")); > + if (opt) { > + opt += 4; > + RELOC(memory_limit) = prom_memparse(opt, (const char **)&opt); > + /* Align to 16 MB == size of large page */ > + RELOC(memory_limit) = ALIGN(RELOC(memory_limit), 0x1000000); Maybe a printk to say that it's been rounded up, so we don't surprise the user? > + } > } > > /* > @@ -665,15 +730,7 @@ static void __init prom_init_mem(void) > } > } > > - /* Setup our top/bottom alloc points, that is top of RMO or top of > - * segment 0 when running non-LPAR > - */ > - if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) > - RELOC(alloc_top) = RELOC(rmo_top); > - else > - RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); > RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); > - RELOC(alloc_top_high) = RELOC(ram_top); > > /* Check if we have an initrd after the kernel, if we do move our bottom > * point to after it > @@ -683,8 +740,37 @@ static void __init prom_init_mem(void) > > RELOC(alloc_bottom)) > RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(prom_initrd_end)); > } > + > + /* If memory_limit is set we reduce the upper limits *except* for > + * alloc_top_high. This must be the real top of RAM so we can put > + * TCE's up there. */ > + > + RELOC(alloc_top_high) = RELOC(ram_top); > + > + if (unlikely(RELOC(memory_limit))) { > + if (RELOC(memory_limit) <= RELOC(alloc_bottom)) { > + prom_printf("Ignoring mem=%x <= alloc_bottom.\n", > + RELOC(memory_limit)); > + RELOC(memory_limit) = 0; ...or should it just be bumped up to include alloc_bottom instead? > + } else if (RELOC(memory_limit) >= RELOC(ram_top)) { > + prom_printf("Ignoring mem=%x >= ram_top.\n", > + RELOC(memory_limit)); > + RELOC(memory_limit) = 0; > + } else { > + RELOC(ram_top) = RELOC(memory_limit); > + RELOC(rmo_top) = min(RELOC(rmo_top), RELOC(memory_limit)); > + } > + } > + > + /* Setup our top alloc point, that is top of RMO or top of > + * segment 0 when running non-LPAR. */ > + if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) > + RELOC(alloc_top) = RELOC(rmo_top); > + else > + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); > > prom_printf("memory layout at init:\n"); > + prom_printf(" memory_limit : %x\n", RELOC(memory_limit)); > prom_printf(" alloc_bottom : %x\n", RELOC(alloc_bottom)); > prom_printf(" alloc_top : %x\n", RELOC(alloc_top)); > prom_printf(" alloc_top_hi : %x\n", RELOC(alloc_top_high)); > @@ -873,6 +959,11 @@ static void __init prom_initialize_tce_t > > reserve_mem(local_alloc_bottom, local_alloc_top - local_alloc_bottom); > > + if (RELOC(memory_limit)) { > + RELOC(tce_alloc_start) = local_alloc_bottom; > + RELOC(tce_alloc_end) = local_alloc_top; > + } > + > /* Flag the first invalid entry */ > prom_debug("ending prom_initialize_tce_table\n"); > } > @@ -1688,6 +1779,15 @@ unsigned long __init prom_init(unsigned > prom_setprop(_prom->chosen, "linux,iommu-off", NULL, 0); > if (RELOC(iommu_force_on)) > prom_setprop(_prom->chosen, "linux,iommu-force-on", NULL, 0); > + if (RELOC(memory_limit)) > + prom_setprop(_prom->chosen, "linux,memory-limit", > + PTRRELOC(&memory_limit), sizeof(RELOC(memory_limit))); > + if (RELOC(tce_alloc_start)) > + prom_setprop(_prom->chosen, "linux,tce-alloc-start", > + PTRRELOC(&tce_alloc_start), sizeof(RELOC(tce_alloc_start))); > + if (RELOC(tce_alloc_end)) > + prom_setprop(_prom->chosen, "linux,tce-alloc-end", > + PTRRELOC(&tce_alloc_end), sizeof(RELOC(tce_alloc_end))); > > /* > * Now finally create the flattened device-tree > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev From olof at austin.ibm.com Wed Feb 23 15:49:59 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 22 Feb 2005 22:49:59 -0600 Subject: [RFC] splitting out LPAR support from CONFIG_PSERIES In-Reply-To: <200502221723.52051.arnd@arndb.de> References: <200502221723.52051.arnd@arndb.de> Message-ID: <20050223044959.GA10256@austin.ibm.com> On Tue, Feb 22, 2005 at 05:23:51PM +0100, Arnd Bergmann wrote: > I have a private patch set that currently depends on this patch. > It introduces a new compile time option that makes it possible > to disable support for LPAR or native setups from a pSeries kernel. > > Obviously, this is not for generic distribution kernels, but I think > it makes sense to have the option when you're building for just one > machine. It also makes some of my subsequent patches simpler, especially > enabling RTAS on non-pSeries machines without requiring LPAR support. I don't see how the RTAS-on-other-platforms could be impacted significantly by this? Even if you can't share the code at this time, could you describe what it enables you to do that you can't with the LPAR code there? > The current form of the patch introduces lots of new #ifdefs, which > can be reduced if we use the scheme I proposed in the > 'Introduce CPU_HAS_FEATURE() macro' discussion. It would also be > cleaner to split the pSeries_iommu code into LPAR and native files. I agree that maybe the iommu code should be split, but it's not a huge file as-is so it wasn't a priority during any of the rewrites so far. The added feature macros were technically neat, but I didn't like the implementation as it was presented, it's too easy to get it wrong when using the macros. Maybe they could be done in a cleaner way somehow. Anyway, that's a different discussion. > I'm not proposing inclusion of this patch at this point, but I'd like > to know if the idea is ok or if I should better try not to touch > the pSeries code. I think it's a bad idea, but I don't have any real strong motivations for it: Mainly the fact that right now a pSeries kernel will boot anywhere, keeping some of the rope away from users to hang themselves with by building specialized kernels. We've been working in the other direction lately, with PPC_MULTIPLATFORM booting both powermac and pseries/openpower machines. There's also the ambiguity of what machines require LPAR-enabled kernels. Currently any POWER5 machine would do, even HMC-less setups, while a HMC-less POWER4 setup runs a native kernel. JS20 has a thin firmware layer that emulates an LPAR environment too, which might not be obvious to everyone. Of the pSeries machines, only POWER3 and POWER4 SMP could run a non-LPAR kernel. There's just too much room for confusion. Bottom line is, besides for embedded setups where every byte counts, I'm not sure it's worth it. I can't see right now how it'd make the other patch all that much easier to add. If it does, then maybe it's a sign we should clean the code in other ways. :-) -Olof From arnd at arndb.de Thu Feb 24 00:25:41 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 23 Feb 2005 14:25:41 +0100 Subject: [RFC] splitting out LPAR support from CONFIG_PSERIES In-Reply-To: <20050223044959.GA10256@austin.ibm.com> References: <200502221723.52051.arnd@arndb.de> <20050223044959.GA10256@austin.ibm.com> Message-ID: <200502231425.42041.arnd@arndb.de> On Middeweken 23 Februar 2005 05:49, Olof Johansson wrote: > > > Obviously, this is not for generic distribution kernels, but I think > > it makes sense to have the option when you're building for just one > > machine. It also makes some of my subsequent patches simpler, especially > > enabling RTAS on non-pSeries machines without requiring LPAR support. > > I don't see how the RTAS-on-other-platforms could be impacted > significantly by this? Even if you can't share the code at this time, > could you describe what it enables you to do that you can't with the > LPAR code there? It's not really a significant impact for my other patches. The deal is that I'd like to rename pSeries_smp.c and pSeries_pci.c to rtas_smp.c/rtas_pci.c to make clear that they are potentially used by other platforms. To make it work, I need to add some #ifdef around e.g. the call to vpa_init() in pSeries_smp.c. My idea was that it makes more sense to test for the if we want to allow LPAR instead of testing for pSeries. And if I introduce such an option in the first place, I also take advantage of that by allowing to build a pSeries kernel without any of the LPAR stuff. RTAS on other platforms doesn't really require much change, the patch in this mail adds the basic config option. > I think it's a bad idea, but I don't have any real strong motivations > for it: Mainly the fact that right now a pSeries kernel will boot > anywhere, keeping some of the rope away from users to hang themselves > with by building specialized kernels. Of course, the pSeries kernel won't boot on all the platforms if you disable the device driver for the root disk or any other vital option. I just thought giving this extra option is a nice idea and I had to do some change here anyway. If nobody else like this patch, I can simply drop it from my series and update the patches on top of it. > There's also the ambiguity of what machines require LPAR-enabled > kernels. Currently any POWER5 machine would do, even HMC-less setups, > while a HMC-less POWER4 setup runs a native kernel. JS20 has a thin > firmware layer that emulates an LPAR environment too, which might not > be obvious to everyone. Of the pSeries machines, only POWER3 and POWER4 > SMP could run a non-LPAR kernel. There's just too much room for > confusion. Thanks for that information, I've been looking for that for some time. Arnd <>< ---- RTAS is not actually pSeries specific, but some PPC64 code that relies on RTAS is currently protected by CONFIG_PPC_PSERIES. This introduces a generic configuration option PPC_RTAS that can be used by other subarchitectures as well. The existing option with the same name is renamed to the more specific RTAS_PROC. Signed-off-by: Arnd Bergmann Index: linux-2.6-ppc/arch/ppc64/Kconfig =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/Kconfig 2005-01-07 12:47:39.752931064 -0500 +++ linux-2.6-ppc/arch/ppc64/Kconfig 2005-01-07 12:48:05.167929480 -0500 @@ -245,16 +245,21 @@ config PPC_RTAS - bool "Proc interface to RTAS" + bool depends on PPC_PSERIES + default y + +config RTAS_PROC + bool "Proc interface to RTAS" + depends on PPC_RTAS config RTAS_FLASH tristate "Firmware flash interface" - depends on PPC_RTAS + depends on RTAS_PROC config SCANLOG tristate "Scanlog dump interface" - depends on PPC_RTAS + depends on RTAS_PROC && PPC_PSERIES config LPARCFG tristate "LPAR Configuration Data" Index: linux-2.6-ppc/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/Makefile 2005-01-07 12:48:02.907011000 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/Makefile 2005-01-07 12:48:05.174928416 -0500 @@ -37,7 +37,7 @@ obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_MODULES) += module.o ppc_ksyms.o -obj-$(CONFIG_PPC_RTAS) += rtas-proc.o +obj-$(CONFIG_RTAS_PROC) += rtas-proc.o obj-$(CONFIG_SCANLOG) += scanlog.o obj-$(CONFIG_VIOPATH) += viopath.o obj-$(CONFIG_LPARCFG) += lparcfg.o Index: linux-2.6-ppc/arch/ppc64/kernel/entry.S =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/entry.S 2005-01-07 12:47:39.754930760 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/entry.S 2005-01-07 12:48:05.168929328 -0500 @@ -617,7 +617,7 @@ bl .unrecoverable_exception b unrecov_restore -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_RTAS /* * On CHRP, the Run-Time Abstraction Services (RTAS) have to be * called with the MMU off. @@ -754,7 +754,7 @@ mtlr r0 blr /* return to caller */ -#endif /* CONFIG_PPC_PSERIES */ +#endif /* CONFIG_PPC_RTAS */ #ifdef CONFIG_PPC_MULTIPLATFORM Index: linux-2.6-ppc/arch/ppc64/kernel/misc.S =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/misc.S 2005-01-07 12:47:39.757930304 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/misc.S 2005-01-07 12:48:05.169929176 -0500 @@ -680,7 +680,7 @@ ld r30,-16(r1) blr -#ifndef CONFIG_PPC_PSERIES /* hack hack hack */ +#ifdef CONFIG_PPC_RTAS /* hack hack hack */ #define ppc_rtas sys_ni_syscall #endif Index: linux-2.6-ppc/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/prom.c 2005-01-07 12:47:39.759930000 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/prom.c 2005-01-07 12:48:05.171928872 -0500 @@ -890,7 +890,7 @@ if (get_flat_dt_prop(node, "linux,iommu-force-on", NULL) != NULL) iommu_force_on = 1; -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_RTAS /* To help early debugging via the front panel, we retreive a minimal * set of RTAS infos now if available */ @@ -906,7 +906,7 @@ rtas.size = *prop; } } -#endif /* CONFIG_PPC_PSERIES */ +#endif /* CONFIG_PPC_RTAS */ /* break now */ return 1; Index: linux-2.6-ppc/arch/ppc64/kernel/rtc.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/rtc.c 2005-01-07 12:47:39.761929696 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/rtc.c 2005-01-07 12:48:05.172928720 -0500 @@ -337,7 +337,7 @@ } #endif -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_RTAS #define MAX_RTC_WAIT 5000 /* 5 sec */ #define RTAS_CLOCK_BUSY (-2) void pSeries_get_boot_time(struct rtc_time *rtc_tm) Index: linux-2.6-ppc/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/kernel/setup.c 2005-01-07 12:47:39.763929392 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/setup.c 2005-01-07 12:48:05.173928568 -0500 @@ -603,12 +603,12 @@ */ initialize_cache_info(); -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_RTAS /* * Initialize RTAS if available */ rtas_initialize(); -#endif /* CONFIG_PPC_PSERIES */ +#endif /* CONFIG_PPC_RTAS */ /* * Check if we have an initrd provided via the device-tree Index: linux-2.6-ppc/arch/ppc64/oprofile/op_model_power4.c =================================================================== --- linux-2.6-ppc.orig/arch/ppc64/oprofile/op_model_power4.c 2005-01-07 12:47:39.765929088 -0500 +++ linux-2.6-ppc/arch/ppc64/oprofile/op_model_power4.c 2005-01-07 12:48:05.174928416 -0500 @@ -224,7 +224,7 @@ if (mmcra & MMCRA_SIPR) return pc; -#ifdef CONFIG_PPC_PSERIES +#ifdef CONFIG_PPC_RTAS /* Were we in RTAS? */ if (pc >= rtas.base && pc < (rtas.base + rtas.size)) /* function descriptor madness */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050223/39620ede/attachment.pgp From marcelo.tosatti at cyclades.com Wed Feb 23 17:15:01 2005 From: marcelo.tosatti at cyclades.com (Marcelo Tosatti) Date: Wed, 23 Feb 2005 03:15:01 -0300 Subject: [BUG][PATCH] 2.4: PPC64: 32 bit sys_recvmsg corruption In-Reply-To: <20050222132935.7f6194ba.sfr@canb.auug.org.au> References: <20050216111146.524158ce.sfr@canb.auug.org.au> <20050216002841.GA8237@wotan.suse.de> <20050216140628.70232669.sfr@canb.auug.org.au> <20050216172259.1dee3b39.sfr@canb.auug.org.au> <20050221143555.3d969f24.sfr@canb.auug.org.au> <20050221112746.GB17667@wotan.suse.de> <20050222121627.26374d83.sfr@canb.auug.org.au> <20050221175425.3bdb5c12.davem@davemloft.net> <20050222132935.7f6194ba.sfr@canb.auug.org.au> Message-ID: <20050223061501.GB5747@dmt.cnet> On Tue, Feb 22, 2005 at 01:29:35PM +1100, Stephen Rothwell wrote: > Hi Dave, Marcleo, > > On Mon, 21 Feb 2005 17:54:25 -0800 "David S. Miller" wrote: > > > > On Tue, 22 Feb 2005 12:16:27 +1100 > > Stephen Rothwell wrote: > > > > > Please consider for inclusion into 2.4.30. > > > > Marcelo already put in an earlier version of your patch with > > the typo in the conditional which broke compilation on every > > platform. > > > > Please send him a relative patch to fix things up. > > Sorry about that. Here is a relative patch that fixes the mossing || and > removes the printk as requested by Andi. Applied, thanks. From brking at us.ibm.com Thu Feb 24 06:34:56 2005 From: brking at us.ibm.com (Brian King) Date: Wed, 23 Feb 2005 13:34:56 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) In-Reply-To: <20050223000810.GA32744@austin.ibm.com> References: <20050223000810.GA32744@austin.ibm.com> Message-ID: <421CDAE0.8060205@us.ibm.com> Linas Vepstas wrote: > ===== drivers/scsi/ipr.c 1.31 vs edited ===== > --- 1.31/drivers/scsi/ipr.c 2004-12-14 17:06:35 -06:00 > +++ edited/drivers/scsi/ipr.c 2005-02-22 17:37:41 -06:00 > @@ -80,6 +80,8 @@ > #include > #include > #include > + > +#define CONFIG_SCSI_IPR_EEH This will obviously need to get cleaned up. This config option should go away eventually.. I am assuming it is your way of flagging the parts of code you have changed... > #include "ipr.h" > > /* > @@ -2917,7 +2919,6 @@ static int ipr_eh_host_reset(struct scsi > > if (WAIT_FOR_DUMP == ioa_cfg->sdt_state) > ioa_cfg->sdt_state = GET_DUMP; > - > rc = ipr_reset_reload(ioa_cfg, IPR_SHUTDOWN_ABBREV); Don't delete blank lines... > LEAVE; > @@ -5007,6 +5008,67 @@ static int ipr_reset_start_bist(struct i > return rc; > } > > +#ifdef CONFIG_SCSI_IPR_EEH > + > +static int ipr_reset_shutdown_ioa(struct ipr_cmnd *ipr_cmd); > + > +#define IPR_WAIT_FOR_EEH_RESET (HZ) This should go in ipr.h with all the other timeout literals > +static int ipr_reset_poll_eeh_recovery(struct ipr_cmnd *ipr_cmd) > +{ > + struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg; > + int rc; > + > + ENTER; > + if (ioa_cfg->wait_on_eeh_reset) { > + ipr_reset_start_timer(ipr_cmd, IPR_WAIT_FOR_EEH_RESET); > + rc = IPR_RC_JOB_RETURN; > + } else { > + ipr_cmd->job_step = ipr_reset_start_bist; > + rc = IPR_RC_JOB_CONTINUE; > + } > + > + LEAVE; > + return rc; > +} I really don't like this polling. > + > +static void ipr_eeh_frozen (struct pci_dev *pdev, void * data) > +{ > + struct ipr_ioa_cfg *ioa_cfg = data; > + ioa_cfg->wait_on_eeh_reset = 1; > +} Probably don't need the second arg - void * data. You can get the ioa_cfg pointer with pci_get_drvdata(pdev) Also, this function should start the ipr reset job. You need to prevent ipr from talking to the device. Maybe something like: static int ipr_reset_frozen(struct ipr_cmnd *ipr_cmd) { struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg; list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); ipr_cmd->done = ipr_reset_ioa_job; return IPR_RC_JOB_RETURN; } static void ipr_eeh_frozen(struct pci_dev *pdev) { unsigned long host_lock_flags = 0; struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); spin_lock_irqsave(ioa_cfg->host->host_lock, flags); _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_frozen, IPR_SHUTDOWN_NONE); spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); } static void ipr_eeh_post_reset(struct pci_dev *pdev) { unsigned long host_lock_flags = 0; struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); spin_lock_irqsave(ioa_cfg->host->host_lock, flags); _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, IPR_SHUTDOWN_NONE); spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); } That would let you get rid of all the polling code and should simplify the code a bit. > +static void ipr_eeh_post_reset (struct pci_dev *pdev, void * data) > +{ > + struct ipr_ioa_cfg *ioa_cfg = data; > + ioa_cfg->wait_on_eeh_reset = 0; > +} Same comment as above. > + > +static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data) > +{ > + // struct ipr_ioa_cfg *ioa_cfg = data; > + > +#if 0 // XXXXXXXXXXXXXXXXXXXXXXX > + ipr_cmd->job_step = ipr_reset_shutdown_ioa; > + rc = IPR_RC_JOB_CONTINUE; > +#endif > +} This needs to "bringdown" the adapter, but not actually touch it. Basically stuff like unblocking requests so that we fail them instead of hanging, etc. > + > +static void ipr_register_eeh_handlers (struct ipr_ioa_cfg *ioa_cfg) > +{ > + /* XXX borken memory management; this malloc not managed */ > + struct eeh_recovery_ops *eeh_ops; > + eeh_ops = kmalloc (sizeof(struct eeh_recovery_ops), GFP_KERNEL); > + memset (eeh_ops, 0, sizeof(struct eeh_recovery_ops)); > + eeh_ops->frozen = ipr_eeh_frozen; > + eeh_ops->post_reset = ipr_eeh_post_reset; > + eeh_ops->perm_failure = ipr_eeh_perm_failure; > + eeh_ops->data = ioa_cfg; > + eeh_register_recovery_ops (ioa_cfg->pdev, eeh_ops); > +} If this was generalized a bit more, you could add these function pointers into the pci_driver struct so they could be statically initialized in each driver. > /** > * ipr_reset_allowed - Query whether or not IOA can be reset > * @ioa_cfg: ioa config struct > @@ -5042,6 +5104,7 @@ static int ipr_reset_wait_to_start_bist( > struct ipr_ioa_cfg *ioa_cfg = ipr_cmd->ioa_cfg; > int rc = IPR_RC_JOB_RETURN; > > +#ifndef CONFIG_SCSI_IPR_EEH > if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left) { > ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; > ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); > @@ -5049,6 +5112,21 @@ static int ipr_reset_wait_to_start_bist( > ipr_cmd->job_step = ipr_reset_start_bist; > rc = IPR_RC_JOB_CONTINUE; > } > +#else > + if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left > + && !eeh_slot_is_isolated (ioa_cfg->pdev)) { > + > + ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; > + ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); > + } else { > + if (eeh_slot_is_isolated (ioa_cfg->pdev)) { > + ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; > + } else { > + ipr_cmd->job_step = ipr_reset_start_bist; > + } > + rc = IPR_RC_JOB_CONTINUE; > + } > +#endif Not sure what you are trying to accomplish with this bit of code. Does not seem necessary to me. > @@ -5079,7 +5157,16 @@ static int ipr_reset_alert(struct ipr_cm > writel(IPR_UPROCI_RESET_ALERT, ioa_cfg->regs.set_uproc_interrupt_reg); > ipr_cmd->job_step = ipr_reset_wait_to_start_bist; > } else { > +#ifndef CONFIG_SCSI_IPR_EEH > ipr_cmd->job_step = ipr_reset_start_bist; > +#else > + if (eeh_slot_is_isolated (ioa_cfg->pdev)) { > + ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; > + return IPR_RC_JOB_CONTINUE; > + } else { > + ipr_cmd->job_step = ipr_reset_start_bist; > + } > +#endif > } This is not needed. > > if (rc != PCIBIOS_SUCCESSFUL) { > dev_err(&pdev->dev, "Failed to save PCI config space\n"); > ===== drivers/scsi/ipr.h 1.21 vs edited ===== > --- 1.21/drivers/scsi/ipr.h 2004-12-14 17:09:02 -06:00 > +++ edited/drivers/scsi/ipr.h 2005-02-22 15:52:36 -06:00 > @@ -833,6 +833,9 @@ struct ipr_ioa_cfg { > u8 dump_taken:1; > u8 allow_cmds:1; > u8 allow_ml_add_del:1; > +#ifdef CONFIG_SCSI_IPR_EEH > + u8 wait_on_eeh_reset:1; > +#endif If you make the changes I suggest above, you should be able to remove this flag altogether. -- Brian King eServer Storage I/O IBM Linux Technology Center From tiwari.amit at gmail.com Thu Feb 24 07:00:08 2005 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Thu, 24 Feb 2005 01:30:08 +0530 Subject: YHPC and kernel from kernel.org Message-ID: Can we use the kernel from kernel.org with YHPC? What is the additional components which Y-HPC kernel get in over and above the standard kernel? From benh at kernel.crashing.org Thu Feb 24 09:31:07 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 24 Feb 2005 09:31:07 +1100 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) In-Reply-To: <421CDAE0.8060205@us.ibm.com> References: <20050223000810.GA32744@austin.ibm.com> <421CDAE0.8060205@us.ibm.com> Message-ID: <1109197867.5384.3.camel@gaston> On Wed, 2005-02-23 at 13:34 -0600, Brian King wrote: > Linas Vepstas wrote: > > ===== drivers/scsi/ipr.c 1.31 vs edited ===== > > --- 1.31/drivers/scsi/ipr.c 2004-12-14 17:06:35 -06:00 > > +++ edited/drivers/scsi/ipr.c 2005-02-22 17:37:41 -06:00 > > @@ -80,6 +80,8 @@ > > #include > > #include > > #include > > + > > +#define CONFIG_SCSI_IPR_EEH > > This will obviously need to get cleaned up. This config option should > go away eventually.. I am assuming it is your way of flagging the > parts of code you have changed... Yah, I don't intend to get that merged as-is, I want to make a proper "generic" API in the kernel for archs that can provide some sort of recovery. I asked Linas to post what he did so I could have a better idea on the way he's doing it. From linas at austin.ibm.com Thu Feb 24 12:05:38 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 23 Feb 2005 19:05:38 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) In-Reply-To: <421CDAE0.8060205@us.ibm.com> References: <20050223000810.GA32744@austin.ibm.com> <421CDAE0.8060205@us.ibm.com> Message-ID: <20050224010538.GD2088@austin.ibm.com> On Wed, Feb 23, 2005 at 01:34:56PM -0600, Brian King was heard to remark: > Linas Vepstas wrote: > > +#define CONFIG_SCSI_IPR_EEH > > This will obviously need to get cleaned up. This config option should > go away eventually.. I am assuming it is your way of flagging the > parts of code you have changed... Yes, its temporary scaffolding ... > I really don't like this polling. Yes, I didn't either, but I couldn't tell how else to do it in IPR. > this function should start the ipr reset job. You need to prevent > ipr from talking to the device. Maybe something like: Yes, and that is what I couldn't figure out how to do. And now I do ... I'll try this shortly. > > +static void ipr_eeh_frozen (struct pci_dev *pdev, void * data) > > +{ > > + struct ipr_ioa_cfg *ioa_cfg = data; > > Probably don't need the second arg - void * data. You can get the > ioa_cfg pointer with pci_get_drvdata(pdev) OK, well, that's worth general discussion. For the IPR, no, it may not be needed. For general C-based OO style coding, this is the standard style for passing "user data" aka "pointer to 'self'" aka "pointer to 'this'". I notice that more and more OO style is creeping into the kernel, and so I thought I'd add the standard convention for this here. I note that this is *not* the convention currently used in struct pci_driver; it uses the "pci_get_drvdata(pdev)" style, but that is frowned upon in standard OO-style circles. > > +static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data) > > +{ > > + ipr_cmd->job_step = ipr_reset_shutdown_ioa; > > This needs to "bringdown" the adapter, but not actually touch it. Basically > stuff like unblocking requests so that we fail them instead of hanging, etc. Yes. Actually, right after this, I unconfig the pci slot, which calls pci_remove_bus_device() .. pci_destroy_dev() .. and eventually pci_driver->remove() and so IPR finds out about this sooner or later anyway. The goal of this function was to provide an alternate and/or earlier warning that the device is going away. Maybe its superfluous. I tend to add callbacks like this because I know that sooner or later someone will want one ... > > + > > +static void ipr_register_eeh_handlers (struct ipr_ioa_cfg *ioa_cfg) > > +{ > > + /* XXX borken memory management; this malloc not managed */ > > + struct eeh_recovery_ops *eeh_ops; > > + eeh_ops = kmalloc (sizeof(struct eeh_recovery_ops), GFP_KERNEL); > > + memset (eeh_ops, 0, sizeof(struct eeh_recovery_ops)); > > + eeh_ops->frozen = ipr_eeh_frozen; > > + eeh_ops->post_reset = ipr_eeh_post_reset; > > + eeh_ops->perm_failure = ipr_eeh_perm_failure; > > + eeh_ops->data = ioa_cfg; > > + eeh_register_recovery_ops (ioa_cfg->pdev, eeh_ops); > > +} > > If this was generalized a bit more, you could add these function > pointers into the pci_driver struct so they could be statically > initialized in each driver. Yes, that is the intent. I just stuck it in its own structure "for now". > > +#ifndef CONFIG_SCSI_IPR_EEH .... > > +#else > > + if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left > > + && !eeh_slot_is_isolated (ioa_cfg->pdev)) { > > + > > + ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; > > + ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); > > + } else { > > + if (eeh_slot_is_isolated (ioa_cfg->pdev)) { > > + ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; > > + } else { > > + ipr_cmd->job_step = ipr_reset_start_bist; > > + } > > + rc = IPR_RC_JOB_CONTINUE; > > + } > > +#endif > > Not sure what you are trying to accomplish with this bit of code. Does > not seem necessary to me. I want to make sure that the IPR driver holds off from starting the BIST until after the PCI slot has been reset. The callback notification given above is async to the detection of the EEH error, and so the IPR driver may have already decided that something is wrong, and started a bist, before the "slot is frozen" callback arrives. So I wanted to prevent this. In other words, the IPR "bist" code should make sure that EEH is not to blame, and, if it is, wait for the EEH error to clear before continuing with the reset. --linas From benh at kernel.crashing.org Thu Feb 24 12:08:53 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 24 Feb 2005 12:08:53 +1100 Subject: YHPC and kernel from kernel.org In-Reply-To: References: Message-ID: <1109207333.5385.29.camel@gaston> On Thu, 2005-02-24 at 01:30 +0530, Amit K Tiwari wrote: > Can we use the kernel from kernel.org with YHPC? What is the > additional components which Y-HPC kernel get in over and above the > standard kernel? You should ask the YellowDog folks ... Ben. From linas at austin.ibm.com Thu Feb 24 12:14:09 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 23 Feb 2005 19:14:09 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050223174356.GH13081@kroah.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> Message-ID: <20050224011409.GE2088@austin.ibm.com> On Wed, Feb 23, 2005 at 09:43:57AM -0800, Greg KH was heard to remark: > On Tue, Feb 22, 2005 at 06:24:09PM -0600, Linas Vepstas wrote: > > > > The following patch implements PPC64-specific PCI error recovery. > > How about working with Ben to solve the general problem? Yes, I am now talking to Ben on a semi-regular basis. > I don't want > to see this kind of arch specific change go into the kernel tree. It > should be in the pci core, as I think everyone agrees. Ah, well, I hadn't sensed this agreement until recently. So I guess I was still tip-toeing around. > I just haven't seen any patches that put it there :) The next patch will put the recovery routines into struct pci_driver. And we'll go from there... I *really* would like to hear from Seto or anyone else working on this for PCI Express. --linas From benh at kernel.crashing.org Thu Feb 24 12:16:11 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 24 Feb 2005 12:16:11 +1100 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050224011409.GE2088@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> Message-ID: <1109207771.5384.35.camel@gaston> > The next patch will put the recovery routines into > struct pci_driver. And we'll go from there... "recovery routines" isn't quite clear... what do you have in mind exactly ? Have you read my previous mails on this issue a couple of weeks ago on this list ? There are various issues there, like properly synchronizing drivers sharing the same bus segment, dealing with drivers that don't support recovery on the same segment, API suitable for other platforms that do things differently, etc... etc... > I *really* would like to hear from Seto or anyone else working > on this for PCI Express. Same. Ben. From linas at austin.ibm.com Thu Feb 24 12:31:37 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 23 Feb 2005 19:31:37 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <1109207532.5384.32.camel@gaston> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> Message-ID: <20050224013137.GF2088@austin.ibm.com> On Thu, Feb 24, 2005 at 12:12:12PM +1100, Benjamin Herrenschmidt was heard to remark: > On Wed, 2005-02-23 at 09:43 -0800, Greg KH wrote: > > On Tue, Feb 22, 2005 at 06:24:09PM -0600, Linas Vepstas wrote: > > > The following patch implements PPC64-specific PCI error recovery. > > > > It should be in the pci core, as I think everyone agrees. > > I just haven't seen any patches that put it there :) > > Yes, I don't want that neither and I'm still trying to figure out what > is the best generic API to provide. In fact, I was sort-of waiting for > Seto comments on my last mail and his latest patches, but he's been > silent lately. > > I'll probably start writing code next week. > > Linas work is interesting as it's an example of a working setup. Even if > it's not to be merged as-is, It wasn't meant to be merged "as is", because this set of patches does have crud in it. This was a "post early, post often" code sampler. > it's useful for me to see what exactly he > had to do, and it gives me an IPR driver with proper recovery mecanisms > that I can use for testing my own stuff :) Yes, well, we now have a better recovery mechanism from Brian King; I'll try that tommorrow. I also want to do the symbios driver, and maybe one more if I can find h/w to test on. Ben, again, let me extend an offer: if you can articulate a "gee I think it should do this" is a few sentances, I can try to code that up. --linas From benh at kernel.crashing.org Thu Feb 24 12:35:12 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 24 Feb 2005 12:35:12 +1100 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050224013137.GF2088@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> Message-ID: <1109208912.14993.0.camel@gaston> On Wed, 2005-02-23 at 19:31 -0600, Linas Vepstas wrote: > Yes, well, we now have a better recovery mechanism from Brian King; I'll > try that tommorrow. I also want to do the symbios driver, and maybe one > more if I can find h/w to test on. > > Ben, again, let me extend an offer: if you can articulate a "gee I think > it should do this" is a few sentances, I can try to code that up. Fine with me, but I'd really like some feedback from Seto first since I itend to base the recovery part on top of his error reporting patches, and he's working on a new set of those afaik. Anyway, I'll come up with a description of what I have in mind for the recovery asap. Ben. From linas at austin.ibm.com Thu Feb 24 13:14:36 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 23 Feb 2005 20:14:36 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <1109207771.5384.35.camel@gaston> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <1109207771.5384.35.camel@gaston> Message-ID: <20050224021436.GG2088@austin.ibm.com> Hi Ben, On Thu, Feb 24, 2005 at 12:16:11PM +1100, Benjamin Herrenschmidt was heard to remark: > > > The next patch will put the recovery routines into > > struct pci_driver. And we'll go from there... > > "recovery routines" isn't quite clear... what do you have in mind > exactly ? OK, see below. >Have you read my previous mails on this issue a couple of > weeks ago on this list ? Yes. And I liked them ... > There are various issues there, like properly synchronizing drivers > sharing the same bus segment, dealing with drivers that don't support > recovery on the same segment, API suitable for other platforms that do > things differently, etc... etc... Yes, well, these are the same old issues since the dawn of time. When I read your emails, I sensed that you were heading back to the "original" design point, which had been shredded , but was the one that made the most sense to me. So I took your emails in a very positive light :) --------------- So, here was the general thinking: > properly synchronizing drivers > sharing the same bus segment, This is done by having a "master eeh reset/recovery thread", which synchronizes the recovery across any/all device drivers that are affected. It does this by calling back into each device driver to tell them things like, "now the bus is frozen", "now the bus is being reset", "now the bus is finsihed being reset". Each device driver can handle these as desired. In the most recent patch, I put these callbacks into a "struct eeh_recovery_ops" for now, but these should probably go into "struct pci_driver". Right now, the names and purpose of these callbacks are a rough "first cut" and subject to change. ----------- Historically, one of the issues that was repeatedly argued was whether the "master eeh reset/recovery thread" should run in the kernel or in user space. I think things are a lot simpler if this "master recovery thread" runs in the kernel. > API suitable for other platforms that do things differently, I'm currently envisioning the "master recovery thread" to be arch-specific and/or pci-bus-controller+bios/firmware-specific. This avoids having to figure out how to design a generic recovery procedure that would be arch-indep, and yet still capable of reseting a pci bus for any weird current or future pci bus design. In the current patch I sent out, this "master recovery thread" is implemented in the "rpaphp" code, and thus is specific to the rpa pci bus. ------------- > dealing with drivers that don't support > recovery on the same segment, In the current patch, if a driver failed to register a "struct eeh_recovery_ops", then that driver would get whacked by a rpaphp_unconfig_pci_adapter() This works, because, ... rpaphp_unconfig_pci_adapter (struct slot *) // in rpaphp_pci.c { calls pci_remove_bus_device (struct pci_dev *) // in /drivers/pci/remove.c { calls pci_destroy_dev (struct pci_dev *) { etc ... calls struct pci_driver->remove() which frees memory .. etc. also generates user-land events to udev, and so on. ------------- The bonus of using hotplug to recover is that any hotplug-capble driver can be recovered, (although this is a heavy-handed aproach). This works almost everywhere, except for anything that holds a block device, where we can't just unmount filesystems after the fact. --linas From michael at ellerman.id.au Thu Feb 24 15:17:58 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Thu, 24 Feb 2005 15:17:58 +1100 Subject: [RFC/PATCH] ppc64: Add mem=X option In-Reply-To: <20050223034003.GB15427@austin.ibm.com> References: <20050222192423.727023f7.michael@ellerman.id.au> <20050223034003.GB15427@austin.ibm.com> Message-ID: <200502241518.02432.michael@ellerman.id.au> Hi Olof, Thanks for the comments, I'll send an updated patch later today or tomorrow. I'll fix On Wed, 23 Feb 2005 14:40, Olof Johansson wrote: > > + for (i = 0; i < mem->cnt; i++) { > > + total += mem->region[i].size; > > + > > + if (total <= memory_limit) > > + continue; > > + > > + crop = (memory_limit - (total - mem->region[i].size)); ... > > + mem->region[i].size = crop; > > + mem->cnt = i + 1; > > + break; > > Also, if you just reduce the size of memory_limit instead of keep the > rolling total, the crop calculation will be simpler. It took a bit of > thinking to make sure it's right the way it's written now. I've rewritten this twice already, and I agree it's a bit ugly. I hadn't thought of decrementing the limit. I'll fix it to avoid a 0 sized region too. > > + /* Bolt kernel mappings for all of memory */ > > + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); > > Do you want to bolt it all, even if there's a mem= limitation? I'm not sure, it seems to boot happily either way. But I think it's better not to bolt it all - that more closely resembles the situation we're trying to simulate, ie that there is no more RAM than the limit. > > + if (unlikely(memory_limit && tce_alloc_start && tce_alloc_end)) { > > Do you need to check both tce_alloc_start and tce_alloc_end? Won't both > be set if one is? Yeah either both tce variables should be set or none. > > + if (base + size >= tce_alloc_start) > > + tce_alloc_start = base + size + 1; > > + > > + create_pte_mapping(tce_alloc_start, tce_alloc_end, > > + mode_rw, use_largepages); > > Could tce_alloc_end ever be below memory_limit too? No. tce_alloc_end is just ram_top in prom_init.c (before memory_limit) But I might change the prom_init.c code to make that more explicit, rather than the current code where we copy ram_top -> alloc_top_high -> tce_alloc_end. > You might need to check tce_alloc_start and end to make sure you can use > 16MB pages, or if you need 4K because of alignment/size constraints. > > Even if you don't have to, a comment related to it could be warranted. Not sure what you mean here. If start/end aren't 16MB aligned then I should use 4K pages for the mapping? > > + for (i++; i <= max_domain; i++) { > > I think I would change this to a regular while(++i < max_domain) loop > instead. Agreed. That's ugly, I just copied the for loop that was there. > > + dbg("NUMA: offlining node %ld for memory_limit\n", i); > > + node_set_offline(i); > > Just because the node doesn't get memory allocated we can't set it > offline. I.e. the cpus will still be online. Ah crud, I thought I was just offlining the memory. > > + for (i = 0; i <= max_domain; i++) > > + node_set_online(i); > > Good question, I think can go. That code was added by Matt Dobson earlier > this year. Well, if I'm not calling node_set_offline() above then it doesn't matter. He added that when he added the for_each_online_node() stuff, and if there are gaps in the node numbering then the extra for loop does have an effect, I'm not game to touch it in case it's doing something subtle that I'm missing. > > +static unsigned long __initdata memory_limit; > > +static unsigned long __initdata tce_alloc_start; > > +static unsigned long __initdata tce_alloc_end; > > A little confusing to have the same variable names here as local > statics. Maybe rename them? Or use the global. Benh wants to keep the prom_init <=> kernel interface clean so I'll keep them but rename them. > > + while (isxdigit(*cp) && > > + (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < > > base) { + result = result * base + value; > > + cp++; > > + } > > Hmm. Would you mind breaking that up? It'll be a few more lines but much > easier to read. Yeah it's not pretty, I just copied it from the regular strtoul but I'll clean it up. > > +unsigned long prom_memparse(const char *ptr, const char **retptr) > > +{ > > + unsigned long ret = prom_strtoul(ptr, retptr); > > + > > + switch (**retptr) { > > + case 'G': > > + case 'g': > > + ret <<= 10; > > + case 'M': > > + case 'm': > > + ret <<= 10; > > + case 'K': > > + case 'k': > > + ret <<= 10; > > + (*retptr)++; > > Do other architectures swallow/tolerate a b/B after the unit? Could be > nice. Yes and no. That's copied from the generic memparse() which everyone else uses. And although it doesn't have a b/B case it shouldn't choke if we do have one, it'll just be ignored and the next strstr() will skip it. > > + /* Align to 16 MB == size of large page */ > > + RELOC(memory_limit) = ALIGN(RELOC(memory_limit), 0x1000000); > > Maybe a printk to say that it's been rounded up, so we don't surprise > the user? Good plan. > > + if (RELOC(memory_limit) <= RELOC(alloc_bottom)) { > > + prom_printf("Ignoring mem=%x <= alloc_bottom.\n", > > + RELOC(memory_limit)); > > + RELOC(memory_limit) = 0; > > ...or should it just be bumped up to include alloc_bottom instead? That would just mean the first allocation will fail and we'll panic, because alloc_bottom == memory_limit == alloc_top (see alloc_up() in prom_init.c). We could just get rid of the checking but I figure it's nicer to complain and still boot than panic. cheers! -- Michael Ellerman OzLabs Canberra IBM Linux Technology Centre ==================================== Phone: +61 2 6212 1183 Email: michael at ellerman.id.au WWWeb: http://michael.ellerman.id.au -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050224/bb50daae/attachment.pgp From olof at austin.ibm.com Thu Feb 24 15:38:44 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 23 Feb 2005 22:38:44 -0600 Subject: [RFC/PATCH] ppc64: Add mem=X option In-Reply-To: <200502241518.02432.michael@ellerman.id.au> References: <20050222192423.727023f7.michael@ellerman.id.au> <20050223034003.GB15427@austin.ibm.com> <200502241518.02432.michael@ellerman.id.au> Message-ID: <20050224043844.GA16486@austin.ibm.com> Hi, On Thu, Feb 24, 2005 at 03:17:58PM +1100, Michael Ellerman wrote: > Thanks for the comments, I'll send an updated patch later today or tomorrow. > I'll fix Cool, a couple of comments/answers below. > On Wed, 23 Feb 2005 14:40, Olof Johansson wrote: > > > + for (i = 0; i < mem->cnt; i++) { > > > + total += mem->region[i].size; > > > + > > > + if (total <= memory_limit) > > > + continue; > > > + > > > + crop = (memory_limit - (total - mem->region[i].size)); > ... > > > + mem->region[i].size = crop; > > > + mem->cnt = i + 1; > > > + break; > > > > Also, if you just reduce the size of memory_limit instead of keep the > > rolling total, the crop calculation will be simpler. It took a bit of > > thinking to make sure it's right the way it's written now. > > I've rewritten this twice already, and I agree it's a bit ugly. I hadn't > thought of decrementing the limit. I'll fix it to avoid a 0 sized region too. > > > > + /* Bolt kernel mappings for all of memory */ > > > + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); > > > > Do you want to bolt it all, even if there's a mem= limitation? > > I'm not sure, it seems to boot happily either way. But I think it's better not > to bolt it all - that more closely resembles the situation we're trying to > simulate, ie that there is no more RAM than the limit. The only drawback of always bolting it all is that noone will find bugs caused by referencing kernel memory that's past the limit. :-) > > > + if (unlikely(memory_limit && tce_alloc_start && tce_alloc_end)) { > > > > Do you need to check both tce_alloc_start and tce_alloc_end? Won't both > > be set if one is? > > Yeah either both tce variables should be set or none. > > > > + if (base + size >= tce_alloc_start) > > > + tce_alloc_start = base + size + 1; > > > + > > > + create_pte_mapping(tce_alloc_start, tce_alloc_end, > > > + mode_rw, use_largepages); > > > > Could tce_alloc_end ever be below memory_limit too? > > No. tce_alloc_end is just ram_top in prom_init.c (before memory_limit) > But I might change the prom_init.c code to make that more explicit, rather > than the current code where we copy ram_top -> alloc_top_high -> > tce_alloc_end. I'm happy with the way it is now, I just wanted to ask. > > You might need to check tce_alloc_start and end to make sure you can use > > 16MB pages, or if you need 4K because of alignment/size constraints. > > > > Even if you don't have to, a comment related to it could be warranted. > > Not sure what you mean here. If start/end aren't 16MB aligned then I should > use 4K pages for the mapping? Well, the current create_pte_mapping will, if 16MB pages are used, potentially map a bit more than the limit if it's not on a 16MB boundary. Likewise, if the tce tables aren't starting/ending on 16MB boundaries, more will be mapped. Either you can just ignore it and round accordingly, or you'll need to downgrade to use 4KB mappings with less granularity. The problem then becomes that you can't mix the page sizes in the same segment, so you'd need to map either the whole kernel or just the segments in question with 4K pages. Rounding is probably better and easier. Since you already do it for memory_limit, adding it for the tce range (start and end) is all that's needed. > > > + for (i++; i <= max_domain; i++) { > > > > I think I would change this to a regular while(++i < max_domain) loop > > instead. > > Agreed. That's ugly, I just copied the for loop that was there. > > > > + dbg("NUMA: offlining node %ld for memory_limit\n", i); > > > + node_set_offline(i); > > > > Just because the node doesn't get memory allocated we can't set it > > offline. I.e. the cpus will still be online. > > Ah crud, I thought I was just offlining the memory. > > > > + for (i = 0; i <= max_domain; i++) > > > + node_set_online(i); > > > > Good question, I think can go. That code was added by Matt Dobson earlier > > this year. > > Well, if I'm not calling node_set_offline() above then it doesn't matter. > He added that when he added the for_each_online_node() stuff, and if there are > gaps in the node numbering then the extra for loop does have an effect, I'm > not game to touch it in case it's doing something subtle that I'm missing. Sure, it can always be taken out later if someone feels up for it and can prove it won't break anything. NUMA is messy in the sense that it takes a while for a weird config case to show up that will break assumptions, etc. > > > +static unsigned long __initdata memory_limit; > > > +static unsigned long __initdata tce_alloc_start; > > > +static unsigned long __initdata tce_alloc_end; > > > > A little confusing to have the same variable names here as local > > statics. Maybe rename them? Or use the global. > > Benh wants to keep the prom_init <=> kernel interface clean so I'll keep them > but rename them. Good point. Separate is fine with me. > > > + while (isxdigit(*cp) && > > > + (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < > > > base) { + result = result * base + value; > > > + cp++; > > > + } > > > > Hmm. Would you mind breaking that up? It'll be a few more lines but much > > easier to read. > > Yeah it's not pretty, I just copied it from the regular strtoul but I'll clean > it up. Ah, might as well leave it be then, there's value in keeping the code common even if the original was a bit messy. > > Do other architectures swallow/tolerate a b/B after the unit? Could be > > nice. > > Yes and no. That's copied from the generic memparse() which everyone else > uses. And although it doesn't have a b/B case it shouldn't choke if we do > have one, it'll just be ignored and the next strstr() will skip it. Good point. Should be fine the way it is then. > > > + /* Align to 16 MB == size of large page */ > > > + RELOC(memory_limit) = ALIGN(RELOC(memory_limit), 0x1000000); > > > > Maybe a printk to say that it's been rounded up, so we don't surprise > > the user? > > Good plan. > > > > + if (RELOC(memory_limit) <= RELOC(alloc_bottom)) { > > > + prom_printf("Ignoring mem=%x <= alloc_bottom.\n", > > > + RELOC(memory_limit)); > > > + RELOC(memory_limit) = 0; > > > > ...or should it just be bumped up to include alloc_bottom instead? > > That would just mean the first allocation will fail and we'll panic, because > alloc_bottom == memory_limit == alloc_top (see alloc_up() in prom_init.c). We > could just get rid of the checking but I figure it's nicer to complain and > still boot than panic. Yep, that's better. The alternative would be to add some sort of arbitrary margin for the machine to have enough memory to come up, but that's way ugly. -Olof From seto.hidetoshi at jp.fujitsu.com Fri Feb 25 01:04:39 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Thu, 24 Feb 2005 23:04:39 +0900 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050224011409.GE2088@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> Message-ID: <421DDEF7.7080103@jp.fujitsu.com> Linas Vepstas wrote: > I *really* would like to hear from Seto or anyone else working > on this for PCI Express. Sorry to my late reply. I've been stuck in other stuffs... and it took me a long time to read these codes. It will be helpful to understand if you could divide the patch into some parts, for example arch/kernel stuff and drivers. I also agree with Greg's remark, however I know that PCI recovery will not be implemented without arch-specific codes, at least in this time. So I think what we have to do is design some generic front interfaces and implement specific background codes. Your code seems good, three callbacks, master recovery thread... they are great, I believe. But as you know still here is a basic question: "Are they also good/enough for other platforms?" It's also important to remember that PPC64 already has special infrastructure. PPC64 always uses quite cautious eeh_readX(), so it can detect every error almost synchronously in the affected context, and maybe can react to the error on the time of happening. AFAIK it's special. Most of archs don't like neither doing nervous check nor heavy firmcall in golden route such as read(). And then, PPC64 has "automatic PCI-bus isolation" system, which sounds very high-tech and efficient. Even expensive magical box called ia64 don't have such... well, anyway I think that recovery on PPC64 is blessed with such nice environment. Unfortunately or fortunately, your approach to PCI error recovery and mine are significantly different, maybe good to compare. Still now I use conservative designed API like: { iochk_clear(cookie,dev); io,io,io... if(iochk_read(cookie)) return -EAGAIN; } It allows drivers to make IO-critical section. Based on tradition that error checking is too heavy to do so frequently, frequency of check is flexibly adjustable. For example, impatient driver will put io into the section as many as possible, to reduce the overhead of error check. Cautious driver will put only one io, to reduce the damage of an error. You have let me realize that: "The most cautious arch I know, PPC64, would not need to use this API." I had already code prototype of ia64 specific part with this API, so it's too bad if you are disappointed at them. But in the same time, I'm afraid that currently some arch would not have both of proper chance and enough infrastructure to call callbacks. Is it possible that my API can use as such infrastructure? Imagine - possible mix: - RAS-aware driver registers callbacks to some struct on init - check before IOs (ex. block if bus recovery is processing...) - do IOs... (ex. shut up device on error etc.) - check after IOs (ex. IO rendezvous, recover, return result...) - master-recovery-thread handles extra more... : Is this sounds good for generic purposes? Ah... I might have wrote too much :-p At last, I guess I'll effort but would not be able to reply so often. However I'll be glad if you could keep me in cc and engage in this discussion. Thanks, H.Seto From brking at us.ibm.com Fri Feb 25 01:56:56 2005 From: brking at us.ibm.com (Brian King) Date: Thu, 24 Feb 2005 08:56:56 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) In-Reply-To: <20050224010538.GD2088@austin.ibm.com> References: <20050223000810.GA32744@austin.ibm.com> <421CDAE0.8060205@us.ibm.com> <20050224010538.GD2088@austin.ibm.com> Message-ID: <421DEB38.1020504@us.ibm.com> Linas Vepstas wrote: >>>+static void ipr_eeh_frozen (struct pci_dev *pdev, void * data) >>>+{ >>>+ struct ipr_ioa_cfg *ioa_cfg = data; >> >>Probably don't need the second arg - void * data. You can get the >>ioa_cfg pointer with pci_get_drvdata(pdev) > > > OK, well, that's worth general discussion. For the IPR, no, > it may not be needed. For general C-based OO style coding, > this is the standard style for passing "user data" aka > "pointer to 'self'" aka "pointer to 'this'". I notice that > more and more OO style is creeping into the kernel, and so > I thought I'd add the standard convention for this here. > > I note that this is *not* the convention currently used in > struct pci_driver; it uses the "pci_get_drvdata(pdev)" style, > but that is frowned upon in standard OO-style circles. I think we will all agree we are not working in a standard OO-style circle;) >>>+static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data) >>>+{ >>>+ ipr_cmd->job_step = ipr_reset_shutdown_ioa; >> >>This needs to "bringdown" the adapter, but not actually touch it. Basically >>stuff like unblocking requests so that we fail them instead of hanging, etc. > > > Yes. Actually, right after this, I unconfig the pci slot, > which calls pci_remove_bus_device() .. pci_destroy_dev() .. > and eventually pci_driver->remove() and so IPR finds out about this > sooner or later anyway. The goal of this function was to provide > an alternate and/or earlier warning that the device is going away. > > Maybe its superfluous. I tend to add callbacks like this because > I know that sooner or later someone will want one ... Ok. I wasn't sure about that. It in that case, I think its fine to leave the callback in. I still think the perm_failure handler should do the bringdown in ipr, however. >>>+#else >>>+ if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left >>>+ && !eeh_slot_is_isolated (ioa_cfg->pdev)) { >>>+ >>>+ ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT; >>>+ ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT); >>>+ } else { >>>+ if (eeh_slot_is_isolated (ioa_cfg->pdev)) { >>>+ ipr_cmd->job_step = ipr_reset_poll_eeh_recovery; >>>+ } else { >>>+ ipr_cmd->job_step = ipr_reset_start_bist; >>>+ } >>>+ rc = IPR_RC_JOB_CONTINUE; >>>+ } >>>+#endif >> >>Not sure what you are trying to accomplish with this bit of code. Does >>not seem necessary to me. > > > I want to make sure that the IPR driver holds off from starting the BIST > until after the PCI slot has been reset. The callback notification > given above is async to the detection of the EEH error, and so the IPR > driver may have already decided that something is wrong, and started a > bist, before the "slot is frozen" callback arrives. So I wanted to > prevent this. > > In other words, the IPR "bist" code should make sure that EEH is not to > blame, and, if it is, wait for the EEH error to clear before continuing > with the reset. I still don't think it is needed, given the changes I suggested in my previous mail. With the changes I proposed, ipr could detect a problem and start an adapter reset before the freeze callback arrives, but that should do no harm. When the freeze callback does come, it will start a new reset job and the old one will get aborted. -- Brian King eServer Storage I/O IBM Linux Technology Center From cfriesen at nortel.com Fri Feb 25 03:54:17 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Thu, 24 Feb 2005 10:54:17 -0600 Subject: looking for help with scomc/scomd registers on 970 Message-ID: <421E06B9.2000504@nortel.com> Not strictly linux related, but I thought someone might know the answer. I've got a request from someone that wants to be able to flush the L2 on the 970. The user manual has a procedure to do this, but it involves first flipping the cache to direct-mapped mode by setting SCOM register 0x43000 bit 0x8000. The only thing is, I can't find any linux code that ever touches the SCOM stuff, and the manual has no examples of *reading* from the SCOM area, just writing to it, so I'm not entirely sure how to do that. A google search found the following snippet of darwin code: lis r8,cFIR ; Get the Core FIR register address ori r8,r8,0x8000 ; Set to read data sync mtspr scomc,r8 ; Request the Core FIR mfspr r25,scomd ; Get the source mfspr r8,scomc ; Get back the status (we just ignore it) sync isync This implies that bit 0x8000 needs to be set to specify a read command, and that we need to read the status after the read. Does anyone know if this is in fact the case? Also, in that code they make reference to early chip revisions that returned scom reads shifted by one bit. Does anyone know which versions are affected? Thanks, Chris From will_schmidt at vnet.ibm.com Fri Feb 25 07:41:07 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Thu, 24 Feb 2005 14:41:07 -0600 Subject: RFC/Patch more xmon additions Message-ID: <421E3BE3.90301@vnet.ibm.com> Hi Folks, Am looking for comments on this additional function i've added to xmon on the side.. the bulk of my intent was to make it easier for me to poke at memory within a particular user process. I realize that the spacing is a bit screwed up, and the function names should eventually change. Because i couldnt decide on letters for the new functions, i put them under a submenu 'w'. wP will dump info on all processes. wp 0xabc will make process with pid 0xabc the active pid. <- active only with respect to xmon poking into memory. wd 0xabcd1234 - will call through the pdg/pmd functions and return the kernel address corresponding to 0xabcd1234 within the processes memory space location. wg will dump gprs of the process/thread. -Will -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: xmon_pxd_code.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050224/86bee291/attachment.txt From linas at austin.ibm.com Fri Feb 25 10:14:55 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 24 Feb 2005 17:14:55 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <421DDEF7.7080103@jp.fujitsu.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> Message-ID: <20050224231455.GH2088@austin.ibm.com> Hi Hide, I am very glad to hear from you. On Thu, Feb 24, 2005 at 11:04:39PM +0900, Hidetoshi Seto was heard to remark: > > It's also important to remember that PPC64 already has special > infrastructure. PPC64 always uses quite cautious eeh_readX(), > so it can detect every error almost synchronously in the affected > context, and maybe can react to the error on the time of happening. > AFAIK it's special. Most of archs don't like neither doing nervous > check nor heavy firmcall in golden route such as read(). The reason ppc64 checks for a possible PCI error every time is because this was the only thing we could think of without actually modifying any device drivers. **If** a device driver is modified, then the check for errors can be made much less frequent. However, we thought that most device driver maintainers would reject a ppc64-only patch, and so we picked the simplest/dumbest thing that would work. > called ia64 don't have such... well, anyway I think that recovery > on PPC64 is blessed with such nice environment. Thank you :) > Unfortunately or fortunately, your approach to PCI error recovery Let us distinguish the terms "error recovery" and "error detection" -- "detection" is finding out that an error occured -- "recovery" as is the seqence of steps taken to make the PCI device useable again. > Still now I use conservative designed API like: > { > iochk_clear(cookie,dev); > io,io,io... > if(iochk_read(cookie)) return -EAGAIN; > } > It allows drivers to make IO-critical section. Based on tradition that > error checking is too heavy to do so frequently, frequency of check > is flexibly adjustable. For example, impatient driver will put io into > the section as many as possible, to reduce the overhead of error check. > Cautious driver will put only one io, to reduce the damage of an error. Yes, this interface for "detection" would be good. I could (Ben could?) easily provide code up this kind of an interface, once we agree what the names and arguments of the subroutines are (iock_clear()? pci_iochk_clear? pci_ioblock_begin()/pci_ioblock_end() ?) The hard part is to start converting device drivers to use this interface; the other hard part (for you) is to decide what to do about the device drivers that have not converted to this interface. > You have let me realize that: > "The most cautious arch I know, PPC64, would not need to use this API." > I had already code prototype of ia64 specific part with this API, so > it's too bad if you are disappointed at them. I am not disappointed. Its a good idea in general. We should talk about the detection API details. Care to propose these details? (i.e. what's "cookie", how do you get a cookie, etc?) > Imagine - possible mix: > - RAS-aware driver registers callbacks to some struct on init Yes. Which structure? struct pci_driver? > - check before IOs (ex. block if bus recovery is processing...) Many drivers do i/o in an interrupt context; we cannot block that i/o without hanging the kernel. What happens if iochk_clear() blocks, waiting for the bus to reset, while the device driver tries to do i/o from a timer interrupt? > - do IOs... (ex. shut up device on error etc.) > - check after IOs (ex. IO rendezvous, recover, return result...) Yes. > - master-recovery-thread handles extra more... How should the master recovery thread be invoked? > Is this sounds good for generic purposes? Yes. I'd like to discuss specifics of the actual names and arguments and descriptions of the subroutines as soon as possible. > Ah... I might have wrote too much :-p No. --linas From linas at austin.ibm.com Fri Feb 25 10:28:22 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 24 Feb 2005 17:28:22 -0600 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only) In-Reply-To: <421DEB38.1020504@us.ibm.com> References: <20050223000810.GA32744@austin.ibm.com> <421CDAE0.8060205@us.ibm.com> <20050224010538.GD2088@austin.ibm.com> <421DEB38.1020504@us.ibm.com> Message-ID: <20050224232822.GI2088@austin.ibm.com> On Thu, Feb 24, 2005 at 08:56:56AM -0600, Brian King was heard to remark: > > I think we will all agree we are not working in a standard OO-style > circle;) Ah, come on, more and more OO stuff is sneaking into the kernel all the time. Its not a bad style (although I agree that much foolishness can be blamed on novice programmers carried away by giddy OO concepts). > >>>+static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data) > >>>+{ > >>>+ ipr_cmd->job_step = ipr_reset_shutdown_ioa; > >> > >>This needs to "bringdown" the adapter, but not actually touch it. Basically > >>stuff like unblocking requests so that we fail them instead of hanging, etc. > > > > > > Yes. Actually, right after this, I unconfig the pci slot, > > which calls pci_remove_bus_device() .. pci_destroy_dev() .. > > and eventually pci_driver->remove() and so IPR finds out about this > > sooner or later anyway. The goal of this function was to provide > > an alternate and/or earlier warning that the device is going away. > > > > Maybe its superfluous. I tend to add callbacks like this because > > I know that sooner or later someone will want one ... > > Ok. I wasn't sure about that. It in that case, I think its fine to leave > the callback in. I still think the perm_failure handler should do > the bringdown in ipr, however. Yes, I didn't implement this because it was a quick-n-dirty hack. The real question I'd like to pose to you is, "Are there any other callbacks you would like to get during the pci error detection and recovery phase?" > >>Not sure what you are trying to accomplish with this bit of code. Does > >>not seem necessary to me. OK. I don't have a clear mental model of how the IPR worked at this level of detail; I'll cut this code out. --linas From benh at kernel.crashing.org Fri Feb 25 10:48:53 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 25 Feb 2005 10:48:53 +1100 Subject: [RFC/PATCH] ppc64: Add mem=X option In-Reply-To: <20050224043844.GA16486@austin.ibm.com> References: <20050222192423.727023f7.michael@ellerman.id.au> <20050223034003.GB15427@austin.ibm.com> <200502241518.02432.michael@ellerman.id.au> <20050224043844.GA16486@austin.ibm.com> Message-ID: <1109288933.15026.29.camel@gaston> On Wed, 2005-02-23 at 22:38 -0600, Olof Johansson wrote: > > > > I'm not sure, it seems to boot happily either way. But I think it's better not > > to bolt it all - that more closely resembles the situation we're trying to > > simulate, ie that there is no more RAM than the limit. > > The only drawback of always bolting it all is that noone will find bugs > caused by referencing kernel memory that's past the limit. :-) He's currently not bolting it all and I think that's the way to go. From olof at austin.ibm.com Fri Feb 25 10:49:18 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 24 Feb 2005 17:49:18 -0600 Subject: [RFC/PATCH] ppc64: Add mem=X option In-Reply-To: <1109288933.15026.29.camel@gaston> References: <20050222192423.727023f7.michael@ellerman.id.au> <20050223034003.GB15427@austin.ibm.com> <200502241518.02432.michael@ellerman.id.au> <20050224043844.GA16486@austin.ibm.com> <1109288933.15026.29.camel@gaston> Message-ID: <20050224234918.GF15818@austin.ibm.com> On Fri, Feb 25, 2005 at 10:48:53AM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2005-02-23 at 22:38 -0600, Olof Johansson wrote: > > > The only drawback of always bolting it all is that noone will find bugs > > caused by referencing kernel memory that's past the limit. :-) > > He's currently not bolting it all and I think that's the way to go. Yep, it's the preferred way to do it. -Olof From seto.hidetoshi at jp.fujitsu.com Fri Feb 25 14:35:50 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Fri, 25 Feb 2005 12:35:50 +0900 Subject: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050224231455.GH2088@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> Message-ID: <421E9D16.3000606@jp.fujitsu.com> Hi, Linas. Linas Vepstas wrote: > The reason ppc64 checks for a possible PCI error every time is because > this was the only thing we could think of without actually modifying any > device drivers. **If** a device driver is modified, then the check for > errors can be made much less frequent. However, we thought that most > device driver maintainers would reject a ppc64-only patch, and so > we picked the simplest/dumbest thing that would work. Of course it will be happy for driver maintainers if everything goes well without modifying. Maybe we can do up to a certain point, but we would have to request modifying or making callbacks or something to drivers to go above there (ex. device re-enabling). > Let us distinguish the terms "error recovery" and "error detection" > > -- "detection" is finding out that an error occured > -- "recovery" as is the seqence of steps taken to make the PCI > device useable again. OK. > The hard part is to start converting device drivers to use this > interface; the other hard part (for you) is to decide what to do about > the device drivers that have not converted to this interface. In other words, we have to support normal/non-aware drivers to a degree. The trouble in this time is that some arch just unwisely down the system on an error such as PERR, so there was no chance to recover from the error. PPC64 doesn't have such trouble any more. BTW, how ppc64 drivers deal '~0'(all 1) data after bus isolation? Does the weird data come up to the user application? >>Imagine - possible mix: >> - RAS-aware driver registers callbacks to some struct on init > > Yes. Which structure? struct pci_driver? pci_driver would be major candidate, I think. >> - check before IOs (ex. block if bus recovery is processing...) > > Many drivers do i/o in an interrupt context; we cannot block > that i/o without hanging the kernel. What happens if iochk_clear() > blocks, waiting for the bus to reset, while the device driver tries to > do i/o from a timer interrupt? Block was bad word... spin? it will be bad too. Anyway, something will be required to control subsequent i/o. How would you solve such problem? >> - master-recovery-thread handles extra more... > > How should the master recovery thread be invoked? I have no clear idea. How about daemonize? > I'd like to discuss specifics of the actual names and arguments > and descriptions of the subroutines as soon as possible. Followings are latest "generic" part of my "iomap-check" code. All comments are welcome. Thanks, H.Seto ----- diff -Nur linux-2.6.10-iomap-0/include/asm-generic/iomap.h linux-2.6.10-iomap-1/include/asm-generic/iomap.h --- linux-2.6.10-iomap-0/include/asm-generic/iomap.h 2005-02-15 15:27:27.000000000 +0900 +++ linux-2.6.10-iomap-1/include/asm-generic/iomap.h 2005-02-21 14:40:45.000000000 +0900 @@ -60,4 +60,20 @@ extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); extern void pci_iounmap(struct pci_dev *dev, void __iomem *); +/* + * IOMAP_CHECK provides additional interfaces for drivers to detect + * some IO errors, supports drivers having ability to recover errors. + * + * All works around iomap-check depends on the design of "iocookie" + * structure. Every archtecture owning its iomap-check is free to + * define the actual design of iocookie to fit its special style. + */ +#ifndef HAVE_ARCH_IOMAP_CHECK +typedef unsigned long iocookie; +#endif + +extern void iochk_init(void); +extern void iochk_clear(iocookie *cookie, struct pci_dev *dev); +extern int iochk_read(iocookie *cookie); + #endif diff -Nur linux-2.6.10-iomap-0/lib/iomap.c linux-2.6.10-iomap-1/lib/iomap.c --- linux-2.6.10-iomap-0/lib/iomap.c 2005-02-15 15:27:27.000000000 +0900 +++ linux-2.6.10-iomap-1/lib/iomap.c 2005-02-21 14:38:17.000000000 +0900 @@ -210,3 +210,28 @@ } EXPORT_SYMBOL(pci_iomap); EXPORT_SYMBOL(pci_iounmap); + +/* + * Note that default iochk_clear-read pair interfaces could be used + * just as a replacement of traditional local_irq_save-restore pair. + * Originally they don't have any effective error check, but some + * high-reliable platforms would provide useful information to you. + */ +#ifndef HAVE_ARCH_IOMAP_CHECK +#include +void iochk_init(void) { ; } + +void iochk_clear(iocookie *cookie, struct pci_dev *dev) +{ + local_irq_save(*cookie); +} + +int iochk_read(iocookie *cookie) +{ + local_irq_restore(*cookie); + return 0; +} +EXPORT_SYMBOL(iochk_init); +EXPORT_SYMBOL(iochk_clear); +EXPORT_SYMBOL(iochk_read); +#endif /* HAVE_ARCH_IOMAP_CHECK */ From david at gibson.dropbear.id.au Fri Feb 25 15:14:46 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 25 Feb 2005 15:14:46 +1100 Subject: [PPC64] Hugepage hash flushing bugfix Message-ID: <20050225041446.GC10725@localhost.localdomain> Andrew, Linus, please apply: Fix a potentially bad (although very rarely triggered) bug in the ppc64 hugepage code. hpte_update() did not correctly calculate the address for hugepages, so pte_clear() (which we use for hugepage ptes as well as normal ones) would not correctly flush the hash page table entry. Under the right circumstances this could potentially lead to duplicate hash entries, which is very bad. davem's upcoming patch to pass the virtual address directly to set_pte() and its ilk will obsolete this, but this is bad enough it should probably be fixed in the meantime. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/mm/tlb.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/tlb.c 2004-09-09 09:59:49.000000000 +1000 +++ working-2.6/arch/ppc64/mm/tlb.c 2005-02-25 14:56:47.000000000 +1100 @@ -85,8 +85,12 @@ ptepage = virt_to_page(ptep); mm = (struct mm_struct *) ptepage->mapping; - addr = ptepage->index + - (((unsigned long)ptep & ~PAGE_MASK) * PTRS_PER_PTE); + addr = ptepage->index; + if (pte_huge(pte)) + addr += ((unsigned long)ptep & ~PAGE_MASK) + / sizeof(*ptep) * HPAGE_SIZE; + else + addr += ((unsigned long)ptep & ~PAGE_MASK) * PTRS_PER_PTE; if (REGION_ID(addr) == USER_REGION_ID) context = mm->context.id; -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Fri Feb 25 15:33:38 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 25 Feb 2005 15:33:38 +1100 Subject: looking for help with scomc/scomd registers on 970 In-Reply-To: <421E06B9.2000504@nortel.com> References: <421E06B9.2000504@nortel.com> Message-ID: <1109306018.14992.52.camel@gaston> On Thu, 2005-02-24 at 10:54 -0600, Chris Friesen wrote: > Not strictly linux related, but I thought someone might know the answer. > > I've got a request from someone that wants to be able to flush the L2 on > the 970. > > The user manual has a procedure to do this, but it involves first > flipping the cache to direct-mapped mode by setting SCOM register > 0x43000 bit 0x8000. The only thing is, I can't find any linux code that > ever touches the SCOM stuff, and the manual has no examples of *reading* > from the SCOM area, just writing to it, so I'm not entirely sure how to > do that. > > A google search found the following snippet of darwin code: > > > lis r8,cFIR ; Get the Core FIR register address > ori r8,r8,0x8000 ; Set to read data > sync > mtspr scomc,r8 ; Request the Core FIR > mfspr r25,scomd ; Get the source > mfspr r8,scomc ; Get back the status (we just ignore it) > sync > isync > > > > This implies that bit 0x8000 needs to be set to specify a read command, > and that we need to read the status after the read. Does anyone know if > this is in fact the case? I think so. Also, there are some erratas with early 970's where the stuff is shifted one bit. The Darwin source is a good reference for that. > Also, in that code they make reference to early chip revisions that > returned scom reads shifted by one bit. Does anyone know which versions > are affected? I think all non-FX. Ben. From michael at ellerman.id.au Fri Feb 25 19:14:08 2005 From: michael at ellerman.id.au (Michael Ellerman) Date: Fri, 25 Feb 2005 19:14:08 +1100 Subject: [RFC/PATCH] Updated: ppc64: Add mem=X option In-Reply-To: <20050222192423.727023f7.michael@ellerman.id.au> References: <20050222192423.727023f7.michael@ellerman.id.au> Message-ID: <20050225191408.599c613d.michael@ellerman.id.au> Hi All, Here is an updated patch for adding support for the mem=X boot option. Nothing major's changed, just a bunch of cleanups as suggested by Olof. I've booted this a couple of times on a P5 LPAR. Apart from the NUMA code which I'd like to test further, I'm fairly happy with this. I'll test it all a bit more next week and then hopefully it'll be ready for 2.6.12. cheers Summary: - Fixed comments - Rename prom_init.c variables for clarity - Removed likely/unlikely abuse - Only bolt/map up to memory_limit on iSeries/pSeries - Align memory_limit to PAGE_SIZE on iSeries - On pSeries memory_limit & tce_alloc_start/end are 16MB aligned - Don't offline numa nodes just for mem limit, just set spanned_pages = 0 - Fix stupid truncation algorithm, thanks to Olof and Stephen arch/ppc64/kernel/iSeries_setup.c | 38 +++++++---- arch/ppc64/kernel/lmb.c | 33 +++++++++ arch/ppc64/kernel/prom.c | 15 ++++ arch/ppc64/kernel/prom_init.c | 131 +++++++++++++++++++++++++++++++++++--- arch/ppc64/kernel/setup.c | 20 ++++- arch/ppc64/mm/hash_utils.c | 23 ++++++ arch/ppc64/mm/numa.c | 29 +++++++- include/asm-ppc64/lmb.h | 1 8 files changed, 259 insertions(+), 31 deletions(-) Index: latest-bk-with-mem-option/arch/ppc64/kernel/setup.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/kernel/setup.c +++ latest-bk-with-mem-option/arch/ppc64/kernel/setup.c @@ -641,12 +641,11 @@ void __init setup_system(void) early_console_initialized = 1; register_console(&udbg_console); -#endif /* !CONFIG_PPC_ISERIES */ - /* Save unparsed command line copy for /proc/cmdline */ strlcpy(saved_command_line, cmd_line, COMMAND_LINE_SIZE); parse_early_param(); +#endif /* !CONFIG_PPC_ISERIES */ #if defined(CONFIG_SMP) && !defined(CONFIG_PPC_ISERIES) /* @@ -805,20 +804,31 @@ struct seq_operations cpuinfo_op = { .show = show_cpuinfo, }; -#if 0 /* XXX not currently used */ +/* + * These three variables are used to save values passed to us by prom_init() + * via the device tree. The TCE variables are needed because with a memory_limit + * in force we may need to explicitly map the TCE are at the top of RAM. + */ unsigned long memory_limit; +unsigned long tce_alloc_start; +unsigned long tce_alloc_end; +#ifdef CONFIG_PPC_ISERIES +/* + * On iSeries we just parse the mem=X option from the command line. + * On pSeries it's a bit more complicated, see prom_init_mem() + */ static int __init early_parsemem(char *p) { if (!p) return 0; - memory_limit = memparse(p, &p); + memory_limit = ALIGN(memparse(p, &p), PAGE_SIZE); return 0; } early_param("mem", early_parsemem); -#endif +#endif /* CONFIG_PPC_ISERIES */ #ifdef CONFIG_PPC_MULTIPLATFORM static int __init set_preferred_console(void) Index: latest-bk-with-mem-option/arch/ppc64/kernel/lmb.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/kernel/lmb.c +++ latest-bk-with-mem-option/arch/ppc64/kernel/lmb.c @@ -344,3 +344,36 @@ lmb_abs_to_phys(unsigned long aa) return pa; } + +/* + * Truncate the lmb list to memory_limit if it's set + * You must call lmb_analyze() after this. + */ +void __init lmb_apply_memory_limit(void) +{ + extern unsigned long memory_limit; + unsigned long i, limit; + struct lmb_region *mem = &(lmb.memory); + + if (! memory_limit) + return; + + limit = memory_limit; + for (i = 0; i < mem->cnt; i++) { + if (limit > mem->region[i].size) { + limit -= mem->region[i].size; + continue; + } + +#ifdef DEBUG + udbg_printf("lmb_truncate(): truncating at region %x\n", i); + udbg_printf("lmb_truncate(): total = %x\n", total); + udbg_printf("lmb_truncate(): size = %x\n", mem->region[i].size); + udbg_printf("lmb_truncate(): crop = %x\n", crop); +#endif + + mem->region[i].size = limit; + mem->cnt = i + 1; + break; + } +} Index: latest-bk-with-mem-option/include/asm-ppc64/lmb.h =================================================================== --- latest-bk-with-mem-option.orig/include/asm-ppc64/lmb.h +++ latest-bk-with-mem-option/include/asm-ppc64/lmb.h @@ -53,6 +53,7 @@ extern unsigned long __init lmb_alloc_ba extern unsigned long __init lmb_phys_mem_size(void); extern unsigned long __init lmb_end_of_DRAM(void); extern unsigned long __init lmb_abs_to_phys(unsigned long); +extern void __init lmb_apply_memory_limit(void); extern void lmb_dump_all(void); Index: latest-bk-with-mem-option/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/kernel/iSeries_setup.c +++ latest-bk-with-mem-option/arch/ppc64/kernel/iSeries_setup.c @@ -284,7 +284,7 @@ unsigned long iSeries_process_mainstore_ return mem_blocks; } -static void __init iSeries_parse_cmdline(void) +static void __init iSeries_get_cmdline(void) { char *p, *q; @@ -304,6 +304,8 @@ static void __init iSeries_parse_cmdline /*static*/ void __init iSeries_init_early(void) { + extern unsigned long memory_limit; + DBG(" -> iSeries_init_early()\n"); ppcdbg_initialize(); @@ -351,6 +353,29 @@ static void __init iSeries_parse_cmdline */ build_iSeries_Memory_Map(); + iSeries_get_cmdline(); + + /* Save unparsed command line copy for /proc/cmdline */ + strlcpy(saved_command_line, cmd_line, COMMAND_LINE_SIZE); + + /* Parse early parameters, in particular mem=x */ + parse_early_param(); + + if (memory_limit) { + if (memory_limit > systemcfg->physicalMemorySize) + printk("Ignoring 'mem' option, value %lu is too large.\n", memory_limit); + else + systemcfg->physicalMemorySize = memory_limit; + } + + /* Bolt kernel mappings for all of memory (or just a bit if we've got a limit) */ + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); + + lmb_init(); + lmb_add(0, systemcfg->physicalMemorySize); + lmb_analyze(); /* ?? */ + lmb_reserve(0, __pa(klimit)); + /* Initialize machine-dependency vectors */ #ifdef CONFIG_SMP smp_init_iSeries(); @@ -376,9 +401,6 @@ static void __init iSeries_parse_cmdline initrd_start = initrd_end = 0; #endif /* CONFIG_BLK_DEV_INITRD */ - - iSeries_parse_cmdline(); - DBG(" <- iSeries_init_early()\n"); } @@ -539,14 +561,6 @@ static void __init build_iSeries_Memory_ * nextPhysChunk */ systemcfg->physicalMemorySize = chunk_to_addr(nextPhysChunk); - - /* Bolt kernel mappings for all of memory */ - iSeries_bolt_kernel(0, systemcfg->physicalMemorySize); - - lmb_init(); - lmb_add(0, systemcfg->physicalMemorySize); - lmb_analyze(); /* ?? */ - lmb_reserve(0, __pa(klimit)); } /* Index: latest-bk-with-mem-option/arch/ppc64/kernel/prom.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/kernel/prom.c +++ latest-bk-with-mem-option/arch/ppc64/kernel/prom.c @@ -875,6 +875,8 @@ static int __init early_init_dt_scan_cho const char *full_path, void *data) { u32 *prop; + u64 *prop64; + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; if (strcmp(full_path, "/chosen") != 0) return 0; @@ -891,6 +893,18 @@ static int __init early_init_dt_scan_cho if (get_flat_dt_prop(node, "linux,iommu-force-on", NULL) != NULL) iommu_force_on = 1; + prop64 = (u64*)get_flat_dt_prop(node, "linux,memory-limit", NULL); + if (prop64) + memory_limit = *prop64; + + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-start", NULL); + if (prop64) + tce_alloc_start = *prop64; + + prop64 = (u64*)get_flat_dt_prop(node, "linux,tce-alloc-end", NULL); + if (prop64) + tce_alloc_end = *prop64; + #ifdef CONFIG_PPC_PSERIES /* To help early debugging via the front panel, we retreive a minimal * set of RTAS infos now if available @@ -1030,6 +1044,7 @@ void __init early_init_devtree(void *par lmb_init(); scan_flat_dt(early_init_dt_scan_root, NULL); scan_flat_dt(early_init_dt_scan_memory, NULL); + lmb_apply_memory_limit(); lmb_analyze(); systemcfg->physicalMemorySize = lmb_phys_mem_size(); lmb_reserve(0, __pa(klimit)); Index: latest-bk-with-mem-option/arch/ppc64/mm/hash_utils.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/mm/hash_utils.c +++ latest-bk-with-mem-option/arch/ppc64/mm/hash_utils.c @@ -140,6 +140,8 @@ void __init htab_initialize(void) unsigned long pteg_count; unsigned long mode_rw; int i, use_largepages = 0; + unsigned long base = 0, size = 0; + extern unsigned long memory_limit, tce_alloc_start, tce_alloc_end; DBG(" -> htab_initialize()\n"); @@ -195,8 +197,6 @@ void __init htab_initialize(void) /* create bolted the linear mapping in the hash table */ for (i=0; i < lmb.memory.cnt; i++) { - unsigned long base, size; - base = lmb.memory.region[i].physbase + KERNELBASE; size = lmb.memory.region[i].size; @@ -225,6 +225,25 @@ void __init htab_initialize(void) #endif /* CONFIG_U3_DART */ create_pte_mapping(base, base + size, mode_rw, use_largepages); } + + /* + * If we have a memory_limit and we've allocated TCEs then we need to + * explicitly map the TCE area at the top of RAM. We also cope with the + * case that the TCEs start below memory_limit. + * tce_alloc_start/end are 16MB aligned so the mapping should work + * for either 4K or 16MB pages. + */ + if (tce_alloc_start) { + tce_alloc_start += KERNELBASE; + tce_alloc_end += KERNELBASE; + + if (base + size >= tce_alloc_start) + tce_alloc_start = base + size + 1; + + create_pte_mapping(tce_alloc_start, tce_alloc_end, + mode_rw, use_largepages); + } + DBG(" <- htab_initialize()\n"); } #undef KB Index: latest-bk-with-mem-option/arch/ppc64/mm/numa.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/mm/numa.c +++ latest-bk-with-mem-option/arch/ppc64/mm/numa.c @@ -270,6 +270,7 @@ static int __init parse_numa_properties( int max_domain = 0; long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; unsigned long i; + extern unsigned long memory_limit; if (numa_enabled == 0) { printk(KERN_WARNING "NUMA disabled by user\n"); @@ -378,15 +379,37 @@ new_range: size / PAGE_SIZE; } - for (i = start ; i < (start+size); i += MEMORY_INCREMENT) - numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = - numa_domain; + for (i = start; i < (start+size) && i < lmb_end_of_DRAM(); i += MEMORY_INCREMENT) + numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = numa_domain; ranges--; if (ranges) goto new_range; } + if (memory_limit) { + unsigned long size, limit = memory_limit; + + for (i = 0; i <= max_domain; i++) { + size = init_node_data[i].node_spanned_pages * PAGE_SIZE; + if (limit > size) { + limit -= size; + continue; + } + + init_node_data[i].node_spanned_pages = limit / PAGE_SIZE; + + dbg("NUMA: truncating node %ld to 0x%lx bytes\n", i, limit); + break; + } + + while (++i <= max_domain) { + dbg("NUMA: truncating node %ld to 0x0 bytes\n", i); + init_node_data[i].node_start_pfn = 0; + init_node_data[i].node_spanned_pages = 0; + } + } + for (i = 0; i <= max_domain; i++) node_set_online(i); Index: latest-bk-with-mem-option/arch/ppc64/kernel/prom_init.c =================================================================== --- latest-bk-with-mem-option.orig/arch/ppc64/kernel/prom_init.c +++ latest-bk-with-mem-option/arch/ppc64/kernel/prom_init.c @@ -178,6 +178,10 @@ static int __initdata of_platform; static char __initdata prom_cmd_line[COMMAND_LINE_SIZE]; +static unsigned long __initdata prom_memory_limit; +static unsigned long __initdata prom_tce_alloc_start; +static unsigned long __initdata prom_tce_alloc_end; + static unsigned long __initdata alloc_top; static unsigned long __initdata alloc_top_high; static unsigned long __initdata alloc_bottom; @@ -385,10 +389,64 @@ static int __init prom_setprop(phandle n (u32)(unsigned long) value, (u32) valuelen); } +/* We can't use the standard versions because of RELOC headaches. */ +#define isxdigit(c) (('0' <= (c) && (c) <= '9') \ + || ('a' <= (c) && (c) <= 'f') \ + || ('A' <= (c) && (c) <= 'F')) + +#define isdigit(c) ('0' <= (c) && (c) <= '9') +#define islower(c) ('a' <= (c) && (c) <= 'z') +#define toupper(c) (islower(c) ? ((c) - 'a' + 'A') : (c)) + +unsigned long prom_strtoul(const char *cp, const char **endp) +{ + unsigned long result = 0, base = 10, value; + + if (*cp == '0') { + base = 8; + cp++; + if (toupper(*cp) == 'X') { + cp++; + base = 16; + } + } + + while (isxdigit(*cp) && + (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) < base) { + result = result * base + value; + cp++; + } + + if (endp) + *endp = cp; + + return result; +} + +unsigned long prom_memparse(const char *ptr, const char **retptr) +{ + unsigned long ret = prom_strtoul(ptr, retptr); + + switch (**retptr) { + case 'G': + case 'g': + ret <<= 10; + case 'M': + case 'm': + ret <<= 10; + case 'K': + case 'k': + ret <<= 10; + (*retptr)++; + default: + break; + } + return ret; +} /* * Early parsing of the command line passed to the kernel, used for - * the options that affect the iommu + * "mem=x" and the options that affect the iommu */ static void __init early_cmdline_parse(void) { @@ -419,6 +477,14 @@ static void __init early_cmdline_parse(v else if (!strncmp(opt, RELOC("force"), 5)) RELOC(iommu_force_on) = 1; } + + opt = strstr(RELOC(prom_cmd_line), RELOC("mem=")); + if (opt) { + opt += 4; + RELOC(prom_memory_limit) = prom_memparse(opt, (const char **)&opt); + /* Align to 16 MB == size of large page */ + RELOC(prom_memory_limit) = ALIGN(RELOC(prom_memory_limit), 0x1000000); + } } /* @@ -665,15 +731,7 @@ static void __init prom_init_mem(void) } } - /* Setup our top/bottom alloc points, that is top of RMO or top of - * segment 0 when running non-LPAR - */ - if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) - RELOC(alloc_top) = RELOC(rmo_top); - else - RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); - RELOC(alloc_top_high) = RELOC(ram_top); /* Check if we have an initrd after the kernel, if we do move our bottom * point to after it @@ -683,8 +741,41 @@ static void __init prom_init_mem(void) > RELOC(alloc_bottom)) RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(prom_initrd_end)); } + + /* + * If prom_memory_limit is set we reduce the upper limits *except* for + * alloc_top_high. This must be the real top of RAM so we can put + * TCE's up there. + */ + + RELOC(alloc_top_high) = RELOC(ram_top); + + if (RELOC(prom_memory_limit)) { + if (RELOC(prom_memory_limit) <= RELOC(alloc_bottom)) { + prom_printf("Ignoring mem=%x <= alloc_bottom.\n", + RELOC(prom_memory_limit)); + RELOC(prom_memory_limit) = 0; + } else if (RELOC(prom_memory_limit) >= RELOC(ram_top)) { + prom_printf("Ignoring mem=%x >= ram_top.\n", + RELOC(prom_memory_limit)); + RELOC(prom_memory_limit) = 0; + } else { + RELOC(ram_top) = RELOC(prom_memory_limit); + RELOC(rmo_top) = min(RELOC(rmo_top), RELOC(prom_memory_limit)); + } + } + + /* + * Setup our top alloc point, that is top of RMO or top of + * segment 0 when running non-LPAR. + */ + if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) + RELOC(alloc_top) = RELOC(rmo_top); + else + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); prom_printf("memory layout at init:\n"); + prom_printf(" memory_limit : %x (16 MB aligned)\n", RELOC(prom_memory_limit)); prom_printf(" alloc_bottom : %x\n", RELOC(alloc_bottom)); prom_printf(" alloc_top : %x\n", RELOC(alloc_top)); prom_printf(" alloc_top_hi : %x\n", RELOC(alloc_top_high)); @@ -873,6 +964,16 @@ static void __init prom_initialize_tce_t reserve_mem(local_alloc_bottom, local_alloc_top - local_alloc_bottom); + if (RELOC(prom_memory_limit)) { + /* + * We align the start to a 16MB boundary so we can map the TCE area + * using large pages if possible. The end should be the top of RAM + * so no need to align it. + */ + RELOC(prom_tce_alloc_start) = _ALIGN_DOWN(local_alloc_bottom, 0x1000000); + RELOC(prom_tce_alloc_end) = local_alloc_top; + } + /* Flag the first invalid entry */ prom_debug("ending prom_initialize_tce_table\n"); } @@ -1686,9 +1787,21 @@ unsigned long __init prom_init(unsigned */ if (RELOC(ppc64_iommu_off)) prom_setprop(_prom->chosen, "linux,iommu-off", NULL, 0); + if (RELOC(iommu_force_on)) prom_setprop(_prom->chosen, "linux,iommu-force-on", NULL, 0); + if (RELOC(prom_memory_limit)) + prom_setprop(_prom->chosen, "linux,memory-limit", + PTRRELOC(&prom_memory_limit), sizeof(RELOC(prom_memory_limit))); + + if (RELOC(prom_tce_alloc_start)) { + prom_setprop(_prom->chosen, "linux,tce-alloc-start", + PTRRELOC(&prom_tce_alloc_start), sizeof(RELOC(prom_tce_alloc_start))); + prom_setprop(_prom->chosen, "linux,tce-alloc-end", + PTRRELOC(&prom_tce_alloc_end), sizeof(RELOC(prom_tce_alloc_end))); + } + /* * Now finally create the flattened device-tree */ From johnrose at austin.ibm.com Sat Feb 26 09:37:17 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 25 Feb 2005 16:37:17 -0600 Subject: [PATCH] set pci_io_base dynamically if necessary Message-ID: <1109371037.27183.13.camel@sinatra.austin.ibm.com> Upon DLPAR addition of a PCI Host Brige to a system with purely virtual I/O, set pci_io_base as necessary. Please apply, if appropriate. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pSeries_pci.c~01_ppc64_pci_iobase arch/ppc64/kernel/pSeries_pci.c --- 2_6_linus_2/arch/ppc64/kernel/pSeries_pci.c~01_ppc64_pci_iobase 2005-02-25 16:26:20.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pSeries_pci.c 2005-02-25 16:26:20.000000000 -0600 @@ -413,16 +413,18 @@ struct pci_controller * __devinit init_p unsigned int root_size_cells = 0; struct pci_controller *phb; struct pci_bus *bus; + int primary; root_size_cells = prom_n_size_cells(root); + primary = list_empty(&hose_list); phb = alloc_phb_dynamic(dn, root_size_cells); if (!phb) return NULL; pci_process_bridge_OF_ranges(phb, dn); - pci_setup_phb_io_dynamic(phb); + pci_setup_phb_io_dynamic(phb, primary); of_node_put(root); pci_devs_phb_init_dynamic(phb); diff -puN arch/ppc64/kernel/pci.c~01_ppc64_pci_iobase arch/ppc64/kernel/pci.c --- 2_6_linus_2/arch/ppc64/kernel/pci.c~01_ppc64_pci_iobase 2005-02-25 16:26:20.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pci.c 2005-02-25 16:26:20.000000000 -0600 @@ -621,7 +621,8 @@ void __init pci_setup_phb_io(struct pci_ res->end += io_virt_offset; } -void __devinit pci_setup_phb_io_dynamic(struct pci_controller *hose) +void __devinit pci_setup_phb_io_dynamic(struct pci_controller *hose, + int primary) { unsigned long size = hose->pci_io_size; unsigned long io_virt_offset; @@ -633,6 +634,9 @@ void __devinit pci_setup_phb_io_dynamic( hose->global_number, hose->io_base_phys, (unsigned long) hose->io_base_virt); + if (primary) + pci_io_base = (unsigned long)hose->io_base_virt; + io_virt_offset = (unsigned long)hose->io_base_virt - pci_io_base; res = &hose->io_resource; res->start += io_virt_offset; diff -puN arch/ppc64/kernel/pci.h~01_ppc64_pci_iobase arch/ppc64/kernel/pci.h --- 2_6_linus_2/arch/ppc64/kernel/pci.h~01_ppc64_pci_iobase 2005-02-25 16:26:20.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pci.h 2005-02-25 16:26:20.000000000 -0600 @@ -16,9 +16,9 @@ extern unsigned long isa_io_base; extern void pci_setup_pci_controller(struct pci_controller *hose); extern void pci_setup_phb_io(struct pci_controller *hose, int primary); +extern void pci_setup_phb_io_dynamic(struct pci_controller *hose, int primary); extern struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node); -extern void pci_setup_phb_io_dynamic(struct pci_controller *hose); extern struct list_head hose_list; _ From arnd at arndb.de Sat Feb 26 09:02:39 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 25 Feb 2005 23:02:39 +0100 Subject: [RFC] splitting out LPAR support from CONFIG_PSERIES In-Reply-To: <20050223044959.GA10256@austin.ibm.com> References: <200502221723.52051.arnd@arndb.de> <20050223044959.GA10256@austin.ibm.com> Message-ID: <200502252302.40252.arnd@arndb.de> On Middeweken 23 Februar 2005 05:49, Olof Johansson wrote: > I think it's a bad idea, but I don't have any real strong motivations > for it. Ok, everyone I've asked so far seems to agree with you, so I removed the patch from my quilt. Thanks for the feedback. Arnd <>< From johnrose at austin.ibm.com Sat Feb 26 09:40:51 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 25 Feb 2005 16:40:51 -0600 Subject: [PATCH] remove unnecessary ISA ioports Message-ID: <1109371251.27183.17.camel@sinatra.austin.ibm.com> During boot, pSeries_request_regions() should only request I/O ports for legacy ISA in the case that ISA exists on the system. Add a check for this. This patch was suggested by Anton. Please apply, if appropriate. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pSeries_pci.c~02_ppc64_request_regions arch/ppc64/kernel/pSeries_pci.c --- 2_6_linus_2/arch/ppc64/kernel/pSeries_pci.c~02_ppc64_request_regions 2005-02-25 16:28:09.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/pSeries_pci.c 2005-02-25 16:28:09.000000000 -0600 @@ -529,6 +529,9 @@ EXPORT_SYMBOL(pcibios_remove_root_bus); static void __init pSeries_request_regions(void) { + if (!isa_io_base) + return; + request_region(0x20,0x20,"pic1"); request_region(0xa0,0x20,"pic2"); request_region(0x00,0x20,"dma1"); _ From johnrose at austin.ibm.com Sat Feb 26 09:50:57 2005 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 25 Feb 2005 16:50:57 -0600 Subject: [PATCH] allow dynamic enablement of EEH Message-ID: <1109371857.27183.28.camel@sinatra.austin.ibm.com> EEH scans the system I/O adapters at boot for EEH-capabilities. If no EEH-capable adapters are found, the subsystem is marked disabled for the life of the system. EEH should allow dynamic enabling of the EEH subsystem when hotplug-adding an adapter. Please apply, if appropriate. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/eeh.c~04_eeh_add_early arch/ppc64/kernel/eeh.c --- 2_6_linus_2/arch/ppc64/kernel/eeh.c~04_eeh_add_early 2005-02-25 16:29:51.000000000 -0600 +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/eeh.c 2005-02-25 16:29:51.000000000 -0600 @@ -808,7 +808,7 @@ void eeh_add_device_early(struct device_ struct pci_controller *phb; struct eeh_early_enable_info info; - if (!dn || !eeh_subsystem_enabled) + if (!dn) return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { _ From tglx at linutronix.de Sun Feb 27 10:56:28 2005 From: tglx at linutronix.de (tglx at linutronix.de) Date: Sun, 27 Feb 2005 00:56:28 +0100 (CET) Subject: [PATCH 4/10] PPC64: C99 initializers for hw_interrupt_type structures References: <20050227005956.1.patchmail@tglx> Message-ID: <20050227010015.4.patchmail@tglx> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050227/c8e7d0b7/attachment.txt From akpm at osdl.org Sun Feb 27 22:16:55 2005 From: akpm at osdl.org (Andrew Morton) Date: Sun, 27 Feb 2005 03:16:55 -0800 Subject: [PATCH] PPC64: Generic hotplug cpu support In-Reply-To: References: Message-ID: <20050227031655.67233bb5.akpm@osdl.org> Zwane Mwaikambo wrote: > > Patch provides a generic hotplug cpu implementation, with the only current > user being pmac. BUG: using smp_processor_id() in preemptible [00000001] code: swapper/0 caller is .native_idle+0x30/0x60 --- 25/arch/ppc64/kernel/idle.c~ppc64-generic-hotplug-cpu-support-fix 2005-02-27 11:12:47.000000000 -0700 +++ 25-akpm/arch/ppc64/kernel/idle.c 2005-02-27 11:13:03.000000000 -0700 @@ -294,7 +294,7 @@ static int native_idle(void) if (need_resched()) schedule(); - if (cpu_is_offline(smp_processor_id()) && + if (cpu_is_offline(_smp_processor_id()) && system_state == SYSTEM_RUNNING) cpu_die(); } _ From benh at kernel.crashing.org Mon Feb 28 09:22:51 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 28 Feb 2005 09:22:51 +1100 Subject: [PATCH] PPC64: Generic hotplug cpu support In-Reply-To: <20050227031655.67233bb5.akpm@osdl.org> References: <20050227031655.67233bb5.akpm@osdl.org> Message-ID: <1109542971.14993.217.camel@gaston> On Sun, 2005-02-27 at 03:16 -0800, Andrew Morton wrote: > Zwane Mwaikambo wrote: > > > > Patch provides a generic hotplug cpu implementation, with the only current > > user being pmac. > > BUG: using smp_processor_id() in preemptible [00000001] code: swapper/0 > caller is .native_idle+0x30/0x60 > > --- 25/arch/ppc64/kernel/idle.c~ppc64-generic-hotplug-cpu-support-fix 2005-02-27 11:12:47.000000000 -0700 > +++ 25-akpm/arch/ppc64/kernel/idle.c 2005-02-27 11:13:03.000000000 -0700 > @@ -294,7 +294,7 @@ static int native_idle(void) > if (need_resched()) > schedule(); > > - if (cpu_is_offline(smp_processor_id()) && > + if (cpu_is_offline(_smp_processor_id()) && > system_state == SYSTEM_RUNNING) > cpu_die(); > } > _ This is the idle loop. Is that ever supposed to be preempted ? Ben. From akpm at osdl.org Mon Feb 28 09:49:28 2005 From: akpm at osdl.org (Andrew Morton) Date: Sun, 27 Feb 2005 14:49:28 -0800 Subject: [PATCH] PPC64: Generic hotplug cpu support In-Reply-To: <1109542971.14993.217.camel@gaston> References: <20050227031655.67233bb5.akpm@osdl.org> <1109542971.14993.217.camel@gaston> Message-ID: <20050227144928.6c71adaf.akpm@osdl.org> Benjamin Herrenschmidt wrote: > > > - if (cpu_is_offline(smp_processor_id()) && > > + if (cpu_is_offline(_smp_processor_id()) && > > system_state == SYSTEM_RUNNING) > > cpu_die(); > > } > > _ > > This is the idle loop. Is that ever supposed to be preempted ? Nope, it's a false positive. We had to do the same in x86's idle loop and probably others will hit it. From apw at us.ibm.com Mon Feb 28 11:19:46 2005 From: apw at us.ibm.com (Amos Waterland) Date: Sun, 27 Feb 2005 19:19:46 -0500 Subject: [RFC/PATCH] boot debugging symbols Message-ID: <20050228001946.GA28387@kvasir.austin.ibm.com> It is really useful when debugging early boot on simulator to have debug symbols in the 32-bit code that uncompresses the kernel proper. Is this patch all right, or is a CONFIG_BOOT_DEBUG_INFO also needed? -------------- next part -------------- ===== arch/ppc64/boot/Makefile 1.22 vs edited ===== --- 1.22/arch/ppc64/boot/Makefile 2005-01-08 19:42:42 -05:00 +++ edited/arch/ppc64/boot/Makefile 2005-02-27 18:52:03 -05:00 @@ -33,6 +33,11 @@ BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy OBJCOPYFLAGS := contents,alloc,load,readonly,data +ifdef CONFIG_DEBUG_INFO +BOOTCFLAGS += -g +BOOTAFLAGS += -g +endif + src-boot := crt0.S string.S prom.c main.c zlib.c imagesize.c div64.S src-boot := $(addprefix $(obj)/, $(src-boot)) obj-boot := $(addsuffix .o, $(basename $(src-boot))) From utz.bacher at de.ibm.com Mon Feb 28 12:59:15 2005 From: utz.bacher at de.ibm.com (Utz Bacher) Date: Mon, 28 Feb 2005 02:59:15 +0100 (CET) Subject: [PATCH] ppc64: fix nvram partition scan Message-ID: Hi Paul and all, the following patch against 2.6.11-rc4 corrects some problems with bad NVRAM contents: - when the checksum is incorrect, better do not trust anything (instead of assuming the length is correct) - when the partition length is zero, stop looking for more partitions instead of looping Please consider for inclusion. Regards, Utz Signed-off-by: Utz Bacher Index: linux-2.6.11-rc4/arch/ppc64/kernel/nvram.c =================================================================== --- linux-2.6.11-rc4.orig/arch/ppc64/kernel/nvram.c +++ linux-2.6.11-rc4/arch/ppc64/kernel/nvram.c @@ -535,9 +535,21 @@ memcpy(&phead, header, NVRAM_HEADER_LEN); c_sum = nvram_checksum(&phead); - if (c_sum != phead.checksum) - printk(KERN_WARNING "WARNING: nvram partition checksum " - "was %02x, should be %02x!\n", phead.checksum, c_sum); + if (c_sum != phead.checksum) { + printk(KERN_WARNING "WARNING: nvram partition " + "checksum was %02x, should be %02x! Skipping " + "subsequent partitions to prevent data loss\n", + phead.checksum, c_sum); + kfree(header); + return size; + } + + if (!phead.length) { + printk(KERN_WARNING "nvram partition chain ends " + "(zero partition length). Assuming end " + "of chain.\n"); + break; + } tmp_part = (struct nvram_partition *) kmalloc(sizeof(struct nvram_partition), GFP_KERNEL);