From linas at austin.ibm.com Fri Apr 1 06:06:22 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 31 Mar 2005 14:06:22 -0600 Subject: Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050322175728.GE12675@colo.lackof.org> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> <20050226063609.GC7036@colo.lackof.org> <20050321231028.GV498@austin.ibm.com> <20050322175728.GE12675@colo.lackof.org> Message-ID: <20050331200622.GG15596@austin.ibm.com> Hmm, Got distracted by other issues, so I'm answering a week late... On Tue, Mar 22, 2005 at 10:57:28AM -0700, Grant Grundler was heard to remark: > On Mon, Mar 21, 2005 at 05:10:28PM -0600, Linas Vepstas wrote: > > My current hardware will halt all i/o to/from the symbios controller > > upon detection of a PCI error. The recovery proceedure that I am > > currently using is to call system firmware (aka 'bios') to raise > > and then lower the #RST pci signal line for 1/4 second, then wait 2 > > seconds for the PCI bus to settle, then restore the PCI config space > > registers (BARs, interrupt line, etc) to what they used to be. Then, > > I call sym_start_up() in an attempt to get the symbios card working > > again. And that's where I get stuck ... > > Does this process cause a SCSI bus reset? Don't get a chance to get that far. Have to bring up the PCI interfaces first, before any scsi command can be issued. > BTW, when did sym2 get a chance to cleanup "pending" requests? Yes, the sym2 driver has mechanisms for that. > You want everything moved back to the "queued" state or failed > (flush pending IO so upper layers can retry if they want). Upper layer is the linux block device; my understanding is that it does not retry, nor do the filesystems above that. Passing errors upwards seems to be pretty darned fatal. My goal is to limit retries to the driver. > > Sometimes, I get the PCI error while the card is sitting there idly > > after the #RST, but more often, I get the error in sym_chip_reset(), > > immediately after the OUTB (nc_istat, SRST); > > Oh? Is this the driver trying to issue SCSI Reset? No I am trying to reinitialize the scsi card after the pci bus has been reset. This has nothing to do with scsi bus resets, as far as I know ... --linas From apw at us.ibm.com Fri Apr 1 05:34:58 2005 From: apw at us.ibm.com (Amos Waterland) Date: Thu, 31 Mar 2005 14:34:58 -0500 Subject: [patch] fix prom.c compile warning Message-ID: <20050331193458.GA4186@kvasir.watson.ibm.com> The code in unflatten_device_tree knows that get_property is written to only return with lenp equal to 1 when also returning a valid pointer. The gcc 3.3.3 compiler is not able to prove this to itself, so it warns about a possible uninitialized pointer dereference: .../arch/ppc64/kernel/prom.c: In function `unflatten_device_tree': .../arch/ppc64/kernel/prom.c:828: warning: `p' might be used uninitialized in this function Unless it is desired to rework the interaction between the two functions, this will keep the existing behavior but quiet the compiler. Signed-off-by: Amos Waterland ===== arch/ppc64/kernel/prom.c 1.127 vs edited ===== --- 1.127/arch/ppc64/kernel/prom.c 2005-03-28 17:21:21 -05:00 +++ edited/arch/ppc64/kernel/prom.c 2005-03-31 13:40:42 -05:00 @@ -825,7 +825,7 @@ { unsigned long start, mem, size; struct device_node **allnextp = &allnodes; - char *p; + char *p = NULL; int l = 0; DBG(" -> unflatten_device_tree()\n"); From linas at austin.ibm.com Fri Apr 1 06:14:09 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 31 Mar 2005 14:14:09 -0600 Subject: Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <4240581C.1000906@us.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> <20050226063609.GC7036@colo.lackof.org> <20050321231028.GV498@austin.ibm.com> <4240581C.1000906@us.ibm.com> Message-ID: <20050331201409.GH15596@austin.ibm.com> On Tue, Mar 22, 2005 at 11:38:36AM -0600, Brian King was heard to remark: > Linas Vepstas wrote: > > > > My current hardware will halt all i/o to/from the symbios controller > > upon detection of a PCI error. The recovery proceedure that I am > > currently using is to call system firmware (aka 'bios') to raise > > and then lower the #RST pci signal line for 1/4 second, then wait 2 > > seconds for the PCI bus to settle, then restore the PCI config space > > registers (BARs, interrupt line, etc) to what they used to be. Then, > > I call sym_start_up() in an attempt to get the symbios card working > > again. And that's where I get stuck ... > > > > My assumption is that after the #RST, that the symbios card will sit > > there, dumb and stupid, with no scripts running. But sometimes I find > > that the card has done something to make the PCI error hardware trip > > again. Typically, this means that the card attempted to DMA to some > > address that its not allowed to touch, or raised #SERR or possibly > > #PERR (I can't tell which). > > What config registers are you restoring? BAR's, grant, latency, interrupt, cacheline size. > Is it possible symbios does not > like something in your config restore? possibly... > Another possiblity is that asserting PCI reset is not cleanly resetting > the card. Does PCI reset force BIST to be run on these cards? You could > try to manually run BIST on the card after the PCI reset to see if that I didn't see bist in the code, but I wasn't looking for it either. I could try that. > helps, or you could try power cycling the slot instead of using PCI reset. yes I could :( I'll try that next. Problem is, not all slots are power-cyclable, only the hotplug slots are. I've discoverd that for example, the ethernet chips are soldered to the motherboard, and can't be power-cycled (but fortunately, those don't give me trouble). --linas From linas at austin.ibm.com Fri Apr 1 06:21:48 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 31 Mar 2005 14:21:48 -0600 Subject: RFC/Patch more xmon additions In-Reply-To: <16936.10223.704710.234312@cargo.ozlabs.ibm.com> References: <421E3BE3.90301@vnet.ibm.com> <16936.10223.704710.234312@cargo.ozlabs.ibm.com> Message-ID: <20050331202148.GI15596@austin.ibm.com> Hi Will, I just unearthed this email from the deep mound ... On Fri, Mar 04, 2005 at 08:18:39PM +1100, Paul Mackerras was heard to remark: > will schmidt writes: > > > Am looking for comments on this additional function i've added to xmon > > on the side.. > > > > the bulk of my intent was to make it easier for me to poke at memory > > within a particular user process. > > The main problem I have with it is that we seem to be accessing a lot > of kernel data structures without checking any pointers or using > mread() to read the memory safely. One of the goals of xmon is that > it should be as reliable as possible even if kernel data structures > are corrupted, and I think your patch would reduce that reliability. Please clean up per Paul's suggestions and resubmit; as a matter of principle, its nice to have the debugger print parsed output instead of having to count 289 bytes into some struct task or such to manually decode a bitflag ... --linas From dwmw2 at infradead.org Fri Apr 1 06:44:47 2005 From: dwmw2 at infradead.org (David Woodhouse) Date: Thu, 31 Mar 2005 21:44:47 +0100 Subject: [PATCH] Export re{serv,leas}e_pmc_hardware() for oprofile Message-ID: <1112301887.24487.363.camel@hades.cambridge.redhat.com> CONFIG_OPROFILE=m doesn't work on ppc64 if these aren't exported... Signed-off-by: David Woodhouse --- linux-2.6.11/arch/ppc64/kernel/pmc.c.orig 2005-03-31 20:31:07.000000000 +0100 +++ linux-2.6.11/arch/ppc64/kernel/pmc.c 2005-03-31 20:30:15.000000000 +0100 @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -50,6 +51,7 @@ int reserve_pmc_hardware(perf_irq_t new_ spin_unlock(&pmc_owner_lock); return err; } +EXPORT_SYMBOL_GPL(reserve_pmc_hardware); void release_pmc_hardware(void) { @@ -62,3 +64,4 @@ void release_pmc_hardware(void) spin_unlock(&pmc_owner_lock); } +EXPORT_SYMBOL_GPL(release_pmc_hardware); -- dwmw2 From jschopp at austin.ibm.com Fri Apr 1 07:42:12 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Thu, 31 Mar 2005 15:42:12 -0600 Subject: system call for LPAR characteristics Message-ID: <424C6EB4.7040400@austin.ibm.com> Scott, Not sure if Manish got back to you or not. Saw this message go by on a mailing list (IBM internal one) and thought it might relate to what you were asking about. If it does you might post to linuxppc64-dev at ozlabs.org and ask about /proc/ppc64/lparcfg containing PURR data. Come to think of it that mailing list might be a good place to get your kernel questions answered. -Joel -------- Original Message -------- On Thu, Mar 31, 2005 at 02:05:33PM -0600, Chakarat Skawratananond wrote: > Hi All, > > AIX has the lpar_get_info( ) system call. > Doesn't seem like we have the equivalent for LoP. > If not, is there a workaround? > > We have /proc/ppc64/lparcfg but this is configuration data, not the > current CPU use. /proc/ppc64/lparcfg has recently been amended to include cpu (PURR) usage. It might not be a feature in either of the distros yet. Jeff Scheel might now. -Olof From mikpe at csd.uu.se Fri Apr 1 08:07:34 2005 From: mikpe at csd.uu.se (Mikael Pettersson) Date: Fri, 1 Apr 2005 00:07:34 +0200 (MEST) Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks Message-ID: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> Here's a 3-part patch kit which adds a ppc64 driver to perfctr, written by David Gibson . ppc64 is sufficiently different from ppc32 that this driver is kept separate from my ppc32 driver. This shouldn't matter unless people actually want to run ppc32 kernels on ppc64 processors. ppc64 perfctr driver from David Gibson : - ppc64 arch hooks: Kconfig, syscalls numbers and tables, task struct, and process management ops (switch_to, exit, fork) Signed-off-by: Mikael Pettersson arch/ppc64/Kconfig | 1 + arch/ppc64/kernel/misc.S | 12 ++++++++++++ arch/ppc64/kernel/process.c | 6 ++++++ include/asm-ppc64/processor.h | 2 ++ include/asm-ppc64/unistd.h | 8 +++++++- 5 files changed, 28 insertions(+), 1 deletion(-) diff -rupN linux-2.6.12-rc1-mm4/arch/ppc64/Kconfig linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/Kconfig --- linux-2.6.12-rc1-mm4/arch/ppc64/Kconfig 2005-03-31 21:08:24.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/Kconfig 2005-03-31 23:28:07.000000000 +0200 @@ -297,6 +297,7 @@ config SECCOMP endmenu +source "drivers/perfctr/Kconfig" menu "General setup" diff -rupN linux-2.6.12-rc1-mm4/arch/ppc64/kernel/misc.S linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/kernel/misc.S --- linux-2.6.12-rc1-mm4/arch/ppc64/kernel/misc.S 2005-03-31 21:08:24.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/kernel/misc.S 2005-03-31 23:28:07.000000000 +0200 @@ -956,6 +956,12 @@ _GLOBAL(sys_call_table32) .llong .sys32_request_key .llong .compat_sys_keyctl .llong .compat_sys_waitid + .llong .sys_ni_syscall /* 273 reserved for sys_ioprio_set */ + .llong .sys_ni_syscall /* 274 reserved for sys_ioprio_get */ + .llong .sys_vperfctr_open /* 275 */ + .llong .sys_vperfctr_control + .llong .sys_vperfctr_write + .llong .sys_vperfctr_read .balign 8 _GLOBAL(sys_call_table) @@ -1232,3 +1238,9 @@ _GLOBAL(sys_call_table) .llong .sys_request_key /* 270 */ .llong .sys_keyctl .llong .sys_waitid + .llong .sys_ni_syscall /* 273 reserved for sys_ioprio_set */ + .llong .sys_ni_syscall /* 274 reserved for sys_ioprio_get */ + .llong .sys_vperfctr_open /* 275 */ + .llong .sys_vperfctr_control + .llong .sys_vperfctr_write + .llong .sys_vperfctr_read diff -rupN linux-2.6.12-rc1-mm4/arch/ppc64/kernel/process.c linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/kernel/process.c --- linux-2.6.12-rc1-mm4/arch/ppc64/kernel/process.c 2005-03-31 21:07:46.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/arch/ppc64/kernel/process.c 2005-03-31 23:28:07.000000000 +0200 @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -225,7 +226,9 @@ struct task_struct *__switch_to(struct t local_irq_save(flags); + perfctr_suspend_thread(&prev->thread); last = _switch(old_thread, new_thread); + perfctr_resume_thread(¤t->thread); local_irq_restore(flags); @@ -323,6 +326,7 @@ void exit_thread(void) last_task_used_altivec = NULL; #endif /* CONFIG_ALTIVEC */ #endif /* CONFIG_SMP */ + perfctr_exit_thread(¤t->thread); } void flush_thread(void) @@ -425,6 +429,8 @@ copy_thread(int nr, unsigned long clone_ */ kregs->nip = *((unsigned long *)ret_from_fork); + perfctr_copy_task(p, regs); + return 0; } diff -rupN linux-2.6.12-rc1-mm4/include/asm-ppc64/processor.h linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/include/asm-ppc64/processor.h --- linux-2.6.12-rc1-mm4/include/asm-ppc64/processor.h 2005-03-31 21:08:31.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/include/asm-ppc64/processor.h 2005-03-31 23:28:07.000000000 +0200 @@ -574,6 +574,8 @@ struct thread_struct { unsigned long vrsave; int used_vr; /* set if process has used altivec */ #endif /* CONFIG_ALTIVEC */ + /* performance counters */ + struct vperfctr *perfctr; }; #define ARCH_MIN_TASKALIGN 16 diff -rupN linux-2.6.12-rc1-mm4/include/asm-ppc64/unistd.h linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/include/asm-ppc64/unistd.h --- linux-2.6.12-rc1-mm4/include/asm-ppc64/unistd.h 2005-03-31 21:07:54.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr27-ppc64-arch-hooks/include/asm-ppc64/unistd.h 2005-03-31 23:28:07.000000000 +0200 @@ -283,8 +283,14 @@ #define __NR_request_key 270 #define __NR_keyctl 271 #define __NR_waitid 272 +/* 273 is reserved for ioprio_set */ +/* 274 is reserved for ioprio_get */ +#define __NR_vperfctr_open 275 +#define __NR_vperfctr_control (__NR_vperfctr_open+1) +#define __NR_vperfctr_write (__NR_vperfctr_open+2) +#define __NR_vperfctr_read (__NR_vperfctr_open+3) -#define __NR_syscalls 273 +#define __NR_syscalls 279 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif From mikpe at csd.uu.se Fri Apr 1 08:09:04 2005 From: mikpe at csd.uu.se (Mikael Pettersson) Date: Fri, 1 Apr 2005 00:09:04 +0200 (MEST) Subject: [PATCH 2.6.12-rc1-mm5 2/3] perfctr: common updates for ppc64 Message-ID: <200503312209.j2VM94QH011932@alkaid.it.uu.se> ppc64 perfctr driver from David Gibson : - perfctr common updates: Makefile, version - perfctr virtual quirk: the ppc64 low-level driver is unable to prevent all stray overflow interrupts, on ppc64 (and only ppc64) the right action in this case is to ignore the interrupt and resume Signed-off-by: Mikael Pettersson drivers/perfctr/Makefile | 5 ++++- drivers/perfctr/version.h | 2 +- drivers/perfctr/virtual.c | 11 ++++++++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/Makefile linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/Makefile --- linux-2.6.12-rc1-mm4/drivers/perfctr/Makefile 2005-03-31 21:08:26.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/Makefile 2005-03-31 23:36:04.000000000 +0200 @@ -1,4 +1,4 @@ -# $Id: Makefile,v 1.26 2004/05/30 23:02:14 mikpe Exp $ +# $Id: Makefile,v 1.27 2005/03/23 01:29:34 mikpe Exp $ # Makefile for the Performance-monitoring counters driver. # This also covers x86_64. @@ -8,6 +8,9 @@ tests-objs-$(CONFIG_X86) := x86_tests.o perfctr-objs-$(CONFIG_PPC32) := ppc.o tests-objs-$(CONFIG_PPC32) := ppc_tests.o +perfctr-objs-$(CONFIG_PPC64) := ppc64.o +tests-objs-$(CONFIG_PPC64) := ppc64_tests.o + perfctr-objs-y += init.o perfctr-objs-$(CONFIG_PERFCTR_INIT_TESTS) += $(tests-objs-y) perfctr-objs-$(CONFIG_PERFCTR_VIRTUAL) += virtual.o diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/version.h linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/version.h --- linux-2.6.12-rc1-mm4/drivers/perfctr/version.h 2005-03-31 21:08:26.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/version.h 2005-03-31 23:36:04.000000000 +0200 @@ -1 +1 @@ -#define VERSION "2.7.14" +#define VERSION "2.7.15" diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/virtual.c linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/virtual.c --- linux-2.6.12-rc1-mm4/drivers/perfctr/virtual.c 2005-03-31 21:08:26.000000000 +0200 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-common-update/drivers/perfctr/virtual.c 2005-03-31 23:36:04.000000000 +0200 @@ -1,4 +1,4 @@ -/* $Id: virtual.c,v 1.111 2005/02/20 11:56:44 mikpe Exp $ +/* $Id: virtual.c,v 1.115 2005/03/28 22:39:02 mikpe Exp $ * Virtual per-process performance counters. * * Copyright (C) 1999-2005 Mikael Pettersson @@ -272,8 +272,17 @@ static void vperfctr_handle_overflow(str pmc_mask = perfctr_cpu_identify_overflow(&perfctr->cpu_state); if (!pmc_mask) { +#ifdef CONFIG_PPC64 + /* On some hardware (ppc64, in particular) it's + * impossible to control interrupts finely enough to + * eliminate overflows on counters we don't care + * about. So in this case just restart the counters + * and keep going. */ + vperfctr_resume(perfctr); +#else printk(KERN_ERR "%s: BUG! pid %d has unidentifiable overflow source\n", __FUNCTION__, tsk->pid); +#endif return; } perfctr->ireload_needed = 1; From mikpe at csd.uu.se Fri Apr 1 08:09:49 2005 From: mikpe at csd.uu.se (Mikael Pettersson) Date: Fri, 1 Apr 2005 00:09:49 +0200 (MEST) Subject: [PATCH 2.6.12-rc1-mm5 3/3] perfctr: ppc64 driver core Message-ID: <200503312209.j2VM9nCe011940@alkaid.it.uu.se> ppc64 perfctr driver from David Gibson : - ppc64 perfctr driver core Signed-off-by: Mikael Pettersson drivers/perfctr/ppc64.c | 743 ++++++++++++++++++++++++++++++++++++++++++ drivers/perfctr/ppc64_tests.c | 322 ++++++++++++++++++ drivers/perfctr/ppc64_tests.h | 12 include/asm-ppc64/perfctr.h | 166 +++++++++ 4 files changed, 1243 insertions(+) diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64.c linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64.c --- linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64.c 2005-03-31 23:37:37.000000000 +0200 @@ -0,0 +1,743 @@ +/* + * PPC64 performance-monitoring counters driver. + * + * based on Mikael Pettersson's 32 bit ppc code + * Copyright (C) 2004 David Gibson, IBM Corporation. + * Copyright (C) 2004 Mikael Pettersson + */ + +#include +#include +#include +#include +#include +#include +#include /* tb_ticks_per_jiffy */ +#include +#include + +#include "ppc64_tests.h" + +extern void ppc64_enable_pmcs(void); + +/* Support for lazy perfctr SPR updates. */ +struct per_cpu_cache { /* roughly a subset of perfctr_cpu_state */ + unsigned int id; /* cache owner id */ + /* Physically indexed cache of the MMCRs. */ + unsigned long ppc64_mmcr0, ppc64_mmcr1, ppc64_mmcra; +}; +static DEFINE_PER_CPU(struct per_cpu_cache, per_cpu_cache); +#define __get_cpu_cache(cpu) (&per_cpu(per_cpu_cache, cpu)) +#define get_cpu_cache() (&__get_cpu_var(per_cpu_cache)) + +/* Structure for counter snapshots, as 32-bit values. */ +struct perfctr_low_ctrs { + unsigned int tsc; + unsigned int pmc[8]; +}; + +static unsigned int new_id(void) +{ + static DEFINE_SPINLOCK(lock); + static unsigned int counter; + int id; + + spin_lock(&lock); + id = ++counter; + spin_unlock(&lock); + return id; +} + +static inline unsigned int read_pmc(unsigned int pmc) +{ + switch (pmc) { + case 0: + return mfspr(SPRN_PMC1); + break; + case 1: + return mfspr(SPRN_PMC2); + break; + case 2: + return mfspr(SPRN_PMC3); + break; + case 3: + return mfspr(SPRN_PMC4); + break; + case 4: + return mfspr(SPRN_PMC5); + break; + case 5: + return mfspr(SPRN_PMC6); + break; + case 6: + return mfspr(SPRN_PMC7); + break; + case 7: + return mfspr(SPRN_PMC8); + break; + + default: + return -EINVAL; + } +} + +static inline void write_pmc(int pmc, s32 val) +{ + switch (pmc) { + case 0: + mtspr(SPRN_PMC1, val); + break; + case 1: + mtspr(SPRN_PMC2, val); + break; + case 2: + mtspr(SPRN_PMC3, val); + break; + case 3: + mtspr(SPRN_PMC4, val); + break; + case 4: + mtspr(SPRN_PMC5, val); + break; + case 5: + mtspr(SPRN_PMC6, val); + break; + case 6: + mtspr(SPRN_PMC7, val); + break; + case 7: + mtspr(SPRN_PMC8, val); + break; + } +} + +#ifdef CONFIG_PERFCTR_INTERRUPT_SUPPORT +static void perfctr_default_ihandler(unsigned long pc) +{ + unsigned int mmcr0 = mfspr(SPRN_MMCR0); + + mmcr0 &= ~MMCR0_PMXE; + mtspr(SPRN_MMCR0, mmcr0); +} + +static perfctr_ihandler_t perfctr_ihandler = perfctr_default_ihandler; + +void do_perfctr_interrupt(struct pt_regs *regs) +{ + unsigned long mmcr0; + + /* interrupts are disabled here, so we don't need to + * preempt_disable() */ + + (*perfctr_ihandler)(instruction_pointer(regs)); + + /* clear PMAO so the interrupt doesn't reassert immediately */ + mmcr0 = mfspr(SPRN_MMCR0) & ~MMCR0_PMAO; + mtspr(SPRN_MMCR0, mmcr0); +} + +void perfctr_cpu_set_ihandler(perfctr_ihandler_t ihandler) +{ + perfctr_ihandler = ihandler ? ihandler : perfctr_default_ihandler; +} + +#else +#define perfctr_cstatus_has_ictrs(cstatus) 0 +#endif + + +#if defined(CONFIG_SMP) && defined(CONFIG_PERFCTR_INTERRUPT_SUPPORT) + +static inline void +set_isuspend_cpu(struct perfctr_cpu_state *state, int cpu) +{ + state->isuspend_cpu = cpu; +} + +static inline int +is_isuspend_cpu(const struct perfctr_cpu_state *state, int cpu) +{ + return state->isuspend_cpu == cpu; +} + +static inline void clear_isuspend_cpu(struct perfctr_cpu_state *state) +{ + state->isuspend_cpu = NR_CPUS; +} + +#else +static inline void set_isuspend_cpu(struct perfctr_cpu_state *state, int cpu) { } +static inline int is_isuspend_cpu(const struct perfctr_cpu_state *state, int cpu) { return 1; } +static inline void clear_isuspend_cpu(struct perfctr_cpu_state *state) { } +#endif + + +static void ppc64_clear_counters(void) +{ + mtspr(SPRN_MMCR0, 0); + mtspr(SPRN_MMCR1, 0); + mtspr(SPRN_MMCRA, 0); + + mtspr(SPRN_PMC1, 0); + mtspr(SPRN_PMC2, 0); + mtspr(SPRN_PMC3, 0); + mtspr(SPRN_PMC4, 0); + mtspr(SPRN_PMC5, 0); + mtspr(SPRN_PMC6, 0); + + if (cpu_has_feature(CPU_FTR_PMC8)) { + mtspr(SPRN_PMC7, 0); + mtspr(SPRN_PMC8, 0); + } +} + +/* + * Driver methods, internal and exported. + */ + +static void perfctr_cpu_write_control(const struct perfctr_cpu_state *state) +{ + struct per_cpu_cache *cache; + unsigned long long value; + + cache = get_cpu_cache(); + /* + * Order matters here: update threshmult and event + * selectors before updating global control, which + * potentially enables PMIs. + * + * Since mtspr doesn't accept a runtime value for the + * SPR number, unroll the loop so each mtspr targets + * a constant SPR. + * + * For processors without MMCR2, we ensure that the + * cache and the state indicate the same value for it, + * preventing any actual mtspr to it. Ditto for MMCR1. + */ + value = state->control.mmcra; + if (value != cache->ppc64_mmcra) { + cache->ppc64_mmcra = value; + mtspr(SPRN_MMCRA, value); + } + value = state->control.mmcr1; + if (value != cache->ppc64_mmcr1) { + cache->ppc64_mmcr1 = value; + mtspr(SPRN_MMCR1, value); + } + value = state->control.mmcr0; + if (perfctr_cstatus_has_ictrs(state->user.cstatus)) + value |= MMCR0_PMXE; + if (value != cache->ppc64_mmcr0) { + cache->ppc64_mmcr0 = value; + mtspr(SPRN_MMCR0, value); + } + cache->id = state->id; +} + +static void perfctr_cpu_read_counters(struct perfctr_cpu_state *state, + struct perfctr_low_ctrs *ctrs) +{ + unsigned int cstatus, i, pmc; + + cstatus = state->user.cstatus; + if (perfctr_cstatus_has_tsc(cstatus)) + ctrs->tsc = mftb() & 0xffffffff; + + for (i = 0; i < perfctr_cstatus_nractrs(cstatus); ++i) { + pmc = state->user.pmc[i].map; + ctrs->pmc[i] = read_pmc(pmc); + } +} + +#ifdef CONFIG_PERFCTR_INTERRUPT_SUPPORT +static void perfctr_cpu_isuspend(struct perfctr_cpu_state *state) +{ + unsigned int cstatus, nrctrs, i; + int cpu; + + cpu = smp_processor_id(); + set_isuspend_cpu(state, cpu); /* early to limit cpu's live range */ + cstatus = state->user.cstatus; + nrctrs = perfctr_cstatus_nrctrs(cstatus); + for (i = perfctr_cstatus_nractrs(cstatus); i < nrctrs; ++i) { + unsigned int pmc = state->user.pmc[i].map; + unsigned int now = read_pmc(pmc); + + state->user.pmc[i].sum += now - state->user.pmc[i].start; + state->user.pmc[i].start = now; + } +} + +static void perfctr_cpu_iresume(const struct perfctr_cpu_state *state) +{ + struct per_cpu_cache *cache; + unsigned int cstatus, nrctrs, i; + int cpu; + + cpu = smp_processor_id(); + cache = __get_cpu_cache(cpu); + if (cache->id == state->id) { + /* Clearing cache->id to force write_control() + to unfreeze MMCR0 would be done here, but it + is subsumed by resume()'s MMCR0 reload logic. */ + if (is_isuspend_cpu(state, cpu)) { + return; /* skip reload of PMCs */ + } + } + /* + * The CPU state wasn't ours. + * + * The counters must be frozen before being reinitialised, + * to prevent unexpected increments and missed overflows. + * + * All unused counters must be reset to a non-overflow state. + */ + if (!(cache->ppc64_mmcr0 & MMCR0_FC)) { + cache->ppc64_mmcr0 |= MMCR0_FC; + mtspr(SPRN_MMCR0, cache->ppc64_mmcr0); + } + cstatus = state->user.cstatus; + nrctrs = perfctr_cstatus_nrctrs(cstatus); + for (i = perfctr_cstatus_nractrs(cstatus); i < nrctrs; ++i) { + write_pmc(state->user.pmc[i].map, state->user.pmc[i].start); + } +} + +/* Call perfctr_cpu_ireload() just before perfctr_cpu_resume() to + bypass internal caching and force a reload if the I-mode PMCs. */ +void perfctr_cpu_ireload(struct perfctr_cpu_state *state) +{ +#ifdef CONFIG_SMP + clear_isuspend_cpu(state); +#else + get_cpu_cache()->id = 0; +#endif +} + +/* PRE: the counters have been suspended and sampled by perfctr_cpu_suspend() */ +unsigned int perfctr_cpu_identify_overflow(struct perfctr_cpu_state *state) +{ + unsigned int cstatus, nractrs, nrctrs, i; + unsigned int pmc_mask = 0; + int nr_pmcs = 6; + + if (cpu_has_feature(CPU_FTR_PMC8)) + nr_pmcs = 8; + + cstatus = state->user.cstatus; + nractrs = perfctr_cstatus_nractrs(cstatus); + nrctrs = perfctr_cstatus_nrctrs(cstatus); + + /* Ickity, ickity, ick. We don't have fine enough interrupt + * control to disable interrupts on all the counters we're not + * interested in. So, we have to deal with overflows on actrs + * amd unused PMCs as well as the ones we actually care + * about. */ + for (i = 0; i < nractrs; ++i) { + int pmc = state->user.pmc[i].map; + unsigned int val = read_pmc(pmc); + + /* For actrs, force a sample if they overflowed */ + + if ((int)val < 0) { + state->user.pmc[i].sum += val - state->user.pmc[i].start; + state->user.pmc[i].start = 0; + write_pmc(pmc, 0); + } + } + for (; i < nrctrs; ++i) { + if ((int)state->user.pmc[i].start < 0) { /* PPC64-specific */ + int pmc = state->user.pmc[i].map; + /* XXX: "+=" to correct for overshots */ + state->user.pmc[i].start = state->control.ireset[pmc]; + pmc_mask |= (1 << i); + } + } + + /* Clear any unused overflowed counters, so we don't loop on + * the interrupt */ + for (i = 0; i < nr_pmcs; ++i) { + if (! (state->unused_pmcs & (1<control.header.nractrs; + nrctrs = i + state->control.header.nrictrs; + for(; i < nrctrs; ++i) { + unsigned int pmc = state->user.pmc[i].map; + if ((int)state->control.ireset[pmc] < 0) /* PPC64-specific */ + return -EINVAL; + state->user.pmc[i].start = state->control.ireset[pmc]; + } + return 0; +} + +#else /* CONFIG_PERFCTR_INTERRUPT_SUPPORT */ +static inline void perfctr_cpu_isuspend(struct perfctr_cpu_state *state) { } +static inline void perfctr_cpu_iresume(const struct perfctr_cpu_state *state) { } +static inline int check_ireset(struct perfctr_cpu_state *state) { return 0; } +#endif /* CONFIG_PERFCTR_INTERRUPT_SUPPORT */ + +static int check_control(struct perfctr_cpu_state *state) +{ + unsigned int i, nractrs, nrctrs, pmc_mask, pmc; + unsigned int nr_pmcs = 6; + + if (cpu_has_feature(CPU_FTR_PMC8)) + nr_pmcs = 8; + + nractrs = state->control.header.nractrs; + nrctrs = nractrs + state->control.header.nrictrs; + if (nrctrs < nractrs || nrctrs > nr_pmcs) + return -EINVAL; + + pmc_mask = 0; + for (i = 0; i < nrctrs; ++i) { + pmc = state->control.pmc_map[i]; + state->user.pmc[i].map = pmc; + if (pmc >= nr_pmcs || (pmc_mask & (1<control.mmcr0 & MMCR0_PMXE) + || (state->control.mmcr0 & MMCR0_PMAO) + || (state->control.mmcr0 & MMCR0_TBEE) ) + return -EINVAL; + + state->unused_pmcs = ((1 << nr_pmcs)-1) & ~pmc_mask; + + state->id = new_id(); + + return 0; +} + +int perfctr_cpu_update_control(struct perfctr_cpu_state *state, int is_global) +{ + int err; + + clear_isuspend_cpu(state); + state->user.cstatus = 0; + + /* disallow i-mode counters if we cannot catch the interrupts */ + if (!(perfctr_info.cpu_features & PERFCTR_FEATURE_PCINT) + && state->control.header.nrictrs) + return -EPERM; + + err = check_control(state); /* may initialise state->cstatus */ + if (err < 0) + return err; + err = check_ireset(state); + if (err < 0) + return err; + state->user.cstatus |= perfctr_mk_cstatus(state->control.header.tsc_on, + state->control.header.nractrs, + state->control.header.nrictrs); + return 0; +} + +/* + * get_reg_offset() maps SPR numbers to offsets into struct perfctr_cpu_control. + */ +static const struct { + unsigned int spr; + unsigned int offset; + unsigned int size; +} reg_offsets[] = { + { SPRN_MMCR0, offsetof(struct perfctr_cpu_control, mmcr0), sizeof(long) }, + { SPRN_MMCR1, offsetof(struct perfctr_cpu_control, mmcr1), sizeof(long) }, + { SPRN_MMCRA, offsetof(struct perfctr_cpu_control, mmcra), sizeof(long) }, + { SPRN_PMC1, offsetof(struct perfctr_cpu_control, ireset[1-1]), sizeof(int) }, + { SPRN_PMC2, offsetof(struct perfctr_cpu_control, ireset[2-1]), sizeof(int) }, + { SPRN_PMC3, offsetof(struct perfctr_cpu_control, ireset[3-1]), sizeof(int) }, + { SPRN_PMC4, offsetof(struct perfctr_cpu_control, ireset[4-1]), sizeof(int) }, + { SPRN_PMC5, offsetof(struct perfctr_cpu_control, ireset[5-1]), sizeof(int) }, + { SPRN_PMC6, offsetof(struct perfctr_cpu_control, ireset[6-1]), sizeof(int) }, + { SPRN_PMC7, offsetof(struct perfctr_cpu_control, ireset[7-1]), sizeof(int) }, + { SPRN_PMC8, offsetof(struct perfctr_cpu_control, ireset[8-1]), sizeof(int) }, +}; + +static int get_reg_offset(unsigned int spr, unsigned int *size) +{ + unsigned int i; + + for(i = 0; i < ARRAY_SIZE(reg_offsets); ++i) + if (spr == reg_offsets[i].spr) { + *size = reg_offsets[i].size; + return reg_offsets[i].offset; + } + return -1; +} + +static int access_regs(struct perfctr_cpu_control *control, + void *argp, unsigned int argbytes, int do_write) +{ + struct perfctr_cpu_reg *regs; + unsigned int i, nr_regs, size; + int offset; + + nr_regs = argbytes / sizeof(struct perfctr_cpu_reg); + if (nr_regs * sizeof(struct perfctr_cpu_reg) != argbytes) + return -EINVAL; + regs = (struct perfctr_cpu_reg*)argp; + + for(i = 0; i < nr_regs; ++i) { + offset = get_reg_offset(regs[i].nr, &size); + if (offset < 0) + return -EINVAL; + if (size == sizeof(long)) { + unsigned long *where = (unsigned long*)((char*)control + offset); + if (do_write) + *where = regs[i].value; + else + regs[i].value = *where; + } else { + unsigned int *where = (unsigned int*)((char*)control + offset); + if (do_write) + *where = regs[i].value; + else + regs[i].value = *where; + } + } + return argbytes; +} + +int perfctr_cpu_control_write(struct perfctr_cpu_control *control, unsigned int domain, + const void *srcp, unsigned int srcbytes) +{ + if (domain != PERFCTR_DOMAIN_CPU_REGS) + return -EINVAL; + return access_regs(control, (void*)srcp, srcbytes, 1); +} + +int perfctr_cpu_control_read(const struct perfctr_cpu_control *control, unsigned int domain, + void *dstp, unsigned int dstbytes) +{ + if (domain != PERFCTR_DOMAIN_CPU_REGS) + return -EINVAL; + return access_regs((struct perfctr_cpu_control*)control, dstp, dstbytes, 0); +} + +void perfctr_cpu_suspend(struct perfctr_cpu_state *state) +{ + unsigned int i, cstatus; + struct perfctr_low_ctrs now; + + /* quiesce the counters */ + mtspr(SPRN_MMCR0, MMCR0_FC); + get_cpu_cache()->ppc64_mmcr0 = MMCR0_FC; + + if (perfctr_cstatus_has_ictrs(state->user.cstatus)) + perfctr_cpu_isuspend(state); + + perfctr_cpu_read_counters(state, &now); + cstatus = state->user.cstatus; + if (perfctr_cstatus_has_tsc(cstatus)) + state->user.tsc_sum += now.tsc - state->user.tsc_start; + + for (i = 0; i < perfctr_cstatus_nractrs(cstatus); ++i) + state->user.pmc[i].sum += now.pmc[i] - state->user.pmc[i].start; +} + +void perfctr_cpu_resume(struct perfctr_cpu_state *state) +{ + struct perfctr_low_ctrs now; + unsigned int i, cstatus; + + if (perfctr_cstatus_has_ictrs(state->user.cstatus)) + perfctr_cpu_iresume(state); + perfctr_cpu_write_control(state); + + perfctr_cpu_read_counters(state, &now); + cstatus = state->user.cstatus; + if (perfctr_cstatus_has_tsc(cstatus)) + state->user.tsc_start = now.tsc; + + for (i = 0; i < perfctr_cstatus_nractrs(cstatus); ++i) + state->user.pmc[i].start = now.pmc[i]; + + /* XXX: if (SMP && start.tsc == now.tsc) ++now.tsc; */ +} + +void perfctr_cpu_sample(struct perfctr_cpu_state *state) +{ + unsigned int i, cstatus, nractrs; + struct perfctr_low_ctrs now; + + perfctr_cpu_read_counters(state, &now); + cstatus = state->user.cstatus; + if (perfctr_cstatus_has_tsc(cstatus)) { + state->user.tsc_sum += now.tsc - state->user.tsc_start; + state->user.tsc_start = now.tsc; + } + nractrs = perfctr_cstatus_nractrs(cstatus); + for(i = 0; i < nractrs; ++i) { + state->user.pmc[i].sum += now.pmc[i] - state->user.pmc[i].start; + state->user.pmc[i].start = now.pmc[i]; + } +} + +static void perfctr_cpu_clear_counters(void) +{ + struct per_cpu_cache *cache; + + cache = get_cpu_cache(); + memset(cache, 0, sizeof *cache); + cache->id = 0; + + ppc64_clear_counters(); +} + +/**************************************************************** + * * + * Processor detection and initialisation procedures. * + * * + ****************************************************************/ + +static void ppc64_cpu_setup(void) +{ + /* allow user to initialize these???? */ + + unsigned long long mmcr0 = mfspr(SPRN_MMCR0); + unsigned long long mmcra = mfspr(SPRN_MMCRA); + + + ppc64_enable_pmcs(); + + mmcr0 |= MMCR0_FC; + mtspr(SPRN_MMCR0, mmcr0); + + mmcr0 |= MMCR0_FCM1|MMCR0_PMXE|MMCR0_FCECE; + mmcr0 |= MMCR0_PMC1CE|MMCR0_PMCjCE; + mtspr(SPRN_MMCR0, mmcr0); + + mmcra |= MMCRA_SAMPLE_ENABLE; + mtspr(SPRN_MMCRA, mmcra); + + printk("setup on cpu %d, mmcr0 %lx\n", smp_processor_id(), + mfspr(SPRN_MMCR0)); + printk("setup on cpu %d, mmcr1 %lx\n", smp_processor_id(), + mfspr(SPRN_MMCR1)); + printk("setup on cpu %d, mmcra %lx\n", smp_processor_id(), + mfspr(SPRN_MMCRA)); + +/* mtmsrd(mfmsr() | MSR_PMM); */ + + ppc64_clear_counters(); + + mmcr0 = mfspr(SPRN_MMCR0); + mmcr0 &= ~MMCR0_PMAO; + mmcr0 &= ~MMCR0_FC; + mtspr(SPRN_MMCR0, mmcr0); + + printk("start on cpu %d, mmcr0 %llx\n", smp_processor_id(), mmcr0); +} + + +static void perfctr_cpu_clear_one(void *ignore) +{ + /* PREEMPT note: when called via on_each_cpu(), + this is in IRQ context with preemption disabled. */ + perfctr_cpu_clear_counters(); +} + +static void perfctr_cpu_reset(void) +{ + on_each_cpu(perfctr_cpu_clear_one, NULL, 1, 1); + perfctr_cpu_set_ihandler(NULL); +} + +int __init perfctr_cpu_init(void) +{ + extern unsigned long ppc_proc_freq; + extern unsigned long ppc_tb_freq; + + perfctr_info.cpu_features = PERFCTR_FEATURE_RDTSC + | PERFCTR_FEATURE_RDPMC | PERFCTR_FEATURE_PCINT; + + perfctr_cpu_name = "PowerPC64"; + + perfctr_info.cpu_khz = ppc_proc_freq / 1000; + /* We need to round here rather than truncating, because in a + * few cases the raw ratio can end up being 7.9999 or + * suchlike */ + perfctr_info.tsc_to_cpu_mult = + (ppc_proc_freq + ppc_tb_freq - 1) / ppc_tb_freq; + + on_each_cpu((void *)ppc64_cpu_setup, NULL, 0, 1); + + perfctr_ppc64_init_tests(); + + perfctr_cpu_reset(); + return 0; +} + +void __exit perfctr_cpu_exit(void) +{ + perfctr_cpu_reset(); +} + +/**************************************************************** + * * + * Hardware reservation. * + * * + ****************************************************************/ + +static spinlock_t service_mutex = SPIN_LOCK_UNLOCKED; +static const char *current_service = NULL; + +const char *perfctr_cpu_reserve(const char *service) +{ + const char *ret; + + spin_lock(&service_mutex); + + ret = current_service; + if (ret) + goto out; + + ret = "unknown driver (oprofile?)"; + if (reserve_pmc_hardware(do_perfctr_interrupt) != 0) + goto out; + + current_service = service; + ret = NULL; + + out: + spin_unlock(&service_mutex); + return ret; +} + +void perfctr_cpu_release(const char *service) +{ + spin_lock(&service_mutex); + + if (service != current_service) { + printk(KERN_ERR "%s: attempt by %s to release while reserved by %s\n", + __FUNCTION__, service, current_service); + goto out; + } + + /* power down the counters */ + perfctr_cpu_reset(); + current_service = NULL; + release_pmc_hardware(); + + out: + spin_unlock(&service_mutex); +} diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64_tests.c linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64_tests.c --- linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64_tests.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64_tests.c 2005-03-31 23:37:37.000000000 +0200 @@ -0,0 +1,322 @@ +/* + * Performance-monitoring counters driver. + * Optional PPC64-specific init-time tests. + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * Copyright (C) 2004 Mikael Pettersson + */ +#include +#include +#include +#include +#include +#include +#include /* for tb_ticks_per_jiffy */ +#include "ppc64_tests.h" + +#define NITER 256 +#define X2(S) S"; "S +#define X8(S) X2(X2(X2(S))) + +static void __init do_read_tbl(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mftbl %0") : "=r"(dummy)); +} + +static void __init do_read_pmc1(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_PMC1)) : "=r"(dummy)); +} + +static void __init do_read_pmc2(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_PMC2)) : "=r"(dummy)); +} + +static void __init do_read_pmc3(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_PMC3)) : "=r"(dummy)); +} + +static void __init do_read_pmc4(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_PMC4)) : "=r"(dummy)); +} + +static void __init do_read_mmcr0(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_MMCR0)) : "=r"(dummy)); +} + +static void __init do_read_mmcr1(unsigned int unused) +{ + unsigned int i, dummy; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mfspr %0," __stringify(SPRN_MMCR1)) : "=r"(dummy)); +} + +static void __init do_write_pmc2(unsigned int arg) +{ + unsigned int i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mtspr " __stringify(SPRN_PMC2) ",%0") : : "r"(arg)); +} + +static void __init do_write_pmc3(unsigned int arg) +{ + unsigned int i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mtspr " __stringify(SPRN_PMC3) ",%0") : : "r"(arg)); +} + +static void __init do_write_pmc4(unsigned int arg) +{ + unsigned int i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mtspr " __stringify(SPRN_PMC4) ",%0") : : "r"(arg)); +} + +static void __init do_write_mmcr1(unsigned int arg) +{ + unsigned int i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mtspr " __stringify(SPRN_MMCR1) ",%0") : : "r"(arg)); +} + +static void __init do_write_mmcr0(unsigned int arg) +{ + unsigned int i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__(X8("mtspr " __stringify(SPRN_MMCR0) ",%0") : : "r"(arg)); +} + +static void __init do_empty_loop(unsigned int unused) +{ + unsigned i; + for(i = 0; i < NITER/8; ++i) + __asm__ __volatile__("" : : ); +} + +static unsigned __init run(void (*doit)(unsigned int), unsigned int arg) +{ + unsigned int start, stop; + start = mfspr(SPRN_PMC1); + (*doit)(arg); /* should take < 2^32 cycles to complete */ + stop = mfspr(SPRN_PMC1); + return stop - start; +} + +static void __init init_tests_message(void) +{ +#if 0 + printk(KERN_INFO "Please email the following PERFCTR INIT lines " + "to mikpe at csd.uu.se\n" + KERN_INFO "To remove this message, rebuild the driver " + "with CONFIG_PERFCTR_INIT_TESTS=n\n"); + printk(KERN_INFO "PERFCTR INIT: PVR 0x%08x, CPU clock %u kHz, TB clock %lu kHz\n", + pvr, + perfctr_info.cpu_khz, + tb_ticks_per_jiffy*(HZ/10)/(1000/10)); +#endif +} + +static void __init clear(void) +{ + mtspr(SPRN_MMCR0, 0); + mtspr(SPRN_MMCR1, 0); + mtspr(SPRN_MMCRA, 0); + mtspr(SPRN_PMC1, 0); + mtspr(SPRN_PMC2, 0); + mtspr(SPRN_PMC3, 0); + mtspr(SPRN_PMC4, 0); + mtspr(SPRN_PMC5, 0); + mtspr(SPRN_PMC6, 0); + mtspr(SPRN_PMC7, 0); + mtspr(SPRN_PMC8, 0); +} + +static void __init check_fcece(unsigned int pmc1ce) +{ + unsigned int mmcr0; + unsigned int pmc1; + int x = 0; + + /* JHE check out section 1.6.6.2 of the POWER5 pdf */ + + /* + * This test checks if MMCR0[FC] is set after PMC1 overflows + * when MMCR0[FCECE] is set. + * 74xx documentation states this behaviour, while documentation + * for 604/750 processors doesn't mention this at all. + * + * Also output the value of PMC1 shortly after the overflow. + * This tells us if PMC1 really was frozen. On 604/750, it may not + * freeze since we don't enable PMIs. [No freeze confirmed on 750.] + * + * When pmc1ce == 0, MMCR0[PMC1CE] is zero. It's unclear whether + * this masks all PMC1 overflow events or just PMC1 PMIs. + * + * PMC1 counts processor cycles, with 100 to go before overflowing. + * FCECE is set. + * PMC1CE is clear if !pmc1ce, otherwise set. + */ + pmc1 = mfspr(SPRN_PMC1); + + mtspr(SPRN_PMC1, 0x80000000-100); + mmcr0 = MMCR0_FCECE | MMCR0_SHRFC; + + if (pmc1ce) + mmcr0 |= MMCR0_PMC1CE; + + mtspr(SPRN_MMCR0, mmcr0); + + pmc1 = mfspr(SPRN_PMC1); + + do { + do_empty_loop(0); + + pmc1 = mfspr(SPRN_PMC1); + if (x++ > 20000000) { + break; + } + } while (!(mfspr(SPRN_PMC1) & 0x80000000)); + do_empty_loop(0); + + printk(KERN_INFO "PERFCTR INIT: %s(%u): MMCR0[FC] is %u, PMC1 is %#lx\n", + __FUNCTION__, pmc1ce, + !!(mfspr(SPRN_MMCR0) & MMCR0_FC), mfspr(SPRN_PMC1)); + mtspr(SPRN_MMCR0, 0); + mtspr(SPRN_PMC1, 0); +} + +static void __init check_trigger(unsigned int pmc1ce) +{ + unsigned int mmcr0; + unsigned int pmc1; + int x = 0; + + /* + * This test checks if MMCR0[TRIGGER] is reset after PMC1 overflows. + * 74xx documentation states this behaviour, while documentation + * for 604/750 processors doesn't mention this at all. + * [No reset confirmed on 750.] + * + * Also output the values of PMC1 and PMC2 shortly after the overflow. + * PMC2 should be equal to PMC1-0x80000000. + * + * When pmc1ce == 0, MMCR0[PMC1CE] is zero. It's unclear whether + * this masks all PMC1 overflow events or just PMC1 PMIs. + * + * PMC1 counts processor cycles, with 100 to go before overflowing. + * PMC2 counts processor cycles, starting from 0. + * TRIGGER is set, so PMC2 doesn't start until PMC1 overflows. + * PMC1CE is clear if !pmc1ce, otherwise set. + */ + mtspr(SPRN_PMC2, 0); + mtspr(SPRN_PMC1, 0x80000000-100); + mmcr0 = MMCR0_TRIGGER | MMCR0_SHRFC | MMCR0_FCHV; + + if (pmc1ce) + mmcr0 |= MMCR0_PMC1CE; + + mtspr(SPRN_MMCR0, mmcr0); + do { + do_empty_loop(0); + pmc1 = mfspr(SPRN_PMC1); + if (x++ > 20000000) { + break; + } + + } while (!(mfspr(SPRN_PMC1) & 0x80000000)); + do_empty_loop(0); + printk(KERN_INFO "PERFCTR INIT: %s(%u): MMCR0[TRIGGER] is %u, PMC1 is %#lx, PMC2 is %#lx\n", + __FUNCTION__, pmc1ce, + !!(mfspr(SPRN_MMCR0) & MMCR0_TRIGGER), mfspr(SPRN_PMC1), mfspr(SPRN_PMC2)); + mtspr(SPRN_MMCR0, 0); + mtspr(SPRN_PMC1, 0); + mtspr(SPRN_PMC2, 0); +} + +static void __init measure_overheads(void) +{ + int i; + unsigned int mmcr0, loop, ticks[12]; + const char *name[12]; + + clear(); + + /* PMC1 = "processor cycles", + PMC2 = "completed instructions", + not disabled in any mode, + no interrupts */ + /* mmcr0 = (0x01 << 6) | (0x02 << 0); */ + mmcr0 = MMCR0_SHRFC | MMCR0_FCWAIT; + mtspr(SPRN_MMCR0, mmcr0); + + name[0] = "mftbl"; + ticks[0] = run(do_read_tbl, 0); + name[1] = "mfspr (pmc1)"; + ticks[1] = run(do_read_pmc1, 0); + name[2] = "mfspr (pmc2)"; + ticks[2] = run(do_read_pmc2, 0); + name[3] = "mfspr (pmc3)"; + ticks[3] = run(do_read_pmc3, 0); + name[4] = "mfspr (pmc4)"; + ticks[4] = run(do_read_pmc4, 0); + name[5] = "mfspr (mmcr0)"; + ticks[5] = run(do_read_mmcr0, 0); + name[6] = "mfspr (mmcr1)"; + ticks[6] = run(do_read_mmcr1, 0); + name[7] = "mtspr (pmc2)"; + ticks[7] = run(do_write_pmc2, 0); + name[8] = "mtspr (pmc3)"; + ticks[8] = run(do_write_pmc3, 0); + name[9] = "mtspr (pmc4)"; + ticks[9] = run(do_write_pmc4, 0); + name[10] = "mtspr (mmcr1)"; + ticks[10] = run(do_write_mmcr1, 0); + name[11] = "mtspr (mmcr0)"; + ticks[11] = run(do_write_mmcr0, mmcr0); + + loop = run(do_empty_loop, 0); + + clear(); + + init_tests_message(); + printk(KERN_INFO "PERFCTR INIT: NITER == %u\n", NITER); + printk(KERN_INFO "PERFCTR INIT: loop overhead is %u cycles\n", loop); + for(i = 0; i < ARRAY_SIZE(ticks); ++i) { + unsigned int x; + if (!ticks[i]) + continue; + x = ((ticks[i] - loop) * 10) / NITER; + printk(KERN_INFO "PERFCTR INIT: %s cost is %u.%u cycles (%u total)\n", + name[i], x/10, x%10, ticks[i]); + } + + check_fcece(0); +#if 0 + check_fcece(1); + check_trigger(0); + check_trigger(1); +#endif +} + +void __init perfctr_ppc64_init_tests(void) +{ + preempt_disable(); + measure_overheads(); + preempt_enable(); +} diff -rupN linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64_tests.h linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64_tests.h --- linux-2.6.12-rc1-mm4/drivers/perfctr/ppc64_tests.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/drivers/perfctr/ppc64_tests.h 2005-03-31 23:37:37.000000000 +0200 @@ -0,0 +1,12 @@ +/* + * Performance-monitoring counters driver. + * Optional PPC32-specific init-time tests. + * + * Copyright (C) 2004 Mikael Pettersson + */ + +#ifdef CONFIG_PERFCTR_INIT_TESTS +extern void perfctr_ppc64_init_tests(void); +#else +static inline void perfctr_ppc64_init_tests(void) { } +#endif diff -rupN linux-2.6.12-rc1-mm4/include/asm-ppc64/perfctr.h linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/include/asm-ppc64/perfctr.h --- linux-2.6.12-rc1-mm4/include/asm-ppc64/perfctr.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.12-rc1-mm4.perfctr-ppc64-driver/include/asm-ppc64/perfctr.h 2005-03-31 23:37:37.000000000 +0200 @@ -0,0 +1,166 @@ +/* + * PPC64 Performance-Monitoring Counters driver + * + * Copyright (C) 2004 David Gibson, IBM Corporation. + * Copyright (C) 2004 Mikael Pettersson + */ +#ifndef _ASM_PPC64_PERFCTR_H +#define _ASM_PPC64_PERFCTR_H + +#include + +struct perfctr_sum_ctrs { + __u64 tsc; + __u64 pmc[8]; /* the size is not part of the user ABI */ +}; + +struct perfctr_cpu_control_header { + __u32 tsc_on; + __u32 nractrs; /* number of accumulation-mode counters */ + __u32 nrictrs; /* number of interrupt-mode counters */ +}; + +struct perfctr_cpu_state_user { + __u32 cstatus; + /* The two tsc fields must be inlined. Placing them in a + sub-struct causes unwanted internal padding on x86-64. */ + __u32 tsc_start; + __u64 tsc_sum; + struct { + __u32 map; + __u32 start; + __u64 sum; + } pmc[8]; /* the size is not part of the user ABI */ +}; + +/* cstatus is a re-encoding of control.tsc_on/nractrs/nrictrs + which should have less overhead in most cases */ +/* XXX: ppc driver internally also uses cstatus&(1<<30) */ + +static inline +unsigned int perfctr_mk_cstatus(unsigned int tsc_on, unsigned int nractrs, + unsigned int nrictrs) +{ + return (tsc_on<<31) | (nrictrs<<16) | ((nractrs+nrictrs)<<8) | nractrs; +} + +static inline unsigned int perfctr_cstatus_enabled(unsigned int cstatus) +{ + return cstatus; +} + +static inline int perfctr_cstatus_has_tsc(unsigned int cstatus) +{ + return (int)cstatus < 0; /* test and jump on sign */ +} + +static inline unsigned int perfctr_cstatus_nractrs(unsigned int cstatus) +{ + return cstatus & 0x7F; /* and with imm8 */ +} + +static inline unsigned int perfctr_cstatus_nrctrs(unsigned int cstatus) +{ + return (cstatus >> 8) & 0x7F; +} + +static inline unsigned int perfctr_cstatus_has_ictrs(unsigned int cstatus) +{ + return cstatus & (0x7F << 16); +} + +/* + * 'struct siginfo' support for perfctr overflow signals. + * In unbuffered mode, si_code is set to SI_PMC_OVF and a bitmask + * describing which perfctrs overflowed is put in si_pmc_ovf_mask. + * A bitmask is used since more than one perfctr can have overflowed + * by the time the interrupt handler runs. + */ +#define SI_PMC_OVF -8 +#define si_pmc_ovf_mask _sifields._pad[0] /* XXX: use an unsigned field later */ + +#ifdef __KERNEL__ + +#if defined(CONFIG_PERFCTR) + +struct perfctr_cpu_control { + struct perfctr_cpu_control_header header; + u64 mmcr0; + u64 mmcr1; + u64 mmcra; + unsigned int ireset[8]; /* [0,0x7fffffff], for i-mode counters, physical indices */ + unsigned int pmc_map[8]; /* virtual to physical index map */ +}; + +struct perfctr_cpu_state { + /* Don't change field order here without first considering the number + of cache lines touched during sampling and context switching. */ + unsigned int id; + int isuspend_cpu; + struct perfctr_cpu_state_user user; + unsigned int unused_pmcs; + struct perfctr_cpu_control control; +}; + +/* Driver init/exit. */ +extern int perfctr_cpu_init(void); +extern void perfctr_cpu_exit(void); + +/* CPU type name. */ +extern char *perfctr_cpu_name; + +/* Hardware reservation. */ +extern const char *perfctr_cpu_reserve(const char *service); +extern void perfctr_cpu_release(const char *service); + +/* PRE: state has no running interrupt-mode counters. + Check that the new control data is valid. + Update the driver's private control data. + Returns a negative error code if the control data is invalid. */ +extern int perfctr_cpu_update_control(struct perfctr_cpu_state *state, int is_global); + +/* Parse and update control for the given domain. */ +extern int perfctr_cpu_control_write(struct perfctr_cpu_control *control, + unsigned int domain, + const void *srcp, unsigned int srcbytes); + +/* Retrieve and format control for the given domain. + Returns number of bytes written. */ +extern int perfctr_cpu_control_read(const struct perfctr_cpu_control *control, + unsigned int domain, + void *dstp, unsigned int dstbytes); + +/* Read a-mode counters. Subtract from start and accumulate into sums. + Must be called with preemption disabled. */ +extern void perfctr_cpu_suspend(struct perfctr_cpu_state *state); + +/* Write control registers. Read a-mode counters into start. + Must be called with preemption disabled. */ +extern void perfctr_cpu_resume(struct perfctr_cpu_state *state); + +/* Perform an efficient combined suspend/resume operation. + Must be called with preemption disabled. */ +extern void perfctr_cpu_sample(struct perfctr_cpu_state *state); + +/* The type of a perfctr overflow interrupt handler. + It will be called in IRQ context, with preemption disabled. */ +typedef void (*perfctr_ihandler_t)(unsigned long pc); + +/* Operations related to overflow interrupt handling. */ +#ifdef CONFIG_PERFCTR_INTERRUPT_SUPPORT +extern void perfctr_cpu_set_ihandler(perfctr_ihandler_t); +extern void perfctr_cpu_ireload(struct perfctr_cpu_state*); +extern unsigned int perfctr_cpu_identify_overflow(struct perfctr_cpu_state*); +#else +static inline void perfctr_cpu_set_ihandler(perfctr_ihandler_t x) { } +#endif +static inline int perfctr_cpu_has_pending_interrupt(const struct perfctr_cpu_state *state) +{ + return 0; +} + +#endif /* CONFIG_PERFCTR */ + +#endif /* __KERNEL__ */ + +#endif /* _ASM_PPC64_PERFCTR_H */ From akpm at osdl.org Fri Apr 1 09:11:29 2005 From: akpm at osdl.org (Andrew Morton) Date: Thu, 31 Mar 2005 15:11:29 -0800 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> Message-ID: <20050331151129.279b0618.akpm@osdl.org> Mikael Pettersson wrote: > > Here's a 3-part patch kit which adds a ppc64 driver to perfctr, > written by David Gibson . Well that seems like progress. Where do we feel that we stand wrt preparedness for merging all this up? From david at gibson.dropbear.id.au Fri Apr 1 09:49:40 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 1 Apr 2005 09:49:40 +1000 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <20050331151129.279b0618.akpm@osdl.org> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> <20050331151129.279b0618.akpm@osdl.org> Message-ID: <20050331234940.GA21676@localhost.localdomain> On Thu, Mar 31, 2005 at 03:11:29PM -0800, Andrew Morton wrote: > Mikael Pettersson wrote: > > > > Here's a 3-part patch kit which adds a ppc64 driver to perfctr, > > written by David Gibson . > > Well that seems like progress. Where do we feel that we stand wrt > preparedness for merging all this up? I'm still uneasy about it. There were sufficient changes made getting this one ready to go that I'm not confident there aren't more important things to be found. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From akpm at osdl.org Fri Apr 1 11:33:02 2005 From: akpm at osdl.org (Andrew Morton) Date: Thu, 31 Mar 2005 17:33:02 -0800 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <20050331234940.GA21676@localhost.localdomain> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> <20050331151129.279b0618.akpm@osdl.org> <20050331234940.GA21676@localhost.localdomain> Message-ID: <20050331173302.3ec64e59.akpm@osdl.org> David Gibson wrote: > > On Thu, Mar 31, 2005 at 03:11:29PM -0800, Andrew Morton wrote: > > Mikael Pettersson wrote: > > > > > > Here's a 3-part patch kit which adds a ppc64 driver to perfctr, > > > written by David Gibson . > > > > Well that seems like progress. Where do we feel that we stand wrt > > preparedness for merging all this up? > > I'm still uneasy about it. There were sufficient changes made getting > this one ready to go that I'm not confident there aren't more > important things to be found. That's a bit open-ended. How do we determine whether more things will be needed? How do we know when we're done? From grundler at parisc-linux.org Fri Apr 1 16:08:34 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Thu, 31 Mar 2005 23:08:34 -0700 Subject: Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050331200622.GG15596@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> <20050226063609.GC7036@colo.lackof.org> <20050321231028.GV498@austin.ibm.com> <20050322175728.GE12675@colo.lackof.org> <20050331200622.GG15596@austin.ibm.com> Message-ID: <20050401060834.GB29734@colo.lackof.org> On Thu, Mar 31, 2005 at 02:06:22PM -0600, Linas Vepstas wrote: > > Does this process cause a SCSI bus reset? > > Don't get a chance to get that far. Have to bring up the PCI interfaces > first, before any scsi command can be issued. My point is you want the scsi bus to get reset so devices drop all pending IO and stop trying to tell you how much work they've done. I thought this was possible by banging on registers in the 53c8xx chips. > > BTW, when did sym2 get a chance to cleanup "pending" requests? > > Yes, the sym2 driver has mechanisms for that. Uhm, *when*? It wasn't clear from your previous description. I would take care of this *before* trying to get the card back on it's feet. > > You want everything moved back to the "queued" state or failed > > (flush pending IO so upper layers can retry if they want). > > Upper layer is the linux block device; my understanding is that it does > not retry, nor do the filesystems above that. Passing errors upwards > seems to be pretty darned fatal. My goal is to limit retries to the > driver. That's a bad idea. Been there done that. Upper layers can be alot smarter about retries than the driver ever could be. While the driver knows more about the transport and why someting might fail, upper layers will know alternate pathes to the same devices or to the same data on different devices. Upper layers also set the recovery policy for particular storage. Trying to do recovery transperently in the drivers is going to also mess up other high level SW like Service Guard or LifeKeeper. They want to know when a path has failed, log it, and make sure someone gets sent to service the HW if threshholds are exceeded. Let higher layers like dm, VxFS, LVM worry about recovery. > > > Sometimes, I get the PCI error while the card is sitting there idly > > > after the #RST, but more often, I get the error in sym_chip_reset(), > > > immediately after the OUTB (nc_istat, SRST); > > > > Oh? Is this the driver trying to issue SCSI Reset? > > No I am trying to reinitialize the scsi card after the pci bus has been > reset. This has nothing to do with scsi bus resets, as far as I know > ... Ok. Sounds like the card hasn't yet recovered from the PCI Bus reset. I don't know enough about programming 53c8xx chips to tell you where in the process it's dying or why. If you collect traces of which registers get read/written before it dies again, that would a necessary step in for whoever tries to sort this out. hth, grant From grundler at parisc-linux.org Fri Apr 1 16:15:08 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Thu, 31 Mar 2005 23:15:08 -0700 Subject: Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050331201409.GH15596@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> <20050226063609.GC7036@colo.lackof.org> <20050321231028.GV498@austin.ibm.com> <4240581C.1000906@us.ibm.com> <20050331201409.GH15596@austin.ibm.com> Message-ID: <20050401061508.GC29734@colo.lackof.org> On Thu, Mar 31, 2005 at 02:14:09PM -0600, Linas Vepstas wrote: > > What config registers are you restoring? > > BAR's, grant, latency, interrupt, cacheline size. "grant" is PCI_COMMAND? If so, I think you have all of them. You may want to leave BUS_MASTER disabled until you think the driver is in a state where it needs to do DMA again. E.g. before kicking off the scripts engine. > > helps, or you could try power cycling the slot instead of using PCI reset. > > yes I could :( I'll try that next. Problem is, not all slots are > power-cyclable, only the hotplug slots are. I've discoverd that > for example, the ethernet chips are soldered to the motherboard, and > can't be power-cycled (but fortunately, those don't give me trouble). They can if the NIC driver doesn't deal with programming the phy properly. We had a problem with tg3 because of that in the past. The phy doesn't get reset as part of the PCI Bus RESET. grant From mikpe at csd.uu.se Fri Apr 1 22:46:53 2005 From: mikpe at csd.uu.se (Mikael Pettersson) Date: Fri, 1 Apr 2005 14:46:53 +0200 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <20050331173302.3ec64e59.akpm@osdl.org> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> <20050331151129.279b0618.akpm@osdl.org> <20050331234940.GA21676@localhost.localdomain> <20050331173302.3ec64e59.akpm@osdl.org> Message-ID: <16973.17085.561804.567539@alkaid.it.uu.se> Andrew Morton writes: > David Gibson wrote: > > > > On Thu, Mar 31, 2005 at 03:11:29PM -0800, Andrew Morton wrote: > > > Mikael Pettersson wrote: > > > > > > > > Here's a 3-part patch kit which adds a ppc64 driver to perfctr, > > > > written by David Gibson . > > > > > > Well that seems like progress. Where do we feel that we stand wrt > > > preparedness for merging all this up? > > > > I'm still uneasy about it. There were sufficient changes made getting > > this one ready to go that I'm not confident there aren't more > > important things to be found. > > That's a bit open-ended. How do we determine whether more things will be > needed? How do we know when we're done? I have two planned changes that will be done RSN: - On x86/x86-64, user-space uses the mmap()ed state's TSC start value as a way to detect if a user-space sampling operation (which needs to be "virtually atomic") was preempted by the kernel. On ppc{32,64} we've used the TB for the same thing up to now, but that doesn't quite work because the TB is about a magnitude or two too slow. So the plan is to change ppc to store a software generation counter in the mmap()ed state, and change the ppc user-space to check that one instead. - Move common stuff to . In addition, there is one unresolved issue: - A counter's value is represented by a 64-bit software sum, a 32-bit start value containing the HW counter's value at the start of the current time slice, and the current HW counter's value (now). The actual value is computed as sum + (now - start). This is reflected in the mmap()ed state, which contains a variable- length { u32 map; u32 start; u64 sum; } pmc[] array. This layout is very cache-efficient on current 32 and 64-bit CPUs, but there is a _possible_ concern that it won't do on 10+ GHz CPUs. So the question is, should we change it to use 64-bit start values already now (and take more cache misses), or should that wait a few years until it becomes a necessity (causing ABI change issues)? /Mikael From will_schmidt at vnet.ibm.com Sat Apr 2 00:05:47 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Fri, 01 Apr 2005 08:05:47 -0600 Subject: RFC/Patch more xmon additions In-Reply-To: <20050331202148.GI15596@austin.ibm.com> References: <421E3BE3.90301@vnet.ibm.com> <16936.10223.704710.234312@cargo.ozlabs.ibm.com> <20050331202148.GI15596@austin.ibm.com> Message-ID: <424D553B.60306@vnet.ibm.com> Linas Vepstas wrote: > Hi Will, > > I just unearthed this email from the deep mound ... > > On Fri, Mar 04, 2005 at 08:18:39PM +1100, Paul Mackerras was heard to remark: > >>will schmidt writes: >> >> >>>Am looking for comments on this additional function i've added to xmon >>>on the side.. >>> >>>the bulk of my intent was to make it easier for me to poke at memory >>>within a particular user process. >> >>The main problem I have with it is that we seem to be accessing a lot >>of kernel data structures without checking any pointers or using >>mread() to read the memory safely. One of the goals of xmon is that >>it should be as reliable as possible even if kernel data structures >>are corrupted, and I think your patch would reduce that reliability. > > > Please clean up per Paul's suggestions and resubmit; as a matter of principle, > its nice to have the debugger print parsed output instead of having to count 289 > bytes into some struct task or such to manually decode a bitflag ... > > --linas YeAh, it's still on the ToDO list. From brking at us.ibm.com Sat Apr 2 01:27:22 2005 From: brking at us.ibm.com (Brian King) Date: Fri, 01 Apr 2005 09:27:22 -0600 Subject: Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)] In-Reply-To: <20050401060834.GB29734@colo.lackof.org> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <1109207532.5384.32.camel@gaston> <20050224013137.GF2088@austin.ibm.com> <20050226063609.GC7036@colo.lackof.org> <20050321231028.GV498@austin.ibm.com> <20050322175728.GE12675@colo.lackof.org> <20050331200622.GG15596@austin.ibm.com> <20050401060834.GB29734@colo.lackof.org> Message-ID: <424D685A.6070505@us.ibm.com> Grant Grundler wrote: >>>You want everything moved back to the "queued" state or failed >>>(flush pending IO so upper layers can retry if they want). >> >>Upper layer is the linux block device; my understanding is that it does >>not retry, nor do the filesystems above that. Passing errors upwards >>seems to be pretty darned fatal. My goal is to limit retries to the >>driver. > > > That's a bad idea. Been there done that. > > Upper layers can be alot smarter about retries than the driver ever > could be. While the driver knows more about the transport and why > someting might fail, upper layers will know alternate pathes > to the same devices or to the same data on different devices. > Upper layers also set the recovery policy for particular storage. > > Trying to do recovery transperently in the drivers is going to also > mess up other high level SW like Service Guard or LifeKeeper. > They want to know when a path has failed, log it, and make sure > someone gets sent to service the HW if threshholds are exceeded. > > Let higher layers like dm, VxFS, LVM worry about recovery. The sym2 driver should fail everything back with DID_ERROR. In most cases, the scsi midlayer will retry if the upper layer allows retries and you will get the behavior you desire. If retries are not allowed, like for a tape device, the command will get failed back to the upper layer driver. -- Brian King eServer Storage I/O IBM Linux Technology Center From akpm at osdl.org Sat Apr 2 04:25:14 2005 From: akpm at osdl.org (Andrew Morton) Date: Fri, 1 Apr 2005 10:25:14 -0800 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <16973.17085.561804.567539@alkaid.it.uu.se> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> <20050331151129.279b0618.akpm@osdl.org> <20050331234940.GA21676@localhost.localdomain> <20050331173302.3ec64e59.akpm@osdl.org> <16973.17085.561804.567539@alkaid.it.uu.se> Message-ID: <20050401102514.505ad059.akpm@osdl.org> Mikael Pettersson wrote: > > In addition, there is one unresolved issue: > - A counter's value is represented by a 64-bit software sum, > a 32-bit start value containing the HW counter's value at the > start of the current time slice, and the current HW counter's value > (now). The actual value is computed as sum + (now - start). > This is reflected in the mmap()ed state, which contains a variable- > length { u32 map; u32 start; u64 sum; } pmc[] array. > This layout is very cache-efficient on current 32 and 64-bit CPUs, > but there is a _possible_ concern that it won't do on 10+ GHz CPUs. > So the question is, should we change it to use 64-bit start values > already now (and take more cache misses), or should that wait a few > years until it becomes a necessity (causing ABI change issues)? I'd be inclined to make the change now, personally. ABI changes are a pain for everyone. From benh at kernel.crashing.org Sun Apr 3 11:16:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Apr 2005 11:16:28 +1000 Subject: [PATCH] ppc64: Fix boot memory corruption Message-ID: <1112490989.6577.255.camel@gaston> Hi ! Nathan's patch "make OF node fixup code usable at runtim" is introducing a snaky bug. We do 2 passes over this code, one to measure how much memory will be needed so we can allocate a single block, and one to do the actual fixup. However, the new code does some result-checking of prom_alloc() which breaks this mecanism, as the first pass always starts at "0", thus we fail to measure the additional size properly and allocate a block smaller than what we'll actually use for the fixup. This cause us to override whatever sits there, with variable results depending on the memory layout of the machine (but typically crashes). This patch fixes it by starting the "measure" pass with an initial size set to 16 and not 0. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-04-03 10:02:55.000000000 +1000 +++ linux-work/arch/ppc64/kernel/prom.c 2005-04-03 11:08:18.000000000 +1000 @@ -601,8 +601,19 @@ /* Initialize virtual IRQ map */ virt_irq_init(); - /* Finish device-tree (pre-parsing some properties etc...) */ + /* + * Finish device-tree (pre-parsing some properties etc...) + * We do this in 2 passes. One with "measure_only" set, which + * will only measure the amount of memory needed, then we can + * allocate that memory, and call finish_node again. However, + * we must be careful as most routines will fail nowadays when + * prom_alloc() returns 0, so we must make sure our first pass + * doesn't start at 0. We pre-initialize size to 16 for that + * reason and then remove those additional 16 bytes + */ + size = 16; finish_node(allnodes, &size, NULL, 0, 0, 1); + size -= 16; end = start = (unsigned long)abs_to_virt(lmb_alloc(size, 128)); finish_node(allnodes, &end, NULL, 0, 0, 0); BUG_ON(end != start + size); From david at gibson.dropbear.id.au Mon Apr 4 13:25:19 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Mon, 4 Apr 2005 13:25:19 +1000 Subject: [PATCH 2.6.12-rc1-mm5 1/3] perfctr: ppc64 arch hooks In-Reply-To: <16973.17085.561804.567539@alkaid.it.uu.se> References: <200503312207.j2VM7YUI011924@alkaid.it.uu.se> <20050331151129.279b0618.akpm@osdl.org> <20050331234940.GA21676@localhost.localdomain> <20050331173302.3ec64e59.akpm@osdl.org> <16973.17085.561804.567539@alkaid.it.uu.se> Message-ID: <20050404032519.GB29805@localhost.localdomain> On Fri, Apr 01, 2005 at 02:46:53PM +0200, Mikael Pettersson wrote: > Andrew Morton writes: > > David Gibson wrote: > > > > > > On Thu, Mar 31, 2005 at 03:11:29PM -0800, Andrew Morton wrote: > > > > Mikael Pettersson wrote: > > > > > > > > > > Here's a 3-part patch kit which adds a ppc64 driver to perfctr, > > > > > written by David Gibson . > > > > > > > > Well that seems like progress. Where do we feel that we stand wrt > > > > preparedness for merging all this up? > > > > > > I'm still uneasy about it. There were sufficient changes made getting > > > this one ready to go that I'm not confident there aren't more > > > important things to be found. > > > > That's a bit open-ended. How do we determine whether more things will be > > needed? How do we know when we're done? > > I have two planned changes that will be done RSN: > - On x86/x86-64, user-space uses the mmap()ed state's TSC start > value as a way to detect if a user-space sampling operation > (which needs to be "virtually atomic") was preempted by the kernel. > On ppc{32,64} we've used the TB for the same thing up to now, > but that doesn't quite work because the TB is about a magnitude > or two too slow. So the plan is to change ppc to store a > software generation counter in the mmap()ed state, and change > the ppc user-space to check that one instead. If we're going to do it for ppc, we might as well do it for all platforms. That gets us one step closer to eliminating cstatus from the user visible stuff, too, which I think should be done. > - Move common stuff to . > > In addition, there is one unresolved issue: > - A counter's value is represented by a 64-bit software sum, > a 32-bit start value containing the HW counter's value at the > start of the current time slice, and the current HW counter's value > (now). The actual value is computed as sum + (now - start). > This is reflected in the mmap()ed state, which contains a variable- > length { u32 map; u32 start; u64 sum; } pmc[] array. > This layout is very cache-efficient on current 32 and 64-bit CPUs, > but there is a _possible_ concern that it won't do on 10+ GHz CPUs. > So the question is, should we change it to use 64-bit start values > already now (and take more cache misses), or should that wait a few > years until it becomes a necessity (causing ABI change issues)? Is there any way we could rearrange the user visible stuff to not include the 'map' field? After all userspace set up the counters, so it ought to know what the mapping is already... That would mean we could fit in a 64-bit start value without having to mess around to get good alignment. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Mon Apr 4 17:28:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Apr 2005 17:28:01 +1000 Subject: [PATCH] ppc64: Fix semantics of __ioremap Message-ID: <1112599682.26085.35.camel@gaston> Hi ! This patch fixes ppc64 __ioremap() so that it stops adding implicitely _PAGE_GUARDED when the cache is not writeback, and instead, let the callers provide the flag they want here. This allows things like framebuffers to explicitely request a non-cacheable and non-guarded mapping which is more efficient for that type of memory without side effects. The patch also fixes all current callers to add _PAGE_GUARDED except btext, which is fine without it. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pSeries_setup.c 2005-03-10 13:43:01.000000000 +1100 +++ linux-work/arch/ppc64/kernel/pSeries_setup.c 2005-04-04 17:18:34.000000000 +1000 @@ -363,7 +363,7 @@ find_udbg_vterm(); else if (physport) { /* Map the uart for udbg. */ - comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); + comport = (void *)ioremap(physport, 16); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; Index: linux-work/arch/ppc64/kernel/maple_setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/maple_setup.c 2005-01-31 14:18:14.000000000 +1100 +++ linux-work/arch/ppc64/kernel/maple_setup.c 2005-04-04 17:18:49.000000000 +1000 @@ -142,7 +142,7 @@ if (physport) { void *comport; /* Map the uart for udbg. */ - comport = (void *)__ioremap(physport, 16, _PAGE_NO_CACHE); + comport = (void *)ioremap(physport, 16); udbg_init_uart(comport, default_speed); ppc_md.udbg_putc = udbg_putc; Index: linux-work/arch/ppc64/kernel/pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pci.c 2005-04-03 10:02:55.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pci.c 2005-04-04 17:18:05.000000000 +1000 @@ -547,8 +547,9 @@ if (range == NULL || (rlen < sizeof(struct isa_range))) { printk(KERN_ERR "no ISA ranges or unexpected isa range size," "mapping 64k\n"); - __ioremap_explicit(phb_io_base_phys, (unsigned long)phb_io_base_virt, - 0x10000, _PAGE_NO_CACHE); + __ioremap_explicit(phb_io_base_phys, + (unsigned long)phb_io_base_virt, + 0x10000, _PAGE_NO_CACHE | _PAGE_GUARDED); return; } @@ -576,7 +577,7 @@ __ioremap_explicit(phb_io_base_phys, (unsigned long) phb_io_base_virt, - size, _PAGE_NO_CACHE); + size, _PAGE_NO_CACHE | _PAGE_GUARDED); } } @@ -692,7 +693,7 @@ struct resource *res; hose->io_base_virt = __ioremap(hose->io_base_phys, size, - _PAGE_NO_CACHE); + _PAGE_NO_CACHE | _PAGE_GUARDED); DBG("phb%d io_base_phys 0x%lx io_base_virt 0x%lx\n", hose->global_number, hose->io_base_phys, (unsigned long) hose->io_base_virt); @@ -780,7 +781,8 @@ if (get_bus_io_range(bus, &start_phys, &start_virt, &size)) return 1; printk("mapping IO %lx -> %lx, size: %lx\n", start_phys, start_virt, size); - if (__ioremap_explicit(start_phys, start_virt, size, _PAGE_NO_CACHE)) + if (__ioremap_explicit(start_phys, start_virt, size, + _PAGE_NO_CACHE | _PAGE_GUARDED)) return 1; return 0; Index: linux-work/arch/ppc64/mm/init.c =================================================================== --- linux-work.orig/arch/ppc64/mm/init.c 2005-04-03 10:02:55.000000000 +1000 +++ linux-work/arch/ppc64/mm/init.c 2005-04-04 17:17:01.000000000 +1000 @@ -155,7 +155,8 @@ ptep = pte_alloc_kernel(&ioremap_mm, pmdp, ea); pa = abs_to_phys(pa); - set_pte_at(&ioremap_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, __pgprot(flags))); + set_pte_at(&ioremap_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, + __pgprot(flags))); spin_unlock(&ioremap_mm.page_table_lock); } else { unsigned long va, vpn, hash, hpteg; @@ -191,12 +192,9 @@ if ((flags & _PAGE_PRESENT) == 0) flags |= pgprot_val(PAGE_KERNEL); - if (flags & (_PAGE_NO_CACHE | _PAGE_WRITETHRU)) - flags |= _PAGE_GUARDED; - for (i = 0; i < size; i += PAGE_SIZE) { + for (i = 0; i < size; i += PAGE_SIZE) map_io_page(ea+i, pa+i, flags); - } return (void __iomem *) (ea + (addr & ~PAGE_MASK)); } @@ -205,7 +203,7 @@ void __iomem * ioremap(unsigned long addr, unsigned long size) { - return __ioremap(addr, size, _PAGE_NO_CACHE); + return __ioremap(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED); } void __iomem * @@ -272,7 +270,8 @@ return 1; } if (ea != (unsigned long) area->addr) { - printk(KERN_ERR "unexpected addr return from im_get_area\n"); + printk(KERN_ERR "unexpected addr return from " + "im_get_area\n"); return 1; } } @@ -315,7 +314,8 @@ continue; if (pte_present(page)) continue; - printk(KERN_CRIT "Whee.. Swapped out page in kernel page table\n"); + printk(KERN_CRIT "Whee.. Swapped out page in kernel page" + " table\n"); } while (address < end); } @@ -352,7 +352,7 @@ * Access to IO memory should be serialized by driver. * This code is modeled after vmalloc code - unmap_vm_area() * - * XXX what about calls before mem_init_done (ie python_countermeasures()) + * XXX what about calls before mem_init_done (ie python_countermeasures()) */ void iounmap(volatile void __iomem *token) { From benh at kernel.crashing.org Tue Apr 5 16:40:57 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Apr 2005 16:40:57 +1000 Subject: [PATCH] ppc32: Fix AGP and sleep again Message-ID: <1112683258.9567.13.camel@gaston> Hi ! My previous patch that added sleep support for uninorth-agp and some AGP "off" stuff in radeonfb and aty128fb is breaking some configs. More specifically, it has problems with rage128 setups since the DRI code for these in X doesn't properly re-enable AGP on wakeup or console switch (unlike the radeon DRM). This patch fixes the problem for pmac once for all by using a different approach. The AGP driver "registers" special suspend/resume callbacks with some arch code that the fbdev's can later on call to suspend and resume AGP, making sure it's resumed back in the same state it was when suspended. This is platform specific for now. It would be too complicated to try to do a generic implementation of this at this point due to all sort of weird things going on with AGP on other architectures. We'll re-work that whole problem cleanly once we finally merge fbdev's and DRI. In the meantime, please apply this patch which brings back some r128 based laptops into working condition as far as system sleep is concerned. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/drivers/char/agp/uninorth-agp.c =================================================================== --- linux-work.orig/drivers/char/agp/uninorth-agp.c 2005-03-15 11:57:17.000000000 +1100 +++ linux-work/drivers/char/agp/uninorth-agp.c 2005-04-05 15:20:29.000000000 +1000 @@ -10,6 +10,7 @@ #include #include #include +#include #include "agp.h" /* @@ -26,6 +27,7 @@ static int uninorth_rev; static int is_u3; + static int uninorth_fetch_size(void) { int i; @@ -264,7 +266,8 @@ &scratch); } while ((scratch & PCI_AGP_COMMAND_AGP) == 0 && ++timeout < 1000); if ((scratch & PCI_AGP_COMMAND_AGP) == 0) - printk(KERN_ERR PFX "failed to write UniNorth AGP command reg\n"); + printk(KERN_ERR PFX "failed to write UniNorth AGP" + " command register\n"); if (uninorth_rev >= 0x30) { /* This is an AGP V3 */ @@ -278,13 +281,24 @@ } #ifdef CONFIG_PM -static int agp_uninorth_suspend(struct pci_dev *pdev, pm_message_t state) +/* + * These Power Management routines are _not_ called by the normal PCI PM layer, + * but directly by the video driver through function pointers in the device + * tree. + */ +static int agp_uninorth_suspend(struct pci_dev *pdev) { + struct agp_bridge_data *bridge; u32 cmd; u8 agp; struct pci_dev *device = NULL; - if (state != PMSG_SUSPEND) + bridge = agp_find_bridge(pdev); + if (bridge == NULL) + return -ENODEV; + + /* Only one suspend supported */ + if (bridge->dev_private_data) return 0; /* turn off AGP on the video chip, if it was enabled */ @@ -309,12 +323,13 @@ printk("uninorth-agp: disabling AGP on device %s\n", pci_name(device)); cmd &= ~PCI_AGP_COMMAND_AGP; - pci_write_config_dword(device, agp + PCI_AGP_COMMAND, cmd); + pci_write_config_dword(device, agp + PCI_AGP_COMMAND, cmd); } /* turn off AGP on the bridge */ agp = pci_find_capability(pdev, PCI_CAP_ID_AGP); pci_read_config_dword(pdev, agp + PCI_AGP_COMMAND, &cmd); + bridge->dev_private_data = (void *)cmd; if (cmd & PCI_AGP_COMMAND_AGP) { printk("uninorth-agp: disabling AGP on bridge %s\n", pci_name(pdev)); @@ -329,9 +344,23 @@ static int agp_uninorth_resume(struct pci_dev *pdev) { + struct agp_bridge_data *bridge; + u32 command; + + bridge = agp_find_bridge(pdev); + if (bridge == NULL) + return -ENODEV; + + command = (u32)bridge->dev_private_data; + bridge->dev_private_data = NULL; + if (!(command & PCI_AGP_COMMAND_AGP)) + return 0; + + uninorth_agp_enable(bridge, command); + return 0; } -#endif +#endif /* CONFIG_PM */ static int uninorth_create_gatt_table(struct agp_bridge_data *bridge) { @@ -575,6 +604,12 @@ of_node_put(uninorth_node); } +#ifdef CONFIG_PM + /* Inform platform of our suspend/resume caps */ + pmac_register_agp_pm(pdev, agp_uninorth_suspend, agp_uninorth_resume); +#endif + + /* Allocate & setup our driver */ bridge = agp_alloc_bridge(); if (!bridge) return -ENOMEM; @@ -599,6 +634,11 @@ { struct agp_bridge_data *bridge = pci_get_drvdata(pdev); +#ifdef CONFIG_PM + /* Inform platform of our suspend/resume caps */ + pmac_register_agp_pm(pdev, NULL, NULL); +#endif + agp_remove_bridge(bridge); agp_put_bridge(bridge); } @@ -622,10 +662,6 @@ .id_table = agp_uninorth_pci_table, .probe = agp_uninorth_probe, .remove = agp_uninorth_remove, -#ifdef CONFIG_PM - .suspend = agp_uninorth_suspend, - .resume = agp_uninorth_resume, -#endif }; static int __init agp_uninorth_init(void) Index: linux-work/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_feature.c 2005-03-29 15:44:35.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pmac_feature.c 2005-04-05 14:39:52.000000000 +1000 @@ -674,3 +674,67 @@ dump_HT_speeds("PCI-X HT Downlink", cfg, freq); #endif } + +/* + * Early video resume hook + */ + +static void (*pmac_early_vresume_proc)(void *data) __pmacdata; +static void *pmac_early_vresume_data __pmacdata; + +void pmac_set_early_video_resume(void (*proc)(void *data), void *data) +{ + if (_machine != _MACH_Pmac) + return; + preempt_disable(); + pmac_early_vresume_proc = proc; + pmac_early_vresume_data = data; + preempt_enable(); +} +EXPORT_SYMBOL(pmac_set_early_video_resume); + + +/* + * AGP related suspend/resume code + */ + +static struct pci_dev *pmac_agp_bridge __pmacdata; +static int (*pmac_agp_suspend)(struct pci_dev *bridge) __pmacdata; +static int (*pmac_agp_resume)(struct pci_dev *bridge) __pmacdata; + +void __pmac pmac_register_agp_pm(struct pci_dev *bridge, + int (*suspend)(struct pci_dev *bridge), + int (*resume)(struct pci_dev *bridge)) +{ + if (suspend || resume) { + pmac_agp_bridge = bridge; + pmac_agp_suspend = suspend; + pmac_agp_resume = resume; + return; + } + if (bridge != pmac_agp_bridge) + return; + pmac_agp_suspend = pmac_agp_resume = NULL; + return; +} +EXPORT_SYMBOL(pmac_register_agp_pm); + +void __pmac pmac_suspend_agp_for_card(struct pci_dev *dev) +{ + if (pmac_agp_bridge == NULL || pmac_agp_suspend == NULL) + return; + if (pmac_agp_bridge->bus != dev->bus) + return; + pmac_agp_suspend(pmac_agp_bridge); +} +EXPORT_SYMBOL(pmac_suspend_agp_for_card); + +void __pmac pmac_resume_agp_for_card(struct pci_dev *dev) +{ + if (pmac_agp_bridge == NULL || pmac_agp_resume == NULL) + return; + if (pmac_agp_bridge->bus != dev->bus) + return; + pmac_agp_resume(pmac_agp_bridge); +} +EXPORT_SYMBOL(pmac_resume_agp_for_card); Index: linux-work/arch/ppc/platforms/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_feature.c 2005-04-05 14:29:30.000000000 +1000 +++ linux-work/arch/ppc/platforms/pmac_feature.c 2005-04-05 15:20:06.000000000 +1000 @@ -2944,3 +2944,48 @@ if (pmac_early_vresume_proc) pmac_early_vresume_proc(pmac_early_vresume_data); } + +/* + * AGP related suspend/resume code + */ + +static struct pci_dev *pmac_agp_bridge __pmacdata; +static int (*pmac_agp_suspend)(struct pci_dev *bridge) __pmacdata; +static int (*pmac_agp_resume)(struct pci_dev *bridge) __pmacdata; + +void __pmac pmac_register_agp_pm(struct pci_dev *bridge, + int (*suspend)(struct pci_dev *bridge), + int (*resume)(struct pci_dev *bridge)) +{ + if (suspend || resume) { + pmac_agp_bridge = bridge; + pmac_agp_suspend = suspend; + pmac_agp_resume = resume; + return; + } + if (bridge != pmac_agp_bridge) + return; + pmac_agp_suspend = pmac_agp_resume = NULL; + return; +} +EXPORT_SYMBOL(pmac_register_agp_pm); + +void __pmac pmac_suspend_agp_for_card(struct pci_dev *dev) +{ + if (pmac_agp_bridge == NULL || pmac_agp_suspend == NULL) + return; + if (pmac_agp_bridge->bus != dev->bus) + return; + pmac_agp_suspend(pmac_agp_bridge); +} +EXPORT_SYMBOL(pmac_suspend_agp_for_card); + +void __pmac pmac_resume_agp_for_card(struct pci_dev *dev) +{ + if (pmac_agp_bridge == NULL || pmac_agp_resume == NULL) + return; + if (pmac_agp_bridge->bus != dev->bus) + return; + pmac_agp_resume(pmac_agp_bridge); +} +EXPORT_SYMBOL(pmac_resume_agp_for_card); Index: linux-work/drivers/video/aty/radeon_pm.c =================================================================== --- linux-work.orig/drivers/video/aty/radeon_pm.c 2005-04-01 09:04:19.000000000 +1000 +++ linux-work/drivers/video/aty/radeon_pm.c 2005-04-05 15:21:54.000000000 +1000 @@ -2520,13 +2520,10 @@ } -static/*extern*/ int susdisking = 0; - int radeonfb_pci_suspend(struct pci_dev *pdev, pm_message_t state) { struct fb_info *info = pci_get_drvdata(pdev); struct radeonfb_info *rinfo = info->par; - u8 agp; int i; if (state == pdev->dev.power.power_state) @@ -2542,11 +2539,6 @@ */ if (state != PM_SUSPEND_MEM) goto done; - if (susdisking) { - printk("radeonfb (%s): suspending to disk but state = %d\n", - pci_name(pdev), state); - goto done; - } acquire_console_sem(); @@ -2567,27 +2559,13 @@ rinfo->lock_blank = 1; del_timer_sync(&rinfo->lvds_timer); - /* Disable AGP. The AGP host should have done it, but since ordering - * isn't always properly guaranteed in this specific case, let's make - * sure it's disabled on card side now. Ultimately, when merging fbdev - * and dri into some common infrastructure, this will be handled - * more nicely. The host bridge side will (or will not) be dealt with - * by the bridge AGP driver, we don't attempt to touch it here. +#ifdef CONFIG_PPC_PMAC + /* On powermac, we have hooks to properly suspend/resume AGP now, + * use them here. We'll ultimately need some generic support here, + * but the generic code isn't quite ready for that yet */ - agp = pci_find_capability(pdev, PCI_CAP_ID_AGP); - if (agp) { - u32 cmd; - - pci_read_config_dword(pdev, agp + PCI_AGP_COMMAND, &cmd); - if (cmd & PCI_AGP_COMMAND_AGP) { - printk(KERN_INFO "radeonfb (%s): AGP was enabled, " - "disabling ...\n", - pci_name(pdev)); - cmd &= ~PCI_AGP_COMMAND_AGP; - pci_write_config_dword(pdev, agp + PCI_AGP_COMMAND, - cmd); - } - } + pmac_suspend_agp_for_card(pdev); +#endif /* CONFIG_PPC_PMAC */ /* If we support wakeup from poweroff, we save all regs we can including cfg * space @@ -2699,6 +2677,15 @@ rinfo->lock_blank = 0; radeon_screen_blank(rinfo, FB_BLANK_UNBLANK, 1); +#ifdef CONFIG_PPC_PMAC + /* On powermac, we have hooks to properly suspend/resume AGP now, + * use them here. We'll ultimately need some generic support here, + * but the generic code isn't quite ready for that yet + */ + pmac_resume_agp_for_card(pdev); +#endif /* CONFIG_PPC_PMAC */ + + /* Check status of dynclk */ if (rinfo->dynclk == 1) radeon_pm_enable_dynamic_mode(rinfo); Index: linux-work/include/asm-ppc/pmac_feature.h =================================================================== --- linux-work.orig/include/asm-ppc/pmac_feature.h 2005-03-15 11:59:39.000000000 +1100 +++ linux-work/include/asm-ppc/pmac_feature.h 2005-04-05 14:29:31.000000000 +1000 @@ -305,6 +305,17 @@ #define PMAC_FTR_DEF(x) ((_MACH_Pmac << 16) | (x)) +/* The AGP driver registers itself here */ +extern void pmac_register_agp_pm(struct pci_dev *bridge, + int (*suspend)(struct pci_dev *bridge), + int (*resume)(struct pci_dev *bridge)); + +/* Those are meant to be used by video drivers to deal with AGP + * suspend resume properly + */ +extern void pmac_suspend_agp_for_card(struct pci_dev *dev); +extern void pmac_resume_agp_for_card(struct pci_dev *dev); + /* * The part below is for use by macio_asic.c only, do not rely Index: linux-work/drivers/video/aty/aty128fb.c =================================================================== --- linux-work.orig/drivers/video/aty/aty128fb.c 2005-04-01 09:04:18.000000000 +1000 +++ linux-work/drivers/video/aty/aty128fb.c 2005-04-05 15:22:17.000000000 +1000 @@ -2331,7 +2331,6 @@ { struct fb_info *info = pci_get_drvdata(pdev); struct aty128fb_par *par = info->par; - u8 agp; /* We don't do anything but D2, for now we return 0, but * we may want to change that. How do we know if the BIOS @@ -2369,26 +2368,13 @@ par->asleep = 1; par->lock_blank = 1; - /* Disable AGP. The AGP host should have done it, but since ordering - * isn't always properly guaranteed in this specific case, let's make - * sure it's disabled on card side now. Ultimately, when merging fbdev - * and dri into some common infrastructure, this will be handled - * more nicely. The host bridge side will (or will not) be dealt with - * by the bridge AGP driver, we don't attempt to touch it here. +#ifdef CONFIG_PPC_PMAC + /* On powermac, we have hooks to properly suspend/resume AGP now, + * use them here. We'll ultimately need some generic support here, + * but the generic code isn't quite ready for that yet */ - agp = pci_find_capability(pdev, PCI_CAP_ID_AGP); - if (agp) { - u32 cmd; - - pci_read_config_dword(pdev, agp + PCI_AGP_COMMAND, &cmd); - if (cmd & PCI_AGP_COMMAND_AGP) { - printk(KERN_INFO "aty128fb: AGP was enabled, " - "disabling ...\n"); - cmd &= ~PCI_AGP_COMMAND_AGP; - pci_write_config_dword(pdev, agp + PCI_AGP_COMMAND, - cmd); - } - } + pmac_suspend_agp_for_card(pdev); +#endif /* CONFIG_PPC_PMAC */ /* We need a way to make sure the fbdev layer will _not_ touch the * framebuffer before we put the chip to suspend state. On 2.4, I @@ -2432,6 +2418,14 @@ par->lock_blank = 0; aty128fb_blank(0, info); +#ifdef CONFIG_PPC_PMAC + /* On powermac, we have hooks to properly suspend/resume AGP now, + * use them here. We'll ultimately need some generic support here, + * but the generic code isn't quite ready for that yet + */ + pmac_resume_agp_for_card(pdev); +#endif /* CONFIG_PPC_PMAC */ + pdev->dev.power.power_state = PMSG_ON; printk(KERN_DEBUG "aty128fb: resumed !\n"); From benh at kernel.crashing.org Tue Apr 5 17:15:11 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Apr 2005 17:15:11 +1000 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <20050314181420.GD498@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <4235847F.3080705@jp.fujitsu.com> <20050314181420.GD498@austin.ibm.com> Message-ID: <1112685311.9518.35.camel@gaston> Hi ! I've been away for a while, but here is my latest update of the proposal, if we all agree with it, it will go to kernel/Documentation somewhere and we'll start implementing the ppc64 side of it. The error recovery API support is exposed by the driver in the form of a structure of function pointers pointed to by a new field in struct pci_driver. The absence of this pointer in pci_driver denotes an "non-aware" driver, behaviour on these is platform dependant. Platforms like ppc64 can try to simulate hotplug remove/add. The definition of "pci_error_token" is not covered here. It is based on Seto's work on the synchronous error detection. We still need to define functions for extracting infos out of an opaque error token. This is separate from this API. This structure has the form: struct pci_error_handlers { int (*error_detected)(struct pci_dev *dev, pci_error_token error); int (*error_recover)(struct pci_dev *dev); int (*error_restart)(struct pci_dev *dev); int (*link_reset)(struct pci_dev *dev); int (*slot_reset)(struct pci_dev *dev); }; A driver doesn't have to implement all of these callbacks. The only mandatory one is error_detected. If a callback is not implemented, the corresponding feature is considered unsupported. For example, if error_recover and error_restart (they really go together, see desscription to understand why) aren't there, then the driver is assumed as not doing any direct recovery and requires a reset. If link_reset is not implemented, the card is assumed as not caring about link resets, in which case, if recover is supported, the core can try recover (but not slot_reset unless it really did reset the slot). If slot reset is not supported, link reset can be called instead on a slot reset. At first, the call will always be : 1) error_detected() Error detected. This is sent once after an error has been detected. At this point, the device might not be accessible anymore depending on the platform (the slot will be isolated on ppc64). The driver may already have "noticed" the error because of a failing IO, but this is the proper "synchronisation point", that is, it gives a chance to the driver to cleanup, waiting for pending stuffs (timers, whatever, etc...) to complete, it can take semaphores, schedule, etc... everything but touch the device. Within this function and after it returns, the driver shouldn't do any new IOs. Called in task context. This is sort of a "quiesce" point. See note about interrupts at the end of this doc. Result codes: - PCIERR_RESULT_CAN_RECOVER: Return this if you think you might be able to recover the HW by just banging IOs or if you want to be given a chance to extract some diagnostic informations (see below). - PCIERR_RESULT_NEED_RESET: Return this if you think you can't recover unless the slot is reset. - PCIERR_RESULT_DISCONNECT: Return this if you think you won't recover at all, (this will detach the driver ? or just leave it dangling ? to be decided) So at this point, we have called error_detected() for all drivers on the segment that had the error. On ppc64, the slot is isolated. What happens now typically depends on the result from the drivers. If all drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would re-enable IOs on the slot (or do nothing special if the platform doesn't isolate slots) and call 2). If not and we can reset slots, we go to 4), if neither, we have a dead slot. If it's an hotplug slot, we might "simulate" reset by triggering HW unplug/replug tho. 2) error_recover() This is the "early recovery" call. IOs are allowed again, but DMA is not (hrm... to be discussed, I prefer not), with some restrictions. This is NOT a callback for the driver to start operations again, only to peek/poke at the device, extract diagnostic informations if any, and eventually do things like trigger a device local reset or such things, but not restart operations. This is sent if all drivers on a segment agree that they can try to recover and no automatic link reset was performed by the HW. If the platform can't just re-enable IOs without a slot reset or a link reset, it doesn't call this callback and goes directly to 3) or 4). All IOs should be done _synchronously_ from withing this callback, errors triggered by them will be returned via the normal pci_check_whatever() api, no new error_detected() callback will be issued due to an error happening here. However, such an error might cause IOs to be re-blocked for the whole segment, and thus invalidate the recovery that other devices on the same segment might have done, forcing the whole segment into one of the next states, that is link reset or slot reset. Result codes: - PCIERR_RESULT_RECOVERED Return this if you think your device is fully functionnal and think you are ready to start to do your normal driver job again. There is no guarantee that because you returned that, you'll be allowed to actually proceed as another driver on the same segment might have failed and thus triggered a slot reset on platforms that support it. - PCIERR_RESULT_NEED_RESET Return this if you think your device is not recoverable in it's current state and you need a slot reset to proceed. - PCIERR_RESULT_DISCONNECT Same as above. Total failure, no recovery even after reset driver dead. (To be defined more precisely) 3) link_reset() This is called after the link has been reset. This is typically a PCI Express specific state at this point and is done wether a non fatal error has been detected that can be "solved" by resetting the link. The driver is informed here of that reset and should check if the device appears to be in working condition. This function acts a bit like 2) error_recover(), that is it is not supposed to restart normal driver IO operations right away, just "probe" the device to check it's recoverability status. If all is right, then the core will call error_restart() once all driver have ack'd link_reset(). Result codes: (identical to error_recover) 4) slot_reset() This is called after the slot has been hard reset (and PCI BARs re-configured by the platform). If the platform supports PCI hotplug, it can implement this by toggling power on the slot off/on. Drivers here have a chance to re-initialize the hardware (re-download firmware etc...), but drivers shouldn't restart normal IO processing operations at this point. (see note about interrupts, they aren't guaranteed to be delivered until the restart callback has been called). Upon success from this callback, the patform will call error_restart() to complete the error handling and let the driver restart normal IO request processing. However, a driver can still return a critical failure from here in case it just can't get it's device back from reset. There is just nothing we can do about it tho. The driver will just be considered "dead" in this case. Result codes: - PCIERR_RESULT_DISCONNECT Same as above. 5) error_restart() This is called if all drivers on the segment have returned PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. That basically tells the driver to restart activity, everything is back & running. No result code is taken into account here. If a new error happens, it will restart a new error handling process. That's it. I think this covers all the possibilities. The way those callbacks are called is platform policy. A platform with no slot reset capability for example may want to just "ignore" drivers that can't recover (disconnect them) and try to let other cards on the same segment recover. Keep in mind that in most real life cases, though, there will be only one driver per segment. Now, there is a note about interrupts. If you get an interrupt and your device is dead or has been isolated, there is a problem :) After much thinking, I decided to leave that to the platform. That is, the recovery API only precies that: - There is no guarantee that interrupt delivery can proceed from any device on the segment starting from the error detection and until the restart callback is sent, at which point interrupts are expected to be fully operational. - There is no guarantee that interrupt delivery is stopped, that is, ad river that gets an interrupts after detecting an error, or that detects and error within the interrupt handler such that it prevents proper ack'ing of the interrupt (and thus removal of the source) should just return IRQ_NOTHANDLED. It's up to the platform to deal with taht condition, typically by masking the irq source during the duration of the error handling. It is expected that the platform "knows" which interrupts are routed to error-management capable slots and can deal with temporarily disabling that irq number during error processing (this isn't terribly complex). That means some IRQ latency for other devices sharing the interrupt, but there is simply no other way. High end platforms aren't supposed to share interrupts between many devices anyway :) Ben. From cfriesen at nortel.com Wed Apr 6 03:33:23 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Tue, 05 Apr 2005 11:33:23 -0600 Subject: help, trying to invalidate entire icache on 970 Message-ID: <4252CBE3.3010701@nortel.com> I'm having issues with some code that is supposed to invalidate the entire icache on a 970. I have a little test app in userspace that overwrites an instruction and calls some kernel code to invalidate the whole cache. Unfortunately, sometimes the new instruction doesn't get run, and if I start a kernel build in the background, it occurs quite frequently. The kernel code is accessed via an ioctl() on a device node, and it looks like this: local_irq_save() sync repeated 513 times: b 128 <31 nops> isync local_irq_restore() Basically, I'm trying to do the brute-force method of flushing it, as described in the 970 manual. Obviously I'm missing something, but I'm not sure what. In case it matters, the machine is dual-cpu. Anyone have any ideas? Anyone have any such code that works? Chris From moilanen at austin.ibm.com Wed Apr 6 05:33:34 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 5 Apr 2005 14:33:34 -0500 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <4252CBE3.3010701@nortel.com> References: <4252CBE3.3010701@nortel.com> Message-ID: <20050405143334.33e466b0.moilanen@austin.ibm.com> On Tue, 05 Apr 2005 11:33:23 -0600 Chris Friesen wrote: > I'm having issues with some code that is supposed to invalidate the > entire icache on a 970. > > I have a little test app in userspace that overwrites an instruction and > calls some kernel code to invalidate the whole cache. Unfortunately, > sometimes the new instruction doesn't get run, and if I start a kernel > build in the background, it occurs quite frequently. IIRC to modify an instruction you need the following sequence to do flush the icache correctly: dcbst sync icbi isync I can't remember if the 970 actually requires this sequence or not. Jake From cfriesen at nortel.com Wed Apr 6 06:07:23 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Tue, 05 Apr 2005 14:07:23 -0600 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <20050405143334.33e466b0.moilanen@austin.ibm.com> References: <4252CBE3.3010701@nortel.com> <20050405143334.33e466b0.moilanen@austin.ibm.com> Message-ID: <4252EFFB.5010000@nortel.com> Jake Moilanen wrote: > IIRC to modify an instruction you need the following sequence to do > flush the icache correctly: > > dcbst > sync > icbi > isync This works if you know the address that was modified. My problem is that I have an application (emulator) that modifies its own instructions but doesn't track the addresses. Thus I need to flush the entire dcache (on the 970 this is just a "sync"), and invalidate the entire icache. Chris From cfriesen at nortel.com Wed Apr 6 07:15:42 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Tue, 05 Apr 2005 15:15:42 -0600 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <4252CBE3.3010701@nortel.com> References: <4252CBE3.3010701@nortel.com> Message-ID: <4252FFFE.5090700@nortel.com> Friesen, Christopher [CAR:VC21:EXCH] wrote: > Basically, I'm trying to do the brute-force method of flushing it, as > described in the 970 manual. Obviously I'm missing something, but I'm > not sure what. In case it matters, the machine is dual-cpu. > > Anyone have any ideas? Anyone have any such code that works? I've switched over to using the en_icbi method of invalidation. It seems to work, but I'm calling icbi once for every cacheline and that seems suboptimal. The manual says that 4 bits of the address are used to index the icache, and thus each icbi call with en_icbi enabled will result in 16 cachelines being invalidated. Unfortunately I didn't see anywhere where it explained which bits they are and how they map to cachelines. Just calling icbi 32 times and incrementing the address by a cacheline each time didn't work. I should be able to get away with only calling it 32 times, assuming I pick exactly the right addresses, but I'm at a loss as to which addresses to use. Anyone able to help? Chris From linas at austin.ibm.com Wed Apr 6 07:43:03 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 5 Apr 2005 16:43:03 -0500 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <4252FFFE.5090700@nortel.com> References: <4252CBE3.3010701@nortel.com> <4252FFFE.5090700@nortel.com> Message-ID: <20050405214303.GO15596@austin.ibm.com> On Tue, Apr 05, 2005 at 03:15:42PM -0600, Chris Friesen was heard to remark: > Friesen, Christopher [CAR:VC21:EXCH] wrote: > > >Basically, I'm trying to do the brute-force method of flushing it, as > >described in the 970 manual. Obviously I'm missing something, but I'm > >not sure what. In case it matters, the machine is dual-cpu. > > > >Anyone have any ideas? Anyone have any such code that works? > > I've switched over to using the en_icbi method of invalidation. It > seems to work, but I'm calling icbi once for every cacheline and that > seems suboptimal. > > The manual says that 4 bits of the address are used to index the icache, > and thus each icbi call with en_icbi enabled will result in 16 I'm not quite clear on what you are doing ... but some general remarks: In general, the caches tend to be n-way set associative. So invalidating a given cache line will invalidate only one of the n ways. This might explain why you first attempt didn't work. I don't know what n is for the 970. Typically is 2 or 4 for this class of cpu. It tends to vary from one model to another. Which "4 bits" are involved tends to vary from one core to another. Even if you found somethingthat worked on the 970, it might not work on the next generation, since the address lines would be wired differntly. Similar remarks apply for the assumption that theres only 16 or 32 cache blocks or lines or whatever ... I'm not sure I know what en_icbi is (have never scanned the 970 docs). Maybe its invalidating all cache lines that alias to the same address tag. Why, again, is it that you can't just call icbi with the address of the instruction that has been changed? --linas From cfriesen at nortel.com Wed Apr 6 08:15:46 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Tue, 05 Apr 2005 16:15:46 -0600 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <20050405214303.GO15596@austin.ibm.com> References: <4252CBE3.3010701@nortel.com> <4252FFFE.5090700@nortel.com> <20050405214303.GO15596@austin.ibm.com> Message-ID: <42530E12.4090206@nortel.com> Linas Vepstas wrote: > In general, the caches tend to be n-way set associative. So > invalidating a given cache line will invalidate only one of > the n ways. This might explain why you first attempt didn't work. > I don't know what n is for the 970. Typically is 2 or 4 for this class > of cpu. It tends to vary from one model to another. The icache is direct mapped, but is indexed by four bits in the effective address such that a given physical address can be aliased to 16 positions in the cache. > Which "4 bits" are involved tends to vary from one core to another. > Even if you found somethingthat worked on the 970, it might not work on > the next generation, since the address lines would be wired differntly. > > Similar remarks apply for the assumption that theres only 16 or 32 cache > blocks or lines or whatever ... Right. This whole chunk of code is 970-specific. We have other code for other cpus (the 74xx for instance can flash-invalidate the whole icache with one instruction). > I'm not sure I know what en_icbi is (have never scanned the 970 docs). > Maybe its invalidating all cache lines that alias to the same address > tag. Yep. I'm trying to figure those aliasing patterns out so I can minimize the number of icbi calls needed. > Why, again, is it that you can't just call icbi with the address of the > instruction that has been changed? I have a pre-existing app that modifies itself and doesn't track the addresses. All I get is the app telling me "I just modified something." Thus, I have to flush the entire dcache, and invalidate the entire icache in order to ensure that the new code gets run. It's horribly kludgy I know, but that's what I've got to deal with. Chris From paulus at samba.org Wed Apr 6 10:28:09 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 6 Apr 2005 10:28:09 +1000 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <4252EFFB.5010000@nortel.com> References: <4252CBE3.3010701@nortel.com> <20050405143334.33e466b0.moilanen@austin.ibm.com> <4252EFFB.5010000@nortel.com> Message-ID: <16979.11545.264476.711612@cargo.ozlabs.ibm.com> Chris Friesen writes: > My problem is that I have an application (emulator) that modifies its > own instructions but doesn't track the addresses. Thus I need to flush > the entire dcache (on the 970 this is just a "sync"), and invalidate the > entire icache. Current BK now has support for making pages non-executable with mprotect. You could mprotect the pages RW to start with and have a SIGSEGV handler. When the emulator tries to execute from a page you will get a SIGSEGV, and you can flush that page (with 32 x dcbst; sync; 32 x icbi; isync) and then mprotect it RX and return from the signal handler. If the emulator writes to it you get another SIGSEGV and mprotect it back to RW. Paul. From segher at kernel.crashing.org Wed Apr 6 12:49:33 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Wed, 6 Apr 2005 04:49:33 +0200 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <20050405143334.33e466b0.moilanen@austin.ibm.com> References: <4252CBE3.3010701@nortel.com> <20050405143334.33e466b0.moilanen@austin.ibm.com> Message-ID: > IIRC to modify an instruction you need the following sequence to do > flush the icache correctly: > > dcbst > sync > icbi > isync > > I can't remember if the 970 actually requires this sequence or not. It does not, as the DL1 cache is store-through; i.e., the dcbst insn is superfluous here (the sync is required in general, though!) Segher From benh at kernel.crashing.org Wed Apr 6 14:10:38 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Apr 2005 14:10:38 +1000 Subject: [PATCH] ppc64: Improve mapping of vDSO Message-ID: <1112760638.9518.93.camel@gaston> Hi Andrew ! This patch reworks the way the ppc64 is mapped in user memory by the kernel to make it more robust against possible collisions with executable segments. Instead of just whacking a VMA at 1Mb, I now use get_unmapped_area() with a hint, and I moved the mapping of the vDSO to after the mapping of the various ELF segments and of the interpreter, so that conflicts get caught properly (it still has to be before create_elf_tables since the later will fill the AT_SYSINFO_EHDR with the proper address). While I was at it, I also changed the 32 and 64 bits vDSO's to link at their "natural" address of 1Mb instead of 0. This is the address where they are normally mapped in absence of conflict. By doing so, it should be possible to properly prelink one it's been verified to work on glibc. Please apply for 2.6.12, Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/vdso.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso.c 2005-03-07 10:22:15.000000000 +1100 +++ linux-work/arch/ppc64/kernel/vdso.c 2005-04-06 13:32:41.000000000 +1000 @@ -213,13 +213,14 @@ vdso_base = VDSO64_MBASE; } + current->thread.vdso_base = 0; + /* vDSO has a problem and was disabled, just don't "enable" it for the * process */ - if (vdso_pages == 0) { - current->thread.vdso_base = 0; + if (vdso_pages == 0) return 0; - } + vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (vma == NULL) return -ENOMEM; @@ -230,12 +231,16 @@ memset(vma, 0, sizeof(*vma)); /* - * pick a base address for the vDSO in process space. We have a default - * base of 1Mb on which we had a random offset up to 1Mb. - * XXX: Add possibility for a program header to specify that location + * pick a base address for the vDSO in process space. We try to put it + * at vdso_base which is the "natural" base for it, but we might fail + * and end up putting it elsewhere. */ + vdso_base = get_unmapped_area(NULL, vdso_base, + vdso_pages << PAGE_SHIFT, 0, 0); + if (vdso_base & ~PAGE_MASK) + return (int)vdso_base; + current->thread.vdso_base = vdso_base; - /* + ((unsigned long)vma & 0x000ff000); */ vma->vm_mm = mm; vma->vm_start = current->thread.vdso_base; Index: linux-work/fs/binfmt_elf.c =================================================================== --- linux-work.orig/fs/binfmt_elf.c 2005-04-03 10:02:57.000000000 +1000 +++ linux-work/fs/binfmt_elf.c 2005-04-06 13:10:49.000000000 +1000 @@ -782,14 +782,6 @@ goto out_free_dentry; } -#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES - retval = arch_setup_additional_pages(bprm, executable_stack); - if (retval < 0) { - send_sig(SIGKILL, current, 0); - goto out_free_dentry; - } -#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ - current->mm->start_stack = bprm->p; /* Now we do a little grungy work by mmaping the ELF image into @@ -949,6 +941,14 @@ set_binfmt(&elf_format); +#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES + retval = arch_setup_additional_pages(bprm, executable_stack); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + goto out_free_dentry; + } +#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */ + compute_creds(bprm); current->flags &= ~PF_FORKNOEXEC; create_elf_tables(bprm, &loc->elf_ex, (interpreter_type == INTERPRETER_AOUT), Index: linux-work/include/asm-ppc64/vdso.h =================================================================== --- linux-work.orig/include/asm-ppc64/vdso.h 2005-03-15 11:57:38.000000000 +1100 +++ linux-work/include/asm-ppc64/vdso.h 2005-04-06 13:33:20.000000000 +1000 @@ -4,12 +4,12 @@ #ifdef __KERNEL__ /* Default link addresses for the vDSOs */ -#define VDSO32_LBASE 0 -#define VDSO64_LBASE 0 +#define VDSO32_LBASE 0x100000 +#define VDSO64_LBASE 0x100000 /* Default map addresses */ -#define VDSO32_MBASE 0x100000 -#define VDSO64_MBASE 0x100000 +#define VDSO32_MBASE VDSO32_LBASE +#define VDSO64_MBASE VDSO64_LBASE #define VDSO_VERSION_STRING LINUX_2.6.12 From benh at kernel.crashing.org Wed Apr 6 14:57:29 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Apr 2005 14:57:29 +1000 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: References: <4252CBE3.3010701@nortel.com> <20050405143334.33e466b0.moilanen@austin.ibm.com> Message-ID: <1112763449.9568.99.camel@gaston> On Wed, 2005-04-06 at 04:49 +0200, Segher Boessenkool wrote: > > IIRC to modify an instruction you need the following sequence to do > > flush the icache correctly: > > > > dcbst > > sync > > icbi > > isync > > > > I can't remember if the 970 actually requires this sequence or not. > > It does not, as the DL1 cache is store-through; i.e., the dcbst insn > is superfluous here (the sync is required in general, though!) Isn't dcbst a nop on 970 anyway ? Ben. From benh at kernel.crashing.org Wed Apr 6 15:00:13 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Apr 2005 15:00:13 +1000 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <42530E12.4090206@nortel.com> References: <4252CBE3.3010701@nortel.com> <4252FFFE.5090700@nortel.com> <20050405214303.GO15596@austin.ibm.com> <42530E12.4090206@nortel.com> Message-ID: <1112763613.9567.102.camel@gaston> On Tue, 2005-04-05 at 16:15 -0600, Chris Friesen wrote: > I have a pre-existing app that modifies itself and doesn't track the > addresses. All I get is the app telling me "I just modified something." > Thus, I have to flush the entire dcache, and invalidate the entire > icache in order to ensure that the new code gets run. It's horribly > kludgy I know, but that's what I've got to deal with. Stupid idea: have you checked that maybe when it calls you back for that "I just modified something", the address is actually still in one of the registers or a known stack location where you can "peek" at it ? :) That would solve your problem once for all ... Unless that app gets rebuilt regulary, but in this case, you should really get them to fix the API to the callback to take the address. Ben. From ntl at pobox.com Wed Apr 6 15:30:02 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 6 Apr 2005 00:30:02 -0500 Subject: [RFC/PATCH] numa: distinguish associativity domain from node id Message-ID: <20050406053002.GF3611@otto> Yes, yet another numa.c patch... this is strictly rfc for now. The ppc64 numa code makes some possibly invalid assumptions about the numbering of "associativity domains" (which may be considered NUMA nodes). As far as I've been able to determine from the architecture docs, there is no guarantee about the numbering of associativity domains, i.e. the values that are contained in ibm,associativity device node properties. Yet we seem to assume that the numbering of the domains begins at zero and that the range is contiguous, and we use the domain number for a given resource as its logical node id. This strikes me as a problem waiting to happen, and in fact I've been seeing some problems in the lab with larger machines violating or at least straining these assumptions. Consider one such case: the associativity domain for all memory in a partition is 0x1, but the processors are in shared mode (so no associativity info for them) -- all the memory is placed in node 1 while all cpus are mapped to node 0. But in this case, we should really have only one logical node, with all memory and cpus mapped to it. Another case I've seen is that of a partition with all processors and memory having an associativity domain of 0x1. We end up with everything in node 1 and an empty (yet online) node 0. I propose treating the associativity domain for a resource as a "cookie" without making any assumptions about its value. During numa init, each distinct domain is mapped to a logical node. so that the following holds: the logical node numbering begins at zero and is contiguous, and resources added after boot which do not map to an already initialized domain are associated with logical node 0. The patch implements these, and attempts to separate the notion of associativity domain from that of logical node where appropriate. Lightly tested on Power5 LPAR with two numa nodes - it boots and the information under /sys/devices/system/node looks correct. I'm going to omit a signed-off line for now; there's probably some stupid bug introduced or something that someone will find objectionable (memory hotplug folks?)... Please review. Thanks, Nathan arch/ppc64/mm/numa.c | 207 +++++++++++++++++++++++++++---------------- 1 files changed, 131 insertions(+), 76 deletions(-) Index: linux-2.6.12-rc2/arch/ppc64/mm/numa.c =================================================================== --- linux-2.6.12-rc2.orig/arch/ppc64/mm/numa.c 2005-04-05 12:59:28.000000000 -0500 +++ linux-2.6.12-rc2/arch/ppc64/mm/numa.c 2005-04-06 00:17:08.000000000 -0500 @@ -58,6 +58,69 @@ EXPORT_SYMBOL(numa_memory_lookup_table); EXPORT_SYMBOL(numa_cpumask_lookup_table); EXPORT_SYMBOL(nr_cpus_in_node); +/* Maps nid to platform "associativity domain". */ +#define INVALID_DOMAIN (-1) +static int nid_domain[MAX_NUMNODES] = { [0 ... (MAX_NUMNODES - 1)] = + INVALID_DOMAIN }; +/* nid to platform domain id. O(1). */ +static int nid_to_domain(int nid) +{ + BUG_ON(0 > nid || nid >= MAX_NUMNODES); + return nid_domain[nid]; +} + +/* Platform domain to nid. If the given domain does not map to a node, + * return -1. O(n). + */ +static int domain_to_nid(int domain) +{ + int nid; + + for_each_node(nid) + if (domain == nid_to_domain(nid)) + return nid; + return -1; +} + +/* Associate domain with the given nid. */ +static void __init assign_domain_to_nid(int domain, int nid) +{ + BUG_ON(0 > nid || nid >= MAX_NUMNODES); + BUG_ON(domain == INVALID_DOMAIN); + BUG_ON(nid_domain[nid] != INVALID_DOMAIN); + + nid_domain[nid] = domain; + dbg("OF associativity domain 0x%x mapped to node %i\n", domain, nid); +} + +/* Given a previously unencountered associativity domain, find the + * first unused slot in the nid_domain array where it can be plugged + * in. + */ +static int __init setup_domain(int domain) +{ + int nid; + + for_each_node(nid) { + int tmp = nid_to_domain(nid); + + if (tmp != INVALID_DOMAIN) + continue; + + /* Do not set up the same domain twice. */ + BUG_ON(tmp == domain); + + assign_domain_to_nid(domain, nid); + return nid; + } + + printk(KERN_WARNING "Can't associate domain 0x%x with a node, " + "MAX_NUMNODES=%i, num_online_nodes=%i\n", domain, + MAX_NUMNODES, num_online_nodes()); + + return 0; +} + static inline void map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; @@ -117,26 +180,58 @@ static struct device_node * __devinit fi /* must hold reference to node during call */ static int *of_get_associativity(struct device_node *dev) { - return (unsigned int *)get_property(dev, "ibm,associativity", NULL); + return (int *)get_property(dev, "ibm,associativity", NULL); } -static int of_node_numa_domain(struct device_node *device) +/* + * Given an OF device node, return the logical node id to which it + * belongs. If the node has no associativity information, the result + * is 0. During boot, this function will map domains to nodes as + * necessary. + */ +static int of_node_to_nid(struct device_node *dn) { - int numa_domain; - unsigned int *tmp; + int *tmp, domain, nid; if (min_common_depth == -1) return 0; - tmp = of_get_associativity(device); - if (tmp && (tmp[0] >= min_common_depth)) { - numa_domain = tmp[min_common_depth]; - } else { - dbg("WARNING: no NUMA information for %s\n", - device->full_name); - numa_domain = 0; + tmp = of_get_associativity(dn); + + if (!tmp || (tmp[0] < min_common_depth)) { + dbg("no NUMA information for %s\n", dn->full_name); + return 0; } - return numa_domain; + + domain = tmp[min_common_depth]; + + /* + * POWER4 LPAR uses 0xffff as invalid node, + * just use node zero. + */ + if (domain == 0xffff) + nid = 0; + else + nid = domain_to_nid(domain); + + /* If we haven't seen this domain before, associate it with a + * node if we're still in boot. If we're up and running and + * the domain is previously unknown, we have no choice but to + * map the resource to an initialized node, so we map it to + * nid 0. + */ + if (nid < 0) { + if (system_state < SYSTEM_RUNNING) { + nid = setup_domain(domain); + } else { + nid = 0; + dbg("Resource %s has associativity domain" + " %x which was not known at boot, assigning" + " to node %i\n", dn->full_name, domain, nid); + } + } + node_set_online(nid); + return nid; } /* @@ -228,7 +323,7 @@ static unsigned long read_n_cells(int n, */ static int numa_setup_cpu(unsigned long lcpu) { - int numa_domain = 0; + int nid = 0; struct device_node *cpu = find_cpu_node(lcpu); if (!cpu) { @@ -236,27 +331,16 @@ static int numa_setup_cpu(unsigned long goto out; } - numa_domain = of_node_numa_domain(cpu); + nid = of_node_to_nid(cpu); - if (numa_domain >= num_online_nodes()) { - /* - * POWER4 LPAR uses 0xffff as invalid node, - * dont warn in this case. - */ - if (numa_domain != 0xffff) - printk(KERN_ERR "WARNING: cpu %ld " - "maps to invalid NUMA node %d\n", - lcpu, numa_domain); - numa_domain = 0; - } out: - node_set_online(numa_domain); + node_set_online(nid); - map_cpu_to_node(lcpu, numa_domain); + map_cpu_to_node(lcpu, nid); of_node_put(cpu); - return numa_domain; + return nid; } static int cpu_numa_callback(struct notifier_block *nfb, @@ -319,7 +403,6 @@ static int __init parse_numa_properties( struct device_node *cpu = NULL; struct device_node *memory = NULL; int addr_cells, size_cells; - int max_domain = 0; long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; unsigned long i; @@ -341,7 +424,7 @@ static int __init parse_numa_properties( if (min_common_depth < 0) return min_common_depth; - max_domain = numa_setup_cpu(boot_cpuid); + numa_setup_cpu(boot_cpuid); /* * Even though we connect cpus to numa domains later in SMP init, @@ -350,20 +433,8 @@ static int __init parse_numa_properties( * As a result of hotplug we could still have cpus appear later on * with larger node ids. In that case we force the cpu into node 0. */ - for_each_cpu(i) { - int numa_domain; - - cpu = find_cpu_node(i); - - if (cpu) { - numa_domain = of_node_numa_domain(cpu); - of_node_put(cpu); - - if (numa_domain < MAX_NUMNODES && - max_domain < numa_domain) - max_domain = numa_domain; - } - } + while ((cpu = of_find_node_by_type(cpu, "cpu")) != NULL) + of_node_to_nid(cpu); addr_cells = get_mem_addr_cells(); size_cells = get_mem_size_cells(); @@ -371,7 +442,7 @@ static int __init parse_numa_properties( while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { unsigned long start; unsigned long size; - int numa_domain; + int nid; int ranges; unsigned int *memcell_buf; unsigned int len; @@ -389,18 +460,7 @@ new_range: start = _ALIGN_DOWN(start, MEMORY_INCREMENT); size = _ALIGN_UP(size, MEMORY_INCREMENT); - numa_domain = of_node_numa_domain(memory); - - if (numa_domain >= MAX_NUMNODES) { - if (numa_domain != 0xffff) - printk(KERN_ERR "WARNING: memory at %lx maps " - "to invalid NUMA node %d\n", start, - numa_domain); - numa_domain = 0; - } - - if (max_domain < numa_domain) - max_domain = numa_domain; + nid = of_node_to_nid(memory); if (! (size = numa_enforce_memory_limit(start, size))) { if (--ranges) @@ -412,42 +472,37 @@ new_range: /* * Initialize new node struct, or add to an existing one. */ - if (init_node_data[numa_domain].node_end_pfn) { + if (init_node_data[nid].node_end_pfn) { if ((start / PAGE_SIZE) < - init_node_data[numa_domain].node_start_pfn) - init_node_data[numa_domain].node_start_pfn = + init_node_data[nid].node_start_pfn) + init_node_data[nid].node_start_pfn = start / PAGE_SIZE; if (((start / PAGE_SIZE) + (size / PAGE_SIZE)) > - init_node_data[numa_domain].node_end_pfn) - init_node_data[numa_domain].node_end_pfn = + init_node_data[nid].node_end_pfn) + init_node_data[nid].node_end_pfn = (start / PAGE_SIZE) + (size / PAGE_SIZE); - init_node_data[numa_domain].node_present_pages += + init_node_data[nid].node_present_pages += size / PAGE_SIZE; } else { - node_set_online(numa_domain); - - init_node_data[numa_domain].node_start_pfn = + init_node_data[nid].node_start_pfn = start / PAGE_SIZE; - init_node_data[numa_domain].node_end_pfn = - init_node_data[numa_domain].node_start_pfn + + init_node_data[nid].node_end_pfn = + init_node_data[nid].node_start_pfn + size / PAGE_SIZE; - init_node_data[numa_domain].node_present_pages = + init_node_data[nid].node_present_pages = size / PAGE_SIZE; } for (i = start ; i < (start+size); i += MEMORY_INCREMENT) numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = - numa_domain; + nid; if (--ranges) goto new_range; } - for (i = 0; i <= max_domain; i++) - node_set_online(i); - return 0; } @@ -632,7 +687,7 @@ void __init do_init_bootmem(void) memory = NULL; while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { unsigned long mem_start, mem_size; - int numa_domain, ranges; + int devnid, ranges; unsigned int *memcell_buf; unsigned int len; @@ -644,9 +699,9 @@ void __init do_init_bootmem(void) new_range: mem_start = read_n_cells(addr_cells, &memcell_buf); mem_size = read_n_cells(size_cells, &memcell_buf); - numa_domain = numa_enabled ? of_node_numa_domain(memory) : 0; + devnid = numa_enabled ? of_node_to_nid(memory) : 0; - if (numa_domain != nid) + if (devnid != nid) continue; mem_size = numa_enforce_memory_limit(mem_start, mem_size); From benh at kernel.crashing.org Wed Apr 6 16:05:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Apr 2005 16:05:28 +1000 Subject: [PATCH] ppc64: Detect altivec via firmware on unknown CPUs Message-ID: <1112767528.9517.131.camel@gaston> Hi ! This patch adds detection of the Altivec capability of the CPU via the firmware in addition to the cpu table. This allows newer CPUs that aren't in the table to still have working altivec support in the kernel. It also fixes a problem where if a CPU isn't recognized as having altivec features, and takes an altivec unavailable exception due to userland issuing altivec instructions, the kernel would happily enable it and context switch the registers ... but not all of them (it would basically forget vrsave). With this patch, the kernel will refuse to enable altivec when the feature isn't detected for the CPU (SIGILL). Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-04-06 10:22:10.000000000 +1000 +++ linux-work/arch/ppc64/kernel/prom.c 2005-04-06 15:26:23.000000000 +1000 @@ -885,6 +885,7 @@ const char *full_path, void *data) { char *type = get_flat_dt_prop(node, "device_type", NULL); + u32 *prop; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -916,6 +917,20 @@ } } + /* Check if we have a VMX and eventually update CPU features */ + prop = (u32 *)get_flat_dt_prop(node, "ibm,vmx", NULL); + if (prop && (*prop) > 0) { + cur_cpu_spec->cpu_features |= CPU_FTR_ALTIVEC; + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_ALTIVEC; + } + + /* Same goes for Apple's "altivec" property */ + prop = (u32 *)get_flat_dt_prop(node, "altivec", NULL); + if (prop) { + cur_cpu_spec->cpu_features |= CPU_FTR_ALTIVEC; + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_ALTIVEC; + } + return 0; } @@ -1104,7 +1119,9 @@ DBG("Scanning CPUs ...\n"); - /* Retreive hash table size from flattened tree */ + /* Retreive hash table size from flattened tree plus other + * CPU related informations (altivec support, boot CPU ID, ...) + */ scan_flat_dt(early_init_dt_scan_cpus, NULL); /* If hash size wasn't obtained above, we calculate it now based on Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-04-03 10:02:55.000000000 +1000 +++ linux-work/arch/ppc64/kernel/head.S 2005-04-06 14:27:19.000000000 +1000 @@ -922,7 +922,9 @@ altivec_unavailable_common: EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN) #ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION bne .load_up_altivec /* if from user, just load it up */ +END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif bl .save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD From benh at kernel.crashing.org Wed Apr 6 16:13:54 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Apr 2005 16:13:54 +1000 Subject: [PATCH] ppc64: remove bogus f50 hack in prom.c Message-ID: <1112768035.9568.134.camel@gaston> Hi ! The code that parses the OF device tree contains an old bogus hack which was killed a long time ago on ppc32, but survived in ppc64. It was supposed to help with a problem on the f50 which is ... a 32 bits machine :) Additionally, that hack is causing problems, so let's just get rid of it. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-04-06 15:26:23.000000000 +1000 +++ linux-work/arch/ppc64/kernel/prom.c 2005-04-06 16:10:39.000000000 +1000 @@ -544,12 +544,6 @@ if (ip != NULL) nsizec = *ip; - /* the f50 sets the name to 'display' and 'compatible' to what we - * expect for the name -- Cort - */ - if (!strcmp(np->name, "display")) - np->name = get_property(np, "compatible", NULL); - if (!strcmp(np->name, "device-tree") || np->parent == NULL) ifunc = interpret_root_props; else if (np->type == 0) From tlnguyen at snoqualmie.dp.intel.com Thu Apr 7 04:48:52 2005 From: tlnguyen at snoqualmie.dp.intel.com (long) Date: Wed, 6 Apr 2005 11:48:52 -0700 Subject: PCI Error Recovery API Proposal (updated) Message-ID: <200504061848.j36ImqFS004886@snoqualmie.dp.intel.com> On Tue Apr 5 01:43:51 2005 Benjamin Herrenschmidt wrote: >The error recovery API support is exposed by the driver in the form of >a structure of function pointers pointed to by a new field in struct >pci_driver. The absence of this pointer in pci_driver denotes an >"non-aware" driver, behaviour on these is platform dependant. Platforms >like ppc64 can try to simulate hotplug remove/add. > >This structure has the form: > >struct pci_error_handlers >{ > int (*error_detected)(struct pci_dev *dev, pci_error_token > error); > int (*error_recover)(struct pci_dev *dev); > int (*error_restart)(struct pci_dev *dev); > int (*link_reset)(struct pci_dev *dev); > int (*slot_reset)(struct pci_dev *dev); >}; Agree. When do you plan to have this structure in struct pci_driver? >The definition of "pci_error_token" is not covered here. What is the default type of pci_error_token in API 1)? You said "within this function and after it returns, the driver shouldn't do any new IOs." AER code is required to pass error severity (fatal or nonfatal) to a driver when calling API 1). I refer this error token should be defined as an integer type, which is passed with either PCIERR_FATAL_DETECTED or PCIERR_NONFATAL_DETECTED. Please let me know what you think? > 3) link_reset() > > This is called after the link has been reset. This is typically >a PCI Express specific state at this point and is done wether a non fatal >error has been detected that can be "solved" by resetting the link. The >driver is informed here of that reset and should check if the device >appears to be in working condition. This function acts a bit like 2) >error_recover(), that is it is not supposed to restart normal driver IO >operations right away, just "probe" the device to check it's >recoverability status. If all is right, then the core will call >error_restart() once all driver have ack'd link_reset(). API 3) is not like error_recover(). This is basically a PCI Express specific when a fatal error has been reported to the Root Port. This fatal error can be "solved" by resetting the link at upstream port associated with a hierarchy in question. An upstream port driver is informed here to reset its link to return to reliable. After a completion of link reset, we go to 4) and 5). Please change your description accordingly. Thanks, Long From haveblue at us.ibm.com Thu Apr 7 05:08:31 2005 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 06 Apr 2005 12:08:31 -0700 Subject: [RFC/PATCH] numa: distinguish associativity domain from node id In-Reply-To: <20050406053002.GF3611@otto> References: <20050406053002.GF3611@otto> Message-ID: <1112814511.14584.18.camel@localhost> On Wed, 2005-04-06 at 00:30 -0500, Nathan Lynch wrote: > The ppc64 numa code makes some possibly invalid assumptions about the > numbering of "associativity domains" (which may be considered NUMA > nodes). As far as I've been able to determine from the architecture > docs, there is no guarantee about the numbering of associativity > domains, i.e. the values that are contained in ibm,associativity > device node properties. Yet we seem to assume that the numbering of > the domains begins at zero and that the range is contiguous, and we > use the domain number for a given resource as its logical node id. > This strikes me as a problem waiting to happen, and in fact I've been > seeing some problems in the lab with larger machines violating or at > least straining these assumptions. I think I'm responsible for at least some of the bugs that you're hitting. This introduces added complexity that I was trying to avoid when I was touching it, mostly because the power4 systems had much simpler associativity information that was laid out sequentially. Your changes look pretty good to me. One minor nit: static int of_node_to_nid(struct device_node *dn) That sounds to me like a conversion function, but it also does some setup, like setting the nodes online. I might separate the conversion function (the read-only part) from the setup one that writes part of the configuration. -- Dave From ntl at pobox.com Thu Apr 7 06:06:01 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 6 Apr 2005 15:06:01 -0500 Subject: [RFC/PATCH] numa: distinguish associativity domain from node id In-Reply-To: <1112814511.14584.18.camel@localhost> References: <20050406053002.GF3611@otto> <1112814511.14584.18.camel@localhost> Message-ID: <20050406200601.GG3611@otto> Hi Dave On Wed, Apr 06, 2005 at 12:08:31PM -0700, Dave Hansen wrote: > One minor nit: > > static int of_node_to_nid(struct device_node *dn) > > That sounds to me like a conversion function, but it also does some > setup, like setting the nodes online. I might separate the conversion > function (the read-only part) from the setup one that writes part of the > configuration. Yeah, that's a little sneaky and non-obvious, will fix. Thanks for taking a look. Nathan From benh at kernel.crashing.org Thu Apr 7 09:16:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Apr 2005 09:16:08 +1000 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <200504061848.j36ImqFS004886@snoqualmie.dp.intel.com> References: <200504061848.j36ImqFS004886@snoqualmie.dp.intel.com> Message-ID: <1112829368.9517.208.camel@gaston> > Agree. When do you plan to have this structure in struct pci_driver? As soon as everybody agrees on them, that is soon I hope > >The definition of "pci_error_token" is not covered here. > > What is the default type of pci_error_token in API 1)? You said "within > this function and after it returns, the driver shouldn't do any new > IOs." AER code is required to pass error severity (fatal or nonfatal) to > a driver when calling API 1). I refer this error token should be defined > as an integer type, which is passed with either PCIERR_FATAL_DETECTED or > PCIERR_NONFATAL_DETECTED. Please let me know what you think? The token should be an opaque type with accessors. You could define a pci_error_get_severity(token) to return the severity. The idea is to define accessors which return an error when the data requested isn't present in the error info. The actual content of the token is to be defined. I was thinking about a type plus a union. I was hoping Seto could provide something here ... > > 3) link_reset() > > > > This is called after the link has been reset. This is typically > >a PCI Express specific state at this point and is done wether a non fatal > >error has been detected that can be "solved" by resetting the link. The > >driver is informed here of that reset and should check if the device > >appears to be in working condition. This function acts a bit like 2) > >error_recover(), that is it is not supposed to restart normal driver IO > >operations right away, just "probe" the device to check it's > >recoverability status. If all is right, then the core will call > >error_restart() once all driver have ack'd link_reset(). > > API 3) is not like error_recover(). This is basically a PCI Express > specific when a fatal error has been reported to the Root Port. This > fatal error can be "solved" by resetting the link at upstream port > associated with a hierarchy in question. An upstream port driver is informed > here to reset its link to return to reliable. After a completion of link > reset, we go to 4) and 5). Please change your description accordingly. Wait ... Once you have reset the link, you call 3). At this point, the card should be operational again right ? That is, the next callback should be 5) not 4). Unless the driver here decides it can't recover and need a full hard reset of the slot (which is a different thing) and thus you end up power cycling the slot and go to 4). That is, in this regard, the action of a driver in 3) is similar to the action of a driver in "recover", in that sense that the link has been reset, the card might not (depending on wether the link reset triggers a card reset or not, this is device specific, the driver will know what to do) and can recover from it. The next step to expect is 5). Did I get something wrong ? Ben. From anton at samba.org Thu Apr 7 10:15:19 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 7 Apr 2005 10:15:19 +1000 Subject: [RFC/PATCH] numa: distinguish associativity domain from node id In-Reply-To: <20050406053002.GF3611@otto> References: <20050406053002.GF3611@otto> Message-ID: <20050407001519.GA5193@krispykreme> Hi Nathan, > The ppc64 numa code makes some possibly invalid assumptions about the > numbering of "associativity domains" (which may be considered NUMA > nodes). As far as I've been able to determine from the architecture > docs, there is no guarantee about the numbering of associativity > domains, i.e. the values that are contained in ibm,associativity > device node properties. Yet we seem to assume that the numbering of > the domains begins at zero and that the range is contiguous, and we > use the domain number for a given resource as its logical node id. > This strikes me as a problem waiting to happen, and in fact I've been > seeing some problems in the lab with larger machines violating or at > least straining these assumptions. Im reluctant to have a mapping between the Linux concept of a node and the firmware concept if possible. Its nice to be able to jump on a machine and determine if it is set up correctly by looking at sysfs and /proc/device-tree. > Consider one such case: the associativity domain for all memory in a > partition is 0x1, but the processors are in shared mode (so no > associativity info for them) -- all the memory is placed in node 1 > while all cpus are mapped to node 0. But in this case, we should > really have only one logical node, with all memory and cpus mapped to > it. Even in shared processor mode it makes sense to have separate memory nodes so we can still do striping across memory controllers. For the shared processor case where all our memory is in one node that isnt zero, perhaps we could just stuff all the cpus in that node at boot. When we support memory hotplug we then add new nodes as normal. New cpus go into the node we chose at boot. > Another case I've seen is that of a partition with all processors and > memory having an associativity domain of 0x1. We end up with > everything in node 1 and an empty (yet online) node 0. I saw some core changes go in recently that may allow us to have discontiguous node numbers. I agree onlining all nodes from 0...max node is pretty ugly, but perhaps thats fixable. Also, with hot memory unplug we are going to end up with holes. The main problem with not doing a mapping is if firmware decides to exceed the maximum node number (we have it set to 16 at the moment). Anton From ntl at pobox.com Thu Apr 7 11:37:05 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 6 Apr 2005 20:37:05 -0500 Subject: [RFC/PATCH] numa: distinguish associativity domain from node id In-Reply-To: <20050407001519.GA5193@krispykreme> References: <20050406053002.GF3611@otto> <20050407001519.GA5193@krispykreme> Message-ID: <20050407013705.GH3611@otto> On Thu, Apr 07, 2005 at 10:15:19AM +1000, Anton Blanchard wrote: > > > The ppc64 numa code makes some possibly invalid assumptions about the > > numbering of "associativity domains" (which may be considered NUMA > > nodes). As far as I've been able to determine from the architecture > > docs, there is no guarantee about the numbering of associativity > > domains, i.e. the values that are contained in ibm,associativity > > device node properties. Yet we seem to assume that the numbering of > > the domains begins at zero and that the range is contiguous, and we > > use the domain number for a given resource as its logical node id. > > This strikes me as a problem waiting to happen, and in fact I've been > > seeing some problems in the lab with larger machines violating or at > > least straining these assumptions. > > Im reluctant to have a mapping between the Linux concept of a node and > the firmware concept if possible. Its nice to be able to jump on a > machine and determine if it is set up correctly by looking at sysfs and > /proc/device-tree. Ok... just throwing out an idea here. What if we could add an attribute to the node sysdevs which would give us the firmware domain number? > > Consider one such case: the associativity domain for all memory in a > > partition is 0x1, but the processors are in shared mode (so no > > associativity info for them) -- all the memory is placed in node 1 > > while all cpus are mapped to node 0. But in this case, we should > > really have only one logical node, with all memory and cpus mapped to > > it. > > Even in shared processor mode it makes sense to have separate memory > nodes so we can still do striping across memory controllers. For the > shared processor case where all our memory is in one node that isnt > zero, perhaps we could just stuff all the cpus in that node at boot. > When we support memory hotplug we then add new nodes as normal. New cpus > go into the node we chose at boot. OK. > > > Another case I've seen is that of a partition with all processors and > > memory having an associativity domain of 0x1. We end up with > > everything in node 1 and an empty (yet online) node 0. > > I saw some core changes go in recently that may allow us to have > discontiguous node numbers. I agree onlining all nodes from 0...max node > is pretty ugly, but perhaps thats fixable. Also, with hot memory unplug > we are going to end up with holes. The nodemap stuff, I assume. I'll look into whether we can get away with discontiguous online node numbers. > The main problem with not doing a mapping is if firmware decides to > exceed the maximum node number (we have it set to 16 at the moment). We may need to bump that up. If I'm interpreting the ibm,max-associativity-domains property correctly, it should be 32. This is from a box which (I think) can have only two domains worth of cpus and memory. I guess all the extra is to account for I/O which could be added to the system? Unlike ibm,lrdr-capacity, this property doesn't seem to be affected by the partition profile settings. # od -x /proc/device-tree/rtas/ibm,max-associativity-domains 0000000 0000 0005 0000 0001 0000 0001 0000 0020 0000020 0000 0020 0000 0040 ^^^^ Thanks, Nathan From benh at kernel.crashing.org Thu Apr 7 17:26:19 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Apr 2005 17:26:19 +1000 Subject: patch ppc64-detect-altivec-via-firmware-on-unknown-cpus.patch added to -mm tree In-Reply-To: <200504060846.j368kkkG019176@shell0.pdx.osdl.net> References: <200504060846.j368kkkG019176@shell0.pdx.osdl.net> Message-ID: <1112858779.9517.270.camel@gaston> On Wed, 2005-04-06 at 01:46 -0700, akpm at osdl.org wrote: > This is a note to let you know that I've just added the > patch titled > > ppc64: Detect altivec via firmware on unknown CPUs > > to the -mm tree. Its filename is > > ppc64-detect-altivec-via-firmware-on-unknown-cpus.patch Hi Andrew ! Please replace it with this new version which fixes a problem if altivec is not detected by the firmware (or forbidden by it) and we still take a userland altivec exception. The kernel would have panic'd, this makes sure the error is only reported to the application. --- This patch adds detection of the Altivec capability of the CPU via the firmware in addition to the cpu table. This allows newer CPUs that aren't in the table to still have working altivec support in the kernel. It also fixes a problem where if a CPU isn't recognized as having altivec features, and takes an altivec unavailable exception due to userland issuing altivec instructions, the kernel would happily enable it and context switch the registers ... but not all of them (it would basically forget vrsave). With this patch, the kernel will refuse to enable altivec when the feature isn't detected for the CPU (SIGILL). Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/prom.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom.c 2005-04-06 10:22:10.000000000 +1000 +++ linux-work/arch/ppc64/kernel/prom.c 2005-04-07 17:24:08.000000000 +1000 @@ -885,6 +885,7 @@ const char *full_path, void *data) { char *type = get_flat_dt_prop(node, "device_type", NULL); + u32 *prop; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -916,6 +917,20 @@ } } + /* Check if we have a VMX and eventually update CPU features */ + prop = (u32 *)get_flat_dt_prop(node, "ibm,vmx", NULL); + if (prop && (*prop) > 0) { + cur_cpu_spec->cpu_features |= CPU_FTR_ALTIVEC; + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_ALTIVEC; + } + + /* Same goes for Apple's "altivec" property */ + prop = (u32 *)get_flat_dt_prop(node, "altivec", NULL); + if (prop) { + cur_cpu_spec->cpu_features |= CPU_FTR_ALTIVEC; + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_HAS_ALTIVEC; + } + return 0; } @@ -1104,7 +1119,9 @@ DBG("Scanning CPUs ...\n"); - /* Retreive hash table size from flattened tree */ + /* Retreive hash table size from flattened tree plus other + * CPU related informations (altivec support, boot CPU ID, ...) + */ scan_flat_dt(early_init_dt_scan_cpus, NULL); /* If hash size wasn't obtained above, we calculate it now based on Index: linux-work/arch/ppc64/kernel/head.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/head.S 2005-04-03 10:02:55.000000000 +1000 +++ linux-work/arch/ppc64/kernel/head.S 2005-04-06 14:27:19.000000000 +1000 @@ -922,7 +922,9 @@ altivec_unavailable_common: EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN) #ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION bne .load_up_altivec /* if from user, just load it up */ +END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif bl .save_nvgprs addi r3,r1,STACK_FRAME_OVERHEAD Index: linux-work/arch/ppc64/kernel/traps.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/traps.c 2005-03-15 11:57:29.000000000 +1100 +++ linux-work/arch/ppc64/kernel/traps.c 2005-04-07 17:24:34.000000000 +1000 @@ -450,14 +450,12 @@ void altivec_unavailable_exception(struct pt_regs *regs) { -#ifndef CONFIG_ALTIVEC if (user_mode(regs)) { /* A user program has executed an altivec instruction, but this kernel doesn't support altivec. */ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); return; } -#endif printk(KERN_EMERG "Unrecoverable VMX/Altivec Unavailable Exception " "%lx at %lx\n", regs->trap, regs->nip); die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT); From seto.hidetoshi at jp.fujitsu.com Thu Apr 7 23:27:24 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Thu, 07 Apr 2005 22:27:24 +0900 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <1112829368.9517.208.camel@gaston> References: <200504061848.j36ImqFS004886@snoqualmie.dp.intel.com> <1112829368.9517.208.camel@gaston> Message-ID: <4255353C.9080307@jp.fujitsu.com> Benjamin Herrenschmidt wrote: >>>The definition of "pci_error_token" is not covered here. >> >>What is the default type of pci_error_token in API 1)? You said "within >>this function and after it returns, the driver shouldn't do any new >>IOs." AER code is required to pass error severity (fatal or nonfatal) to >>a driver when calling API 1). I refer this error token should be defined >>as an integer type, which is passed with either PCIERR_FATAL_DETECTED or >>PCIERR_NONFATAL_DETECTED. Please let me know what you think? > > The token should be an opaque type with accessors. You could define a > pci_error_get_severity(token) to return the severity. The idea is to > define accessors which return an error when the data requested isn't > present in the error info. The actual content of the token is to be > defined. I was thinking about a type plus a union. I was hoping Seto > could provide something here ... I agree that the token should be an opaque, implement-depends thing. For example, it could be a bitmask like: #define PCIERR_ERROR_DETECTED 0x00000001 /* fundamental, always ON */ #define PCIERR_VALID_INFO 0x00000010 /* optional */ #define PCIERR_SEVERITY_FATAL 0x00000100 /* optional */ #define PCIERR_SEVERITY_NONFATAL 0x00000200 /* optional */ : or to get more detail: #define PCIERR_ERROR_DETECTED 0x00000001 /* fundamental, always ON */ #define PCIERR_VALID_PCIE_AER 0x00010000 /* optional */ : and define a function like: int pci_error_get_severity(pci_error_token *t) int pci_error_get_pcie_aer(pci_error_token *t, pcie_aer_bits *dat) or (I think it would be better,) merge of them, etc. My thought was, depends on the arch, some could have only a error bit, some could have various info with well-defined struct/union/etc. Anyway, the token should be a easy handle for basic use, and also should be useful for advanced use. Thanks, H.Seto From will_schmidt at vnet.ibm.com Thu Apr 7 23:39:36 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Thu, 07 Apr 2005 08:39:36 -0500 Subject: RFC/Patch more xmon additions In-Reply-To: <421E3BE3.90301@vnet.ibm.com> References: <421E3BE3.90301@vnet.ibm.com> Message-ID: <42553818.5070809@vnet.ibm.com> Hi All, here's a revised version of my initial patch. - I've removed the try_spinlock code; - As an alternative to duplicating lots of function to add mread calls in place of references, I've added setjmp(bus_error_jmp) {} around what seem more likely to be critical areas. - cleaned up spacing - changed most of the function names to be xmon_xxx instead of wm_xxx. these functions show up under a submenu 'w'. use "w?" at xmon> prompt to get the help blurb. -Will will schmidt wrote: > > Hi Folks, > Am looking for comments on this additional function i've added to xmon > on the side.. > > the bulk of my intent was to make it easier for me to poke at memory > within a particular user process. > > I realize that the spacing is a bit screwed up, and the function names > should eventually change. Because i couldnt decide on letters for the > new functions, i put them under a submenu 'w'. > > wP will dump info on all processes. > > wp 0xabc will make process with pid 0xabc the active pid. <- active > only with respect to xmon poking into memory. > > wd 0xabcd1234 - will call through the pdg/pmd functions and return the > kernel address corresponding to 0xabcd1234 within the processes memory > space location. > > wg will dump gprs of the process/thread. > > -Will > > > ------------------------------------------------------------------------ > > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: xmon_pxd_code_apr7.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050407/effcabd2/attachment.txt From grundler at parisc-linux.org Fri Apr 8 00:46:24 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Thu, 7 Apr 2005 08:46:24 -0600 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <1112829368.9517.208.camel@gaston> References: <200504061848.j36ImqFS004886@snoqualmie.dp.intel.com> <1112829368.9517.208.camel@gaston> Message-ID: <20050407144624.GA430@colo.lackof.org> On Thu, Apr 07, 2005 at 09:16:08AM +1000, Benjamin Herrenschmidt wrote: > The token should be an opaque type with accessors. Is the intent to let arch specific code determine the contents? If so, shouldn't pci_error_detected() return the a pci_error_code type instead of an int? > The idea is to > define accessors which return an error when the data requested isn't > present in the error info. The actual content of the token is to be > defined. I was thinking about a type plus a union. I was hoping Seto > could provide something here ... This is the corner stone of the interface. I think it needs to be defined and how to use it. grant From tklauser at nuerscht.ch Fri Apr 8 00:24:40 2005 From: tklauser at nuerscht.ch (Tobias Klauser) Date: Thu, 7 Apr 2005 16:24:40 +0200 Subject: [PATCH] arch/ppc64: Replace custom MIN macro Message-ID: <20050407142440.GA776@neon> >From the kerneljanitors TODO list: - min/max macros from kernel.h are safe, a lot of handcrafted MIN/MAX are not. Signed-off-by: Tobias Klauser diff -urpN linux-2.6.12-rc2.orig/arch/ppc64/kernel/signal.c linux-2.6.12-rc2/arch/ppc64/kernel/signal.c --- linux-2.6.12-rc2.orig/arch/ppc64/kernel/signal.c 2005-04-07 16:18:30.287667016 +0200 +++ linux-2.6.12-rc2/arch/ppc64/kernel/signal.c 2005-04-07 16:19:14.159997408 +0200 @@ -42,11 +42,7 @@ #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) -#ifndef MIN -#define MIN(a,b) (((a) < (b)) ? (a) : (b)) -#endif - -#define GP_REGS_SIZE MIN(sizeof(elf_gregset_t), sizeof(struct pt_regs)) +#define GP_REGS_SIZE min(sizeof(elf_gregset_t), sizeof(struct pt_regs)) #define FP_REGS_SIZE sizeof(elf_fpregset_t) #define TRAMP_TRACEBACK 3 From tlnguyen at snoqualmie.dp.intel.com Fri Apr 8 06:00:34 2005 From: tlnguyen at snoqualmie.dp.intel.com (long) Date: Thu, 7 Apr 2005 13:00:34 -0700 Subject: PCI Error Recovery API Proposal (updated) Message-ID: <200504072000.j37K0YnU005975@snoqualmie.dp.intel.com> On Wed Apr 6 17:45:42 2005 Benjamin Herrenschmidt wrote: >> > 3) link_reset() >> > >> > This is called after the link has been reset. This is typically >> >a PCI Express specific state at this point and is done wether a non fatal >> >error has been detected that can be "solved" by resetting the link. The >> >driver is informed here of that reset and should check if the device >> >appears to be in working condition. This function acts a bit like 2) >> >error_recover(), that is it is not supposed to restart normal driver IO >> >operations right away, just "probe" the device to check it's >> >recoverability status. If all is right, then the core will call >> >error_restart() once all driver have ack'd link_reset(). >> >> API 3) is not like error_recover(). This is basically a PCI Express >> specific when a fatal error has been reported to the Root Port. This >> fatal error can be "solved" by resetting the link at upstream port >> associated with a hierarchy in question. An upstream port driver is informed >> here to reset its link to return to reliable. After a completion of link >> reset, we go to 4) and 5). Please change your description accordingly. > >Wait ... Once you have reset the link, you call 3). At this point, the >card should be operational again right ? That is, the next callback >should be 5) not 4). Unless the driver here decides it can't recover and >need a full hard reset of the slot (which is a different thing) and thus >you end up power cycling the slot and go to 4). > >That is, in this regard, the action of a driver in 3) is similar to the >action of a driver in "recover", in that sense that the link has been >reset, the card might not (depending on wether the link reset triggers a >card reset or not, this is device specific, the driver will know what to >do) and can recover from it. The next step to expect is 5). Did I get >something wrong ? Thanks for clearifying the callback after completion of link reset. Regarding the callback of API 3), AER code makes the callback API 3) to reset the link. Once the upstream Port driver completes link reset with a return of PCIERR_RESULT_RECOVERED, meaning the link fully operational. AER code makes API 2) callback to the downstream device driver. If the downstream driver returns PCIERR_RESULT_RECOVERED, the next callback is API 5). This is my understanding of how things should work, please correct me if mistaken. Thanks, Long From raffi at raffi.at Fri Apr 8 05:13:54 2005 From: raffi at raffi.at (Raffael Himmelreich) Date: Thu, 7 Apr 2005 21:13:54 +0200 Subject: IBM RS/6000 7017-S7A CHRP RS64-II Message-ID: <20050407191354.GA7839@exception.at> Hi, first of all I want to say that I am not into the PPC64 architecture at all, so please be forgiving. The machine noticed in the subject is listed as "Working" at , does anybody can give me some hints for booting Linux on it? What I have tried so far is crosscompiling a 2.6.11.2 kernel (x86->ppc64), but when I try booting the kernel the machine throws me into the system management console after the kernel initialized the second (of 8) CPUs. The error code on the LCD says something of "no return value of software" (do not remember, but should be close to that). Do I have to care about CHRP/PreP when crosscompiling the kernel ? Have I missed any kernel arguments/.config parameters or OpenFirmware magic? Additionally I tried to boot a Suse(?) ppc64 precompiled 2.4 kernel which hangs after doing prom_init(). I hope someone is willing to enlight me. PS: please Cc me in follow-ups best regards, raffi From benh at kernel.crashing.org Fri Apr 8 08:30:14 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 08 Apr 2005 08:30:14 +1000 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <200504072000.j37K0YnU005975@snoqualmie.dp.intel.com> References: <200504072000.j37K0YnU005975@snoqualmie.dp.intel.com> Message-ID: <1112913014.9567.287.camel@gaston> On Thu, 2005-04-07 at 13:00 -0700, long wrote: > Thanks for clearifying the callback after completion of link reset. > Regarding the callback of API 3), AER code makes the callback API 3) > to reset the link. Once the upstream Port driver completes link reset > with a return of PCIERR_RESULT_RECOVERED, meaning the link fully > operational. AER code makes API 2) callback to the downstream device > driver. If the downstream driver returns PCIERR_RESULT_RECOVERED, > the next callback is API 5). This is my understanding of how things > should work, please correct me if mistaken. Hrm... I don't think the port driver should enter that picture at all. As far as I'm concerned, the port driver is part of the implementation. The defined API really only concerns the downstream device. That is, the AER can use whatever private PCI-Express API you have to trigger the link reset. I think that's why we have some misunderstanding about the definition of this callback. In my view of things, it's just a call to the downstream device after the link have been reset, and thus is very similar to 2). Ben. From paulus at samba.org Fri Apr 8 09:00:16 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 8 Apr 2005 09:00:16 +1000 Subject: [PATCH] arch/ppc64: Replace custom MIN macro In-Reply-To: <20050407142440.GA776@neon> References: <20050407142440.GA776@neon> Message-ID: <16981.48000.586005.656966@cargo.ozlabs.ibm.com> Tobias Klauser writes: > >From the kerneljanitors TODO list: > - min/max macros from kernel.h are safe, a lot of handcrafted MIN/MAX are not. Well... OK, that removes 4 lines of code, which is good, but the explanation needs changing - it's not a question of safety, it doesn't matter if we evaluate sizeof() more than once. :) Paul. From segher at kernel.crashing.org Fri Apr 8 09:46:54 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Fri, 8 Apr 2005 01:46:54 +0200 Subject: help, trying to invalidate entire icache on 970 In-Reply-To: <1112763449.9568.99.camel@gaston> References: <4252CBE3.3010701@nortel.com> <20050405143334.33e466b0.moilanen@austin.ibm.com> <1112763449.9568.99.camel@gaston> Message-ID: > Isn't dcbst a nop on 970 anyway ? Not completely -- for example, it can in some cases cause some coherency traffic. When followed by a sync, I guess it always is a noop, though. I'm not sure; but does it really matter? :-) Segher From yruan at cs.princeton.edu Fri Apr 8 14:25:25 2005 From: yruan at cs.princeton.edu (Yaoping Ruan) Date: Fri, 08 Apr 2005 00:25:25 -0400 Subject: Custom 2.6 Kernel on Power5 pSeries Message-ID: <425607B5.9080003@cs.princeton.edu> Hello, Is there anybody had successful experience compiling 2.6 kernel on Power 5 servers? We recently purchased a Power5 p520 server and installed with RHEL4 for ppc64. While the installation was ok, I had problem compiling a custom kernel. I installed the kernel src rpm, generated the ppc code, and compiled the kernel with the default config in /boot. When I updated the yaboot.conf and reboot, it died in the middle reporting "vfs error ...". I'd appreciate if anybody can share some experience with me, or give me some links for instruction. -Yaoping From anton at samba.org Fri Apr 8 16:10:55 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 8 Apr 2005 16:10:55 +1000 Subject: Remove -fno-omit-frame-pointer on ppc64 Message-ID: <20050408061055.GA1558@krispykreme> During some code inspection using gcc 4.0 I noticed a stack frame was being created for a number of functions that didnt require it. For example: c0000000000df944 <._spin_unlock>: c0000000000df944: fb e1 ff f0 std r31,-16(r1) c0000000000df948: f8 21 ff c1 stdu r1,-64(r1) c0000000000df94c: 7c 3f 0b 78 mr r31,r1 c0000000000df950: 7c 20 04 ac lwsync c0000000000df954: e8 21 00 00 ld r1,0(r1) c0000000000df958: 38 00 00 00 li r0,0 c0000000000df95c: 90 03 00 00 stw r0,0(r3) c0000000000df960: eb e1 ff f0 ld r31,-16(r1) c0000000000df964: 4e 80 00 20 blr It turns out we are adding -fno-omit-frame-pointer to ppc64 which is causing the above behaviour. Removing that flag results in much better code: c0000000000d5b30 <._spin_unlock>: c0000000000d5b30: 7c 20 04 ac lwsync c0000000000d5b34: 38 00 00 00 li r0,0 c0000000000d5b38: 90 03 00 00 stw r0,0(r3) c0000000000d5b3c: 4e 80 00 20 blr We dont require a frame pointer to debug on ppc64, so remove it. Signed-off-by: Anton Blanchard diff -puN arch/ppc64/Kconfig~no_fomit_frame_pointer arch/ppc64/Kconfig --- gr_work_small/arch/ppc64/Kconfig~no_fomit_frame_pointer 2005-04-08 00:59:19.208299928 -0500 +++ gr_work_small-anton/arch/ppc64/Kconfig 2005-04-08 00:59:19.218298342 -0500 @@ -40,10 +40,6 @@ config COMPAT bool default y -config FRAME_POINTER - bool - default y - # We optimistically allocate largepages from the VM, so make the limit # large enough (16MB). This badly named config option is actually # max order + 1 _ From tklauser at nuerscht.ch Fri Apr 8 19:33:44 2005 From: tklauser at nuerscht.ch (Tobias Klauser) Date: Fri, 8 Apr 2005 11:33:44 +0200 Subject: [PATCH] arch/ppc64: Replace custom MIN macro In-Reply-To: <16981.48000.586005.656966@cargo.ozlabs.ibm.com> References: <20050407142440.GA776@neon> <16981.48000.586005.656966@cargo.ozlabs.ibm.com> Message-ID: <20050408093344.GA834@neon> On Fri, Apr 08, 2005 at 09:00:16AM +1000, Paul Mackerras wrote: > Tobias Klauser writes: > > > >From the kerneljanitors TODO list: > > - min/max macros from kernel.h are safe, a lot of handcrafted MIN/MAX are not. > > Well... OK, that removes 4 lines of code, which is good, but the > explanation needs changing - it's not a question of safety, it doesn't > matter if we evaluate sizeof() more than once. :) min/max from kernel.h also do type checking, so the compiler throws out warnings in case of mismatching types, although this doesn't matter for this patch. So how about the following description: Replace a custom MIN() macro with the min() macro from kernel.h This patch removes 4 lines of redundant code. Tobias From will_schmidt at vnet.ibm.com Sat Apr 9 00:19:06 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Fri, 08 Apr 2005 09:19:06 -0500 Subject: Custom 2.6 Kernel on Power5 pSeries In-Reply-To: <425607B5.9080003@cs.princeton.edu> References: <425607B5.9080003@cs.princeton.edu> Message-ID: <425692DA.2050407@vnet.ibm.com> Yaoping Ruan wrote: > Hello, > > Is there anybody had successful experience compiling 2.6 kernel on Power > 5 servers? We recently purchased a Power5 p520 server and installed with > RHEL4 for ppc64. While the installation was ok, I had problem compiling > a custom kernel. > > I installed the kernel src rpm, generated the ppc code, and compiled the > kernel with the default config in /boot. When I updated the yaboot.conf > and reboot, it died in the middle reporting "vfs error ...". > > I'd appreciate if anybody can share some experience with me, or give me > some links for instruction. > > -Yaoping > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev Did you also build an initrd with your new kernel? (with mkinitrd command?) Filesystems and scsi drivers are usually modules, and are most likely missing from your newly built kernel. -Will From tlnguyen at snoqualmie.dp.intel.com Sat Apr 9 03:18:35 2005 From: tlnguyen at snoqualmie.dp.intel.com (long) Date: Fri, 8 Apr 2005 10:18:35 -0700 Subject: PCI Error Recovery API Proposal (updated) Message-ID: <200504081718.j38HIZrv006858@snoqualmie.dp.intel.com> On Thursday, April 07, 2005 3:30 PM Benjamin Herrenschmidt wrote: >Hrm... I don't think the port driver should enter that picture at all. >As far as I'm concerned, the port driver is part of the implementation. >The defined API really only concerns the downstream device. That is,the >AER can use whatever private PCI-Express API you have to trigger the >link reset. I think that's why we have some misunderstanding about the >definition of this callback. The port driver is a PCI device driver, which supports PCI Express features. Each feature has its own service driver, which is not based on PCI Driver Model. These service drivers should be informed of what is going on in the hierarchy where fatal error occurs as well as what error recovery action takes place. Therefore, in my view, the port driver should be part of error recovery process, which is based on a native SW solution, not a FW policy. I hope you understand my concerns and rewrite the definition of this callback usage. >In my view of things, it's just a call to the downstream device after >the link have been reset, and thus is very similar to 2). The difference in their usages is the reason why I've kept requesting you to have API 3) for PCI Express. Thanks, Long From benh at kernel.crashing.org Sat Apr 9 09:52:45 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 09 Apr 2005 09:52:45 +1000 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <200504081718.j38HIZrv006858@snoqualmie.dp.intel.com> References: <200504081718.j38HIZrv006858@snoqualmie.dp.intel.com> Message-ID: <1113004365.9567.391.camel@gaston> On Fri, 2005-04-08 at 10:18 -0700, long wrote: > On Thursday, April 07, 2005 3:30 PM Benjamin Herrenschmidt wrote: > >Hrm... I don't think the port driver should enter that picture at all. > >As far as I'm concerned, the port driver is part of the implementation. > >The defined API really only concerns the downstream device. That is,the > >AER can use whatever private PCI-Express API you have to trigger the > >link reset. I think that's why we have some misunderstanding about the > >definition of this callback. > > The port driver is a PCI device driver, which supports PCI Express > features. Each feature has its own service driver, which is not based on > PCI Driver Model. These service drivers should be informed of what is > going on in the hierarchy where fatal error occurs as well as what error > recovery action takes place. Therefore, in my view, the port driver > should be part of error recovery process, which is based on a native SW > solution, not a FW policy. I hope you understand my concerns and rewrite > the definition of this callback usage. Well, while I agree that the port driver is part of the process, it doesn't have to adhere to the defined API as far as reset of it's downlink link is concerned. Please understand that this API is aimed toward "generic" facility for normal device drivers. The port driver is a specific of the PCI Express implementation, and while it takes an active role in the process, the generic API shouldn't be modified to take into account specific needs of a port driver. Instead, the AER should have it's own private API to the port driver to implement the recovery process. > >In my view of things, it's just a call to the downstream device after > >the link have been reset, and thus is very similar to 2). Ok, if the link "above" the port driver has been reset, then yes, the port driver in this context can act as a normal driver. In the case of the link down from the port driver, it is not. > The difference in their usages is the reason why I've kept requesting > you to have API 3) for PCI Express. > > Thanks, > Long -- Benjamin Herrenschmidt From yruan at cs.princeton.edu Sat Apr 9 11:09:40 2005 From: yruan at cs.princeton.edu (Yaoping Ruan) Date: Fri, 08 Apr 2005 21:09:40 -0400 Subject: Custom 2.6 Kernel on Power5 pSeries In-Reply-To: <425692DA.2050407@vnet.ibm.com> References: <425607B5.9080003@cs.princeton.edu> <425692DA.2050407@vnet.ibm.com> Message-ID: <42572B54.8030605@cs.princeton.edu> Hi: Yes, I did try mkinitrd to have the initrd image. I also tried "make install" to generate the image and copy the files, but without luck. The following is steps I went into, maybe you see anything missed here: 1) get kernel-2.6.9-5.EL.src.rpm from source SRPMS 2) rpm -ivh /var/spool/up2date/kernel-2.6.9-5.EL.src.rpm 3) cd /usr/src/redhat/SPECS 4) rpmbuild -bp --target=ppc64 kernel-2.6.spec 5) cp -a /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 /usr/src 6) ln -s /usr/src/linux-2.6.9 /usr/src/linux 7) cp /boot/config-2.6.9-5.EL .config (in /usr/src/linux-2.6.9) 8) make oldconfig 9) make menuconfig (added some drivers, and file systems. Most SCSI drivers are included, ext3 is enbled) 10) make; make modules; make modules_install; make install (check vmlinux, system.amp and initrd are installed under /boot) 11) update /etc/yaboot.conf to the new kernel, my yaboot looks as follow: Hit for boot options partition=2 timeout=20 install=/usr/lib/yaboot/yaboot default=linux delay=5 nonvram image=/vmlinux-2.6.9-smp label=2.6.9-smp read-only initrd=/initrd-2.6.9-smp.img root=/dev/VolGroup00/LogVol00 append="console=hvsi1 rhgb" image=/vmlinuz-2.6.9-5.EL label=linux read-only initrd=/initrd-2.6.9-5.EL.img root=/dev/VolGroup00/LogVol00 append="console=hvsi1 rhgb" (2.6.9-smp is the new kernel) When the new kernel boots, it will reboot automatically at some point. Since I am using a dial-up console, I couldn't identify at which step if failed. -Yaoping will schmidt wrote: > Yaoping Ruan wrote: > >> Hello, >> >> Is there anybody had successful experience compiling 2.6 kernel on >> Power 5 servers? We recently purchased a Power5 p520 server and >> installed with RHEL4 for ppc64. While the installation was ok, I had >> problem compiling a custom kernel. >> >> I installed the kernel src rpm, generated the ppc code, and compiled >> the kernel with the default config in /boot. When I updated the >> yaboot.conf and reboot, it died in the middle reporting "vfs error ...". >> >> I'd appreciate if anybody can share some experience with me, or give >> me some links for instruction. >> >> -Yaoping >> _______________________________________________ >> Linuxppc64-dev mailing list >> Linuxppc64-dev at ozlabs.org >> https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > > > Did you also build an initrd with your new kernel? (with mkinitrd > command?) > > Filesystems and scsi drivers are usually modules, and are most likely > missing from your newly built kernel. > > > -Will From olof at austin.ibm.com Mon Apr 11 08:45:18 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Sun, 10 Apr 2005 17:45:18 -0500 Subject: [PATCH] [PPC64] No prefetch for NULL pointers Message-ID: <20050410224518.GE2941@austin.ibm.com> Hi, For prefetches of NULL (as when walking a short linked list), PPC64 will in some cases take a performance hit. The hardware needs to do the TLB walk, and said walk will always miss, which means (up to) two L2 misses as penalty. This seems to hurt overall performance, so for NULL pointers skip the prefetch alltogether. Signed-off-by: Olof Johansson Index: 2.6/include/asm-ppc64/processor.h =================================================================== --- 2.6.orig/include/asm-ppc64/processor.h 2005-03-28 11:08:00.000000000 -0600 +++ 2.6/include/asm-ppc64/processor.h 2005-03-28 11:14:34.000000000 -0600 @@ -635,11 +635,17 @@ static inline unsigned long __pack_fe01( static inline void prefetch(const void *x) { + if (unlikely(!x)) + return; + __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x)); } static inline void prefetchw(const void *x) { + if (unlikely(!x)) + return; + __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x)); } From akpm at osdl.org Mon Apr 11 09:03:41 2005 From: akpm at osdl.org (Andrew Morton) Date: Sun, 10 Apr 2005 16:03:41 -0700 Subject: [PATCH] [PPC64] No prefetch for NULL pointers In-Reply-To: <20050410224518.GE2941@austin.ibm.com> References: <20050410224518.GE2941@austin.ibm.com> Message-ID: <20050410160341.59669057.akpm@osdl.org> olof at austin.ibm.com (Olof Johansson) wrote: > > Hi, > > > For prefetches of NULL (as when walking a short linked list), PPC64 will > in some cases take a performance hit. The hardware needs to do the TLB > walk, and said walk will always miss, which means (up to) two L2 misses > as penalty. This seems to hurt overall performance, so for NULL pointers > skip the prefetch alltogether. > I wonder if prefetch(0) is a common thing, or if only one or two callsites do it? If the latter then perhaps this would be better fixed at the callsites, so well-behaved callsites don't bear the cost of the (unneeded) test? > > Index: 2.6/include/asm-ppc64/processor.h > =================================================================== > --- 2.6.orig/include/asm-ppc64/processor.h 2005-03-28 11:08:00.000000000 -0600 > +++ 2.6/include/asm-ppc64/processor.h 2005-03-28 11:14:34.000000000 -0600 > @@ -635,11 +635,17 @@ static inline unsigned long __pack_fe01( > > static inline void prefetch(const void *x) > { > + if (unlikely(!x)) > + return; > + > __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x)); > } > > static inline void prefetchw(const void *x) > { > + if (unlikely(!x)) > + return; > + > __asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x)); > } > From olof at austin.ibm.com Mon Apr 11 09:11:41 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Sun, 10 Apr 2005 18:11:41 -0500 Subject: [PATCH] [PPC64] No prefetch for NULL pointers In-Reply-To: <20050410160341.59669057.akpm@osdl.org> References: <20050410224518.GE2941@austin.ibm.com> <20050410160341.59669057.akpm@osdl.org> Message-ID: <20050410231141.GA17861@austin.ibm.com> On Sun, Apr 10, 2005 at 04:03:41PM -0700, Andrew Morton wrote: > I wonder if prefetch(0) is a common thing, or if only one or two callsites > do it? > > If the latter then perhaps this would be better fixed at the callsites, so > well-behaved callsites don't bear the cost of the (unneeded) test? Actually we hit it pretty bad whenever someone is iterating over a hlist that's short (i.e. one element). That's how the discussions started, Serge Hallyn couldn't understand why his security patches added such a performance penalty on ppc64: They were iterating over a policy list that in his testcase had only one entry. The unneeded test shouldn't be a big penalty, that's also why I added the static prediction. Even with missing the prediction for NULL prefetches it's better than the table walks we currently take. So, we could either change the hlist iterators or the prefetch macro. Since not all architectures hit this, I picked prefetch. -Olof From anton at samba.org Mon Apr 11 09:13:02 2005 From: anton at samba.org (Anton Blanchard) Date: Mon, 11 Apr 2005 09:13:02 +1000 Subject: [PATCH] [PPC64] No prefetch for NULL pointers In-Reply-To: <20050410160341.59669057.akpm@osdl.org> References: <20050410224518.GE2941@austin.ibm.com> <20050410160341.59669057.akpm@osdl.org> Message-ID: <20050410231302.GA8002@krispykreme> > I wonder if prefetch(0) is a common thing, or if only one or two callsites > do it? > > If the latter then perhaps this would be better fixed at the callsites, so > well-behaved callsites don't bear the cost of the (unneeded) test? NULL terminated lists were what seemed to be hurting us most, for example d_lookup. Prefetch (together with likely/unlikely) are the latest fad in Linux programming and I dont trust everyone to get it right :) Anton From mjr at us.ibm.com Mon Apr 11 09:32:40 2005 From: mjr at us.ibm.com (Mike Ranweiler) Date: Sun, 10 Apr 2005 16:32:40 -0700 Subject: Custom 2.6 Kernel on Power5 pSeries In-Reply-To: <42572B54.8030605@cs.princeton.edu> References: <425607B5.9080003@cs.princeton.edu> <425692DA.2050407@vnet.ibm.com> <42572B54.8030605@cs.princeton.edu> Message-ID: <200504101632.40055.mjr@us.ibm.com> On Friday 08 April 2005 18:09, Yaoping Ruan wrote: > Hi: > > Yes, I did try mkinitrd to have the initrd image. I also tried "make > install" to generate the image and copy the files, but without luck. > > The following is steps I went into, maybe you see anything missed here: > 1) get kernel-2.6.9-5.EL.src.rpm from source SRPMS > 2) rpm -ivh /var/spool/up2date/kernel-2.6.9-5.EL.src.rpm > 3) cd /usr/src/redhat/SPECS > 4) rpmbuild -bp --target=ppc64 kernel-2.6.spec > 5) cp -a /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 /usr/src > 6) ln -s /usr/src/linux-2.6.9 /usr/src/linux > 7) cp /boot/config-2.6.9-5.EL .config (in /usr/src/linux-2.6.9) > 8) make oldconfig > 9) make menuconfig (added some drivers, and file systems. Most SCSI > drivers are included, ext3 is enbled) Have you tried just the generic RH config, make sure you can build boot that, then make your changes? Mike > 10) make; make modules; make modules_install; make install (check > vmlinux, system.amp and initrd are installed under /boot) > 11) update /etc/yaboot.conf to the new kernel, my yaboot looks as follow: > > Hit for boot options > > partition=2 > timeout=20 > install=/usr/lib/yaboot/yaboot > default=linux > delay=5 > nonvram > > image=/vmlinux-2.6.9-smp > label=2.6.9-smp > read-only > initrd=/initrd-2.6.9-smp.img > root=/dev/VolGroup00/LogVol00 > append="console=hvsi1 rhgb" > > image=/vmlinuz-2.6.9-5.EL > label=linux > read-only > initrd=/initrd-2.6.9-5.EL.img > root=/dev/VolGroup00/LogVol00 > append="console=hvsi1 rhgb" > > (2.6.9-smp is the new kernel) > > When the new kernel boots, it will reboot automatically at some point. > Since I am using a dial-up console, I couldn't identify at which step if > failed. > > -Yaoping > > will schmidt wrote: > > Yaoping Ruan wrote: > >> Hello, > >> > >> Is there anybody had successful experience compiling 2.6 kernel on > >> Power 5 servers? We recently purchased a Power5 p520 server and > >> installed with RHEL4 for ppc64. While the installation was ok, I had > >> problem compiling a custom kernel. > >> > >> I installed the kernel src rpm, generated the ppc code, and compiled > >> the kernel with the default config in /boot. When I updated the > >> yaboot.conf and reboot, it died in the middle reporting "vfs error ...". > >> > >> I'd appreciate if anybody can share some experience with me, or give > >> me some links for instruction. > >> > >> -Yaoping > >> _______________________________________________ > >> Linuxppc64-dev mailing list > >> Linuxppc64-dev at ozlabs.org > >> https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > > > > Did you also build an initrd with your new kernel? (with mkinitrd > > command?) > > > > Filesystems and scsi drivers are usually modules, and are most likely > > missing from your newly built kernel. > > > > > > -Will > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev From yruan at cs.princeton.edu Mon Apr 11 14:34:27 2005 From: yruan at cs.princeton.edu (Yaoping Ruan) Date: Mon, 11 Apr 2005 00:34:27 -0400 Subject: Custom 2.6 Kernel on Power5 pSeries In-Reply-To: <200504101632.40055.mjr@us.ibm.com> References: <425607B5.9080003@cs.princeton.edu> <425692DA.2050407@vnet.ibm.com> <42572B54.8030605@cs.princeton.edu> <200504101632.40055.mjr@us.ibm.com> Message-ID: <4259FE53.4080201@cs.princeton.edu> Did you mean using the RH config installed by default as /boot/config-***? Yes, I did. I installed RHEL 4 on a Xeon server with 2 CPUs, and resumed the kernel compiling process. It was successful. The only difference between the Xeon and the Power I noticed was that "make install" actually failed on the Power. (I installed the kernel by using "installkernel and mkunitrd"). -Yaoping >Have you tried just the generic RH config, make sure you can build boot that, >then make your changes? > >Mike > > >>10) make; make modules; make modules_install; make install (check >>vmlinux, system.amp and initrd are installed under /boot) >>11) update /etc/yaboot.conf to the new kernel, my yaboot looks as follow: >> >>Hit for boot options >> >>partition=2 >>timeout=20 >>install=/usr/lib/yaboot/yaboot >>default=linux >>delay=5 >>nonvram >> >>image=/vmlinux-2.6.9-smp >> label=2.6.9-smp >> read-only >> initrd=/initrd-2.6.9-smp.img >> root=/dev/VolGroup00/LogVol00 >> append="console=hvsi1 rhgb" >> >>image=/vmlinuz-2.6.9-5.EL >> label=linux >> read-only >> initrd=/initrd-2.6.9-5.EL.img >> root=/dev/VolGroup00/LogVol00 >> append="console=hvsi1 rhgb" >> >>(2.6.9-smp is the new kernel) >> >>When the new kernel boots, it will reboot automatically at some point. >>Since I am using a dial-up console, I couldn't identify at which step if >>failed. >> >>-Yaoping >> >>will schmidt wrote: >> >> >>>Yaoping Ruan wrote: >>> >>> >>>>Hello, >>>> >>>>Is there anybody had successful experience compiling 2.6 kernel on >>>>Power 5 servers? We recently purchased a Power5 p520 server and >>>>installed with RHEL4 for ppc64. While the installation was ok, I had >>>>problem compiling a custom kernel. >>>> >>>>I installed the kernel src rpm, generated the ppc code, and compiled >>>>the kernel with the default config in /boot. When I updated the >>>>yaboot.conf and reboot, it died in the middle reporting "vfs error ...". >>>> >>>>I'd appreciate if anybody can share some experience with me, or give >>>>me some links for instruction. >>>> >>>>-Yaoping >>>>_______________________________________________ >>>>Linuxppc64-dev mailing list >>>>Linuxppc64-dev at ozlabs.org >>>>https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev >>>> >>>> >>>Did you also build an initrd with your new kernel? (with mkinitrd >>>command?) >>> >>>Filesystems and scsi drivers are usually modules, and are most likely >>>missing from your newly built kernel. >>> >>> >>>-Will >>> >>> >>_______________________________________________ >>Linuxppc64-dev mailing list >>Linuxppc64-dev at ozlabs.org >>https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev >> >> From benh at kernel.crashing.org Mon Apr 11 18:19:27 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 11 Apr 2005 18:19:27 +1000 Subject: [PATCH] ppc64: very basic desktop g5 sound support Message-ID: <1113207567.8984.7.camel@gaston> Hi ! This patch hacks the current PowerMac Alsa driver to add some basic support of analog sound output to some desktop G5s. It has severe limitations though: - Only 44100Khz 16 bits - Only work on G5 models using a TAS3004 analog code, that is early single CPU desktops and all dual CPU desktops at this date, but none of the more recent ones like iMac G5. - It does analog only, no digital/SPDIF support at all, no native AC3 support Better support would require a complete rewrite of the driver (which I am working on, but don't hold your breath), to properly support the diversity of apple sound HW setup, including dual codecs, several i2s busses, all the new codecs used in the new machines, proper clock switching with digital, etc etc etc... This patch applies on top of the other PowerMac sound patches I posted in the past couple of days (new powerbook support and sleep fixes). Note: This is a FAQ entry for PowerMac sound support with TI codecs: They have a feature called "DRC" which is automatically enabled for the internal speaker (at least when auto mute control is enabled) which will cause your sound to fade out to nothing after half a second of playback if you don't set a proper "DRC Range" in the mixer. So if you have a problem like that, check alsamixer and raise your DRC Range to something reasonable. Note2: This patch will also add auto-mute of the speaker when line-out jack is used on some earlier desktop G4s (and on the G5) in addition to the headphone jack. If that behaviour isn't what you want, just disable auto-muting and use the manual mute controls in alsamixer. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/sound/ppc/pmac.c =================================================================== --- linux-work.orig/sound/ppc/pmac.c 2005-04-11 10:50:18.000000000 +1000 +++ linux-work/sound/ppc/pmac.c 2005-04-11 17:48:06.000000000 +1000 @@ -27,14 +27,13 @@ #include #include #include +#include +#include #include #include "pmac.h" #include -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS #include -#else -#include -#endif +#include #if defined(CONFIG_PM) && defined(CONFIG_PMAC_PBOOK) @@ -57,22 +56,29 @@ /* * allocate DBDMA command arrays */ -static int snd_pmac_dbdma_alloc(pmac_dbdma_t *rec, int size) +static int snd_pmac_dbdma_alloc(pmac_t *chip, pmac_dbdma_t *rec, int size) { - rec->space = kmalloc(sizeof(struct dbdma_cmd) * (size + 1), GFP_KERNEL); + unsigned int rsize = sizeof(struct dbdma_cmd) * (size + 1); + + rec->space = dma_alloc_coherent(&chip->pdev->dev, rsize, + &rec->dma_base, GFP_KERNEL); if (rec->space == NULL) return -ENOMEM; rec->size = size; - memset(rec->space, 0, sizeof(struct dbdma_cmd) * (size + 1)); + memset(rec->space, 0, rsize); rec->cmds = (void __iomem *)DBDMA_ALIGN(rec->space); - rec->addr = virt_to_bus(rec->cmds); + rec->addr = rec->dma_base + (unsigned long)((char *)rec->cmds - (char *)rec->space); + return 0; } -static void snd_pmac_dbdma_free(pmac_dbdma_t *rec) +static void snd_pmac_dbdma_free(pmac_t *chip, pmac_dbdma_t *rec) { - if (rec) - kfree(rec->space); + if (rec) { + unsigned int rsize = sizeof(struct dbdma_cmd) * (rec->size + 1); + + dma_free_coherent(&chip->pdev->dev, rsize, rec->space, rec->dma_base); + } } @@ -237,7 +243,7 @@ /* continuous DMA memory type doesn't provide the physical address, * so we need to resolve the address here... */ - offset = virt_to_bus(runtime->dma_area); + offset = runtime->dma_addr; for (i = 0, cp = rec->cmd.cmds; i < rec->nperiods; i++, cp++) { st_le32(&cp->phy_addr, offset); st_le16(&cp->req_count, rec->period_size); @@ -664,8 +670,8 @@ chip->capture.cur_freqs = chip->freqs_ok; /* preallocate 64k buffer */ - snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_CONTINUOUS, - snd_dma_continuous_data(GFP_KERNEL), + snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, + &chip->pdev->dev, 64 * 1024, 64 * 1024); return 0; @@ -757,28 +763,10 @@ /* * a wrapper to feature call for compatibility */ -#if defined(CONFIG_PM) && defined(CONFIG_PMAC_PBOOK) static void snd_pmac_sound_feature(pmac_t *chip, int enable) { -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS ppc_md.feature_call(PMAC_FTR_SOUND_CHIP_ENABLE, chip->node, 0, enable); -#else - if (chip->is_pbook_G3) { - pmu_suspend(); - feature_clear(chip->node, FEATURE_Sound_power); - feature_clear(chip->node, FEATURE_Sound_CLK_enable); - big_mdelay(1000); /* XXX */ - pmu_resume(); - } - if (chip->is_pbook_3400) { - feature_set(chip->node, FEATURE_IOBUS_enable); - udelay(10); - } -#endif } -#else /* CONFIG_PM && CONFIG_PMAC_PBOOK */ -#define snd_pmac_sound_feature(chip,enable) /**/ -#endif /* CONFIG_PM && CONFIG_PMAC_PBOOK */ /* * release resources @@ -786,8 +774,6 @@ static int snd_pmac_free(pmac_t *chip) { - int i; - /* stop sounds */ if (chip->initialized) { snd_pmac_dbdma_reset(chip); @@ -813,9 +799,9 @@ free_irq(chip->tx_irq, (void*)chip); if (chip->rx_irq >= 0) free_irq(chip->rx_irq, (void*)chip); - snd_pmac_dbdma_free(&chip->playback.cmd); - snd_pmac_dbdma_free(&chip->capture.cmd); - snd_pmac_dbdma_free(&chip->extra_dma); + snd_pmac_dbdma_free(chip, &chip->playback.cmd); + snd_pmac_dbdma_free(chip, &chip->capture.cmd); + snd_pmac_dbdma_free(chip, &chip->extra_dma); if (chip->macio_base) iounmap(chip->macio_base); if (chip->latch_base) @@ -826,12 +812,23 @@ iounmap(chip->playback.dma); if (chip->capture.dma) iounmap(chip->capture.dma); +#ifndef CONFIG_PPC64 if (chip->node) { + int i; + for (i = 0; i < 3; i++) { - if (chip->of_requested & (1 << i)) - release_OF_resource(chip->node, i); + if (chip->of_requested & (1 << i)) { + if (chip->is_k2) + release_OF_resource(chip->node->parent, + i); + else + release_OF_resource(chip->node, i); + } } } +#endif /* CONFIG_PPC64 */ + if (chip->pdev) + pci_dev_put(chip->pdev); kfree(chip); return 0; } @@ -881,6 +878,8 @@ { struct device_node *sound; unsigned int *prop, l; + struct macio_chip* macio; + u32 layout_id = 0; if (_machine != _MACH_Pmac) @@ -918,10 +917,17 @@ * if we didn't find a davbus device, try 'i2s-a' since * this seems to be what iBooks have */ - if (! chip->node) + if (! chip->node) { chip->node = find_devices("i2s-a"); + if (chip->node && chip->node->parent && chip->node->parent->parent) { + if (device_is_compatible(chip->node->parent->parent, + "K2-Keylargo")) + chip->is_k2 = 1; + } + } if (! chip->node) return -ENODEV; + sound = find_devices("sound"); while (sound && sound->parent != chip->node) sound = sound->next; @@ -966,7 +972,8 @@ chip->control_mask = MASK_IEPC | 0x11; /* disable IEE */ } if (device_is_compatible(sound, "AOAKeylargo") || - device_is_compatible(sound, "AOAbase")) { + device_is_compatible(sound, "AOAbase") || + device_is_compatible(sound, "AOAK2")) { /* For now, only support very basic TAS3004 based machines with * single frequency until proper i2s control is implemented */ @@ -975,6 +982,7 @@ case 0x46: case 0x33: case 0x29: + case 0x24: chip->num_freqs = ARRAY_SIZE(tumbler_freqs); chip->model = PMAC_SNAPPER; chip->can_byte_swap = 0; /* FIXME: check this */ @@ -987,6 +995,26 @@ chip->device_id = *prop; chip->has_iic = (find_devices("perch") != NULL); + /* We need the PCI device for DMA allocations, let's use a crude method + * for now ... + */ + macio = macio_find(chip->node, macio_unknown); + if (macio == NULL) + printk(KERN_WARNING "snd-powermac: can't locate macio !\n"); + else { + struct pci_dev *pdev = NULL; + + for_each_pci_dev(pdev) { + struct device_node *np = pci_device_to_OF_node(pdev); + if (np && np == macio->of_node) { + chip->pdev = pdev; + break; + } + } + } + if (chip->pdev == NULL) + printk(KERN_WARNING "snd-powermac: can't locate macio PCI device !\n"); + detect_byte_swap(chip); /* look for a property saying what sample rates @@ -1091,8 +1119,10 @@ int err; chip->auto_mute = 1; err = snd_ctl_add(chip->card, snd_ctl_new1(&auto_mute_controls[0], chip)); - if (err < 0) + if (err < 0) { + printk(KERN_ERR "snd-powermac: Failed to add automute control\n"); return err; + } chip->hp_detect_ctl = snd_ctl_new1(&auto_mute_controls[1], chip); return snd_ctl_add(chip->card, chip->hp_detect_ctl); } @@ -1106,6 +1136,7 @@ pmac_t *chip; struct device_node *np; int i, err; + unsigned long ctrl_addr, txdma_addr, rxdma_addr; static snd_device_ops_t ops = { .dev_free = snd_pmac_dev_free, }; @@ -1127,32 +1158,59 @@ if ((err = snd_pmac_detect(chip)) < 0) goto __error; - if (snd_pmac_dbdma_alloc(&chip->playback.cmd, PMAC_MAX_FRAGS + 1) < 0 || - snd_pmac_dbdma_alloc(&chip->capture.cmd, PMAC_MAX_FRAGS + 1) < 0 || - snd_pmac_dbdma_alloc(&chip->extra_dma, 2) < 0) { + if (snd_pmac_dbdma_alloc(chip, &chip->playback.cmd, PMAC_MAX_FRAGS + 1) < 0 || + snd_pmac_dbdma_alloc(chip, &chip->capture.cmd, PMAC_MAX_FRAGS + 1) < 0 || + snd_pmac_dbdma_alloc(chip, &chip->extra_dma, 2) < 0) { err = -ENOMEM; goto __error; } np = chip->node; - if (np->n_addrs < 3 || np->n_intrs < 3) { - err = -ENODEV; - goto __error; - } + if (chip->is_k2) { + if (np->parent->n_addrs < 2 || np->n_intrs < 3) { + err = -ENODEV; + goto __error; + } + for (i = 0; i < 2; i++) { +#ifndef CONFIG_PPC64 + static char *name[2] = { "- Control", "- DMA" }; + if (! request_OF_resource(np->parent, i, name[i])) { + snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + err = -ENODEV; + goto __error; + } + chip->of_requested |= (1 << i); +#endif /* CONFIG_PPC64 */ + ctrl_addr = np->parent->addrs[0].address; + txdma_addr = np->parent->addrs[1].address; + rxdma_addr = txdma_addr + 0x100; + } - for (i = 0; i < 3; i++) { - static char *name[3] = { NULL, "- Tx DMA", "- Rx DMA" }; - if (! request_OF_resource(np, i, name[i])) { - snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + } else { + if (np->n_addrs < 3 || np->n_intrs < 3) { err = -ENODEV; goto __error; } - chip->of_requested |= (1 << i); + + for (i = 0; i < 3; i++) { +#ifndef CONFIG_PPC64 + static char *name[3] = { "- Control", "- Tx DMA", "- Rx DMA" }; + if (! request_OF_resource(np, i, name[i])) { + snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + err = -ENODEV; + goto __error; + } + chip->of_requested |= (1 << i); +#endif /* CONFIG_PPC64 */ + ctrl_addr = np->addrs[0].address; + txdma_addr = np->addrs[1].address; + rxdma_addr = np->addrs[2].address; + } } - chip->awacs = ioremap(np->addrs[0].address, 0x1000); - chip->playback.dma = ioremap(np->addrs[1].address, 0x100); - chip->capture.dma = ioremap(np->addrs[2].address, 0x100); + chip->awacs = ioremap(ctrl_addr, 0x1000); + chip->playback.dma = ioremap(txdma_addr, 0x100); + chip->capture.dma = ioremap(rxdma_addr, 0x100); if (chip->model <= PMAC_BURGUNDY) { if (request_irq(np->intrs[0].line, snd_pmac_ctrl_intr, 0, "PMac", (void*)chip)) { @@ -1180,7 +1238,8 @@ snd_pmac_sound_feature(chip, 1); /* reset */ - out_le32(&chip->awacs->control, 0x11); + if (chip->model == PMAC_AWACS) + out_le32(&chip->awacs->control, 0x11); /* Powerbooks have odd ways of enabling inputs such as an expansion-bay CD or sound from an internal modem @@ -1232,6 +1291,8 @@ return 0; __error: + if (chip->pdev) + pci_dev_put(chip->pdev); snd_pmac_free(chip); return err; } Index: linux-work/sound/ppc/pmac.h =================================================================== --- linux-work.orig/sound/ppc/pmac.h 2005-04-11 10:49:18.000000000 +1000 +++ linux-work/sound/ppc/pmac.h 2005-04-11 17:45:08.000000000 +1000 @@ -60,7 +60,8 @@ * DBDMA space */ struct snd_pmac_dbdma { - unsigned long addr; + dma_addr_t dma_base; + dma_addr_t addr; struct dbdma_cmd __iomem *cmds; void *space; int size; @@ -101,6 +102,7 @@ /* h/w info */ struct device_node *node; + struct pci_dev *pdev; unsigned int revision; unsigned int manufacturer; unsigned int subframe; @@ -110,6 +112,7 @@ unsigned int has_iic : 1; unsigned int is_pbook_3400 : 1; unsigned int is_pbook_G3 : 1; + unsigned int is_k2 : 1; unsigned int can_byte_swap : 1; unsigned int can_duplex : 1; @@ -157,6 +160,7 @@ snd_kcontrol_t *speaker_sw_ctl; snd_kcontrol_t *drc_sw_ctl; /* only used for tumbler -ReneR */ snd_kcontrol_t *hp_detect_ctl; + snd_kcontrol_t *lineout_sw_ctl; /* lowlevel callbacks */ void (*set_format)(pmac_t *chip); Index: linux-work/sound/ppc/tumbler.c =================================================================== --- linux-work.orig/sound/ppc/tumbler.c 2005-04-11 11:57:07.000000000 +1000 +++ linux-work/sound/ppc/tumbler.c 2005-04-11 18:06:42.000000000 +1000 @@ -35,14 +35,19 @@ #include #include #include -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS +#include #include -#else -#error old crap -#endif #include "pmac.h" #include "tumbler_volume.h" +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + /* i2c address for tumbler */ #define TAS_I2C_ADDR 0x34 @@ -78,21 +83,22 @@ }; typedef struct pmac_gpio { -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS unsigned int addr; -#else - void __iomem *addr; -#endif - int active_state; + u8 active_val; + u8 inactive_val; + u8 active_state; } pmac_gpio_t; typedef struct pmac_tumbler_t { pmac_keywest_t i2c; pmac_gpio_t audio_reset; pmac_gpio_t amp_mute; + pmac_gpio_t line_mute; + pmac_gpio_t line_detect; pmac_gpio_t hp_mute; pmac_gpio_t hp_detect; int headphone_irq; + int lineout_irq; unsigned int master_vol[2]; unsigned int save_master_switch[2]; unsigned int master_switch[2]; @@ -120,6 +126,7 @@ regs[0], regs[1]); if (err >= 0) break; + DBG("(W) i2c error %d\n", err); mdelay(10); } while (count--); if (err < 0) @@ -137,6 +144,7 @@ TAS_REG_MCS, (1<<6)|(2<<4)|(2<<2)|0, 0, /* terminator */ }; + DBG("(I) tumbler init client\n"); return send_init_client(i2c, regs); } @@ -151,36 +159,27 @@ TAS_REG_ACS, 0, 0, /* terminator */ }; + DBG("(I) snapper init client\n"); return send_init_client(i2c, regs); } /* * gpio access */ -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS #define do_gpio_write(gp, val) \ pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, (gp)->addr, val) #define do_gpio_read(gp) \ pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, (gp)->addr, 0) #define tumbler_gpio_free(gp) /* NOP */ -#else -#define do_gpio_write(gp, val) writeb(val, (gp)->addr) -#define do_gpio_read(gp) readb((gp)->addr) -static inline void tumbler_gpio_free(pmac_gpio_t *gp) -{ - if (gp->addr) { - iounmap(gp->addr); - gp->addr = NULL; - } -} -#endif /* CONFIG_PPC_HAS_FEATURE_CALLS */ static void write_audio_gpio(pmac_gpio_t *gp, int active) { if (! gp->addr) return; - active = active ? gp->active_state : !gp->active_state; - do_gpio_write(gp, active ? 0x05 : 0x04); + active = active ? gp->active_val : gp->inactive_val; + + do_gpio_write(gp, active); + DBG("(I) gpio %x write %d\n", gp->addr, active); } static int read_audio_gpio(pmac_gpio_t *gp) @@ -663,7 +662,7 @@ * to avoid codec reset on ibook M7 */ -enum { TUMBLER_MUTE_HP, TUMBLER_MUTE_AMP }; +enum { TUMBLER_MUTE_HP, TUMBLER_MUTE_AMP, TUMBLER_MUTE_LINE }; static int tumbler_get_mute_switch(snd_kcontrol_t *kcontrol, snd_ctl_elem_value_t *ucontrol) { @@ -672,7 +671,18 @@ pmac_gpio_t *gp; if (! (mix = chip->mixer_data)) return -ENODEV; - gp = (kcontrol->private_value == TUMBLER_MUTE_HP) ? &mix->hp_mute : &mix->amp_mute; + switch(kcontrol->private_value) { + case TUMBLER_MUTE_HP: + gp = &mix->hp_mute; break; + case TUMBLER_MUTE_AMP: + gp = &mix->amp_mute; break; + case TUMBLER_MUTE_LINE: + gp = &mix->line_mute; break; + default: + gp = NULL; + } + if (gp == NULL) + return -EINVAL; ucontrol->value.integer.value[0] = ! read_audio_gpio(gp); return 0; } @@ -689,7 +699,18 @@ #endif if (! (mix = chip->mixer_data)) return -ENODEV; - gp = (kcontrol->private_value == TUMBLER_MUTE_HP) ? &mix->hp_mute : &mix->amp_mute; + switch(kcontrol->private_value) { + case TUMBLER_MUTE_HP: + gp = &mix->hp_mute; break; + case TUMBLER_MUTE_AMP: + gp = &mix->amp_mute; break; + case TUMBLER_MUTE_LINE: + gp = &mix->line_mute; break; + default: + gp = NULL; + } + if (gp == NULL) + return -EINVAL; val = ! read_audio_gpio(gp); if (val != ucontrol->value.integer.value[0]) { write_audio_gpio(gp, ! ucontrol->value.integer.value[0]); @@ -833,6 +854,14 @@ .put = tumbler_put_mute_switch, .private_value = TUMBLER_MUTE_AMP, }; +static snd_kcontrol_new_t tumbler_lineout_sw __initdata = { + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, + .name = "Line Out Playback Switch", + .info = snd_pmac_boolean_mono_info, + .get = tumbler_get_mute_switch, + .put = tumbler_put_mute_switch, + .private_value = TUMBLER_MUTE_LINE, +}; static snd_kcontrol_new_t tumbler_drc_sw __initdata = { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = "DRC Switch", @@ -849,7 +878,21 @@ static int tumbler_detect_headphone(pmac_t *chip) { pmac_tumbler_t *mix = chip->mixer_data; - return read_audio_gpio(&mix->hp_detect); + int detect = 0; + + if (mix->hp_detect.addr) + detect |= read_audio_gpio(&mix->hp_detect); + return detect; +} + +static int tumbler_detect_lineout(pmac_t *chip) +{ + pmac_tumbler_t *mix = chip->mixer_data; + int detect = 0; + + if (mix->line_detect.addr) + detect |= read_audio_gpio(&mix->line_detect); + return detect; } static void check_mute(pmac_t *chip, pmac_gpio_t *gp, int val, int do_notify, snd_kcontrol_t *sw) @@ -868,6 +911,7 @@ { pmac_t *chip = (pmac_t*) self; pmac_tumbler_t *mix; + int headphone, lineout; if (!chip) return; @@ -875,23 +919,35 @@ mix = chip->mixer_data; snd_assert(mix, return); - if (tumbler_detect_headphone(chip)) { - /* mute speaker */ - check_mute(chip, &mix->hp_mute, 0, mix->auto_mute_notify, - chip->master_sw_ctl); + headphone = tumbler_detect_headphone(chip); + lineout = tumbler_detect_lineout(chip); + + DBG("headphone: %d, lineout: %d\n", headphone, lineout); + + if (headphone || lineout) { + /* unmute headphone/lineout & mute speaker */ + if (headphone) + check_mute(chip, &mix->hp_mute, 0, mix->auto_mute_notify, + chip->master_sw_ctl); + if (lineout && mix->line_mute.addr != 0) + check_mute(chip, &mix->line_mute, 0, mix->auto_mute_notify, + chip->lineout_sw_ctl); if (mix->anded_reset) big_mdelay(10); check_mute(chip, &mix->amp_mute, 1, mix->auto_mute_notify, chip->speaker_sw_ctl); mix->drc_enable = 0; } else { - /* unmute speaker */ + /* unmute speaker, mute others */ check_mute(chip, &mix->amp_mute, 0, mix->auto_mute_notify, chip->speaker_sw_ctl); if (mix->anded_reset) big_mdelay(10); check_mute(chip, &mix->hp_mute, 1, mix->auto_mute_notify, chip->master_sw_ctl); + if (mix->line_mute.addr != 0) + check_mute(chip, &mix->line_mute, 1, mix->auto_mute_notify, + chip->lineout_sw_ctl); mix->drc_enable = 1; } if (mix->auto_mute_notify) { @@ -967,7 +1023,7 @@ } /* find an audio device and get its address */ -static long tumbler_find_device(const char *device, pmac_gpio_t *gp, int is_compatible) +static long tumbler_find_device(const char *device, const char *platform, pmac_gpio_t *gp, int is_compatible) { struct device_node *node; u32 *base, addr; @@ -977,6 +1033,7 @@ else node = find_audio_device(device); if (! node) { + DBG("(W) cannot find audio device %s !\n", device); snd_printdd("cannot find device %s\n", device); return -ENODEV; } @@ -985,29 +1042,41 @@ if (! base) { base = (u32 *)get_property(node, "reg", NULL); if (!base) { + DBG("(E) cannot find address for device %s !\n", device); snd_printd("cannot find address for device %s\n", device); return -ENODEV; } - /* this only work if PPC_HAS_FEATURE_CALLS is set as we - * are only getting the low part of the address - */ addr = *base; if (addr < 0x50) addr += 0x50; } else addr = *base; -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS gp->addr = addr & 0x0000ffff; -#else - gp->addr = ioremap((unsigned long)addr, 1); -#endif /* Try to find the active state, default to 0 ! */ base = (u32 *)get_property(node, "audio-gpio-active-state", NULL); - if (base) + if (base) { gp->active_state = *base; - else + gp->active_val = (*base) ? 0x5 : 0x4; + gp->inactive_val = (*base) ? 0x4 : 0x5; + } else { + u32 *prop = NULL; gp->active_state = 0; + gp->active_val = 0x4; + gp->inactive_val = 0x5; + /* hacks ... */ + if (platform) + prop = (u32 *)get_property(node, platform, NULL); + if (prop) { + if (prop[3] == 0x9 && prop[4] == 0x9) { + gp->active_val = 0xd; + gp->inactive_val = 0xc; + } + } + } + + DBG("(I) GPIO device %s found, offset: %x, active state: %d !\n", + device, gp->addr, gp->active_state); return (node->n_intrs > 0) ? node->intrs[0].line : 0; } @@ -1018,6 +1087,7 @@ pmac_tumbler_t *mix = chip->mixer_data; if (mix->anded_reset) { + DBG("(I) codec anded reset !\n"); write_audio_gpio(&mix->hp_mute, 0); write_audio_gpio(&mix->amp_mute, 0); big_mdelay(200); @@ -1028,6 +1098,8 @@ write_audio_gpio(&mix->amp_mute, 0); big_mdelay(100); } else { + DBG("(I) codec normal reset !\n"); + write_audio_gpio(&mix->audio_reset, 0); big_mdelay(200); write_audio_gpio(&mix->audio_reset, 1); @@ -1045,6 +1117,8 @@ if (mix->headphone_irq >= 0) disable_irq(mix->headphone_irq); + if (mix->lineout_irq >= 0) + disable_irq(mix->lineout_irq); mix->save_master_switch[0] = mix->master_switch[0]; mix->save_master_switch[1] = mix->master_switch[1]; mix->master_switch[0] = mix->master_switch[1] = 0; @@ -1099,41 +1173,59 @@ chip->update_automute(chip, 0); if (mix->headphone_irq >= 0) enable_irq(mix->headphone_irq); + if (mix->lineout_irq >= 0) + enable_irq(mix->lineout_irq); } #endif /* initialize tumbler */ static int __init tumbler_init(pmac_t *chip) { - int irq, err; + int irq; pmac_tumbler_t *mix = chip->mixer_data; snd_assert(mix, return -EINVAL); - if (tumbler_find_device("audio-hw-reset", &mix->audio_reset, 0) < 0) - tumbler_find_device("hw-reset", &mix->audio_reset, 1); - if (tumbler_find_device("amp-mute", &mix->amp_mute, 0) < 0) - tumbler_find_device("amp-mute", &mix->amp_mute, 1); - if (tumbler_find_device("headphone-mute", &mix->hp_mute, 0) < 0) - tumbler_find_device("headphone-mute", &mix->hp_mute, 1); - irq = tumbler_find_device("headphone-detect", &mix->hp_detect, 0); + if (tumbler_find_device("audio-hw-reset", + "platform-do-hw-reset", + &mix->audio_reset, 0) < 0) + tumbler_find_device("hw-reset", + "platform-do-hw-reset", + &mix->audio_reset, 1); + if (tumbler_find_device("amp-mute", + "platform-do-amp-mute", + &mix->amp_mute, 0) < 0) + tumbler_find_device("amp-mute", + "platform-do-amp-mute", + &mix->amp_mute, 1); + if (tumbler_find_device("headphone-mute", + "platform-do-headphone-mute", + &mix->hp_mute, 0) < 0) + tumbler_find_device("headphone-mute", + "platform-do-headphone-mute", + &mix->hp_mute, 1); + if (tumbler_find_device("line-output-mute", + "platform-do-lineout-mute", + &mix->line_mute, 0) < 0) + tumbler_find_device("line-output-mute", + "platform-do-lineout-mute", + &mix->line_mute, 1); + irq = tumbler_find_device("headphone-detect", + NULL, &mix->hp_detect, 0); if (irq < 0) - irq = tumbler_find_device("headphone-detect", &mix->hp_detect, 1); + irq = tumbler_find_device("headphone-detect", + NULL, &mix->hp_detect, 1); if (irq < 0) - irq = tumbler_find_device("keywest-gpio15", &mix->hp_detect, 1); - - tumbler_reset_audio(chip); - - /* activate headphone status interrupts */ - if (irq >= 0) { - unsigned char val; - if ((err = request_irq(irq, headphone_intr, 0, - "Tumbler Headphone Detection", chip)) < 0) - return err; - /* activate headphone status interrupts */ - val = do_gpio_read(&mix->hp_detect); - do_gpio_write(&mix->hp_detect, val | 0x80); - } + irq = tumbler_find_device("keywest-gpio15", + NULL, &mix->hp_detect, 1); mix->headphone_irq = irq; + irq = tumbler_find_device("line-output-detect", + NULL, &mix->line_detect, 0); + if (irq < 0) + irq = tumbler_find_device("line-output-detect", + NULL, &mix->line_detect, 1); + mix->lineout_irq = irq; + + tumbler_reset_audio(chip); return 0; } @@ -1146,6 +1238,8 @@ if (mix->headphone_irq >= 0) free_irq(mix->headphone_irq, chip); + if (mix->lineout_irq >= 0) + free_irq(mix->lineout_irq, chip); tumbler_gpio_free(&mix->audio_reset); tumbler_gpio_free(&mix->amp_mute); tumbler_gpio_free(&mix->hp_mute); @@ -1207,6 +1301,8 @@ else mix->i2c.addr = TAS_I2C_ADDR; + DBG("(I) TAS i2c address is: %x\n", mix->i2c.addr); + if (chip->model == PMAC_TUMBLER) { mix->i2c.init_client = tumbler_init_client; mix->i2c.name = "TAS3001c"; @@ -1242,6 +1338,11 @@ chip->speaker_sw_ctl = snd_ctl_new1(&tumbler_speaker_sw, chip); if ((err = snd_ctl_add(chip->card, chip->speaker_sw_ctl)) < 0) return err; + if (mix->line_mute.addr != 0) { + chip->lineout_sw_ctl = snd_ctl_new1(&tumbler_lineout_sw, chip); + if ((err = snd_ctl_add(chip->card, chip->lineout_sw_ctl)) < 0) + return err; + } chip->drc_sw_ctl = snd_ctl_new1(&tumbler_drc_sw, chip); if ((err = snd_ctl_add(chip->card, chip->drc_sw_ctl)) < 0) return err; @@ -1254,11 +1355,32 @@ INIT_WORK(&device_change, device_change_handler, (void *)chip); #ifdef PMAC_SUPPORT_AUTOMUTE - if (mix->headphone_irq >=0 && (err = snd_pmac_add_automute(chip)) < 0) + if ((mix->headphone_irq >=0 || mix->lineout_irq >= 0) + && (err = snd_pmac_add_automute(chip)) < 0) return err; chip->detect_headphone = tumbler_detect_headphone; chip->update_automute = tumbler_update_automute; tumbler_update_automute(chip, 0); /* update the status only */ + + /* activate headphone status interrupts */ + if (mix->headphone_irq >= 0) { + unsigned char val; + if ((err = request_irq(mix->headphone_irq, headphone_intr, 0, + "Sound Headphone Detection", chip)) < 0) + return 0; + /* activate headphone status interrupts */ + val = do_gpio_read(&mix->hp_detect); + do_gpio_write(&mix->hp_detect, val | 0x80); + } + if (mix->lineout_irq >= 0) { + unsigned char val; + if ((err = request_irq(mix->lineout_irq, headphone_intr, 0, + "Sound Lineout Detection", chip)) < 0) + return 0; + /* activate headphone status interrupts */ + val = do_gpio_read(&mix->line_detect); + do_gpio_write(&mix->line_detect, val | 0x80); + } #endif return 0; Index: linux-work/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_feature.c 2005-04-05 17:46:03.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pmac_feature.c 2005-04-11 17:45:09.000000000 +1000 @@ -64,8 +64,7 @@ */ struct macio_chip macio_chips[MAX_MACIO_CHIPS] __pmacdata; -struct macio_chip* __pmac -macio_find(struct device_node* child, int type) +struct macio_chip* __pmac macio_find(struct device_node* child, int type) { while(child) { int i; @@ -78,6 +77,7 @@ } return NULL; } +EXPORT_SYMBOL_GPL(macio_find); static const char* macio_names[] __pmacdata = { @@ -250,6 +250,30 @@ return 0; } +static long __pmac g5_i2s_enable(struct device_node *node, long param, long value) +{ + /* Very crude implementation for now */ + struct macio_chip* macio = &macio_chips[0]; + unsigned long flags; + + if (value == 0) + return 0; /* don't disable yet */ + + LOCK(flags); + MACIO_BIS(KEYLARGO_FCR3, KL3_CLK45_ENABLE | KL3_CLK49_ENABLE | + KL3_I2S0_CLK18_ENABLE); + udelay(10); + MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_I2S0_CELL_ENABLE | + K2_FCR1_I2S0_CLK_ENABLE_BIT | K2_FCR1_I2S0_ENABLE); + udelay(10); + MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_I2S0_RESET); + UNLOCK(flags); + udelay(10); + + return 0; +} + + #ifdef CONFIG_SMP static long __pmac g5_reset_cpu(struct device_node* node, long param, long value) { @@ -337,6 +361,7 @@ { PMAC_FTR_READ_GPIO, g5_read_gpio }, { PMAC_FTR_WRITE_GPIO, g5_write_gpio }, { PMAC_FTR_GMAC_PHY_RESET, g5_eth_phy_reset }, + { PMAC_FTR_SOUND_CHIP_ENABLE, g5_i2s_enable }, #ifdef CONFIG_SMP { PMAC_FTR_RESET_CPU, g5_reset_cpu }, #endif /* CONFIG_SMP */ Index: linux-work/arch/ppc/platforms/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_feature.c 2005-04-05 17:46:03.000000000 +1000 +++ linux-work/arch/ppc/platforms/pmac_feature.c 2005-04-11 17:45:09.000000000 +1000 @@ -74,8 +74,7 @@ */ struct macio_chip macio_chips[MAX_MACIO_CHIPS] __pmacdata; -struct macio_chip* __pmac -macio_find(struct device_node* child, int type) +struct macio_chip* __pmac macio_find(struct device_node* child, int type) { while(child) { int i; @@ -88,6 +87,7 @@ } return NULL; } +EXPORT_SYMBOL_GPL(macio_find); static const char* macio_names[] __pmacdata = { Index: linux-work/include/asm-ppc/dbdma.h =================================================================== --- linux-work.orig/include/asm-ppc/dbdma.h 2005-03-15 11:59:38.000000000 +1100 +++ linux-work/include/asm-ppc/dbdma.h 2005-04-11 17:45:09.000000000 +1000 @@ -88,7 +88,7 @@ #define WAIT_ALWAYS 3 /* always wait */ /* Align an address for a DBDMA command structure */ -#define DBDMA_ALIGN(x) (((unsigned)(x) + sizeof(struct dbdma_cmd) - 1) \ +#define DBDMA_ALIGN(x) (((unsigned long)(x) + sizeof(struct dbdma_cmd) - 1) \ & -sizeof(struct dbdma_cmd)) /* Useful macros */ Index: linux-work/sound/ppc/beep.c =================================================================== --- linux-work.orig/sound/ppc/beep.c 2005-03-15 12:00:38.000000000 +1100 +++ linux-work/sound/ppc/beep.c 2005-04-11 17:45:09.000000000 +1000 @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include #include #include "pmac.h" @@ -35,7 +37,7 @@ int hz; int nsamples; short *buf; /* allocated wave buffer */ - unsigned long addr; /* physical address of buffer */ + dma_addr_t addr; /* physical address of buffer */ struct input_dev dev; }; @@ -217,12 +219,8 @@ return -ENOMEM; memset(beep, 0, sizeof(*beep)); - beep->buf = (short *) kmalloc(BEEP_BUFLEN * 4, GFP_KERNEL); - if (! beep->buf) { - kfree(beep); - return -ENOMEM; - } - beep->addr = virt_to_bus(beep->buf); + beep->buf = dma_alloc_coherent(&chip->pdev->dev, BEEP_BUFLEN * 4, + &beep->addr, GFP_KERNEL); beep->dev.evbit[0] = BIT(EV_SND); beep->dev.sndbit[0] = BIT(SND_BELL) | BIT(SND_TONE); @@ -255,7 +253,8 @@ { if (chip->beep) { input_unregister_device(&chip->beep->dev); - kfree(chip->beep->buf); + dma_free_coherent(&chip->pdev->dev, BEEP_BUFLEN * 4, + chip->beep->buf, chip->beep->addr); kfree(chip->beep); chip->beep = NULL; } Index: linux-work/include/asm-ppc/keylargo.h =================================================================== --- linux-work.orig/include/asm-ppc/keylargo.h 2005-03-15 11:59:39.000000000 +1100 +++ linux-work/include/asm-ppc/keylargo.h 2005-04-11 17:45:09.000000000 +1000 @@ -228,6 +228,11 @@ #define K2_FCR1_PCI1_BUS_RESET_N 0x00000010 #define K2_FCR1_PCI1_SLEEP_RESET_EN 0x00000020 +#define K2_FCR1_I2S0_CELL_ENABLE 0x00000400 +#define K2_FCR1_I2S0_RESET 0x00000800 +#define K2_FCR1_I2S0_CLK_ENABLE_BIT 0x00001000 +#define K2_FCR1_I2S0_ENABLE 0x00002000 + #define K2_FCR1_PCI1_CLK_ENABLE 0x00004000 #define K2_FCR1_FW_CLK_ENABLE 0x00008000 #define K2_FCR1_FW_RESET_N 0x00010000 From tlnguyen at snoqualmie.dp.intel.com Tue Apr 12 04:25:47 2005 From: tlnguyen at snoqualmie.dp.intel.com (long) Date: Mon, 11 Apr 2005 11:25:47 -0700 Subject: PCI Error Recovery API Proposal (updated) Message-ID: <200504111825.j3BIPlVW015708@snoqualmie.dp.intel.com> On Friday, April 08, 2005 4:53 PM Benjamin Herrenschmidt wrote: > > The port driver is a PCI device driver, which supports PCI Express > > features. Each feature has its own service driver, which is not > > based on PCI Driver Model. These service drivers should be informed > > of what is going on in the hierarchy where fatal error occurs as > > well as what error recovery action takes place. Therefore, in my view, > > the port driver should be part of error recovery process, which is > > based on a native SW solution, not a FW policy. I hope you understand > > my concerns and rewrite the definition of this callback usage. > >Well, while I agree that the port driver is part of the process, it >doesn't have to adhere to the defined API as far as reset of it's >downlink link is concerned. Please understand that this API is aimed >toward "generic" facility for normal device drivers. The port driver is >a specific of the PCI Express implementation, and while it takes an >active role in the process, the generic API shouldn't be modified to >take into account specific needs of a port driver. Instead, the AER >should have it's own private API to the port driver to implement the >recovery process. Consider a system without FW support for link/bus reset and is using a full native OS solution. In this situation a port bus driver or a PCI bridge driver would need to execute the bus/link reset when we get an error from a device. Therefore an interface is required to request a bus/link reset from a bus driver. Specifically, in PCI Express the port bus driver owns execution of a downstream link reset for a link attached to a root port or downstream switch port. The port bus driver can perform a link reset by setting the SBR bit in the configuration registers. This results in a down stream hot-reset. Is there some method the current interface supplies to support a full native OS solution that I am not seeing? The error handling interfaces, in my view, should be flexible enough to support a full native OS solution with no FW assistance. Thanks, Long From juhl-lkml at dif.dk Tue Apr 12 06:36:39 2005 From: juhl-lkml at dif.dk (Jesper Juhl) Date: Mon, 11 Apr 2005 22:36:39 +0200 (CEST) Subject: [PATCH] redundant NULL checks before kfree should go away... Message-ID: (keeping me on CC when replying will be appreciated, thanks) kfree() checks for NULL pointers. Checking prior to calling it is reundant. This patch removes such redundant checks from arch/ppc64/ Signed-off-by: Jesper Juhl diff -upr linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/lparcfg.c linux-2.6.12-rc2-mm3/arch/ppc64/kernel/lparcfg.c --- linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/lparcfg.c 2005-04-05 21:21:14.000000000 +0200 +++ linux-2.6.12-rc2-mm3/arch/ppc64/kernel/lparcfg.c 2005-04-11 22:23:37.000000000 +0200 @@ -597,9 +597,7 @@ int __init lparcfg_init(void) void __exit lparcfg_cleanup(void) { if (proc_ppc64_lparcfg) { - if (proc_ppc64_lparcfg->data) { - kfree(proc_ppc64_lparcfg->data); - } + kfree(proc_ppc64_lparcfg->data); remove_proc_entry("lparcfg", proc_ppc64_lparcfg->parent); } } diff -upr linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/pSeries_reconfig.c linux-2.6.12-rc2-mm3/arch/ppc64/kernel/pSeries_reconfig.c --- linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/pSeries_reconfig.c 2005-04-05 21:21:14.000000000 +0200 +++ linux-2.6.12-rc2-mm3/arch/ppc64/kernel/pSeries_reconfig.c 2005-04-11 22:25:01.000000000 +0200 @@ -294,10 +294,8 @@ static struct property *new_property(con return new; cleanup: - if (new->name) - kfree(new->name); - if (new->value) - kfree(new->value); + kfree(new->name); + kfree(new->value); kfree(new); return NULL; } diff -upr linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/rtas_flash.c linux-2.6.12-rc2-mm3/arch/ppc64/kernel/rtas_flash.c --- linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/rtas_flash.c 2005-04-05 21:21:14.000000000 +0200 +++ linux-2.6.12-rc2-mm3/arch/ppc64/kernel/rtas_flash.c 2005-04-11 22:25:49.000000000 +0200 @@ -565,8 +565,7 @@ static int validate_flash_release(struct static void remove_flash_pde(struct proc_dir_entry *dp) { if (dp) { - if (dp->data != NULL) - kfree(dp->data); + kfree(dp->data); dp->owner = NULL; remove_proc_entry(dp->name, dp->parent); } diff -upr linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/scanlog.c linux-2.6.12-rc2-mm3/arch/ppc64/kernel/scanlog.c --- linux-2.6.12-rc2-mm3-orig/arch/ppc64/kernel/scanlog.c 2005-04-05 21:21:14.000000000 +0200 +++ linux-2.6.12-rc2-mm3/arch/ppc64/kernel/scanlog.c 2005-04-11 22:26:11.000000000 +0200 @@ -234,8 +234,7 @@ int __init scanlog_init(void) void __exit scanlog_cleanup(void) { if (proc_ppc64_scan_log_dump) { - if (proc_ppc64_scan_log_dump->data) - kfree(proc_ppc64_scan_log_dump->data); + kfree(proc_ppc64_scan_log_dump->data); remove_proc_entry("scan-log-dump", proc_ppc64_scan_log_dump->parent); } } From benh at kernel.crashing.org Tue Apr 12 09:56:40 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 12 Apr 2005 09:56:40 +1000 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <200504111825.j3BIPlVW015708@snoqualmie.dp.intel.com> References: <200504111825.j3BIPlVW015708@snoqualmie.dp.intel.com> Message-ID: <1113263800.5388.9.camel@gaston> On Mon, 2005-04-11 at 11:25 -0700, long wrote: > Consider a system without FW support for link/bus reset and is using a > full native OS solution. In this situation a port bus driver or a PCI > bridge driver would need to execute the bus/link reset when we get an > error from a device. Therefore an interface is required to request a > bus/link reset from a bus driver. Specifically, in PCI Express the > port bus driver owns execution of a downstream link reset for a link > attached to a root port or downstream switch port. The port bus driver > can perform a link reset by setting the SBR bit in the configuration > registers. This results in a down stream hot-reset. > > Is there some method the current interface supplies to support a full > native OS solution that I am not seeing? The error handling interfaces, > in my view, should be flexible enough to support a full native OS > solution with no FW assistance. Oh, and I'm not saying the contrary. But again, the API I propose define the various step that "happen" to a device. Asking a port driver to do a link reset doesn't fit in that category. On the same way, the API doesn't define a way to ask the port driver or whatever to do a full card reset. It only defines a way for a card to request it, and to be notified that it happened. The actual code doing things like card resets, slot isolation, link reset, etc... is part of the implementation specific core. The ppc64 one will use the firmware, I expect your "generic" PCI Express implementation to do it differently. In your case, you have already defined a specific model for port drivers, which can be added a function to request a link reset, that your PCI Express error recovery port will then call. I would appreciate if other people following this discussion could give their opinion here, just to make sure I'm not following the wrong track. But it seems to me that issuing the request for a link reset doesn't fit in the leaf driver recovery API, but rather in the private API that you will define between the PCI Express error recovery core and the PCI Express port drivers... Ben. From jk at blackdown.de Tue Apr 12 11:36:48 2005 From: jk at blackdown.de (Juergen Kreileder) Date: Tue, 12 Apr 2005 03:36:48 +0200 Subject: PER_LINUX32 fixes Message-ID: <87ll7ord1r.fsf@blackdown.de> A non-text attachment was scrubbed... Name: per_linux-2.6.12-rc2-mm3.patch Type: text/x-patch Size: 4016 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050412/0862808d/attachment.bin From benh at kernel.crashing.org Tue Apr 12 15:07:16 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 12 Apr 2005 15:07:16 +1000 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) Message-ID: <1113282436.21548.42.camel@gaston> Hi ! (Andrew: This is an update of the previous patch, it fixes a problem with headphone beeing incorrectly muted on some models). This patch hacks the current PowerMac Alsa driver to add some basic support of analog sound output to some desktop G5s. It has severe limitations though: - Only 44100Khz 16 bits - Only work on G5 models using a TAS3004 analog code, that is early single CPU desktops and all dual CPU desktops at this date, but none of the more recent ones like iMac G5. - It does analog only, no digital/SPDIF support at all, no native AC3 support Better support would require a complete rewrite of the driver (which I am working on, but don't hold your breath), to properly support the diversity of apple sound HW setup, including dual codecs, several i2s busses, all the new codecs used in the new machines, proper clock switching with digital, etc etc etc... This patch applies on top of the other PowerMac sound patches I posted in the past couple of days (new powerbook support and sleep fixes). Note: This is a FAQ entry for PowerMac sound support with TI codecs: They have a feature called "DRC" which is automatically enabled for the internal speaker (at least when auto mute control is enabled) which will cause your sound to fade out to nothing after half a second of playback if you don't set a proper "DRC Range" in the mixer. So if you have a problem like that, check alsamixer and raise your DRC Range to something reasonable. Note2: This patch will also add auto-mute of the speaker when line-out jack is used on some earlier desktop G4s (and on the G5) in addition to the headphone jack. If that behaviour isn't what you want, just disable auto-muting and use the manual mute controls in alsamixer. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/sound/ppc/pmac.c =================================================================== --- linux-work.orig/sound/ppc/pmac.c 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/sound/ppc/pmac.c 2005-04-11 22:27:37.000000000 +1000 @@ -27,14 +27,13 @@ #include #include #include +#include +#include #include #include "pmac.h" #include -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS #include -#else -#include -#endif +#include #if defined(CONFIG_PM) && defined(CONFIG_PMAC_PBOOK) @@ -57,22 +56,29 @@ /* * allocate DBDMA command arrays */ -static int snd_pmac_dbdma_alloc(pmac_dbdma_t *rec, int size) +static int snd_pmac_dbdma_alloc(pmac_t *chip, pmac_dbdma_t *rec, int size) { - rec->space = kmalloc(sizeof(struct dbdma_cmd) * (size + 1), GFP_KERNEL); + unsigned int rsize = sizeof(struct dbdma_cmd) * (size + 1); + + rec->space = dma_alloc_coherent(&chip->pdev->dev, rsize, + &rec->dma_base, GFP_KERNEL); if (rec->space == NULL) return -ENOMEM; rec->size = size; - memset(rec->space, 0, sizeof(struct dbdma_cmd) * (size + 1)); + memset(rec->space, 0, rsize); rec->cmds = (void __iomem *)DBDMA_ALIGN(rec->space); - rec->addr = virt_to_bus(rec->cmds); + rec->addr = rec->dma_base + (unsigned long)((char *)rec->cmds - (char *)rec->space); + return 0; } -static void snd_pmac_dbdma_free(pmac_dbdma_t *rec) +static void snd_pmac_dbdma_free(pmac_t *chip, pmac_dbdma_t *rec) { - if (rec) - kfree(rec->space); + if (rec) { + unsigned int rsize = sizeof(struct dbdma_cmd) * (rec->size + 1); + + dma_free_coherent(&chip->pdev->dev, rsize, rec->space, rec->dma_base); + } } @@ -237,7 +243,7 @@ /* continuous DMA memory type doesn't provide the physical address, * so we need to resolve the address here... */ - offset = virt_to_bus(runtime->dma_area); + offset = runtime->dma_addr; for (i = 0, cp = rec->cmd.cmds; i < rec->nperiods; i++, cp++) { st_le32(&cp->phy_addr, offset); st_le16(&cp->req_count, rec->period_size); @@ -664,8 +670,8 @@ chip->capture.cur_freqs = chip->freqs_ok; /* preallocate 64k buffer */ - snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_CONTINUOUS, - snd_dma_continuous_data(GFP_KERNEL), + snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, + &chip->pdev->dev, 64 * 1024, 64 * 1024); return 0; @@ -757,28 +763,10 @@ /* * a wrapper to feature call for compatibility */ -#if defined(CONFIG_PM) && defined(CONFIG_PMAC_PBOOK) static void snd_pmac_sound_feature(pmac_t *chip, int enable) { -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS ppc_md.feature_call(PMAC_FTR_SOUND_CHIP_ENABLE, chip->node, 0, enable); -#else - if (chip->is_pbook_G3) { - pmu_suspend(); - feature_clear(chip->node, FEATURE_Sound_power); - feature_clear(chip->node, FEATURE_Sound_CLK_enable); - big_mdelay(1000); /* XXX */ - pmu_resume(); - } - if (chip->is_pbook_3400) { - feature_set(chip->node, FEATURE_IOBUS_enable); - udelay(10); - } -#endif } -#else /* CONFIG_PM && CONFIG_PMAC_PBOOK */ -#define snd_pmac_sound_feature(chip,enable) /**/ -#endif /* CONFIG_PM && CONFIG_PMAC_PBOOK */ /* * release resources @@ -786,8 +774,6 @@ static int snd_pmac_free(pmac_t *chip) { - int i; - /* stop sounds */ if (chip->initialized) { snd_pmac_dbdma_reset(chip); @@ -813,9 +799,9 @@ free_irq(chip->tx_irq, (void*)chip); if (chip->rx_irq >= 0) free_irq(chip->rx_irq, (void*)chip); - snd_pmac_dbdma_free(&chip->playback.cmd); - snd_pmac_dbdma_free(&chip->capture.cmd); - snd_pmac_dbdma_free(&chip->extra_dma); + snd_pmac_dbdma_free(chip, &chip->playback.cmd); + snd_pmac_dbdma_free(chip, &chip->capture.cmd); + snd_pmac_dbdma_free(chip, &chip->extra_dma); if (chip->macio_base) iounmap(chip->macio_base); if (chip->latch_base) @@ -826,12 +812,23 @@ iounmap(chip->playback.dma); if (chip->capture.dma) iounmap(chip->capture.dma); +#ifndef CONFIG_PPC64 if (chip->node) { + int i; + for (i = 0; i < 3; i++) { - if (chip->of_requested & (1 << i)) - release_OF_resource(chip->node, i); + if (chip->of_requested & (1 << i)) { + if (chip->is_k2) + release_OF_resource(chip->node->parent, + i); + else + release_OF_resource(chip->node, i); + } } } +#endif /* CONFIG_PPC64 */ + if (chip->pdev) + pci_dev_put(chip->pdev); kfree(chip); return 0; } @@ -881,6 +878,8 @@ { struct device_node *sound; unsigned int *prop, l; + struct macio_chip* macio; + u32 layout_id = 0; if (_machine != _MACH_Pmac) @@ -918,10 +917,17 @@ * if we didn't find a davbus device, try 'i2s-a' since * this seems to be what iBooks have */ - if (! chip->node) + if (! chip->node) { chip->node = find_devices("i2s-a"); + if (chip->node && chip->node->parent && chip->node->parent->parent) { + if (device_is_compatible(chip->node->parent->parent, + "K2-Keylargo")) + chip->is_k2 = 1; + } + } if (! chip->node) return -ENODEV; + sound = find_devices("sound"); while (sound && sound->parent != chip->node) sound = sound->next; @@ -966,7 +972,8 @@ chip->control_mask = MASK_IEPC | 0x11; /* disable IEE */ } if (device_is_compatible(sound, "AOAKeylargo") || - device_is_compatible(sound, "AOAbase")) { + device_is_compatible(sound, "AOAbase") || + device_is_compatible(sound, "AOAK2")) { /* For now, only support very basic TAS3004 based machines with * single frequency until proper i2s control is implemented */ @@ -975,6 +982,7 @@ case 0x46: case 0x33: case 0x29: + case 0x24: chip->num_freqs = ARRAY_SIZE(tumbler_freqs); chip->model = PMAC_SNAPPER; chip->can_byte_swap = 0; /* FIXME: check this */ @@ -987,6 +995,26 @@ chip->device_id = *prop; chip->has_iic = (find_devices("perch") != NULL); + /* We need the PCI device for DMA allocations, let's use a crude method + * for now ... + */ + macio = macio_find(chip->node, macio_unknown); + if (macio == NULL) + printk(KERN_WARNING "snd-powermac: can't locate macio !\n"); + else { + struct pci_dev *pdev = NULL; + + for_each_pci_dev(pdev) { + struct device_node *np = pci_device_to_OF_node(pdev); + if (np && np == macio->of_node) { + chip->pdev = pdev; + break; + } + } + } + if (chip->pdev == NULL) + printk(KERN_WARNING "snd-powermac: can't locate macio PCI device !\n"); + detect_byte_swap(chip); /* look for a property saying what sample rates @@ -1091,8 +1119,10 @@ int err; chip->auto_mute = 1; err = snd_ctl_add(chip->card, snd_ctl_new1(&auto_mute_controls[0], chip)); - if (err < 0) + if (err < 0) { + printk(KERN_ERR "snd-powermac: Failed to add automute control\n"); return err; + } chip->hp_detect_ctl = snd_ctl_new1(&auto_mute_controls[1], chip); return snd_ctl_add(chip->card, chip->hp_detect_ctl); } @@ -1106,6 +1136,7 @@ pmac_t *chip; struct device_node *np; int i, err; + unsigned long ctrl_addr, txdma_addr, rxdma_addr; static snd_device_ops_t ops = { .dev_free = snd_pmac_dev_free, }; @@ -1127,32 +1158,59 @@ if ((err = snd_pmac_detect(chip)) < 0) goto __error; - if (snd_pmac_dbdma_alloc(&chip->playback.cmd, PMAC_MAX_FRAGS + 1) < 0 || - snd_pmac_dbdma_alloc(&chip->capture.cmd, PMAC_MAX_FRAGS + 1) < 0 || - snd_pmac_dbdma_alloc(&chip->extra_dma, 2) < 0) { + if (snd_pmac_dbdma_alloc(chip, &chip->playback.cmd, PMAC_MAX_FRAGS + 1) < 0 || + snd_pmac_dbdma_alloc(chip, &chip->capture.cmd, PMAC_MAX_FRAGS + 1) < 0 || + snd_pmac_dbdma_alloc(chip, &chip->extra_dma, 2) < 0) { err = -ENOMEM; goto __error; } np = chip->node; - if (np->n_addrs < 3 || np->n_intrs < 3) { - err = -ENODEV; - goto __error; - } + if (chip->is_k2) { + if (np->parent->n_addrs < 2 || np->n_intrs < 3) { + err = -ENODEV; + goto __error; + } + for (i = 0; i < 2; i++) { +#ifndef CONFIG_PPC64 + static char *name[2] = { "- Control", "- DMA" }; + if (! request_OF_resource(np->parent, i, name[i])) { + snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + err = -ENODEV; + goto __error; + } + chip->of_requested |= (1 << i); +#endif /* CONFIG_PPC64 */ + ctrl_addr = np->parent->addrs[0].address; + txdma_addr = np->parent->addrs[1].address; + rxdma_addr = txdma_addr + 0x100; + } - for (i = 0; i < 3; i++) { - static char *name[3] = { NULL, "- Tx DMA", "- Rx DMA" }; - if (! request_OF_resource(np, i, name[i])) { - snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + } else { + if (np->n_addrs < 3 || np->n_intrs < 3) { err = -ENODEV; goto __error; } - chip->of_requested |= (1 << i); + + for (i = 0; i < 3; i++) { +#ifndef CONFIG_PPC64 + static char *name[3] = { "- Control", "- Tx DMA", "- Rx DMA" }; + if (! request_OF_resource(np, i, name[i])) { + snd_printk(KERN_ERR "pmac: can't request resource %d!\n", i); + err = -ENODEV; + goto __error; + } + chip->of_requested |= (1 << i); +#endif /* CONFIG_PPC64 */ + ctrl_addr = np->addrs[0].address; + txdma_addr = np->addrs[1].address; + rxdma_addr = np->addrs[2].address; + } } - chip->awacs = ioremap(np->addrs[0].address, 0x1000); - chip->playback.dma = ioremap(np->addrs[1].address, 0x100); - chip->capture.dma = ioremap(np->addrs[2].address, 0x100); + chip->awacs = ioremap(ctrl_addr, 0x1000); + chip->playback.dma = ioremap(txdma_addr, 0x100); + chip->capture.dma = ioremap(rxdma_addr, 0x100); if (chip->model <= PMAC_BURGUNDY) { if (request_irq(np->intrs[0].line, snd_pmac_ctrl_intr, 0, "PMac", (void*)chip)) { @@ -1180,7 +1238,8 @@ snd_pmac_sound_feature(chip, 1); /* reset */ - out_le32(&chip->awacs->control, 0x11); + if (chip->model == PMAC_AWACS) + out_le32(&chip->awacs->control, 0x11); /* Powerbooks have odd ways of enabling inputs such as an expansion-bay CD or sound from an internal modem @@ -1232,6 +1291,8 @@ return 0; __error: + if (chip->pdev) + pci_dev_put(chip->pdev); snd_pmac_free(chip); return err; } Index: linux-work/sound/ppc/pmac.h =================================================================== --- linux-work.orig/sound/ppc/pmac.h 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/sound/ppc/pmac.h 2005-04-11 22:27:37.000000000 +1000 @@ -60,7 +60,8 @@ * DBDMA space */ struct snd_pmac_dbdma { - unsigned long addr; + dma_addr_t dma_base; + dma_addr_t addr; struct dbdma_cmd __iomem *cmds; void *space; int size; @@ -101,6 +102,7 @@ /* h/w info */ struct device_node *node; + struct pci_dev *pdev; unsigned int revision; unsigned int manufacturer; unsigned int subframe; @@ -110,6 +112,7 @@ unsigned int has_iic : 1; unsigned int is_pbook_3400 : 1; unsigned int is_pbook_G3 : 1; + unsigned int is_k2 : 1; unsigned int can_byte_swap : 1; unsigned int can_duplex : 1; @@ -157,6 +160,7 @@ snd_kcontrol_t *speaker_sw_ctl; snd_kcontrol_t *drc_sw_ctl; /* only used for tumbler -ReneR */ snd_kcontrol_t *hp_detect_ctl; + snd_kcontrol_t *lineout_sw_ctl; /* lowlevel callbacks */ void (*set_format)(pmac_t *chip); Index: linux-work/sound/ppc/tumbler.c =================================================================== --- linux-work.orig/sound/ppc/tumbler.c 2005-04-11 22:14:45.000000000 +1000 +++ linux-work/sound/ppc/tumbler.c 2005-04-12 15:02:51.000000000 +1000 @@ -35,14 +35,19 @@ #include #include #include -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS +#include #include -#else -#error old crap -#endif #include "pmac.h" #include "tumbler_volume.h" +#undef DEBUG + +#ifdef DEBUG +#define DBG(fmt...) printk(fmt) +#else +#define DBG(fmt...) +#endif + /* i2c address for tumbler */ #define TAS_I2C_ADDR 0x34 @@ -78,21 +83,22 @@ }; typedef struct pmac_gpio { -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS unsigned int addr; -#else - void __iomem *addr; -#endif - int active_state; + u8 active_val; + u8 inactive_val; + u8 active_state; } pmac_gpio_t; typedef struct pmac_tumbler_t { pmac_keywest_t i2c; pmac_gpio_t audio_reset; pmac_gpio_t amp_mute; + pmac_gpio_t line_mute; + pmac_gpio_t line_detect; pmac_gpio_t hp_mute; pmac_gpio_t hp_detect; int headphone_irq; + int lineout_irq; unsigned int master_vol[2]; unsigned int save_master_switch[2]; unsigned int master_switch[2]; @@ -120,6 +126,7 @@ regs[0], regs[1]); if (err >= 0) break; + DBG("(W) i2c error %d\n", err); mdelay(10); } while (count--); if (err < 0) @@ -137,6 +144,7 @@ TAS_REG_MCS, (1<<6)|(2<<4)|(2<<2)|0, 0, /* terminator */ }; + DBG("(I) tumbler init client\n"); return send_init_client(i2c, regs); } @@ -151,36 +159,27 @@ TAS_REG_ACS, 0, 0, /* terminator */ }; + DBG("(I) snapper init client\n"); return send_init_client(i2c, regs); } /* * gpio access */ -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS #define do_gpio_write(gp, val) \ pmac_call_feature(PMAC_FTR_WRITE_GPIO, NULL, (gp)->addr, val) #define do_gpio_read(gp) \ pmac_call_feature(PMAC_FTR_READ_GPIO, NULL, (gp)->addr, 0) #define tumbler_gpio_free(gp) /* NOP */ -#else -#define do_gpio_write(gp, val) writeb(val, (gp)->addr) -#define do_gpio_read(gp) readb((gp)->addr) -static inline void tumbler_gpio_free(pmac_gpio_t *gp) -{ - if (gp->addr) { - iounmap(gp->addr); - gp->addr = NULL; - } -} -#endif /* CONFIG_PPC_HAS_FEATURE_CALLS */ static void write_audio_gpio(pmac_gpio_t *gp, int active) { if (! gp->addr) return; - active = active ? gp->active_state : !gp->active_state; - do_gpio_write(gp, active ? 0x05 : 0x04); + active = active ? gp->active_val : gp->inactive_val; + + do_gpio_write(gp, active); + DBG("(I) gpio %x write %d\n", gp->addr, active); } static int read_audio_gpio(pmac_gpio_t *gp) @@ -663,7 +662,7 @@ * to avoid codec reset on ibook M7 */ -enum { TUMBLER_MUTE_HP, TUMBLER_MUTE_AMP }; +enum { TUMBLER_MUTE_HP, TUMBLER_MUTE_AMP, TUMBLER_MUTE_LINE }; static int tumbler_get_mute_switch(snd_kcontrol_t *kcontrol, snd_ctl_elem_value_t *ucontrol) { @@ -672,7 +671,18 @@ pmac_gpio_t *gp; if (! (mix = chip->mixer_data)) return -ENODEV; - gp = (kcontrol->private_value == TUMBLER_MUTE_HP) ? &mix->hp_mute : &mix->amp_mute; + switch(kcontrol->private_value) { + case TUMBLER_MUTE_HP: + gp = &mix->hp_mute; break; + case TUMBLER_MUTE_AMP: + gp = &mix->amp_mute; break; + case TUMBLER_MUTE_LINE: + gp = &mix->line_mute; break; + default: + gp = NULL; + } + if (gp == NULL) + return -EINVAL; ucontrol->value.integer.value[0] = ! read_audio_gpio(gp); return 0; } @@ -689,7 +699,18 @@ #endif if (! (mix = chip->mixer_data)) return -ENODEV; - gp = (kcontrol->private_value == TUMBLER_MUTE_HP) ? &mix->hp_mute : &mix->amp_mute; + switch(kcontrol->private_value) { + case TUMBLER_MUTE_HP: + gp = &mix->hp_mute; break; + case TUMBLER_MUTE_AMP: + gp = &mix->amp_mute; break; + case TUMBLER_MUTE_LINE: + gp = &mix->line_mute; break; + default: + gp = NULL; + } + if (gp == NULL) + return -EINVAL; val = ! read_audio_gpio(gp); if (val != ucontrol->value.integer.value[0]) { write_audio_gpio(gp, ! ucontrol->value.integer.value[0]); @@ -833,6 +854,14 @@ .put = tumbler_put_mute_switch, .private_value = TUMBLER_MUTE_AMP, }; +static snd_kcontrol_new_t tumbler_lineout_sw __initdata = { + .iface = SNDRV_CTL_ELEM_IFACE_MIXER, + .name = "Line Out Playback Switch", + .info = snd_pmac_boolean_mono_info, + .get = tumbler_get_mute_switch, + .put = tumbler_put_mute_switch, + .private_value = TUMBLER_MUTE_LINE, +}; static snd_kcontrol_new_t tumbler_drc_sw __initdata = { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = "DRC Switch", @@ -849,7 +878,21 @@ static int tumbler_detect_headphone(pmac_t *chip) { pmac_tumbler_t *mix = chip->mixer_data; - return read_audio_gpio(&mix->hp_detect); + int detect = 0; + + if (mix->hp_detect.addr) + detect |= read_audio_gpio(&mix->hp_detect); + return detect; +} + +static int tumbler_detect_lineout(pmac_t *chip) +{ + pmac_tumbler_t *mix = chip->mixer_data; + int detect = 0; + + if (mix->line_detect.addr) + detect |= read_audio_gpio(&mix->line_detect); + return detect; } static void check_mute(pmac_t *chip, pmac_gpio_t *gp, int val, int do_notify, snd_kcontrol_t *sw) @@ -868,6 +911,7 @@ { pmac_t *chip = (pmac_t*) self; pmac_tumbler_t *mix; + int headphone, lineout; if (!chip) return; @@ -875,23 +919,35 @@ mix = chip->mixer_data; snd_assert(mix, return); - if (tumbler_detect_headphone(chip)) { - /* mute speaker */ - check_mute(chip, &mix->hp_mute, 0, mix->auto_mute_notify, - chip->master_sw_ctl); + headphone = tumbler_detect_headphone(chip); + lineout = tumbler_detect_lineout(chip); + + DBG("headphone: %d, lineout: %d\n", headphone, lineout); + + if (headphone || lineout) { + /* unmute headphone/lineout & mute speaker */ + if (headphone) + check_mute(chip, &mix->hp_mute, 0, mix->auto_mute_notify, + chip->master_sw_ctl); + if (lineout && mix->line_mute.addr != 0) + check_mute(chip, &mix->line_mute, 0, mix->auto_mute_notify, + chip->lineout_sw_ctl); if (mix->anded_reset) big_mdelay(10); check_mute(chip, &mix->amp_mute, 1, mix->auto_mute_notify, chip->speaker_sw_ctl); mix->drc_enable = 0; } else { - /* unmute speaker */ + /* unmute speaker, mute others */ check_mute(chip, &mix->amp_mute, 0, mix->auto_mute_notify, chip->speaker_sw_ctl); if (mix->anded_reset) big_mdelay(10); check_mute(chip, &mix->hp_mute, 1, mix->auto_mute_notify, chip->master_sw_ctl); + if (mix->line_mute.addr != 0) + check_mute(chip, &mix->line_mute, 1, mix->auto_mute_notify, + chip->lineout_sw_ctl); mix->drc_enable = 1; } if (mix->auto_mute_notify) { @@ -967,7 +1023,7 @@ } /* find an audio device and get its address */ -static long tumbler_find_device(const char *device, pmac_gpio_t *gp, int is_compatible) +static long tumbler_find_device(const char *device, const char *platform, pmac_gpio_t *gp, int is_compatible) { struct device_node *node; u32 *base, addr; @@ -977,6 +1033,7 @@ else node = find_audio_device(device); if (! node) { + DBG("(W) cannot find audio device %s !\n", device); snd_printdd("cannot find device %s\n", device); return -ENODEV; } @@ -985,29 +1042,48 @@ if (! base) { base = (u32 *)get_property(node, "reg", NULL); if (!base) { + DBG("(E) cannot find address for device %s !\n", device); snd_printd("cannot find address for device %s\n", device); return -ENODEV; } - /* this only work if PPC_HAS_FEATURE_CALLS is set as we - * are only getting the low part of the address - */ addr = *base; if (addr < 0x50) addr += 0x50; } else addr = *base; -#ifdef CONFIG_PPC_HAS_FEATURE_CALLS gp->addr = addr & 0x0000ffff; -#else - gp->addr = ioremap((unsigned long)addr, 1); -#endif /* Try to find the active state, default to 0 ! */ base = (u32 *)get_property(node, "audio-gpio-active-state", NULL); - if (base) + if (base) { gp->active_state = *base; - else + gp->active_val = (*base) ? 0x5 : 0x4; + gp->inactive_val = (*base) ? 0x4 : 0x5; + } else { + u32 *prop = NULL; gp->active_state = 0; + gp->active_val = 0x4; + gp->inactive_val = 0x5; + /* Here are some crude hacks to extract the GPIO polarity and + * open collector informations out of the do-platform script + * as we don't yet have an interpreter for these things + */ + if (platform) + prop = (u32 *)get_property(node, platform, NULL); + if (prop) { + if (prop[3] == 0x9 && prop[4] == 0x9) { + gp->active_val = 0xd; + gp->inactive_val = 0xc; + } + if (prop[3] == 0x1 && prop[4] == 0x1) { + gp->active_val = 0x5; + gp->inactive_val = 0x4; + } + } + } + + DBG("(I) GPIO device %s found, offset: %x, active state: %d !\n", + device, gp->addr, gp->active_state); return (node->n_intrs > 0) ? node->intrs[0].line : 0; } @@ -1018,6 +1094,7 @@ pmac_tumbler_t *mix = chip->mixer_data; if (mix->anded_reset) { + DBG("(I) codec anded reset !\n"); write_audio_gpio(&mix->hp_mute, 0); write_audio_gpio(&mix->amp_mute, 0); big_mdelay(200); @@ -1028,6 +1105,8 @@ write_audio_gpio(&mix->amp_mute, 0); big_mdelay(100); } else { + DBG("(I) codec normal reset !\n"); + write_audio_gpio(&mix->audio_reset, 0); big_mdelay(200); write_audio_gpio(&mix->audio_reset, 1); @@ -1045,6 +1124,8 @@ if (mix->headphone_irq >= 0) disable_irq(mix->headphone_irq); + if (mix->lineout_irq >= 0) + disable_irq(mix->lineout_irq); mix->save_master_switch[0] = mix->master_switch[0]; mix->save_master_switch[1] = mix->master_switch[1]; mix->master_switch[0] = mix->master_switch[1] = 0; @@ -1099,41 +1180,59 @@ chip->update_automute(chip, 0); if (mix->headphone_irq >= 0) enable_irq(mix->headphone_irq); + if (mix->lineout_irq >= 0) + enable_irq(mix->lineout_irq); } #endif /* initialize tumbler */ static int __init tumbler_init(pmac_t *chip) { - int irq, err; + int irq; pmac_tumbler_t *mix = chip->mixer_data; snd_assert(mix, return -EINVAL); - if (tumbler_find_device("audio-hw-reset", &mix->audio_reset, 0) < 0) - tumbler_find_device("hw-reset", &mix->audio_reset, 1); - if (tumbler_find_device("amp-mute", &mix->amp_mute, 0) < 0) - tumbler_find_device("amp-mute", &mix->amp_mute, 1); - if (tumbler_find_device("headphone-mute", &mix->hp_mute, 0) < 0) - tumbler_find_device("headphone-mute", &mix->hp_mute, 1); - irq = tumbler_find_device("headphone-detect", &mix->hp_detect, 0); + if (tumbler_find_device("audio-hw-reset", + "platform-do-hw-reset", + &mix->audio_reset, 0) < 0) + tumbler_find_device("hw-reset", + "platform-do-hw-reset", + &mix->audio_reset, 1); + if (tumbler_find_device("amp-mute", + "platform-do-amp-mute", + &mix->amp_mute, 0) < 0) + tumbler_find_device("amp-mute", + "platform-do-amp-mute", + &mix->amp_mute, 1); + if (tumbler_find_device("headphone-mute", + "platform-do-headphone-mute", + &mix->hp_mute, 0) < 0) + tumbler_find_device("headphone-mute", + "platform-do-headphone-mute", + &mix->hp_mute, 1); + if (tumbler_find_device("line-output-mute", + "platform-do-lineout-mute", + &mix->line_mute, 0) < 0) + tumbler_find_device("line-output-mute", + "platform-do-lineout-mute", + &mix->line_mute, 1); + irq = tumbler_find_device("headphone-detect", + NULL, &mix->hp_detect, 0); if (irq < 0) - irq = tumbler_find_device("headphone-detect", &mix->hp_detect, 1); + irq = tumbler_find_device("headphone-detect", + NULL, &mix->hp_detect, 1); if (irq < 0) - irq = tumbler_find_device("keywest-gpio15", &mix->hp_detect, 1); - - tumbler_reset_audio(chip); - - /* activate headphone status interrupts */ - if (irq >= 0) { - unsigned char val; - if ((err = request_irq(irq, headphone_intr, 0, - "Tumbler Headphone Detection", chip)) < 0) - return err; - /* activate headphone status interrupts */ - val = do_gpio_read(&mix->hp_detect); - do_gpio_write(&mix->hp_detect, val | 0x80); - } + irq = tumbler_find_device("keywest-gpio15", + NULL, &mix->hp_detect, 1); mix->headphone_irq = irq; + irq = tumbler_find_device("line-output-detect", + NULL, &mix->line_detect, 0); + if (irq < 0) + irq = tumbler_find_device("line-output-detect", + NULL, &mix->line_detect, 1); + mix->lineout_irq = irq; + + tumbler_reset_audio(chip); return 0; } @@ -1146,6 +1245,8 @@ if (mix->headphone_irq >= 0) free_irq(mix->headphone_irq, chip); + if (mix->lineout_irq >= 0) + free_irq(mix->lineout_irq, chip); tumbler_gpio_free(&mix->audio_reset); tumbler_gpio_free(&mix->amp_mute); tumbler_gpio_free(&mix->hp_mute); @@ -1207,6 +1308,8 @@ else mix->i2c.addr = TAS_I2C_ADDR; + DBG("(I) TAS i2c address is: %x\n", mix->i2c.addr); + if (chip->model == PMAC_TUMBLER) { mix->i2c.init_client = tumbler_init_client; mix->i2c.name = "TAS3001c"; @@ -1242,6 +1345,11 @@ chip->speaker_sw_ctl = snd_ctl_new1(&tumbler_speaker_sw, chip); if ((err = snd_ctl_add(chip->card, chip->speaker_sw_ctl)) < 0) return err; + if (mix->line_mute.addr != 0) { + chip->lineout_sw_ctl = snd_ctl_new1(&tumbler_lineout_sw, chip); + if ((err = snd_ctl_add(chip->card, chip->lineout_sw_ctl)) < 0) + return err; + } chip->drc_sw_ctl = snd_ctl_new1(&tumbler_drc_sw, chip); if ((err = snd_ctl_add(chip->card, chip->drc_sw_ctl)) < 0) return err; @@ -1254,11 +1362,32 @@ INIT_WORK(&device_change, device_change_handler, (void *)chip); #ifdef PMAC_SUPPORT_AUTOMUTE - if (mix->headphone_irq >=0 && (err = snd_pmac_add_automute(chip)) < 0) + if ((mix->headphone_irq >=0 || mix->lineout_irq >= 0) + && (err = snd_pmac_add_automute(chip)) < 0) return err; chip->detect_headphone = tumbler_detect_headphone; chip->update_automute = tumbler_update_automute; tumbler_update_automute(chip, 0); /* update the status only */ + + /* activate headphone status interrupts */ + if (mix->headphone_irq >= 0) { + unsigned char val; + if ((err = request_irq(mix->headphone_irq, headphone_intr, 0, + "Sound Headphone Detection", chip)) < 0) + return 0; + /* activate headphone status interrupts */ + val = do_gpio_read(&mix->hp_detect); + do_gpio_write(&mix->hp_detect, val | 0x80); + } + if (mix->lineout_irq >= 0) { + unsigned char val; + if ((err = request_irq(mix->lineout_irq, headphone_intr, 0, + "Sound Lineout Detection", chip)) < 0) + return 0; + /* activate headphone status interrupts */ + val = do_gpio_read(&mix->line_detect); + do_gpio_write(&mix->line_detect, val | 0x80); + } #endif return 0; Index: linux-work/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_feature.c 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pmac_feature.c 2005-04-11 22:27:38.000000000 +1000 @@ -64,8 +64,7 @@ */ struct macio_chip macio_chips[MAX_MACIO_CHIPS] __pmacdata; -struct macio_chip* __pmac -macio_find(struct device_node* child, int type) +struct macio_chip* __pmac macio_find(struct device_node* child, int type) { while(child) { int i; @@ -78,6 +77,7 @@ } return NULL; } +EXPORT_SYMBOL_GPL(macio_find); static const char* macio_names[] __pmacdata = { @@ -250,6 +250,30 @@ return 0; } +static long __pmac g5_i2s_enable(struct device_node *node, long param, long value) +{ + /* Very crude implementation for now */ + struct macio_chip* macio = &macio_chips[0]; + unsigned long flags; + + if (value == 0) + return 0; /* don't disable yet */ + + LOCK(flags); + MACIO_BIS(KEYLARGO_FCR3, KL3_CLK45_ENABLE | KL3_CLK49_ENABLE | + KL3_I2S0_CLK18_ENABLE); + udelay(10); + MACIO_BIS(KEYLARGO_FCR1, K2_FCR1_I2S0_CELL_ENABLE | + K2_FCR1_I2S0_CLK_ENABLE_BIT | K2_FCR1_I2S0_ENABLE); + udelay(10); + MACIO_BIC(KEYLARGO_FCR1, K2_FCR1_I2S0_RESET); + UNLOCK(flags); + udelay(10); + + return 0; +} + + #ifdef CONFIG_SMP static long __pmac g5_reset_cpu(struct device_node* node, long param, long value) { @@ -337,6 +361,7 @@ { PMAC_FTR_READ_GPIO, g5_read_gpio }, { PMAC_FTR_WRITE_GPIO, g5_write_gpio }, { PMAC_FTR_GMAC_PHY_RESET, g5_eth_phy_reset }, + { PMAC_FTR_SOUND_CHIP_ENABLE, g5_i2s_enable }, #ifdef CONFIG_SMP { PMAC_FTR_RESET_CPU, g5_reset_cpu }, #endif /* CONFIG_SMP */ Index: linux-work/arch/ppc/platforms/pmac_feature.c =================================================================== --- linux-work.orig/arch/ppc/platforms/pmac_feature.c 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/arch/ppc/platforms/pmac_feature.c 2005-04-11 22:27:38.000000000 +1000 @@ -74,8 +74,7 @@ */ struct macio_chip macio_chips[MAX_MACIO_CHIPS] __pmacdata; -struct macio_chip* __pmac -macio_find(struct device_node* child, int type) +struct macio_chip* __pmac macio_find(struct device_node* child, int type) { while(child) { int i; @@ -88,6 +87,7 @@ } return NULL; } +EXPORT_SYMBOL_GPL(macio_find); static const char* macio_names[] __pmacdata = { Index: linux-work/include/asm-ppc/dbdma.h =================================================================== --- linux-work.orig/include/asm-ppc/dbdma.h 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/include/asm-ppc/dbdma.h 2005-04-11 22:27:38.000000000 +1000 @@ -88,7 +88,7 @@ #define WAIT_ALWAYS 3 /* always wait */ /* Align an address for a DBDMA command structure */ -#define DBDMA_ALIGN(x) (((unsigned)(x) + sizeof(struct dbdma_cmd) - 1) \ +#define DBDMA_ALIGN(x) (((unsigned long)(x) + sizeof(struct dbdma_cmd) - 1) \ & -sizeof(struct dbdma_cmd)) /* Useful macros */ Index: linux-work/sound/ppc/beep.c =================================================================== --- linux-work.orig/sound/ppc/beep.c 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/sound/ppc/beep.c 2005-04-11 22:27:38.000000000 +1000 @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include #include #include "pmac.h" @@ -35,7 +37,7 @@ int hz; int nsamples; short *buf; /* allocated wave buffer */ - unsigned long addr; /* physical address of buffer */ + dma_addr_t addr; /* physical address of buffer */ struct input_dev dev; }; @@ -217,12 +219,8 @@ return -ENOMEM; memset(beep, 0, sizeof(*beep)); - beep->buf = (short *) kmalloc(BEEP_BUFLEN * 4, GFP_KERNEL); - if (! beep->buf) { - kfree(beep); - return -ENOMEM; - } - beep->addr = virt_to_bus(beep->buf); + beep->buf = dma_alloc_coherent(&chip->pdev->dev, BEEP_BUFLEN * 4, + &beep->addr, GFP_KERNEL); beep->dev.evbit[0] = BIT(EV_SND); beep->dev.sndbit[0] = BIT(SND_BELL) | BIT(SND_TONE); @@ -255,7 +253,8 @@ { if (chip->beep) { input_unregister_device(&chip->beep->dev); - kfree(chip->beep->buf); + dma_free_coherent(&chip->pdev->dev, BEEP_BUFLEN * 4, + chip->beep->buf, chip->beep->addr); kfree(chip->beep); chip->beep = NULL; } Index: linux-work/include/asm-ppc/keylargo.h =================================================================== --- linux-work.orig/include/asm-ppc/keylargo.h 2005-04-11 22:09:18.000000000 +1000 +++ linux-work/include/asm-ppc/keylargo.h 2005-04-11 22:27:38.000000000 +1000 @@ -228,6 +228,11 @@ #define K2_FCR1_PCI1_BUS_RESET_N 0x00000010 #define K2_FCR1_PCI1_SLEEP_RESET_EN 0x00000020 +#define K2_FCR1_I2S0_CELL_ENABLE 0x00000400 +#define K2_FCR1_I2S0_RESET 0x00000800 +#define K2_FCR1_I2S0_CLK_ENABLE_BIT 0x00001000 +#define K2_FCR1_I2S0_ENABLE 0x00002000 + #define K2_FCR1_PCI1_CLK_ENABLE 0x00004000 #define K2_FCR1_FW_CLK_ENABLE 0x00008000 #define K2_FCR1_FW_RESET_N 0x00010000 From msdemlei at cl.uni-heidelberg.de Wed Apr 13 00:28:20 2005 From: msdemlei at cl.uni-heidelberg.de (Markus Demleitner) Date: Tue, 12 Apr 2005 16:28:20 +0200 Subject: iMac G5 cpufreq Message-ID: <20050412142820.GA5357@tucana.cl.uni-heidelberg.de> Hi, I just noticed that I cannot select cpufreq capabilities for an iMac G5 kernel -- I'm not enough of a kconfig buff to see what exactly keeps the power management/cpufreq from appearing, but I guess the reason is there is no driver for the G5's frequency scaling capabilities (then again, I have no idea if it requires chipset support just yet, either). So: Could anyone give me a hint what would be required to do CPU frequency scaling on G5s? And is there any known documentation or source code out there that might be helpful in doing whatever is required? Background: Even when stretching the intervention thresholds in my exprimental fan control, I notice that OS/X makes do with *much* less fan activity, and I strongly suspect they simply reduce the CPU frequency (though I can't be sure: they do draw 10 Watts more power in my standard cat /dev/zero > /dev/null load, even while keeping the fans off). Now, I'd rather reduce the CPU speed than increase the fan speed, but... Thanks, Markus (PS: Don't bother to cc: me, I'm on the list) From olof at austin.ibm.com Wed Apr 13 01:31:42 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 12 Apr 2005 10:31:42 -0500 Subject: iMac G5 cpufreq In-Reply-To: <20050412142820.GA5357@tucana.cl.uni-heidelberg.de> References: <20050412142820.GA5357@tucana.cl.uni-heidelberg.de> Message-ID: <20050412153141.GC29235@austin.ibm.com> On Tue, Apr 12, 2005 at 04:28:20PM +0200, Markus Demleitner wrote: > So: Could anyone give me a hint what would be required to do CPU > frequency scaling on G5s? And is there any known documentation > or source code out there that might be helpful in doing whatever is > required? The PPC970FX User Manual Chapter 9.8 describes how to use PowerTune. However, there are arguments needed to be passed in that I don't know where to find in the docs. Maybe Darwin has them somewhere? -Olof From support at ECommerce4Profit.com Wed Apr 13 02:55:05 2005 From: support at ECommerce4Profit.com (Wayne Braithwaite) Date: 12 Apr 2005 16:55:05 -0000 Subject: Request for confirmation Message-ID: <20050412165505.66424.qmail@optius.mc.qmqpc.getresponse.com> You have just requested to subscribe to an opt-in list maintained by GetResponse.com opt-in/autoresponder service. This click action enables you to obtain FREE! newsletter subscription, a eight week marketing course and your own personalised web site completely FREE! List: ecommerce4profit.com at GetResponse.com List Owner's Name: Wayne Braithwaite List Owner's Email: support at ECommerce4Profit.com The list owner has requested your confirmation to verify your e-mail address. To activate your subscription simply click the link below! http://www.getresponse.com/k/0KU8/C0aUZq3UR You can unsubscribe or change your details at any time. Thanks! GetResponse.com -- Email address: linuxppc64-dev at ozlabs.org Type of request: import Timestamp: Tue Apr 12 12:55:05 2005 IP address: 213.78.163.227 20 Prospect Road Banbury Oxon OX16 5HL Great Britain From grundler at parisc-linux.org Wed Apr 13 03:15:05 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Tue, 12 Apr 2005 11:15:05 -0600 Subject: PCI Error Recovery API Proposal (updated) In-Reply-To: <1113263800.5388.9.camel@gaston> References: <200504111825.j3BIPlVW015708@snoqualmie.dp.intel.com> <1113263800.5388.9.camel@gaston> Message-ID: <20050412171505.GD32551@colo.lackof.org> On Tue, Apr 12, 2005 at 09:56:40AM +1000, Benjamin Herrenschmidt wrote: ... > I would appreciate if other people following this discussion could give > their opinion here, just to make sure I'm not following the wrong track. > But it seems to me that issuing the request for a link reset doesn't fit > in the leaf driver recovery API, but rather in the private API that you > will define between the PCI Express error recovery core and the PCI > Express port drivers... I *think* you are on the right track. Device drivers must know a little bit about which type of bus they are on because of probe (how driver is bound to device). AFAIK, Power control (suspend/resume and hotplug) and interrupt handling (MSI/MSI-X) have managed to abstract out bus specifics. I expect the link reset could be abstracted as well. (e.g could one suspend and resume an EISA driver? Probably with some tweaking.) In general, the less drivers know about bus, the more portable they are to other chipset/architectures. But that hasn't stopped driver writers from mucking with things like cacheline size, Latency timer, or MMRBC (PCI-X). I warn this because several of the PCIe cards I'm aware of use a PCIe-to-PCI-X bridge to make a PCI-X device work in a PCIe slot. ie, it's really a PCI-X device that knows nothing about the PCI-e that's upstream. Error recovery is going to be very .... "interesting" to support for those devices. grant From rlrevell at joe-job.com Wed Apr 13 04:28:53 2005 From: rlrevell at joe-job.com (Lee Revell) Date: Tue, 12 Apr 2005 14:28:53 -0400 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: <1113282436.21548.42.camel@gaston> References: <1113282436.21548.42.camel@gaston> Message-ID: <1113330533.31159.43.camel@mindpipe> On Tue, 2005-04-12 at 15:07 +1000, Benjamin Herrenschmidt wrote: > Hi ! > > (Andrew: This is an update of the previous patch, it fixes a problem > with headphone beeing incorrectly muted on some models). > > This patch hacks the current PowerMac Alsa driver to add some basic > support of analog sound output to some desktop G5s. It has severe > limitations though: Um... why in the heck are you posting this here instead of alsa-devel? Lee From torvalds at osdl.org Wed Apr 13 04:49:38 2005 From: torvalds at osdl.org (Linus Torvalds) Date: Tue, 12 Apr 2005 11:49:38 -0700 (PDT) Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: <1113330533.31159.43.camel@mindpipe> References: <1113282436.21548.42.camel@gaston> <1113330533.31159.43.camel@mindpipe> Message-ID: On Tue, 12 Apr 2005, Lee Revell wrote: > > Um... why in the heck are you posting this here instead of alsa-devel? Which list do you think has more people interested? ppc64 or alsa? Pretty much anybody with a G5 will probably be on the ppc lists. And _nobody_ will be on the alsa lists, since it historically has never had any sound at all. In other words, don't believe that "sound" means that it must be an alsa list. Lists make sense not because of intent, but because of who they reach. Linus From rlrevell at joe-job.com Wed Apr 13 04:52:44 2005 From: rlrevell at joe-job.com (Lee Revell) Date: Tue, 12 Apr 2005 14:52:44 -0400 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: References: <1113282436.21548.42.camel@gaston> <1113330533.31159.43.camel@mindpipe> Message-ID: <1113331964.31159.46.camel@mindpipe> On Tue, 2005-04-12 at 11:49 -0700, Linus Torvalds wrote: > > On Tue, 12 Apr 2005, Lee Revell wrote: > > > > Um... why in the heck are you posting this here instead of alsa-devel? > > Which list do you think has more people interested? ppc64 or alsa? > OK, makes sense. I still think alsa-devel should be cc'ed, for code review purposes. Lee From schwab at suse.de Wed Apr 13 05:32:19 2005 From: schwab at suse.de (Andreas Schwab) Date: Tue, 12 Apr 2005 21:32:19 +0200 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: <1113282436.21548.42.camel@gaston> (Benjamin Herrenschmidt's message of "Tue, 12 Apr 2005 15:07:16 +1000") References: <1113282436.21548.42.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > This patch hacks the current PowerMac Alsa driver to add some basic > support of analog sound output to some desktop G5s. It has severe > limitations though: > > - Only 44100Khz 16 bits > - Only work on G5 models using a TAS3004 analog code, that is early > single CPU desktops and all dual CPU desktops at this date, but none > of the more recent ones like iMac G5. > - It does analog only, no digital/SPDIF support at all, no native > AC3 support On my PowerMac the internal speaker is now working, but unfortunately on the line-out I get nearly no output. I have pushed both the master and pcm control to the maximum and still barely hear anything. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From hch at lst.de Wed Apr 13 06:14:33 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 12 Apr 2005 22:14:33 +0200 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: References: <1113282436.21548.42.camel@gaston> Message-ID: <20050412201433.GA25869@lst.de> On Tue, Apr 12, 2005 at 09:32:19PM +0200, Andreas Schwab wrote: > On my PowerMac the internal speaker is now working, but unfortunately on > the line-out I get nearly no output. I have pushed both the master and > pcm control to the maximum and still barely hear anything. Work fine for me, but I had to turn the volume to the max on the external amplifier, so this is probably the same problem. From benh at kernel.crashing.org Wed Apr 13 08:16:18 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 08:16:18 +1000 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: References: <1113282436.21548.42.camel@gaston> <1113330533.31159.43.camel@mindpipe> Message-ID: <1113344178.5388.106.camel@gaston> On Tue, 2005-04-12 at 11:49 -0700, Linus Torvalds wrote: > > On Tue, 12 Apr 2005, Lee Revell wrote: > > > > Um... why in the heck are you posting this here instead of alsa-devel? > > Which list do you think has more people interested? ppc64 or alsa? > > Pretty much anybody with a G5 will probably be on the ppc lists. And > _nobody_ will be on the alsa lists, since it historically has never had > any sound at all. > > In other words, don't believe that "sound" means that it must be an alsa > list. Lists make sense not because of intent, but because of who they > reach. Yah, that, and i intend to take over this driver, and this is really only platform specific munging in the driver itself, I pretty much don't change anything to the way the driver interfaces to Alsa. Once I am done rewriting it completely (which I started doing but it will take some time to complete) to get digital, AC3, etc... I will submit the new one to alsa-devel though for comments. Ben. From benh at kernel.crashing.org Wed Apr 13 08:17:05 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 08:17:05 +1000 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: References: <1113282436.21548.42.camel@gaston> Message-ID: <1113344225.21548.108.camel@gaston> On Tue, 2005-04-12 at 21:32 +0200, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > This patch hacks the current PowerMac Alsa driver to add some basic > > support of analog sound output to some desktop G5s. It has severe > > limitations though: > > > > - Only 44100Khz 16 bits > > - Only work on G5 models using a TAS3004 analog code, that is early > > single CPU desktops and all dual CPU desktops at this date, but none > > of the more recent ones like iMac G5. > > - It does analog only, no digital/SPDIF support at all, no native > > AC3 support > > On my PowerMac the internal speaker is now working, but unfortunately on > the line-out I get nearly no output. I have pushed both the master and > pcm control to the maximum and still barely hear anything. Yes, I noticed that too on some models, not sure what's up at this point. What about the headphone jack on the front ? That one appears to work. Ben. From schwab at suse.de Wed Apr 13 08:33:28 2005 From: schwab at suse.de (Andreas Schwab) Date: Wed, 13 Apr 2005 00:33:28 +0200 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: <1113344225.21548.108.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 13 Apr 2005 08:17:05 +1000") References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Yes, I noticed that too on some models, not sure what's up at this > point. What about the headphone jack on the front ? That one appears to > work. Doesn't work either for me. Well, I'll have keep my workaround a bit longer until you are ready with the rewrite. Keep up the good work! Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Wed Apr 13 08:39:21 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 08:39:21 +1000 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> Message-ID: <1113345561.5387.114.camel@gaston> On Wed, 2005-04-13 at 00:33 +0200, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > Yes, I noticed that too on some models, not sure what's up at this > > point. What about the headphone jack on the front ? That one appears to > > work. > > Doesn't work either for me. Well, I'll have keep my workaround a bit > longer until you are ready with the rewrite. Keep up the good work! Doesn't work with version 2 of the patch ? neither the headphone nor the line out jack ? hrm... does it properly detect insertion of the jack and mute the speaker in both cases ? Can you send me a tarball of your device-tree ? Ben. From schwab at suse.de Wed Apr 13 08:57:46 2005 From: schwab at suse.de (Andreas Schwab) Date: Wed, 13 Apr 2005 00:57:46 +0200 Subject: [PATCH] ppc64: very basic desktop g5 sound support (#2) In-Reply-To: <1113345561.5387.114.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 13 Apr 2005 08:39:21 +1000") References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> <1113345561.5387.114.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Doesn't work with version 2 of the patch ? neither the headphone nor the > line out jack ? Yes. > hrm... does it properly detect insertion of the jack and mute the > speaker in both cases ? Yes, it does. > Can you send me a tarball of your device-tree ? Will do. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From johnrose at austin.ibm.com Wed Apr 13 09:03:42 2005 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 12 Apr 2005 18:03:42 -0500 Subject: [PATCH] enable CONFIG_RTAS_PROC by default Message-ID: <1113347022.16917.30.camel@sinatra.austin.ibm.com> Hi- This patch enables CONFIG_RTAS_PROC by default on pSeries. This will preserve /proc/ppc64/rtas/rmo_buffer, which is needed by librtas. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/Kconfig~fix_Kconfig arch/ppc64/Kconfig --- 2_6_linus_3/arch/ppc64/Kconfig~fix_Kconfig 2005-04-12 18:03:45.000000000 -0500 +++ 2_6_linus_3-johnrose/arch/ppc64/Kconfig 2005-04-12 18:03:56.000000000 -0500 @@ -262,6 +262,7 @@ config PPC_RTAS config RTAS_PROC bool "Proc interface to RTAS" depends on PPC_RTAS + default y config RTAS_FLASH tristate "Firmware flash interface" _ From benh at kernel.crashing.org Wed Apr 13 09:12:19 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 09:12:19 +1000 Subject: iMac G5 cpufreq In-Reply-To: <20050412153141.GC29235@austin.ibm.com> References: <20050412142820.GA5357@tucana.cl.uni-heidelberg.de> <20050412153141.GC29235@austin.ibm.com> Message-ID: <1113347540.5387.125.camel@gaston> On Tue, 2005-04-12 at 10:31 -0500, Olof Johansson wrote: > On Tue, Apr 12, 2005 at 04:28:20PM +0200, Markus Demleitner wrote: > > > So: Could anyone give me a hint what would be required to do CPU > > frequency scaling on G5s? And is there any known documentation > > or source code out there that might be helpful in doing whatever is > > required? > > The PPC970FX User Manual Chapter 9.8 describes how to use > PowerTune. However, there are arguments needed to be passed in that I > don't know where to find in the docs. Maybe Darwin has them somewhere? I haven't yet implemented PowerTune support for 970FX yet indeed. It's on my todo list. Darwin does has some bits, there are other bits I'll have to figure out in more brutal ways (like the voltage control). Ben. From benh at kernel.crashing.org Wed Apr 13 12:07:44 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 12:07:44 +1000 Subject: [PATCH] ppc64: add PT_NOTE section to vDSO Message-ID: <1113358065.5388.164.camel@gaston> Hi ! This patch from Roland adds a PT_NOTE section to both 32 and 64 bits vDSOs to expose the kernel version to glibc, thus avoiding a uname syscall on every launch. This is equivalent to the patches Roland posted already for x86 and x86-64. Note: the 64 bits .note is actually using the 32 bits format. This is normal. The ELF spec specifies a different format for 64 bits .note, but for some reason, this was never properly implemented, the core dumps for example are all using 32 bits format .note, and binutils cannot even read a 64 bits format .note. Talking to our toolchain folks, they think we'd rather stick to 32 bits format .note everywhere and get the spec fixed some day ... Signed-off-by: Roland McGrath Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/vdso32/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso32/Makefile 2005-04-13 11:15:21.000000000 +1000 +++ linux-work/arch/ppc64/kernel/vdso32/Makefile 2005-04-13 11:19:53.000000000 +1000 @@ -1,7 +1,7 @@ # List of files in the vdso, has to be asm only for now -obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o +obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o # Build rules Index: linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso32/vdso32.lds.S 2005-04-13 11:15:21.000000000 +1000 +++ linux-work/arch/ppc64/kernel/vdso32/vdso32.lds.S 2005-04-13 11:19:53.000000000 +1000 @@ -20,6 +20,8 @@ .gnu.version_d : { *(.gnu.version_d) } .gnu.version_r : { *(.gnu.version_r) } + .note : { *(.note.*) } :text :note + . = ALIGN (16); .text : { @@ -87,6 +89,7 @@ PHDRS { text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + note PT_NOTE FLAGS(4); /* PF_R */ dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ } Index: linux-work/arch/ppc64/kernel/vdso64/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso64/Makefile 2005-04-13 11:15:21.000000000 +1000 +++ linux-work/arch/ppc64/kernel/vdso64/Makefile 2005-04-13 11:20:35.000000000 +1000 @@ -1,6 +1,6 @@ # List of files in the vdso, has to be asm only for now -obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o +obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o # Build rules Index: linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso64/vdso64.lds.S 2005-04-13 11:15:21.000000000 +1000 +++ linux-work/arch/ppc64/kernel/vdso64/vdso64.lds.S 2005-04-13 11:19:53.000000000 +1000 @@ -18,12 +18,14 @@ .gnu.version_d : { *(.gnu.version_d) } .gnu.version_r : { *(.gnu.version_r) } + .note : { *(.note.*) } :text :note + . = ALIGN (16); .text : { *(.text .stub .text.* .gnu.linkonce.t.*) *(.sfpr .glink) - } + } :text PROVIDE (__etext = .); PROVIDE (_etext = .); PROVIDE (etext = .); @@ -88,6 +90,7 @@ PHDRS { text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */ + note PT_NOTE FLAGS(4); /* PF_R */ dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */ } Index: linux-work/arch/ppc64/kernel/vdso32/note.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso32/note.S 2005-04-13 11:19:53.000000000 +1000 @@ -0,0 +1,25 @@ +/* + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text. + * Here we can supply some information useful to userland. + */ + +#include +#include + +#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type) \ + .section name, flags; \ + .balign 4; \ + .long 1f - 0f; /* name length */ \ + .long 3f - 2f; /* data length */ \ + .long type; /* note type */ \ +0: .asciz vendor; /* vendor name */ \ +1: .balign 4; \ +2: + +#define ASM_ELF_NOTE_END \ +3: .balign 4; /* pad out section */ \ + .previous + + ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0) + .long LINUX_VERSION_CODE + ASM_ELF_NOTE_END Index: linux-work/arch/ppc64/kernel/vdso64/note.S =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/vdso64/note.S 2005-04-13 11:23:10.000000000 +1000 @@ -0,0 +1 @@ +#include "../vdso32/note.S" From sharada at in.ibm.com Wed Apr 13 19:34:24 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 15:04:24 +0530 Subject: slb_flush_and_rebolt panic trace in 2.6.12-rc2-mm3 Message-ID: <20050413093424.GA2188@in.ibm.com> Hello, I was trying to boot the latest kernel version, 2.6.12-rc2-mm3 on a p630 box, and found the kernel throwing up this call trace after boot. It does come down to the login prompt, but keeps throwing out the below call trace when you attempt to do any operations on the box. Kernel 2.6.12-rc2-mm3 on an ppc64 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c0000000ff6cf380] [00000000ffe9e6c0] 0xffe9e6c0 (unreliable) [c0000000ff6cf420] [c000000000048f08] .__schedule_tail+0x9c/0x1b4 [c0000000ff6cf4c0] [c0000000004574cc] .schedule+0x920/0xc50 [c0000000ff6cf5f0] [c0000000004588f0] .schedule_timeout+0xfc/0x104 [c0000000ff6cf6d0] [c000000000258e68] .tty_wait_until_sent+0x124/0x1b8 [c0000000ff6cf7d0] [c00000000025940c] .set_termios+0xbc/0x23c [c0000000ff6cf890] [c00000000025a01c] .n_tty_ioctl+0x950/0xc78 [c0000000ff6cf960] [c000000000252d9c] .tty_ioctl+0x57c/0x1120 [c0000000ff6cfb60] [c0000000000c753c] .do_ioctl+0xc0/0x12c [c0000000ff6cfc00] [c0000000000c77b4] .vfs_ioctl+0x20c/0x4d4 [c0000000ff6cfcb0] [c0000000000c7ac8] .sys_ioctl+0x4c/0x8c [c0000000ff6cfd60] [c0000000000e74e8] .compat_sys_ioctl+0x45c/0x4b4 [c0000000ff6cfe30] [c00000000000d880] syscall_exit+0x0/0x18 llm16.in.ibm.com login: AT S7=45 S0=0 L1&c1 E1 Q0 Password: Login timed out Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.2Call Trace: [c0000000ff6cf870] [c0000000ff6cfc90] 0xc0000000ff6cfc90 (unreliable) [c0000000ff6cf910] [c000000000048f08] .__schedule_tail+0x9c/0x1b4 [c0000000ff6cf9b0] [c0000000004574cc] .schedule+0x920/0xc50 [c0000000ff6cfae0] [c0000000004578d0] .wait_for_completion+0xd4/0x18c [c0000000ff6cfbe0] [c00000000004ab08] .sched_exec+0x1a8/0x200 [c0000000ff6cfce0] [c0000000000e8c94] .compat_do_execve+0x9c/0x2d0 [c0000000ff6cfd90] [c00000000001b348] .sys32_execve+0x7c/0x108 [c0000000ff6cfe30] [c00000000000d880] syscall_exit+0x0/0x18 Red Hat Enterprise Linux AS release 3.90 (Nahant) Kernel 2.6.12-rc2-mm3 on an ppc64 llm16.in.ibm.com login: Anyone else seen this problem on this kernel version? 2.6.12-rc2-mm1 was working fine for me on the same box. I use the default pSeries_defconfig config options. Thanks and Regards, Sharada From david at gibson.dropbear.id.au Wed Apr 13 14:34:30 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 13 Apr 2005 14:34:30 +1000 Subject: slb_flush_and_rebolt panic trace in 2.6.12-rc2-mm3 In-Reply-To: <20050413093424.GA2188@in.ibm.com> References: <20050413093424.GA2188@in.ibm.com> Message-ID: <20050413043430.GB2038@localhost.localdomain> On Wed, Apr 13, 2005 at 03:04:24PM +0530, R Sharada wrote: > Hello, > I was trying to boot the latest kernel version, 2.6.12-rc2-mm3 on a > p630 box, and found the kernel throwing up this call trace after boot. It does > come down to the login prompt, but keeps throwing out the below call trace when > you attempt to do any operations on the box. > > Kernel 2.6.12-rc2-mm3 on an ppc64 > > Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 > Call Trace: > [c0000000ff6cf380] [00000000ffe9e6c0] 0xffe9e6c0 (unreliable) > [c0000000ff6cf420] [c000000000048f08] .__schedule_tail+0x9c/0x1b4 > [c0000000ff6cf4c0] [c0000000004574cc] .schedule+0x920/0xc50 > [c0000000ff6cf5f0] [c0000000004588f0] .schedule_timeout+0xfc/0x104 > [c0000000ff6cf6d0] [c000000000258e68] .tty_wait_until_sent+0x124/0x1b8 > [c0000000ff6cf7d0] [c00000000025940c] .set_termios+0xbc/0x23c > [c0000000ff6cf890] [c00000000025a01c] .n_tty_ioctl+0x950/0xc78 > [c0000000ff6cf960] [c000000000252d9c] .tty_ioctl+0x57c/0x1120 > [c0000000ff6cfb60] [c0000000000c753c] .do_ioctl+0xc0/0x12c > [c0000000ff6cfc00] [c0000000000c77b4] .vfs_ioctl+0x20c/0x4d4 > [c0000000ff6cfcb0] [c0000000000c7ac8] .sys_ioctl+0x4c/0x8c > [c0000000ff6cfd60] [c0000000000e74e8] .compat_sys_ioctl+0x45c/0x4b4 > [c0000000ff6cfe30] [c00000000000d880] syscall_exit+0x0/0x18 > llm16.in.ibm.com login: AT S7=45 S0=0 L1&c1 E1 Q0 > Password: Login timed out Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.2Call Trace: > [c0000000ff6cf870] [c0000000ff6cfc90] 0xc0000000ff6cfc90 (unreliable) > [c0000000ff6cf910] [c000000000048f08] .__schedule_tail+0x9c/0x1b4 > [c0000000ff6cf9b0] [c0000000004574cc] .schedule+0x920/0xc50 > [c0000000ff6cfae0] [c0000000004578d0] .wait_for_completion+0xd4/0x18c > [c0000000ff6cfbe0] [c00000000004ab08] .sched_exec+0x1a8/0x200 > [c0000000ff6cfce0] [c0000000000e8c94] .compat_do_execve+0x9c/0x2d0 > [c0000000ff6cfd90] [c00000000001b348] .sys32_execve+0x7c/0x108 > [c0000000ff6cfe30] [c00000000000d880] syscall_exit+0x0/0x18 > > Red Hat Enterprise Linux AS release 3.90 (Nahant) > Kernel 2.6.12-rc2-mm3 on an ppc64 > > llm16.in.ibm.com login: > > Anyone else seen this problem on this kernel version? 2.6.12-rc2-mm1 was working > fine for me on the same box. I use the default pSeries_defconfig config options. Looks like we're somehow entering the context_switch() path with interrupts enabled. That's bad. The first traceback suggests a problem in the terminal code, though that could just be coincidence. What sort of console is on this machine? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From anton at samba.org Wed Apr 13 14:33:57 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 13 Apr 2005 14:33:57 +1000 Subject: slb_flush_and_rebolt panic trace in 2.6.12-rc2-mm3 In-Reply-To: <20050413093424.GA2188@in.ibm.com> References: <20050413093424.GA2188@in.ibm.com> Message-ID: <20050413043357.GB10014@krispykreme> Hi, > Hello, I was trying to boot the latest kernel version, 2.6.12-rc2-mm3 > on a p630 box, and found the kernel throwing up this call trace after > boot. It does come down to the login prompt, but keeps throwing out > the below call trace when you attempt to do any operations on the box. The new irqs on scheduling path looks to have caused this. We cannot call switch_slb with interrupts off. This patch should fix it. Anton -- Disable interrupts around switch_slb, required now generic code calls it with interrupts on. Signed-off-by: Anton Blanchard ===== include/asm-ppc64/mmu_context.h 1.23 vs edited ===== --- 1.23/include/asm-ppc64/mmu_context.h 2005-01-26 08:50:16 +11:00 +++ edited/include/asm-ppc64/mmu_context.h 2005-04-13 14:29:28 +10:00 @@ -51,6 +51,8 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { + unsigned long flags; + if (!cpu_isset(smp_processor_id(), next->cpu_vm_mask)) cpu_set(smp_processor_id(), next->cpu_vm_mask); @@ -58,6 +60,8 @@ if (prev == next) return; + local_irq_save(flags); + #ifdef CONFIG_ALTIVEC if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) asm volatile ("dssall"); @@ -67,6 +71,8 @@ switch_slb(tsk, next); else switch_stab(tsk, next); + + local_irq_restore(flags); } #define deactivate_mm(tsk,mm) do { } while (0) From david at gibson.dropbear.id.au Wed Apr 13 14:39:57 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Wed, 13 Apr 2005 14:39:57 +1000 Subject: slb_flush_and_rebolt panic trace in 2.6.12-rc2-mm3 In-Reply-To: <20050413043357.GB10014@krispykreme> References: <20050413093424.GA2188@in.ibm.com> <20050413043357.GB10014@krispykreme> Message-ID: <20050413043957.GC2038@localhost.localdomain> On Wed, Apr 13, 2005 at 02:33:57PM +1000, Anton Blanchard wrote: > > Hi, > > > Hello, I was trying to boot the latest kernel version, 2.6.12-rc2-mm3 > > on a p630 box, and found the kernel throwing up this call trace after > > boot. It does come down to the login prompt, but keeps throwing out > > the below call trace when you attempt to do any operations on the box. > > The new irqs on scheduling path looks to have caused this. We cannot > call switch_slb with interrupts off. This patch should fix it. Ah, missed that change. Yes, that would explain it. Hrm.. if interrupts are now enabled through context_switch(), are we safe if we were interrupted between switch_mm() and _switch(), where we update the bolted kernel stack slb? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson From anton at samba.org Wed Apr 13 14:55:23 2005 From: anton at samba.org (Anton Blanchard) Date: Wed, 13 Apr 2005 14:55:23 +1000 Subject: slb_flush_and_rebolt panic trace in 2.6.12-rc2-mm3 In-Reply-To: <20050413043957.GC2038@localhost.localdomain> References: <20050413093424.GA2188@in.ibm.com> <20050413043357.GB10014@krispykreme> <20050413043957.GC2038@localhost.localdomain> Message-ID: <20050413045523.GC10014@krispykreme> > Hrm.. if interrupts are now enabled through context_switch(), are we > safe if we were interrupted between switch_mm() and _switch(), where > we update the bolted kernel stack slb? So long as we change both the bolted SLB stack entry and the kernel stack at the same time in _switch, and we disable interrupts around there, I think we are OK. It probably wants more thought but Im having too much fun trying to build a toolchain to think :) Anton From benh at kernel.crashing.org Wed Apr 13 21:23:02 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Apr 2005 21:23:02 +1000 Subject: [PATCH] ppc64: improve g5 sound headphone mute In-Reply-To: References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> <1113345561.5387.114.camel@gaston> <1113347296.5388.121.camel@gaston> <1113350355.5387.129.camel@gaston> Message-ID: <1113391382.5463.20.camel@gaston> Hi ! This patch fixes a couple more issues with the management of the GPIOs dealing with headphone and line out mute on the G5. It should fix the remaining problems of people not getting any sound out of the headphone jack. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/sound/ppc/tumbler.c =================================================================== --- linux-work.orig/sound/ppc/tumbler.c 2005-04-12 18:07:50.000000000 +1000 +++ linux-work/sound/ppc/tumbler.c 2005-04-13 18:02:26.000000000 +1000 @@ -177,11 +177,22 @@ if (! gp->addr) return; active = active ? gp->active_val : gp->inactive_val; - do_gpio_write(gp, active); DBG("(I) gpio %x write %d\n", gp->addr, active); } +static int check_audio_gpio(pmac_gpio_t *gp) +{ + int ret; + + if (! gp->addr) + return 0; + + ret = do_gpio_read(gp); + + return (ret & 0xd) == (gp->active_val & 0xd); +} + static int read_audio_gpio(pmac_gpio_t *gp) { int ret; @@ -683,7 +694,7 @@ } if (gp == NULL) return -EINVAL; - ucontrol->value.integer.value[0] = ! read_audio_gpio(gp); + ucontrol->value.integer.value[0] = !check_audio_gpio(gp); return 0; } @@ -711,7 +722,7 @@ } if (gp == NULL) return -EINVAL; - val = ! read_audio_gpio(gp); + val = ! check_audio_gpio(gp); if (val != ucontrol->value.integer.value[0]) { write_audio_gpio(gp, ! ucontrol->value.integer.value[0]); return 1; @@ -897,11 +908,11 @@ static void check_mute(pmac_t *chip, pmac_gpio_t *gp, int val, int do_notify, snd_kcontrol_t *sw) { - //pmac_tumbler_t *mix = chip->mixer_data; - if (val != read_audio_gpio(gp)) { + if (check_audio_gpio(gp) != val) { write_audio_gpio(gp, val); if (do_notify) - snd_ctl_notify(chip->card, SNDRV_CTL_EVENT_MASK_VALUE, &sw->id); + snd_ctl_notify(chip->card, SNDRV_CTL_EVENT_MASK_VALUE, + &sw->id); } } From sharada at in.ibm.com Thu Apr 14 00:06:05 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:36:05 +0530 Subject: [PATCH 0/3] kexec for ppc64 Message-ID: <20050413140605.GA5081@in.ibm.com> The patches that follow implement the kexec support for ppc64. The set contains 3 patches - ppc64-cleanup-global-interrupt-queue.patch - calls the global queue membership for all cpus (including boot-cpu) and is needed for kexec. Earlier behaviour was to not set the global queue membership for the boot cpu, since that is taken care of, by the Firmware. This however breaks for kexec, and hence the boot cpu was required to be added into the global queue at boot. Initial hack written by self, and later refined and moved into xics.c by Milton Miller. - ppc64-native-hash-clear.patch - This cleans up the hash tables and invalidates the tlb for native (non-lpar) environment. Written by Milton Miller. - kexec-ppc64.patch - This implements the core of the arch specific support for kexec on ppc64. Patch contains information explaining about the design and contents. Written by Milton Miller, with a few fixes provided by self. The patches are created against 2.6.12-rc2-mm3. Note: 2.6.12-rc2-mm3 requires a fix (recently posted by Anton on linuxppc64-dev) for a slb_flush_and_rebolt panic condition) http://ozlabs.org/pipermail/linuxppc64-dev/2005-April/003803.html Test Status: The code has been tested on: - Power 4, p630 boxes, in both LPAR and non-LPAR environments - Power 5, LPAR and non-LPAR environments Known issues: - The network interface has to be shut down prior to kexec boot, as otherwise the network will fail with EEH errors - The ipr driver on Power 5 has been seen to intermitenttly throw some reset errors sometimes causing a hang - On Power 5, the e1000 module also requires to be unloaded along with an interface shutdown (ifconfig ethx down) Please review and provide comments Thanks and Regards, Sharada From sharada at in.ibm.com Thu Apr 14 00:08:07 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:38:07 +0530 Subject: [PATCH 1/3] global interrupt queue cleanup In-Reply-To: <20050413140605.GA5081@in.ibm.com> References: <20050413140605.GA5081@in.ibm.com> Message-ID: <20050413140807.GB5081@in.ibm.com> Move the code to set global interrupt queue membership to xics.c, and remove no longer needed extern declarations. Also call it on all cpus (even the boot cpu) to prepare for kexec. Signed-off-by: Milton Miller Signed-off-by: R Sharada --- linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/pSeries_smp.c | 7 ---- linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/xics.c | 16 ++++++++--- linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/xics.h | 3 -- 3 files changed, 12 insertions(+), 14 deletions(-) diff -puN arch/ppc64/kernel/pSeries_smp.c~ppc64-cleanup-global-interrupt-queue arch/ppc64/kernel/pSeries_smp.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/pSeries_smp.c~ppc64-cleanup-global-interrupt-queue 2005-04-12 17:34:13.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/pSeries_smp.c 2005-04-12 17:34:38.000000000 +0530 @@ -329,13 +329,6 @@ static void __devinit smp_xics_setup_cpu cpu_clear(cpu, of_spin_map); - /* - * Put the calling processor into the GIQ. This is really only - * necessary from a secondary thread as the OF start-cpu interface - * performs this function for us on primary threads. - */ - rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, - (1UL << interrupt_server_size) - 1 - default_distrib_server, 1); } static DEFINE_SPINLOCK(timebase_lock); diff -puN arch/ppc64/kernel/xics.c~ppc64-cleanup-global-interrupt-queue arch/ppc64/kernel/xics.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/xics.c~ppc64-cleanup-global-interrupt-queue 2005-04-12 17:34:13.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/xics.c 2005-04-12 17:34:38.000000000 +0530 @@ -432,6 +432,7 @@ void xics_cause_IPI(int cpu) { ops->qirr_info(cpu, IPI_PRIORITY); } +#endif /* CONFIG_SMP */ void xics_setup_cpu(void) { @@ -439,9 +440,17 @@ void xics_setup_cpu(void) ops->cppr_info(cpu, 0xff); iosync(); -} -#endif /* CONFIG_SMP */ + /* + * Put the calling processor into the GIQ. This is really only + * necessary from a secondary thread as the OF start-cpu interface + * performs this function for us on primary threads. + * + * XXX: undo of teardown on kexec needs this too, as may hotplug + */ + rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, + (1UL << interrupt_server_size) - 1 - default_distrib_server, 1); +} void xics_init_IRQ(void) { @@ -563,8 +572,7 @@ nextnode: for (; i < NR_IRQS; ++i) get_irq_desc(i)->handler = &xics_pic; - ops->cppr_info(boot_cpuid, 0xff); - iosync(); + xics_setup_cpu(); ppc64_boot_msg(0x21, "XICS Done"); } diff -puN include/asm-ppc64/xics.h~ppc64-cleanup-global-interrupt-queue include/asm-ppc64/xics.h --- linux-2.6.12-rc2-mm3/include/asm-ppc64/xics.h~ppc64-cleanup-global-interrupt-queue 2005-04-12 17:34:13.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/xics.h 2005-04-12 17:34:38.000000000 +0530 @@ -30,7 +30,4 @@ struct xics_ipi_struct { extern struct xics_ipi_struct xics_ipi_message[NR_CPUS] __cacheline_aligned; -extern unsigned int default_distrib_server; -extern unsigned int interrupt_server_size; - #endif /* _PPC64_KERNEL_XICS_H */ _ From sharada at in.ibm.com Thu Apr 14 00:08:50 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:38:50 +0530 Subject: [PATCH 2/3] native hash clear In-Reply-To: <20050413140807.GB5081@in.ibm.com> References: <20050413140605.GA5081@in.ibm.com> <20050413140807.GB5081@in.ibm.com> Message-ID: <20050413140850.GC5081@in.ibm.com> Add code to clear the hash table and invalidate the tlb for native (SMP, non-LPAR) mode. Supports 16M and 4k pages. Signed-off-by: Milton Miller Signed-off-by: R Sharada --- linux-2.6.12-rc2-mm3-sharada/arch/ppc64/mm/hash_native.c | 47 ++++++++++++++- linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/mmu.h | 22 +++++++ 2 files changed, 68 insertions(+), 1 deletion(-) diff -puN arch/ppc64/mm/hash_native.c~ppc64-native-hash-clear arch/ppc64/mm/hash_native.c --- linux-2.6.12-rc2-mm3/arch/ppc64/mm/hash_native.c~ppc64-native-hash-clear 2005-04-12 17:36:12.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/mm/hash_native.c 2005-04-12 17:36:35.000000000 +0530 @@ -304,6 +304,50 @@ static void native_hpte_invalidate(unsig local_irq_restore(flags); } +/* + * clear all mappings on kexec. All cpus are in real mode (or they will + * be when they isi), and we are the only one left. We rely on our kernel + * mapping being 0xC0's and the hardware ignoring those two real bits. + * + * TODO: add batching support when enabled. remember, no dynamic memory here, + * athough there is the control page available... + */ +static void native_hpte_clear(void) +{ + unsigned long slot, slots, flags; + HPTE *hptep = htab_address; + Hpte_dword0 dw0; + unsigned long pteg_count; + + pteg_count = htab_hash_mask + 1; + + local_irq_save(flags); + + /* we take the tlbie lock and hold it. Some hardware will + * deadlock if we try to tlbie from two processors at once. + */ + spin_lock(&native_tlbie_lock); + + slots = pteg_count * HPTES_PER_GROUP; + + for (slot = 0; slot < slots; slot++, hptep++) { + /* + * we could lock the pte here, but we are the only cpu + * running, right? and for crash dump, we probably + * don't want to wait for a maybe bad cpu. + */ + dw0 = hptep->dw0.dw0; + + if (dw0.v) { + hptep->dw0.dword0 = 0; + tlbie(slot2va(dw0.avpn, dw0.l, dw0.h, slot), dw0.l); + } + } + + spin_unlock(&native_tlbie_lock); + local_irq_restore(flags); +} + static void native_flush_hash_range(unsigned long context, unsigned long number, int local) { @@ -416,7 +460,8 @@ void hpte_init_native(void) ppc_md.hpte_updatepp = native_hpte_updatepp; ppc_md.hpte_updateboltedpp = native_hpte_updateboltedpp; ppc_md.hpte_insert = native_hpte_insert; - ppc_md.hpte_remove = native_hpte_remove; + ppc_md.hpte_remove = native_hpte_remove; + ppc_md.hpte_clear_all = native_hpte_clear; if (tlb_batching_enabled()) ppc_md.flush_hash_range = native_flush_hash_range; htab_finish_init(); diff -puN include/asm-ppc64/mmu.h~ppc64-native-hash-clear include/asm-ppc64/mmu.h --- linux-2.6.12-rc2-mm3/include/asm-ppc64/mmu.h~ppc64-native-hash-clear 2005-04-12 17:36:12.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/mmu.h 2005-04-12 17:36:35.000000000 +0530 @@ -164,6 +164,28 @@ static inline void tlbiel(unsigned long asm volatile("ptesync": : :"memory"); } +static inline unsigned long slot2va(unsigned long avpn, unsigned long large, + unsigned long secondary, unsigned long slot) +{ + unsigned long va; + + va = avpn << 23; + + if (!large) { + unsigned long vpi, pteg; + + pteg = slot / HPTES_PER_GROUP; + if (secondary) + pteg = ~pteg; + + vpi = ((va >> 28) ^ pteg) & htab_hash_mask; + + va |= vpi << PAGE_SHIFT; + } + + return va; +} + /* * Handle a fault by adding an HPTE. If the address can't be determined * to be valid via Linux page tables, return 1. If handled return 0 _ From sharada at in.ibm.com Thu Apr 14 00:09:48 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:39:48 +0530 Subject: [PATCH 3/3] kexec ppc64 In-Reply-To: <20050413140850.GC5081@in.ibm.com> References: <20050413140605.GA5081@in.ibm.com> <20050413140807.GB5081@in.ibm.com> <20050413140850.GC5081@in.ibm.com> Message-ID: <20050413140948.GD5081@in.ibm.com> kexec support for ppc64 platforms. A couple of notes: 1) We copy the pages in virtual mode, using the full base kernel and a statically allocated stack. At kexec_prepare time we scan the pages and if any overlap our (0, _end[]) range we return -ETXTBSY. On PowerPC 64 systems running in LPAR (logical partitioning) mode, only a small region of memory, referred to as the RMO, can be accessed in real mode. Since Linux runs with only one zone of memory in the memory allocator, and it can be orders of magnitude more memory than the RMO, looping until we allocate pages in the source region is not feasible. Copying in virtual means we don't have to write a hash table generation and call hypervisor to insert translations, instead we rely on the pinned kernel linear mapping. The kernel already has move to linked location built in, so there is no requirement to load it at 0. If we want to load something other than a kernel, then a stub can be written to copy a linear chunk in real mode. 2) The start entry point gets passed parameters from the kernel. Slaves are started at a fixed address after copying code from the entry point. All CPUs get passed their firmware assigned physical id in r3 (most calling conventions use this register for the first argument). This is used to distinguish each CPU from all other CPUs. Since firmware is not around, there is no other way to obtain this information other than to pass it somewhere. A single CPU, referred to here as the master and the one executing the kexec call, branches to start with the address of start in r4. While this can be calculated, we have to load it through a gpr to branch to this point so defining the register this is contained in is free. A stack of unspecified size is available at r1 (also common calling convention). All remaining running CPUs are sent to start at absolute address 0x60 after copying the first 0x100 bytes from start to address 0. This convention was chosen because it matches what the kernel has been doing itself. (only gpr3 is defined). Note: This is not quite the convention of the kexec bootblock v2 in the kernel. A stub has been written to convert between them, and we may adjust the kernel in the future to allow this directly without any stub. 3) Destination pages can be placed anywhere, even where they would not be accessible in real mode. This will allow us to place ram disks above the RMO if we choose. Signed-off-by: Milton Miller Signed-off-by: R Sharada --- linux-2.6.12-rc2-mm3-sharada/arch/ppc64/Kconfig | 17 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/Makefile | 1 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/head.S | 6 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/machine_kexec.c | 301 ++++++++++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/misc.S | 179 +++++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/mpic.c | 29 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/mpic.h | 3 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/pSeries_setup.c | 10 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/setup.c | 19 linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/xics.c | 25 linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/kexec.h | 45 + linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/machdep.h | 1 linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/xics.h | 1 13 files changed, 620 insertions(+), 17 deletions(-) diff -puN arch/ppc64/Kconfig~kexec-ppc64 arch/ppc64/Kconfig --- linux-2.6.12-rc2-mm3/arch/ppc64/Kconfig~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/Kconfig 2005-04-12 17:47:29.000000000 +0530 @@ -119,6 +119,23 @@ config PPC_SPLPAR processors, that is, which share physical processors between two or more partitions. +config KEXEC + bool "kexec system call (EXPERIMENTAL)" + depends on PPC_MULTIPLATFORM && EXPERIMENTAL + help + kexec is a system call that implements the ability to shutdown your + current kernel, and to start another kernel. It is like a reboot + but it is indepedent of the system firmware. And like a reboot + you can start any kernel with it, not just Linux. + + The name comes from the similiarity to the exec system call. + + It is an ongoing process to be certain the hardware in a machine + is properly shutdown, so do not be surprised if this code does not + initially work for you. It may help to enable device hotplugging + support. As of this writing the exact hardware interface is + strongly in flux, so no good recommendation can be made. + config IBMVIO depends on PPC_PSERIES || PPC_ISERIES bool diff -puN arch/ppc64/kernel/Makefile~kexec-ppc64 arch/ppc64/kernel/Makefile --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/Makefile~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/Makefile 2005-04-12 17:47:29.000000000 +0530 @@ -34,6 +34,7 @@ obj-$(CONFIG_PPC_PSERIES) += pSeries_pci pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o +obj-$(CONFIG_KEXEC) += machine_kexec.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o diff -puN arch/ppc64/kernel/head.S~kexec-ppc64 arch/ppc64/kernel/head.S --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/head.S~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/head.S 2005-04-12 17:47:29.000000000 +0530 @@ -1194,7 +1194,7 @@ _GLOBAL(pSeries_secondary_smp_init) bl .__restore_cpu_setup /* Set up a paca value for this processor. Since we have the - * physical cpu id in r3, we need to search the pacas to find + * physical cpu id in r24, we need to search the pacas to find * which logical id maps to our physical one. */ LOADADDR(r13, paca) /* Get base vaddr of paca array */ @@ -1207,8 +1207,8 @@ _GLOBAL(pSeries_secondary_smp_init) cmpwi r5,NR_CPUS blt 1b -99: HMT_LOW /* Couldn't find our CPU id */ - b 99b + mr r3,r24 /* not found, copy phys to r3 */ + b .kexec_wait /* next kernel might do better */ 2: mtspr SPRG3,r13 /* Save vaddr of paca in SPRG3 */ /* From now on, r24 is expected to be logica cpuid */ diff -puN /dev/null arch/ppc64/kernel/machine_kexec.c --- /dev/null 2005-04-12 01:39:31.008997000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/machine_kexec.c 2005-04-12 17:47:29.000000000 +0530 @@ -0,0 +1,301 @@ +/* + * machine_kexec.c - handle transition of Linux booting another kernel + * + * Copyright (C) 2004-2005, IBM Corp. + * + * Created by: Milton D Miller II + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include /* _end */ +#include + +#define HASH_GROUP_SIZE 0x80 /* size of each hash group, asm/mmu.h */ + +/* Have this around till we move it into crash specific file */ +note_buf_t crash_notes[NR_CPUS]; + +/* Dummy for now. Not sure if we need to have a crash shutdown in here + * and if what it will achieve. Letting it be now to compile the code + * in generic kexec environment + */ +void machine_crash_shutdown(void) +{ + /* do nothing right now */ + /* smp_relase_cpus() if we want smp on panic kernel */ + /* cpu_irq_down to isolate us until we are ready */ +} + +int machine_kexec_prepare(struct kimage *image) +{ + int i; + unsigned long begin, end; /* limits of segment */ + unsigned long low, high; /* limits of blocked memory range */ + struct device_node *node; + unsigned long *basep; + unsigned int *sizep; + + if (!ppc_md.hpte_clear_all) + return -ENOENT; + + /* + * Since we use the kernel fault handlers and paging code to + * handle the virtual mode, we must make sure no destination + * overlaps kernel static data or bss. + */ + for(i = 0; i < image->nr_segments; i++) + if (image->segment[i].mem < __pa(_end)) + return -ETXTBSY; + + /* + * For non-LPAR, we absolutely can not overwrite the mmu hash + * table, since we are still using the bolted entries in it to + * do the copy. Check that here. + * + * It is safe if the end is below the start of the blocked + * region (end <= low), or if the beginning is after the + * end of the blocked region (begin >= high). Use the + * boolean identity !(a || b) === (!a && !b). + */ + if (htab_address) { + low = __pa(htab_address); + high = low + (htab_hash_mask + 1) * HASH_GROUP_SIZE; + + for(i = 0; i < image->nr_segments; i++) { + begin = image->segment[i].mem; + end = begin + image->segment[i].memsz; + + if ((begin < high) && (end > low)) + return -ETXTBSY; + } + } + + /* We also should not overwrite the tce tables */ + for (node = of_find_node_by_type(NULL, "pci"); node != NULL; + node = of_find_node_by_type(node, "pci")) { + basep = (unsigned long *)get_property(node, "linux,tce-base", + NULL); + sizep = (unsigned int *)get_property(node, "linux,tce-size", + NULL); + if (basep == NULL || sizep == NULL) + continue; + + low = *basep; + high = low + (*sizep); + + for(i = 0; i < image->nr_segments; i++) { + begin = image->segment[i].mem; + end = begin + image->segment[i].memsz; + + if ((begin < high) && (end > low)) + return -ETXTBSY; + } + } + + return 0; +} + +void machine_kexec_cleanup(struct kimage *image) +{ + /* we do nothing in prepare that needs to be undone */ +} + +#define IND_FLAGS (IND_DESTINATION | IND_INDIRECTION | IND_DONE | IND_SOURCE) + +static void copy_segments(unsigned long ind) +{ + unsigned long entry; + unsigned long *ptr; + void *dest; + void *addr; + + /* + * We rely on kexec_load to create a lists that properly + * initializes these pointers before they are used. + * We will still crash if the list is wrong, but at least + * the compiler will be quiet. + */ + ptr = NULL; + dest = NULL; + + for (entry = ind; !(entry & IND_DONE); entry = *ptr++) { + addr = __va(entry & PAGE_MASK); + + switch (entry & IND_FLAGS) { + case IND_DESTINATION: + dest = addr; + break; + case IND_INDIRECTION: + ptr = addr; + break; + case IND_SOURCE: + copy_page(dest, addr); + dest += PAGE_SIZE; + } + } +} + +void kexec_copy_flush(struct kimage *image) +{ + long i, nr_segments = image->nr_segments; + struct kexec_segment ranges[KEXEC_SEGMENT_MAX]; + + /* save the ranges on the stack to efficiently flush the icache */ + memcpy(ranges, image->segment, sizeof(ranges)); + + /* + * After this call we may not use anything allocated in dynamic + * memory, including *image. + * + * Only globals and the stack are allowed. + */ + copy_segments(image->head); + + /* + * we need to clear the icache for all dest pages sometime, + * including ones that were in place on the original copy + */ + for (i = 0; i < nr_segments; i++) + flush_icache_range(ranges[i].mem + KERNELBASE, + ranges[i].mem + KERNELBASE + + ranges[i].memsz); +} + +#ifdef CONFIG_SMP + +/* FIXME: we should schedule this function to be called on all cpus based + * on calling the interrupts, but we would like to call it off irq level + * so that the interrupt controller is clean. + */ +void kexec_smp_down(void *arg) +{ + if (ppc_md.cpu_irq_down) + ppc_md.cpu_irq_down(); + + local_irq_disable(); + kexec_smp_wait(); + /* NOTREACHED */ +} + +static void kexec_prepare_cpus(void) +{ + int my_cpu, i, notified=-1; + + smp_call_function(kexec_smp_down, NULL, 0, /* wait */0); + my_cpu = get_cpu(); + + /* check the others cpus are now down (via paca hw cpu id == -1) */ + for (i=0; i < NR_CPUS; i++) { + if (i == my_cpu) + continue; + + while (paca[i].hw_cpu_id != -1) { + if (!cpu_possible(i)) { + printk("kexec: cpu %d hw_cpu_id %d is not" + " possible, ignoring\n", + i, paca[i].hw_cpu_id); + break; + } + if (!cpu_online(i)) { + /* Fixme: this can be spinning in + * pSeries_secondary_wait with a paca + * waiting for it to go online. + */ + printk("kexec: cpu %d hw_cpu_id %d is not" + " online, ignoring\n", + i, paca[i].hw_cpu_id); + break; + } + if (i != notified) { + printk( "kexec: waiting for cpu %d (physical" + " %d) to go down\n", + i, paca[i].hw_cpu_id); + notified = i; + } + } + } + + /* after we tell the others to go down */ + if (ppc_md.cpu_irq_down) + ppc_md.cpu_irq_down(); + + put_cpu(); + + local_irq_disable(); +} + +#else /* ! SMP */ + +static void kexec_prepare_cpus(void) +{ + /* + * move the secondarys to us so that we can copy + * the new kernel 0-0x100 safely + * + * do this if kexec in setup.c ? + */ + smp_relase_cpus(); + if (ppc_md.cpu_irq_down) + ppc_md.cpu_irq_down(); + local_irq_disable(); +} + +#endif /* SMP */ + +/* + * kexec thread structure and stack. + * + * We need to make sure that this is 16384-byte aligned due to the + * way process stacks are handled. It also must be statically allocated + * or allocated as part of the kimage, because everything else may be + * overwritten when we copy the kexec image. We piggyback on the + * "init_task" linker section here to statically allocate a stack. + * + * We could use a smaller stack if we don't care about anything using + * current, but that audit has not been performed. + */ +union thread_union kexec_stack + __attribute__((__section__(".data.init_task"))) = { }; + +/* Our assembly helper, in kexec_stub.S */ +extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start, + void *image, void *control, void (*clear_all)(void)) ATTRIB_NORET; + +/* too late to fail here */ +void machine_kexec(struct kimage *image) +{ + + /* prepare control code if any */ + + /* shutdown other cpus into our wait loop and quiesce interrupts */ + kexec_prepare_cpus(); + + /* switch to a staticly allocated stack. Based on irq stack code. + * XXX: the task struct will likely be invalid once we do the copy! + */ + kexec_stack.thread_info.task = current_thread_info()->task; + kexec_stack.thread_info.flags = 0; + + /* Some things are best done in assembly. Finding globals with + * a toc is easier in C, so pass in what we can. + */ + kexec_sequence(&kexec_stack, image->start, image, + page_address(image->control_code_page), + ppc_md.hpte_clear_all); + /* NOTREACHED */ +} diff -puN arch/ppc64/kernel/misc.S~kexec-ppc64 arch/ppc64/kernel/misc.S --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/misc.S~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/misc.S 2005-04-12 17:47:29.000000000 +0530 @@ -680,6 +680,181 @@ _GLOBAL(kernel_thread) ld r30,-16(r1) blr +/* kexec_wait(phys_cpu) + * + * wait for the flag to change, indicating this kernel is going away but + * the slave code for the next one is at addresses 0 to 100. + * + * This is used by all slaves. + * + * Physical (hardware) cpu id should be in r3. + */ +_GLOBAL(kexec_wait) + bl 1f +1: mflr r5 + addi r5,5,kexec_flag-1b + +99: HMT_LOW +#ifdef CONFIG_KEXEC /* use no memory without kexec */ + lwz r4,0(r5) + cmpwi 0,r4,0 + bnea 0x60 +#endif + b 99b + +/* this can be in text because we won't change it until we are + * running in real anyways + */ +kexec_flag: + .long 0 + + +#ifdef CONFIG_KEXEC + +/* kexec_smp_wait(void) + * + * call with interrupts off + * note: this is a terminal routine, it does not save lr + * + * get phys id from paca + * set paca id to -1 to say we got here + * switch to real mode + * join other cpus in kexec_wait(phys_id) + */ +_GLOBAL(kexec_smp_wait) + lhz r3,PACAHWCPUID(r13) + li r4,-1 + sth r4,PACAHWCPUID(r13) /* let others know we left */ + bl real_mode + b .kexec_wait + +/* + * switch to real mode (turn mmu off) + * we use the early kernel trick that the hardware ignores bits + * 0 and 1 (big endian) of the effective address in real mode + * + * don't overwrite r3 here, it is live for kexec_wait above. + */ +real_mode: /* assume normal blr return */ +1: li r9,MSR_RI + li r10,MSR_DR|MSR_IR + mflr r11 /* return address to SRR0 */ + mfmsr r12 + andc r9,r12,r9 + andc r10,r12,r10 + + mtmsrd r9,1 + mtspr SPRN_SRR1,r10 + mtspr SPRN_SRR0,r11 + rfid + + +/* + * kexec_sequence(newstack, start, image, control, clear_all()) + * + * does the grungy work with stack switching and real mode switches + * also does simple calls to other code + */ + +_GLOBAL(kexec_sequence) + mflr r0 + std r0,16(r1) + + /* switch stacks to newstack -- &kexec_stack.stack */ + stdu r1,THREAD_SIZE-112(r3) + mr r1,r3 + + li r0,0 + std r0,16(r1) + + /* save regs for local vars on new stack. + * yes, we won't go back, but ... + */ + std r31,-8(r1) + std r30,-16(r1) + std r29,-24(r1) + std r28,-32(r1) + std r27,-40(r1) + std r26,-48(r1) + std r25,-56(r1) + + stdu r1,-112-64(r1) + + /* save args into preserved regs */ + mr r31,r3 /* newstack (both) */ + mr r30,r4 /* start (real) */ + mr r29,r5 /* image (virt) */ + mr r28,r6 /* control, unused */ + mr r27,r7 /* clear_all() fn desc */ + mr r26,r8 /* spare */ + lhz r25,PACAHWCPUID(r13) /* get our phys cpu from paca */ + + /* disable interrupts, we are overwriting kernel data next */ + mfmsr r3 + rlwinm r3,r3,0,17,15 + mtmsrd r3,1 + + /* copy dest pages, flush whole dest image */ + mr r3,r29 + bl .kexec_copy_flush /* (image) */ + + /* turn off mmu */ + bl real_mode + + /* clear out hardware hash page table and tlb */ + ld r5,0(r27) /* deref function descriptor */ + mtctr r5 + bctrl /* ppc_md.hash_clear_all(void); */ + +/* + * kexec image calling is: + * the first 0x100 bytes of the entry point are copied to 0 + * + * all slaves branch to slave = 0x60 (absolute) + * slave(phys_cpu_id); + * + * master goes to start = entry point + * start(phys_cpu_id, start, 0); + * + * + * a wrapper is needed to call existing kernels, here is an approximate + * description of one method: + * + * v2: (2.6.10) + * start will be near the boot_block (maybe 0x100 bytes before it?) + * it will have a 0x60, which will b to boot_block, where it will wait + * and 0 will store phys into struct boot-block and load r3 from there, + * copy kernel 0-0x100 and tell slaves to back down to 0x60 again + * + * v1: (2.6.9) + * boot block will have all cpus scanning device tree to see if they + * are the boot cpu ????? + * other device tree differences (prop sizes, va vs pa, etc)... + */ + + /* copy 0x100 bytes starting at start to 0 */ + li r3,0 + mr r4,r30 + li r5,0x100 + li r6,0 + bl .copy_and_flush /* (dest, src, copy limit, start offset) */ +1: /* assume normal blr return */ + + /* release other cpus to the new kernel secondary start at 0x60 */ + mflr r5 + li r6,1 + stw r6,kexec_flag-1b(5) + mr r3,r25 # my phys cpu + mr r4,r30 # start, aka phys mem offset + mtlr 4 + li r5,0 + blr /* image->start(physid, image->start, 0); */ +#endif /* CONFIG_KEXEC */ + +#ifdef CONFIG_PPC_RTAS /* hack hack hack */ +#define ppc_rtas sys_ni_syscall +#endif + /* Why isn't this a) automatic, b) written in 'C'? */ .balign 8 _GLOBAL(sys_call_table32) @@ -951,7 +1126,7 @@ _GLOBAL(sys_call_table32) .llong .compat_sys_mq_timedreceive /* 265 */ .llong .compat_sys_mq_notify .llong .compat_sys_mq_getsetattr - .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .compat_sys_kexec_load /* 268 reserved for sys_kexec_load */ .llong .sys32_add_key .llong .sys32_request_key .llong .compat_sys_keyctl @@ -1233,7 +1408,7 @@ _GLOBAL(sys_call_table) .llong .sys_mq_timedreceive /* 265 */ .llong .sys_mq_notify .llong .sys_mq_getsetattr - .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys_kexec_load /* 268 reserved for sys_kexec_load */ .llong .sys_add_key .llong .sys_request_key /* 270 */ .llong .sys_keyctl diff -puN arch/ppc64/kernel/mpic.c~kexec-ppc64 arch/ppc64/kernel/mpic.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/mpic.c~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/mpic.c 2005-04-12 17:47:29.000000000 +0530 @@ -792,6 +792,35 @@ void mpic_setup_this_cpu(void) #endif /* CONFIG_SMP */ } +/* + * XXX: someone who knows mpic should check this. + * do we need to eoi the ipi here (see xics comments)? + * or can we reset the mpic in the new kernel? + */ +void mpic_teardown_this_cpu(void) +{ + struct mpic *mpic = mpic_primary; + unsigned long flags; + u32 msk = 1 << hard_smp_processor_id(); + unsigned int i; + + BUG_ON(mpic == NULL); + + DBG("%s: teardown_this_cpu(%d)\n", mpic->name, hard_smp_processor_id()); + spin_lock_irqsave(&mpic_lock, flags); + + /* let the mpic know we don't want intrs. */ + for (i = 0; i < mpic->num_sources ; i++) + mpic_irq_write(i, MPIC_IRQ_DESTINATION, + mpic_irq_read(i, MPIC_IRQ_DESTINATION) & ~msk); + + /* Set current processor priority to max */ + mpic_cpu_write(MPIC_CPU_CURRENT_TASK_PRI, 0xf); + + spin_unlock_irqrestore(&mpic_lock, flags); +} + + void mpic_send_ipi(unsigned int ipi_no, unsigned int cpu_mask) { struct mpic *mpic = mpic_primary; diff -puN arch/ppc64/kernel/mpic.h~kexec-ppc64 arch/ppc64/kernel/mpic.h --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/mpic.h~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/mpic.h 2005-04-12 17:47:29.000000000 +0530 @@ -255,6 +255,9 @@ extern unsigned int mpic_irq_get_priorit /* Setup a non-boot CPU */ extern void mpic_setup_this_cpu(void); +/* Clean up for kexec (or cpu offline or ...) */ +extern void mpic_teardown_this_cpu(void); + /* Request IPIs on primary mpic */ extern void mpic_request_ipis(void); diff -puN arch/ppc64/kernel/pSeries_setup.c~kexec-ppc64 arch/ppc64/kernel/pSeries_setup.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/pSeries_setup.c~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/pSeries_setup.c 2005-04-12 17:47:29.000000000 +0530 @@ -195,14 +195,16 @@ static void __init pSeries_setup_arch(vo { /* Fixup ppc_md depending on the type of interrupt controller */ if (ppc64_interrupt_controller == IC_OPEN_PIC) { - ppc_md.init_IRQ = pSeries_init_mpic; - ppc_md.get_irq = mpic_get_irq; + ppc_md.init_IRQ = pSeries_init_mpic; + ppc_md.get_irq = mpic_get_irq; + ppc_md.cpu_irq_down = mpic_teardown_this_cpu; /* Allocate the mpic now, so that find_and_init_phbs() can * fill the ISUs */ pSeries_setup_mpic(); } else { - ppc_md.init_IRQ = xics_init_IRQ; - ppc_md.get_irq = xics_get_irq; + ppc_md.init_IRQ = xics_init_IRQ; + ppc_md.get_irq = xics_get_irq; + ppc_md.cpu_irq_down = xics_teardown_cpu; } #ifdef CONFIG_SMP diff -puN arch/ppc64/kernel/setup.c~kexec-ppc64 arch/ppc64/kernel/setup.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/setup.c~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/setup.c 2005-04-12 17:47:29.000000000 +0530 @@ -677,29 +677,32 @@ void __init setup_system(void) DBG(" <- setup_system()\n"); } - -void machine_restart(char *cmd) +/* also used by kexec */ +void machine_shutdown(void) { if (ppc_md.nvram_sync) ppc_md.nvram_sync(); +} + +void machine_restart(char *cmd) +{ + machine_shutdown(); ppc_md.restart(cmd); } EXPORT_SYMBOL(machine_restart); - + void machine_power_off(void) { - if (ppc_md.nvram_sync) - ppc_md.nvram_sync(); + machine_shutdown(); ppc_md.power_off(); } EXPORT_SYMBOL(machine_power_off); - + void machine_halt(void) { - if (ppc_md.nvram_sync) - ppc_md.nvram_sync(); + machine_shutdown(); ppc_md.halt(); } diff -puN arch/ppc64/kernel/xics.c~kexec-ppc64 arch/ppc64/kernel/xics.c --- linux-2.6.12-rc2-mm3/arch/ppc64/kernel/xics.c~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/arch/ppc64/kernel/xics.c 2005-04-12 17:47:29.000000000 +0530 @@ -647,6 +647,31 @@ static void xics_set_affinity(unsigned i } } +void xics_teardown_cpu(void) +{ + int cpu = smp_processor_id(); + int status; + + ops->cppr_info(cpu, 0x00); + iosync(); + + /* + * we need to EOI the IPI if we got here from kexec down IPI + * + * xics doesn't care if we duplicate an EOI as long as we + * don't EOI and raise priority. + * + * probably need to check all the other interrupts too + * should we be flagging idle loop instead? + * or creating some task to be scheduled? + */ + ops->xirr_info_set(cpu, XICS_IPI); + + status = rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, + (1UL << interrupt_server_size) - 1 - default_distrib_server, 0); + WARN_ON(status != 0); +} + #ifdef CONFIG_HOTPLUG_CPU /* Interrupts are disabled. */ diff -puN /dev/null include/asm-ppc64/kexec.h --- /dev/null 2005-04-12 01:39:31.008997000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/kexec.h 2005-04-12 17:48:16.000000000 +0530 @@ -0,0 +1,45 @@ +#ifndef _PPC64_KEXEC_H +#define _PPC64_KEXEC_H + +/* + * KEXEC_SOURCE_MEMORY_LIMIT maximum page get_free_page can return. + * I.e. Maximum page that is mapped directly into kernel memory, + * and kmap is not required. + * + * Someone correct me if FIXADDR_START - PAGEOFFSET is not the correct + * calculation for the amount of memory directly mappable into the + * kernel memory space. + */ + +/* Maximum physical address we can use pages from */ +/* XXX: since we copy virt we can use any page we allocate */ +#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL) + +/* Maximum address we can reach in physical address mode */ +/* XXX: I want to allow initrd in highmem. otherwise set to rmo on lpar */ +#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL) + +/* Maximum address we can use for the control code buffer */ +/* XXX: unused today, ppc32 uses TASK_SIZE, probably left over from use_mm */ +#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL) + +/* XXX: today we don't use this at all, althogh we have a static stack */ +#define KEXEC_CONTROL_CODE_SIZE 4096 + +/* The native architecture */ +#define KEXEC_ARCH KEXEC_ARCH_PPC64 + +#define MAX_NOTE_BYTES 1024 + +#ifndef __ASSEMBLY__ + +typedef u32 note_buf_t[MAX_NOTE_BYTES/4]; + +extern note_buf_t crash_notes[]; + +extern void kexec_smp_wait(void); /* get and clear naca physid, wait for + master to copy new code to 0 */ + +#endif /* __ASSEMBLY__ */ +#endif /* _PPC_KEXEC_H */ + diff -puN include/asm-ppc64/machdep.h~kexec-ppc64 include/asm-ppc64/machdep.h --- linux-2.6.12-rc2-mm3/include/asm-ppc64/machdep.h~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/machdep.h 2005-04-12 17:47:29.000000000 +0530 @@ -85,6 +85,7 @@ struct machdep_calls { void (*init_IRQ)(void); int (*get_irq)(struct pt_regs *); + void (*cpu_irq_down)(void); /* PCI stuff */ void (*pcibios_fixup)(void); diff -puN include/asm-ppc64/xics.h~kexec-ppc64 include/asm-ppc64/xics.h --- linux-2.6.12-rc2-mm3/include/asm-ppc64/xics.h~kexec-ppc64 2005-04-12 17:46:06.000000000 +0530 +++ linux-2.6.12-rc2-mm3-sharada/include/asm-ppc64/xics.h 2005-04-12 17:47:29.000000000 +0530 @@ -17,6 +17,7 @@ void xics_init_IRQ(void); int xics_get_irq(struct pt_regs *); void xics_setup_cpu(void); +void xics_teardown_cpu(void); void xics_cause_IPI(int cpu); void xics_request_IPIs(void); void xics_migrate_irqs_away(void); _ From sharada at in.ibm.com Thu Apr 14 00:11:57 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:41:57 +0530 Subject: [0/5] kexec tools for ppc64 Message-ID: <20050413141157.GE5081@in.ibm.com> The following set of files from Milton Miller, provide a basic tools package to test and use the kexec ppc64 support in kernel. The tools package contain the following files - README - Makefile - fs2dt.c - This tool creates a flattened device tree from the /proc/device-tree interface - loadem.c - This tool loads the kexec kernel and allows to kexec to the new kernel - v2wrap.S - This tool provides the trampoline code for the transition from the old kernel to the new kernel The tools package can be downloaded as a tarball from the following link http://www.kernel.org/pub/linux/kernel/people/suparna/kdump/ppc64-tools-20050331.tar.gz We are planning to integrate this with kexec-tools package from Eric Biederman. Thanks and Regards, Sharada From sharada at in.ibm.com Thu Apr 14 00:12:39 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:42:39 +0530 Subject: [1/5] README In-Reply-To: <20050413141157.GE5081@in.ibm.com> References: <20050413141157.GE5081@in.ibm.com> Message-ID: <20050413141239.GF5081@in.ibm.com> This package contains files used to develop and test kexec on the ppc64 platform. Specifically, v2wrap, a minimal wrapper to call the kernel entry point from the kexec entrypoint context, fs2dt, a program to generate a boot-block and device tree structure (the structured arguments the kernel expects to find on entry) from a file system representation, and loadem, a minimalist program to call kexec with user specified arguments ("load the contents of this file here"). After the description are some examples, followed by some bulid and design notes. Description: v2wrap Stub assembly code for trampoline during kexec boot, converting the master cpu thread's argument registers from those provided by kexec to those expected by the 2.6.10 or later kernel. The boot-block and device-tree are assumed to immediately follow the code, and kernel is located by reading the last word in this stub, which is patched by the kexec user space tools. Since some platforms do not allow cpus to be stopped and restarted, they must always be running valid code, thus expanding the trampoline from a simple store and a load. Secondary cpus are expected to start in a copy of this code located at address 0x60. The secondary cpus are told to move to the master copy of the code, the kernel code is copied to address 0, and then the secondary are moved to the kernels code block before transfering the master cpu thread to the new kernel. fs2dt fs2dt creates a flattened data block of the device-tree from a file system representation, such as that provided by the kernel in /proc/device-tree. This flatened data block is passed by reference to the new kernel (the address is placed in a register). Since this block itself must be placed in the contained list of memory regions to reserve, the address to load this block of data is passed as an argument with the -b flag. The syntax for invoking the tool is ./fs2dt -b eg, ./fs2dt -b 0x5000100 /proc/device-tree The tool outputs to stdout, which can be redirected and stored in a to file, or passed directly to another tool like loadem. loadem loadem is a small program that can invoke the either the reboot syscall with kexec arg, or read files (and/or stdin) into memory and call the kexec_load syscall without any interpretation of the loaded contents. Although written to test the ppc64 kexec kernel code, it should work on any architecture (the -c option should have a trival change for little endian). Arguments supported by loadem: -a => Start a new segment at the given address -c => chain this segment -- place the current segments address anchored at at offset bytes into the previous segment -e => save the current address as the kexec entry point -f => load this file into the current segment -k => perform a kexec reboot -p => load the kernel for kexec reboot under panic case -r => round up the segment size to the given page-size boundary (the beginning of a segment must be page aligned by the user) -s => read input from stdin and add to the current segment -t => provide the native architecture (this is needed when we are running the loadem as a 32bit utility on a 64 bit kernel) -z => zero extend the segment by increasing memsz Examples: In the examples below, the trampoline v2wrap and the device-tree generated by fs2dt are loaded into the first segment. The second segment is the actual the kernel to be loaded, which can be obtained by the objcopy command from the vmlinux. The word anchored (ending) at offset 100 in the first segment is patched with the address of the beginning of the second segment, providing v2wrap the lcoation of the kernel. Optionally, a ramdisk can be loaded into an additional segment which the kernel would locate via properties placed in the device-tree. On PPC64, the kernel can be loaded at any word-aligned address that fits within the real mode limit and the kernel will copy itself into its linked execution location. Extract the kernel load image from the ELF file: objcopy -O binary vmlinux vmlinux.kexec Example for testing a simple kernel, without ramdisk (one line): ./fs2dt -b 0x5000100 /proc/device-tree/ | ./loadem -t 21 -a 0x5000000 -e -f v2wrap -s -a 0x6000000 -c 0x100 -f vmlinux.kexec -r 0x1000 Example loading two cpios into an initramfs at address 38000000 and clearing 1MB at 768MB on PPC64 with 4k page size (one line): /fs2dt -b 0x5000100 /proc/device-tree/ | /loadem -t 0x15 -r 0x1000 -a 0x5000000 -e -f /v2wrap -s -a 0x6000000 -c 0x100 -f vmlinux.kexec -a 0x38000000 -f fs_large.cpio.gz -4 -f kexec.cpio.gz -a 0x30000000 -z 0x10000 Notes: The programs were developed with the following goals in mind: 1) Do only what is necessary 2) Simple, easy to port 3) Provide explicit control to a power user 3) Be of minimal size 4) Easily ported to other libraries These goals are reflected in choices such as minimalist error reporting and making no calls to printf. Goals 3 and 4 are intended to target the tools to a limited initramfs environment. While the utilities can be compiled as 32-bit or 64-bit programs, the included Makefile creates 32-bit static executables. When using a 64-bit kernel, the compat wrapper requires the architecture to be specified explicitly. This can be achieved with the -t flag. In additon to the files included, loadem.c includes kexec-syscall.h which can be obtained from the kexec-tools package. As of our testing with version 1.99, the file was located in the kexec subdirectory. Simply copy it into the directory you unpack these files. The anchor point used for the -c flag is defined the address of the end of the word that will not move as the word size changes. Using this scheme means the anchor point is independent of word size. For big endian machines, this is the address of the byte following the word, or a word offset of -1. For little endian machines, offset 0 should be used and the chain point would be the first byte. Note that the code currently does not detect endian, but is hardcoded for big endian. Passing an initramfs or initrd to a ppc64 kernel requires the addresses to be placed as properties in the device tree. While fs2dt will automatically reserve the memory for the initrd, changing the initrd size or load address requires copying the device-tree and changing binary files in the copied tree. From sharada at in.ibm.com Thu Apr 14 00:13:01 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:43:01 +0530 Subject: [2/5] Makefile In-Reply-To: <20050413141239.GF5081@in.ibm.com> References: <20050413141157.GE5081@in.ibm.com> <20050413141239.GF5081@in.ibm.com> Message-ID: <20050413141301.GG5081@in.ibm.com> CC=/usr/local/ppc64-4.0/bin/gcc CFLAGS=-m64 --static -ffreestanding #CFLAGS=-m32 --static -ffreestanding BACKUP=../cpy/ kexec.cpio.gz: kexec.cpio gzip -c9 $< > $@ gzip -l $@ kexec.cpio: loadem v2wrap fs2dt ls -1d $^ | cpio -C4 -ovHnewc > $@ loadem: loadem.c kexec-syscall.h fs2dt: fs2dt.c v2wrap.o: v2wrap.S v2wrap.elf: v2wrap.o ld -Ttext=0 -e 0 -o v2wrap.elf v2wrap.o v2wrap: v2wrap.elf objcopy -O binary v2wrap.elf v2wrap clean: sed -ne 's/^\([^=:#]*\):.*/\1/p' Makefile | xargs rm -f backup: clean ls | cpio -ovHnewc > $(BACKUP)tools.`date +%Y%m%d%H%M` From sharada at in.ibm.com Thu Apr 14 00:14:03 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:44:03 +0530 Subject: [3/5] flattened device tree creation In-Reply-To: <20050413141301.GG5081@in.ibm.com> References: <20050413141157.GE5081@in.ibm.com> <20050413141239.GF5081@in.ibm.com> <20050413141301.GG5081@in.ibm.com> Message-ID: <20050413141403.GH5081@in.ibm.com> /* * fs2dt: creates a flattened device-tree * * Copyright (C) 2004,2005 Milton D Miller II, IBM Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation (version 2 of the License). * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include #include #include #include #define MAXPATH 1024 /* max path name length */ #define NAMESPACE 16384 /* max bytes for property names */ #define TREEWORDS 65536 /* max 32 bit words for property values */ #define MEMRESERVE 256 /* max number of reserved memory blocks */ enum { ERR_NONE, ERR_USAGE, ERR_OPENDIR, ERR_READDIR, ERR_STAT, ERR_OPEN, ERR_READ, ERR_RESERVE, }; void err(const char *str, int rc) { if (errno) perror(str); else fprintf(stderr, "%s: unrecoverable error\n", str); exit(rc); } typedef unsigned dvt; struct stat statbuf[1]; char pathname[MAXPATH], *pathstart; char propnames[NAMESPACE]; dvt dtstruct[TREEWORDS], *dt; unsigned long long mem_rsrv[2*MEMRESERVE]; void reserve(unsigned long long where, unsigned long long length) { unsigned long long *mr; mr = mem_rsrv; while(mr[1]) mr += 2; mr[0] = where; mr[1] = length; } /* look for properties we need to reserve memory space for */ void checkprop(char *name, dvt *data) { static unsigned long long base, size, end; if ((data == NULL) && (base || size || end)) err((void *)data, ERR_RESERVE); else if (!strcmp(name, "linux,rtas-base")) base = *data; else if (!strcmp(name, "linux,initrd-start") || !strcmp(name, "linux,tce-base")) base = *(unsigned long long *) data; else if (!strcmp(name, "rtas-size") || !strcmp(name, "linux,tce-size")) size = *data; else if (!strcmp(name, "linux,initrd-end")) end = *(unsigned long long *) data; if (size && end) err(name, ERR_RESERVE); if (base && size) { reserve(base, size); base = size = 0; } if (base && end) { reserve(base, end-base); base = end = 0; } } /* * return the property index for a property name, creating a new one * if needed. */ dvt propnum(const char *name) { dvt offset = 0; while(propnames[offset]) if (strcmp(name, propnames+offset)) offset += strlen(propnames+offset)+1; else return offset; strcpy(propnames+offset, name); return offset; } /* put all properties (files) in the property structure */ void putprops(char *fn, DIR *dir) { struct dirent *dp; while ((dp = readdir(dir)) != NULL) { strcpy(fn, dp->d_name); if (lstat(pathname, statbuf)) err(pathname, ERR_STAT); if (S_ISREG(statbuf[0].st_mode)) { int fd, len = statbuf[0].st_size; *dt++ = 3; *dt++ = len; *dt++ = propnum(fn); if ((len >= 8) && ((unsigned long)dt & 0x4)) dt++; fd = open(pathname, O_RDONLY); if (fd == -1) err(pathname, ERR_OPEN); if (read(fd, dt, len) != len) err(pathname, ERR_READ); close(fd); checkprop(fn, dt); dt += (len + 3)/4; } } fn[0] = '\0'; if (errno) err(pathname, ERR_READDIR); checkprop(pathname, NULL); } /* * put a node (directory) in the property structure. first properties * then children. */ void putnode(void) { DIR *dir; char *dn; struct dirent *dp; *dt++ = 1; strcpy((void *)dt, *pathstart ? pathstart : "/"); while(*dt) dt++; if (dt[-1] & 0xff) dt++; dir = opendir(pathname); if (!dir) err(pathname, ERR_OPENDIR); strcat(pathname, "/"); dn = pathname + strlen(pathname); putprops(dn, dir); rewinddir(dir); while ((dp = readdir(dir)) != NULL) { strcpy(dn, dp->d_name); if (!strcmp(dn, ".") || !strcmp(dn, "..")) continue; if (lstat(pathname, statbuf)) err(pathname, ERR_STAT); if (S_ISDIR(statbuf[0].st_mode)) putnode(); } if (errno) err(pathname, ERR_READDIR); *dt++ = 2; closedir(dir); dn[-1] = '\0'; } /* boot block version 2 as defined by the linux kernel */ struct bootblock { unsigned magic, totalsize, off_dt_struct, off_dt_strings, off_mem_rsvmap, version, last_comp_version, boot_physid; } bb[1]; main(int argc, char *argv[], char *envp[]) { unsigned len; unsigned long long me; me = 0x01ff8000; while (1) switch(getopt(argc, argv, "c:b:")) { case -1: goto opt; case 'c': bb->boot_physid = strtoul(optarg, NULL, 0); break; case 'b': me = strtoull(optarg, NULL, 0); break; default: fprintf(stderr, "usage: fs2dt [-c cpu ] [-b base] dir\n"); err(argv[0], ERR_USAGE); } opt: if (optind < argc) strcpy(pathname, argv[optind]); else strcpy(pathname, "."); pathstart = pathname + strlen(pathname); dt = dtstruct; putnode(); *dt++ = 9; len = sizeof(bb[0]); len += 7; len &= ~7; bb->off_mem_rsvmap = len; for (len = 1; mem_rsrv[len]; len += 2) ; len+= 3; len *= sizeof(mem_rsrv[0]); bb->off_dt_struct = bb->off_mem_rsvmap + len; len = dt - dtstruct; len *= sizeof(dvt); bb->off_dt_strings = bb->off_dt_struct + len; len = propnum(""); len += 3; len &= ~3; bb->totalsize = bb->off_dt_strings + len; bb->magic = 0xd00dfeed; bb->version = 2; bb->last_comp_version = 2; reserve(me, bb->totalsize); write(1, bb, bb->off_mem_rsvmap); write(1, mem_rsrv, bb->off_dt_struct - bb->off_mem_rsvmap); write(1, dtstruct, bb->off_dt_strings - bb->off_dt_struct); write(1, propnames, bb->totalsize - bb->off_dt_strings); exit(ERR_NONE); } From sharada at in.ibm.com Thu Apr 14 00:14:43 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:44:43 +0530 Subject: [4/5] load and boot kexec kernel In-Reply-To: <20050413141403.GH5081@in.ibm.com> References: <20050413141157.GE5081@in.ibm.com> <20050413141239.GF5081@in.ibm.com> <20050413141301.GG5081@in.ibm.com> <20050413141403.GH5081@in.ibm.com> Message-ID: <20050413141443.GI5081@in.ibm.com> /* * loadem: bare-bones load and execute something with kexec * * Copyright (C) 2004 - 2005 Milton D Miller II, IBM Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation (version 2 of the License). * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include #include #include #include #include struct proper_kexec_segment { void *buf; size_t bufsz; size_t mem; size_t memsz; }; struct kexec_segment { void *buf; size_t bufsz; void *mem; size_t memsz; }; extern void __fix_the_kexec_segment_structures(void); #include "kexec-syscall.h" void err(char *what, char *which); void use(char *why); #define __str(x) #x #define str(x) __str(x) struct proper_kexec_segment seg[KEXEC_MAX_SEGMENTS]; unsigned long entry; unsigned long flags; unsigned long mask; struct stat statbuf[1]; char *myname; #include int main(int argc, char *argv[]) { int f=-1, fd, t, arg; unsigned long *lp; char *cp; void *p; if (sizeof(struct proper_kexec_segment) != sizeof(struct kexec_segment)) __fix_the_kexec_segment_structures(); myname = argv[0]; for (cp = myname; *cp; cp++) if (*cp == '/') myname = cp + 1; for(;;) switch (arg = getopt(argc, argv, "a:c:ef:kpr:st:z:48")) { case 'a': /* at Address */ if (++f >= KEXEC_MAX_SEGMENTS) use("Maximum number of segments (" str(KEXEC_MAX_SEGMENT) ") exceeded.\n"); seg[f].mem = strtoul(optarg, NULL, 0); break; case 'c': /* Chain previous */ if (f < 1) use("-c: can not chain first file"); lp = seg[f-1].buf + strtoul(optarg, NULL, 0); lp[-1] = seg[f].mem; /* Big endian, LE use 0 */ break; case 'e': /* Entry point */ entry = seg[f].mem + seg[f].memsz; break; case 'f': /* load File */ fd = open(optarg, O_RDONLY); if (fd == -1) err("open", optarg); if (fstat(fd, statbuf)) err("stat", optarg); t = statbuf[0].st_size; p = realloc(seg[f].buf, seg[f].bufsz + t); if (!p) err("malloc", optarg); seg[f].buf = p; if (read(fd, seg[f].buf + seg[f].bufsz, t) != t) err("read", optarg); close(fd); seg[f].bufsz += t; seg[f].memsz += t; break; case 'k': /* Kexec now */ kexec_reboot(); break; case 'p': /* load Panic kernel */ flags |= KEXEC_FLAG_ON_PANIC; break; case 's': /* load Stdin */ #define CHUNK 4096 t = 0; do { seg[f].bufsz += t; seg[f].memsz += t; seg[f].buf = realloc(seg[f].buf, seg[f].bufsz + CHUNK); if (!seg[f].buf) err("realloc",""); t = read(0, seg[f].buf + seg[f].bufsz, CHUNK); } while (t > 0); seg[f].buf = realloc(seg[f].buf, seg[f].bufsz); if (!seg[f].buf) err("realloc",""); break; #undef CHUNK case '8': /* pad to 8 byte bounary */ case '4': /* pad to 4 byte bounary */ t = arg - '0'; t -= seg[f].bufsz & (t-1); if (t == arg - '0') break; seg[f].buf = realloc(seg[f].buf, seg[f].bufsz + t); if (!seg[f].buf) err("malloc", ""); cp = seg[f].buf + seg[f].bufsz; seg[f].memsz += t; seg[f].bufsz += t; while (t--) *cp++ = 0; break; case 'r': /* Round up page size; move this to kernel */ mask = strtoul(optarg, NULL, 0); if (mask & (mask-1)) use("round value must be a power of 2\n"); mask--; /* convert to a mask */ break; case 't': /* set architecture Type */ t = strtoul(optarg, NULL, 0); flags |= t << 16; break; case 'z': /* Zero extend segment by bumping memsz */ seg[f].memsz += strtoul(optarg, NULL, 0); break; case -1: if (optind < argc) use("unexpected option or argument\n"); for (t=0; t <= f; t++) { /* move this to the kernel */ seg[t].memsz += mask; seg[t].memsz &= ~mask; } if (errno = -kexec_load((void *)entry, ++f, (struct kexec_segment *)seg, flags)) err("kexec", "syscall"); exit(0); default: use("unknown option or argument\n"); } } /* a bit of user frendliness, while still minimizing library usage for size */ #define eput(s) write(2, s, sizeof(s)-1) #define vput(s) {for (l=0; s[l]; l++); write(2, s, l);} void use(char *why) { int l; eput("Usage: "); vput(myname); eput(" options\n" "\t where options are:\n" "\t-a address start a new segment At address\n" "\t-c offset at offset in previous segment" " put a Chain pointer to here\n" "\t-e assign Entry point here\n" "\t-f file load a File\n" "\t-k kexec the new kernel NOW\n" "\t-p load panic kernel\n" "\t-r size round up segments memsz to size\n" "\t-s read from Standard input\n" "\t-t type set architecture flag to given number\n" "\t-4 4-byte align size, padding with 0\n" "\t-8 8-byte align size, padding with 0\n" "\t-z count zero extend segment by count bytes" " after all files.\n"); exit(1); } #include #include #define TOP_NIB_SHIFT(x) (8*sizeof(x) - 4) #define TOP_NIB(x) (x >> TOP_NIB_SHIFT(x)) void err(char *what, char *which) { unsigned err = errno; int l, i; char erro[2*sizeof(err)+6]; for (l=0; l < 2*sizeof(err); l++, err <<= 4) if (TOP_NIB(err)) break; for(i=0; i < 2*sizeof(err) - l; i++, err <<= 4) erro[i+4] = "0123456789abcdef"[TOP_NIB(err)]; erro[0] = ' '; erro[1] = '('; erro[2] = '0'; erro[3] = 'x'; erro[4+i+0] = ')'; erro[4+i+1] = '\n'; erro[4+i+2] = '\0'; err = errno; /* save again before the writes below */ vput(myname); eput(": "); vput(what); eput(" "); vput(which); eput(": "); vput(strerror(err)); vput(erro); exit(2); } From sharada at in.ibm.com Thu Apr 14 00:15:12 2005 From: sharada at in.ibm.com (R Sharada) Date: Wed, 13 Apr 2005 19:45:12 +0530 Subject: [5/5] trampoline code In-Reply-To: <20050413141443.GI5081@in.ibm.com> References: <20050413141157.GE5081@in.ibm.com> <20050413141239.GF5081@in.ibm.com> <20050413141301.GG5081@in.ibm.com> <20050413141403.GH5081@in.ibm.com> <20050413141443.GI5081@in.ibm.com> Message-ID: <20050413141512.GJ5081@in.ibm.com> # # kexec: Linux boots Linux # # Copyright (C) 2004 - 2005, Milton D Miller II, IBM Corporation # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation (version 2 of the License). # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # v2wrap.S # a wrapper to place in front of a v2 device tree # to call a ppc64 kernel with the expected arguments # of kernel(device-tree, phys-offset, 0) # # calling convention: # r3 = physical number of this cpu (all cpus) # r4 = address of this chunk (master only) # master enters at start (aka first byte of this chunk) # slaves (additional cpus), if any, enter a copy of the # first 0x100 bytes of this code relocated to 0x0 # # in other words, # a copy of the first 0x100 bytes of this code is copied to 0 # and the slaves are sent to address 0x60 # with r3 = their physical cpu number. # look a bit like a Linux kernel here ... .machine ppc64 .org 0 start: b master tweq 0,0 secondary_hold: .llong 0 .org 0x20 # need a bit more space than after slave, master: std 4,secondary_hold at l(0) # bring slaves up here to this copy sync # try to get the slaves to see this or 1,1,1 # low priority to let other thread catchup isync mr 5,3 # save cpu id to r5 addi 3,4,0x100 # r3 = boot param block lwz 6,20(3) # fetch version number cmpwi 0,6,2 # v2 ? blt 80f stw 5,28(3) # save my cpu number as boot_cpu_phys 80: b 81f .org 0x60 # ABI: slaves start at 60 with r3=phys slave: ld 4,secondary_hold at l(0); cmpdi 0,4,0 beq slave # ahh, master told us where he is running from # jump into our copy of the code up there so this code can change addi 5,4,1f-start mtctr 5 bctr # ok, now wait for the master to tell is to go back to the new block 1: ld 5,copied at l(4) cmpdi 0,5,0 beq 1b ba 0x60 .long 0 # just an eye-catcher, delete if space needed .long 0 # just an eye-catcher, delete if space needed 81: # master continues here or 3,3,3 # ok back to high, lets boot lis 6,0x1 mtctr 6 # delay a bit for slaves to catch up 83: bdnz 83b # before we overwrite 0-100 again ld 4,-8(3) # kernel pointer is at -8(bb) by loader addi 5,4,-8 # prepare copy with update form instructions li 6,0x100/8 mtctr 6 li 6,-8 85: ldu 7,8(5) stdu 7,8(6) bdnz 85b li 5,0 # r5 will be 0 for kernel dcbst 0,5 # store dcache, flush icache dcbst 0,6 # 0 and 0xf8 covers us with 128 byte lines mtctr 4 # prepare branch too sync icbi 0,5 icbi 0,6 sync isync std 6,-16(3) # send slaves back down bctr # start kernel .org 0xf0 copied: .llong 0 kernel: .llong 0 .org 0x100 __end_stub: .equ boot_block, . - start From jjarvis at staffmail.ed.ac.uk Thu Apr 14 00:13:25 2005 From: jjarvis at staffmail.ed.ac.uk (James Jarvis) Date: Wed, 13 Apr 2005 15:13:25 +0100 Subject: G5 iMac Kernel Compile error with xfrm4_output.c:119: error: too few arguments to function Message-ID: <1113401605.30761.8.camel@hp-eval.ucs.ed.ac.uk> I have been skulking on the list in listening mode for a while now... most of it is over my head but I guess it is useful to have G5 iMac compilation testers. I am using a patched 2.6.12rc2 kernel net/ipv4/xfrm4_output.c: In function `xfrm4_output': net/ipv4/xfrm4_output.c:119: warning: passing arg 1 of pointer to function from incompatible pointer type net/ipv4/xfrm4_output.c:119: error: too few arguments to function make[2]: *** [net/ipv4/xfrm4_output.o] Error 1 make[1]: *** [net/ipv4] Error 2 make: *** [net] Error 2 I have attached the config file. I am compiling on a Yellow Dog base install. Cheers, James -------------- next part -------------- # # Automatically generated make config: don't edit # Linux kernel version: 2.6.11.7 # Wed Apr 13 14:47:22 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y CONFIG_FRAME_POINTER=y CONFIG_FORCE_MAX_ZONEORDER=13 # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="PPC-PIE" CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_SYSVIPC_COMPAT=y # # Platform support # # CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y # CONFIG_PPC_PSERIES is not set CONFIG_PPC_PMAC=y # CONFIG_PPC_MAPLE is not set CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y CONFIG_ALTIVEC=y CONFIG_U3_DART=y CONFIG_PPC_PMAC64=y CONFIG_BOOTX_TEXT=y # CONFIG_POWER4_ONLY is not set # CONFIG_IOMMU_VMERGE is not set # CONFIG_SMP is not set # CONFIG_PREEMPT is not set CONFIG_GENERIC_HARDIRQS=y CONFIG_SECCOMP=y # # General setup # CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=m # CONFIG_PCI_LEGACY_PROC is not set CONFIG_PCI_NAMES=y # CONFIG_PCI_DEBUG is not set # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set CONFIG_PROC_DEVICETREE=y # CONFIG_CMDLINE_BOOL is not set # # Device Drivers # # # Generic Driver Options # # CONFIG_STANDALONE is not set CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_CRYPTOLOOP=y CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=12288 CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CDROM_PKTCDVD is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=m CONFIG_BLK_DEV_IDE=m # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_BLK_DEV_IDEDISK=m # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=m # CONFIG_BLK_DEV_IDETAPE is not set CONFIG_BLK_DEV_IDEFLOPPY=m CONFIG_BLK_DEV_IDESCSI=m # CONFIG_IDE_TASK_IOCTL is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=m CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y # CONFIG_BLK_DEV_OFFBOARD is not set CONFIG_BLK_DEV_GENERIC=m # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_SL82C105 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set CONFIG_BLK_DEV_CMD64X=m # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y # CONFIG_SCSI_FC_ATTRS is not set # CONFIG_SCSI_ISCSI_ATTRS is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=253 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 CONFIG_AIC7XXX_DEBUG_ENABLE=y CONFIG_AIC7XXX_DEBUG_MASK=0 CONFIG_AIC7XXX_REG_PRETTY_PRINT=y # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_AHCI is not set CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_NV is not set # CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_QSTOR is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set # CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID10=y CONFIG_MD_RAID5=y # CONFIG_MD_RAID6 is not set # CONFIG_MD_MULTIPATH is not set # CONFIG_MD_FAULTY is not set # CONFIG_BLK_DEV_DM is not set # # Fusion MPT device support # CONFIG_FUSION=y CONFIG_FUSION_MAX_SGE=40 CONFIG_FUSION_CTL=y # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=m # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set CONFIG_IEEE1394_OUI_DB=y CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y CONFIG_IEEE1394_CONFIG_ROM_IP1394=y # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=m # # Protocol Drivers # CONFIG_IEEE1394_VIDEO1394=m CONFIG_IEEE1394_SBP2=m # CONFIG_IEEE1394_SBP2_PHYS_DMA is not set CONFIG_IEEE1394_ETH1394=m CONFIG_IEEE1394_DV1394=m CONFIG_IEEE1394_RAWIO=m # CONFIG_IEEE1394_CMP is not set # # I2O device support # # CONFIG_I2O is not set # # Macintosh device drivers # CONFIG_ADB=y CONFIG_ADB_PMU=y # CONFIG_PMAC_SMU is not set # CONFIG_PMAC_PBOOK is not set # CONFIG_PMAC_BACKLIGHT is not set # CONFIG_INPUT_ADBHID is not set CONFIG_THERM_PM72=y # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=m # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y CONFIG_INET_TUNNEL=y CONFIG_IP_TCPDIAG=y # CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration # # CONFIG_IP_VS is not set CONFIG_IPV6=m # CONFIG_IPV6_PRIVACY is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_INET6_TUNNEL is not set # CONFIG_IPV6_TUNNEL is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m # CONFIG_IP_NF_CT_ACCT is not set # CONFIG_IP_NF_CONNTRACK_MARK is not set # CONFIG_IP_NF_CT_PROTO_SCTP is not set CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m CONFIG_IP_NF_TFTP=m CONFIG_IP_NF_AMANDA=m CONFIG_IP_NF_QUEUE=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_LIMIT=m CONFIG_IP_NF_MATCH_IPRANGE=m CONFIG_IP_NF_MATCH_MAC=m CONFIG_IP_NF_MATCH_PKTTYPE=m CONFIG_IP_NF_MATCH_MARK=m CONFIG_IP_NF_MATCH_MULTIPORT=m CONFIG_IP_NF_MATCH_TOS=m CONFIG_IP_NF_MATCH_RECENT=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_DSCP=m CONFIG_IP_NF_MATCH_AH_ESP=m CONFIG_IP_NF_MATCH_LENGTH=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_MATCH_TCPMSS=m CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_OWNER=m # CONFIG_IP_NF_MATCH_ADDRTYPE is not set # CONFIG_IP_NF_MATCH_REALM is not set # CONFIG_IP_NF_MATCH_SCTP is not set # CONFIG_IP_NF_MATCH_COMMENT is not set # CONFIG_IP_NF_MATCH_HASHLIMIT is not set CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_LOG=m CONFIG_IP_NF_TARGET_ULOG=m CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_TARGET_REDIRECT=m CONFIG_IP_NF_TARGET_NETMAP=m CONFIG_IP_NF_TARGET_SAME=m CONFIG_IP_NF_NAT_SNMP_BASIC=m CONFIG_IP_NF_NAT_IRC=m CONFIG_IP_NF_NAT_FTP=m CONFIG_IP_NF_NAT_TFTP=m CONFIG_IP_NF_NAT_AMANDA=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_TOS=m CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_CLASSIFY=m # CONFIG_IP_NF_RAW is not set CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m # # IPv6: Netfilter Configuration (EXPERIMENTAL) # # CONFIG_IP6_NF_QUEUE is not set # CONFIG_IP6_NF_IPTABLES is not set CONFIG_XFRM=y CONFIG_XFRM_USER=y # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_SCTP_HMAC_NONE is not set # CONFIG_SCTP_HMAC_SHA1 is not set # CONFIG_SCTP_HMAC_MD5 is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set CONFIG_LLC=m # CONFIG_LLC2 is not set # CONFIG_IPX is not set CONFIG_ATALK=m CONFIG_DEV_APPLETALK=y # CONFIG_IPDDP is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y # CONFIG_DUMMY is not set # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set CONFIG_TUN=y # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set CONFIG_SUNGEM=m CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=m CONFIG_TYPHOON=m # # Tulip family network device support # CONFIG_NET_TULIP=y CONFIG_DE2104X=m CONFIG_TULIP=m # CONFIG_TULIP_MWI is not set CONFIG_TULIP_MMIO=y # CONFIG_TULIP_NAPI is not set # CONFIG_DE4X5 is not set # CONFIG_WINBOND_840 is not set # CONFIG_DM9102 is not set # CONFIG_HP100 is not set CONFIG_NET_PCI=y CONFIG_PCNET32=m # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_B44 is not set # CONFIG_FORCEDETH is not set # CONFIG_DGRS is not set # CONFIG_EEPRO100 is not set # CONFIG_E100 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set CONFIG_NE2K_PCI=m # CONFIG_8139CP is not set CONFIG_8139TOO=m CONFIG_8139TOO_PIO=y # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set # CONFIG_8139_OLD_RX_RESET is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set # # Ethernet (1000 Mbit) # CONFIG_ACENIC=m # CONFIG_ACENIC_OMIT_TIGON_I is not set # CONFIG_DL2K is not set CONFIG_E1000=m # CONFIG_E1000_NAPI is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set # CONFIG_VIA_VELOCITY is not set CONFIG_TIGON3=m # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y CONFIG_INPUT_EVBUG=m # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y # CONFIG_KEYBOARD_ATKBD is not set # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y # CONFIG_MOUSE_PS2 is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_VSXXXAA is not set CONFIG_INPUT_JOYSTICK=y # CONFIG_JOYSTICK_ANALOG is not set # CONFIG_JOYSTICK_A3D is not set # CONFIG_JOYSTICK_ADI is not set # CONFIG_JOYSTICK_COBRA is not set # CONFIG_JOYSTICK_GF2K is not set # CONFIG_JOYSTICK_GRIP is not set # CONFIG_JOYSTICK_GRIP_MP is not set # CONFIG_JOYSTICK_GUILLEMOT is not set # CONFIG_JOYSTICK_INTERACT is not set # CONFIG_JOYSTICK_SIDEWINDER is not set # CONFIG_JOYSTICK_TMDC is not set # CONFIG_JOYSTICK_IFORCE is not set # CONFIG_JOYSTICK_WARRIOR is not set # CONFIG_JOYSTICK_MAGELLAN is not set # CONFIG_JOYSTICK_SPACEORB is not set # CONFIG_JOYSTICK_SPACEBALL is not set # CONFIG_JOYSTICK_STINGER is not set # CONFIG_JOYSTICK_TWIDJOY is not set # CONFIG_JOYSTICK_JOYDUMP is not set CONFIG_INPUT_TOUCHSCREEN=y # CONFIG_TOUCHSCREEN_GUNZE is not set # CONFIG_TOUCHSCREEN_ELO is not set # CONFIG_TOUCHSCREEN_MTOUCH is not set # CONFIG_TOUCHSCREEN_MK712 is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_UINPUT=m # # Hardware I/O ports # # CONFIG_SERIO is not set # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # # CONFIG_SERIAL_8250 is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_SERIAL_PMACZILOG=y CONFIG_SERIAL_PMACZILOG_CONSOLE=y # CONFIG_SERIAL_JSM is not set CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 # # TPM devices # # CONFIG_TCG_TPM is not set # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_ALGOPCF is not set # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_PIIX4 is not set # CONFIG_I2C_ISA is not set CONFIG_I2C_KEYWEST=y # CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Hardware Sensors Chip support # # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_FSCPOS is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_GL520SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_LM92 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SIS5595 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # # CONFIG_SENSORS_DS1337 is not set # CONFIG_SENSORS_EEPROM is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # CONFIG_FB=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y CONFIG_FB_SOFT_CURSOR=y CONFIG_FB_MACMODES=y CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_TILEBLITTING is not set # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set CONFIG_FB_OF=y # CONFIG_FB_CONTROL is not set # CONFIG_FB_PLATINUM is not set # CONFIG_FB_VALKYRIE is not set # CONFIG_FB_CT65550 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_NVIDIA is not set # CONFIG_FB_RIVA is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_S1D13XXX is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # # CONFIG_VGA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # # Logo configuration # CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # # CONFIG_SOUND is not set # # USB support # CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_OHCI_BIG_ENDIAN is not set CONFIG_USB_OHCI_LITTLE_ENDIAN=y # CONFIG_USB_UHCI_HCD is not set # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # # CONFIG_USB_BLUETOOTH_TTY is not set CONFIG_USB_ACM=m CONFIG_USB_PRINTER=m # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information # CONFIG_USB_STORAGE=m CONFIG_USB_STORAGE_DEBUG=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y # CONFIG_USB_STORAGE_USBAT is not set CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y # # USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y CONFIG_HID_FF=y CONFIG_HID_PID=y CONFIG_LOGITECH_FF=y CONFIG_THRUSTMASTER_FF=y CONFIG_USB_HIDDEV=y # CONFIG_USB_AIPTEK is not set CONFIG_USB_WACOM=m # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set CONFIG_USB_MON=y # # USB port drivers # # # USB Serial Converter support # CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_WHITEHEAT=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m # CONFIG_USB_SERIAL_CP2101 is not set # CONFIG_USB_SERIAL_CYPRESS_M8 is not set CONFIG_USB_SERIAL_EMPEG=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_VISOR=m CONFIG_USB_SERIAL_IPAQ=m CONFIG_USB_SERIAL_IR=m CONFIG_USB_SERIAL_EDGEPORT=m CONFIG_USB_SERIAL_EDGEPORT_TI=m # CONFIG_USB_SERIAL_GARMIN is not set # CONFIG_USB_SERIAL_IPW is not set CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m CONFIG_USB_SERIAL_KEYSPAN_MPR=y CONFIG_USB_SERIAL_KEYSPAN_USA28=y CONFIG_USB_SERIAL_KEYSPAN_USA28X=y CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y CONFIG_USB_SERIAL_KEYSPAN_USA19=y CONFIG_USB_SERIAL_KEYSPAN_USA18X=y CONFIG_USB_SERIAL_KEYSPAN_USA19W=y CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y CONFIG_USB_SERIAL_KEYSPAN_USA49W=y CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y CONFIG_USB_SERIAL_KLSI=m CONFIG_USB_SERIAL_KOBIL_SCT=m CONFIG_USB_SERIAL_MCT_U232=m CONFIG_USB_SERIAL_PL2303=m CONFIG_USB_SERIAL_SAFE=m CONFIG_USB_SERIAL_SAFE_PADDED=y # CONFIG_USB_SERIAL_TI is not set CONFIG_USB_SERIAL_CYBERJACK=m CONFIG_USB_SERIAL_XIRCOM=m CONFIG_USB_SERIAL_OMNINET=m CONFIG_USB_EZUSB=y # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_SISUSBVGA is not set # CONFIG_USB_TEST is not set # # USB ATM/DSL drivers # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # CONFIG_INFINIBAND is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y # CONFIG_EXT2_FS_POSIX_ACL is not set # CONFIG_EXT2_FS_SECURITY is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y # CONFIG_EXT3_FS_POSIX_ACL is not set # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # # XFS support # CONFIG_XFS_FS=y CONFIG_XFS_EXPORT=y # CONFIG_XFS_RT is not set # CONFIG_XFS_QUOTA is not set # CONFIG_XFS_SECURITY is not set # CONFIG_XFS_POSIX_ACL is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set CONFIG_QUOTA=y # CONFIG_QFMT_V1 is not set # CONFIG_QFMT_V2 is not set CONFIG_QUOTACTL=y CONFIG_DNOTIFY=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set # CONFIG_DEVPTS_FS_XATTR is not set CONFIG_TMPFS=y # CONFIG_TMPFS_XATTR is not set # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set CONFIG_HFS_FS=y CONFIG_HFSPLUS_FS=y # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set CONFIG_CRAMFS=y # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFS_V4=y CONFIG_NFS_DIRECTIO=y CONFIG_NFSD=m CONFIG_NFSD_V3=y CONFIG_NFSD_V4=y CONFIG_NFSD_TCP=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=y CONFIG_SUNRPC=m CONFIG_SUNRPC_GSS=m CONFIG_RPCSEC_GSS_KRB5=m CONFIG_RPCSEC_GSS_SPKM3=m CONFIG_SMB_FS=m # CONFIG_SMB_NLS_DEFAULT is not set CONFIG_CIFS=m CONFIG_CIFS_STATS=y # CONFIG_CIFS_XATTR is not set # CONFIG_CIFS_EXPERIMENTAL is not set CONFIG_NCP_FS=m # CONFIG_NCPFS_PACKET_SIGNING is not set # CONFIG_NCPFS_IOCTL_LOCKING is not set # CONFIG_NCPFS_STRONG is not set # CONFIG_NCPFS_NFS_NS is not set # CONFIG_NCPFS_OS2_NS is not set # CONFIG_NCPFS_SMALLDOS is not set # CONFIG_NCPFS_NLS is not set # CONFIG_NCPFS_EXTRAS is not set CONFIG_CODA_FS=m # CONFIG_CODA_FS_OLD_API is not set CONFIG_AFS_FS=m CONFIG_RXRPC=m # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set # CONFIG_SGI_PARTITION is not set # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=m # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=y # # Profiling support # # CONFIG_PROFILING is not set # # Kernel hacking # # CONFIG_PRINTK_TIME is not set CONFIG_DEBUG_KERNEL=y # CONFIG_MAGIC_SYSRQ is not set CONFIG_LOG_BUF_SHIFT=17 # CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_INFO is not set # CONFIG_DEBUG_FS is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUGGER is not set # CONFIG_PPCDBG is not set # CONFIG_IRQSTACKS is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m # CONFIG_CRYPTO_WP512 is not set # CONFIG_CRYPTO_TGR192 is not set CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=y # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_TEST is not set # # Hardware crypto devices # # # Library routines # CONFIG_CRC_CCITT=m CONFIG_CRC32=y CONFIG_LIBCRC32C=m CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y From tom.l.nguyen at intel.com Thu Apr 14 09:23:30 2005 From: tom.l.nguyen at intel.com (Nguyen, Tom L) Date: Wed, 13 Apr 2005 16:23:30 -0700 Subject: PCI Error Recovery API Proposal (updated) Message-ID: On Tuesday, April 12, 2005 10:15 AM Grant Grundler wrote: > > I would appreciate if other people following this discussion could give > > their opinion here, just to make sure I'm not following the wrong track. > > But it seems to me that issuing the request for a link reset doesn't fit > > in the leaf driver recovery API, but rather in the private API that you > > will define between the PCI Express error recovery core and the PCI > > Express port drivers... > >I *think* you are on the right track. Well, let's go with Ben's proposal as it is. Thanks, Long From andrea at suse.de Thu Apr 14 10:31:13 2005 From: andrea at suse.de (Andrea Arcangeli) Date: Thu, 14 Apr 2005 02:31:13 +0200 Subject: [PATCH] ppc64: improve g5 sound headphone mute In-Reply-To: <1113391382.5463.20.camel@gaston> References: <1113344225.21548.108.camel@gaston> <1113345561.5387.114.camel@gaston> <1113347296.5388.121.camel@gaston> <1113350355.5387.129.camel@gaston> <1113391382.5463.20.camel@gaston> Message-ID: <20050414003113.GX1521@opteron.random> On Wed, Apr 13, 2005 at 09:23:02PM +1000, Benjamin Herrenschmidt wrote: > Hi ! > > This patch fixes a couple more issues with the management of the GPIOs > dealing with headphone and line out mute on the G5. It should fix the > remaining problems of people not getting any sound out of the headphone > jack. It works great here with all patches applied, thanks! From pemith13 at yahoo.com Thu Apr 14 12:21:49 2005 From: pemith13 at yahoo.com (Paul Kadiri) Date: 14 Apr 2005 02:21:49 -0000 Subject: From the Desk of Mr. Paul Kadiri Message-ID: <20050414022149.96450.qmail@cgi2> >From the Desk of Mr. Paul Kadiri Dear Sir/Madam, My name is Paul Kadiri, I am a banker with one of the banks in Lagos in Nigeria. I have urgent and very confidential business proposition for you. On June 6 2002, a Zimbabwean farmer/businessman made a numbered time(Fixed) deposited for twelve calendar months, valued at $15,320,000.00 (Fifteen million,three hundred and twenty thousand united states dollars-five) in my branch. Upon maturity, I sent a routine notification to his forwarding address but got no reply. After a month,i sent a reminder and finally we discovered from his co-farmer who previously had a joint account with him in my bank that he died in the land crisis in Zimbabwe. On further investigation, I found out that he did not leave a WILL and all attempts to trace his next of kin were fruitless. I therefore made further investigation and discovered that he did not declare any next of kin in all his official documents, including his Bank Deposit paperwork. This sum of US$15,320,000.00 is still sitting in the Bank and the interest is being rolled over with the principal sum at the end of each year. No one will come forward to claim it. According to the Nigerian Law, at the expiration of 5 (five) years, the money will revert to the ownership of the Nigerian Government{STATE} if nobody applies to claim the funds. Consequently, my proposal is that I will like you as a foreigner to stand in as his next of kin so that the fruits of this old man's labor will not get into the hands of some corrupt officials. I will intimate you with the requirements that the attorney will need to prepare the necessary documents and affidavits, which will put you in place as the next of kin upon your positive response. We shall employ the services of two attorneys for drafting and notarization of the WILL and obtain the necessary documents and letter of probate/administration in your favor for the transfer. A bank Account in any part of the world, which you provide, will then facilitate the transfer of this money to you as the beneficiary/next of kin. The money will be paid into your account for us to share in the ratio of 70% for me and 30% for you, There is no risk at all as all the paperwork for this transaction will be done by the attorney and my position as the Regional manager guarantees the successful execution of this transaction. Upon your positive response,i shall let you know who i am,the name of my bank and the name of the deceased.I will also provide you with more details and relevant documents that will put you in a clearer picture of this pending transaction. Please observe utmost confidentiality, and be rest assured that this transaction would be most profitable for both of us because I shall require your assistance to invest my share in your country. You can email me via this email address:pakadi13 at yahoo.com Awaiting your urgent reply Yours faithfully. Mr. Paul Kadiri From iamroot at ca.ibm.com Fri Apr 15 10:54:03 2005 From: iamroot at ca.ibm.com (Omkhar Arasaratnam) Date: Thu, 14 Apr 2005 20:54:03 -0400 Subject: [Fwd: 2.4.30 Build Error Using pSeries_defconfig on ppc64] Message-ID: <425F10AB.8090604@ca.ibm.com> Any ideas guys? O -------- Original Message -------- Subject: 2.4.30 Build Error Using pSeries_defconfig on ppc64 Date: Thu, 14 Apr 2005 15:44:54 -0400 From: Omkhar Arasaratnam To: Linux Kernel list , baude at us.ibm.com So, heres what I do cd /usr/src/linux make mrproper cp arch/ppc64/configs/pSeries_defconfig .config make menuconfig make dep make clean make vmlinux Eventually it bombs out with several messages such as: ioctl32.c:X: error: (near initialization for `ioctl_translations[Y]') Culminating in : ioctl32.c:4597: error: (near initialization for `ioctl_translations[691]') make[1]: *** [ioctl32.o] Error 1 make[1]: Leaving directory `/usr/src/linux-2.4.30/arch/ppc64/kernel' make: *** [_dir_arch/ppc64/kernel] Error 2 Obviously something ain't right - ideas? O - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sharada at in.ibm.com Fri Apr 15 19:20:26 2005 From: sharada at in.ibm.com (R Sharada) Date: Fri, 15 Apr 2005 14:50:26 +0530 Subject: RTAS error log sequence numbers In-Reply-To: <20041105050933.GC8470@krispykreme.ozlabs.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041104163759.GR10026@austin.ibm.com> <418A61FE.7030500@austin.ibm.com> <20041105050933.GC8470@krispykreme.ozlabs.ibm.com> Message-ID: <20050415092026.GA3030@in.ibm.com> Is this RTAS Scan Event Error a normal thing to ignore? I came across this bunch of errors while trying to boot the newer kernels (2.6.12-rc based) on a p630. And I was trying to understand if it is a concern or not. The default SUSE 2.6 kernel on the box does not throw these errors. What do these errors indicate? There was no change of hardware in the box (no adding or removing of any hardware) Apr 15 16:37:01 llm15 kernel: RTAS: event: 1, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 2, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 3, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 4, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 5, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 6, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 7, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 8, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 9, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 10, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 1, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 2, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 3, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 4, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 5, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 6, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 7, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 8, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 9, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 10, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 1, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 2, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 3, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 4, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 5, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 6, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 7, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 8, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 9, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 10, Type: Internal Device Failure, Severity: 5 . . . Apr 15 16:37:01 llm15 kernel: RTAS: event: 98, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 99, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 100, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 101, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 102, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 103, Type: Internal Device Failure, Severity: 5 Apr 15 16:37:01 llm15 kernel: RTAS: event: 104, Type: Internal Device Failure, Severity: 5 There seems to be no issues with the boot up however. Why should the error messages be thrown on the same box with newer kernels, while the default kernel does not seem to complain? default kernel output: Apr 15 19:49:20 llm15 kernel: RTAS daemon started Apr 15 19:49:20 llm15 kernel: RTAS: 13 -------- RTAS event begin -------- Apr 15 19:49:20 llm15 kernel: RTAS 0: 04440003 000003f8 d6009d00 17334900 Apr 15 19:49:20 llm15 kernel: RTAS 1: 20050414 49424d00 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 2: 00000000 00000000 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 3: 49424d00 55302e31 2d50312d 43310000 Apr 15 19:49:20 llm15 kernel: RTAS 4: 00503034 4b2326fe 02a00011 37020002 Apr 15 19:49:20 llm15 kernel: RTAS 5: 00000000 00000000 00000000 f1800001 Apr 15 19:49:20 llm15 kernel: RTAS 6: 0001ffff ffffffff 01000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 7: 42343138 20202020 38303033 32364645 Apr 15 19:49:20 llm15 kernel: RTAS 8: 30303030 30303030 30303030 30303030 Apr 15 19:49:20 llm15 kernel: RTAS 9: 036c4432 50464134 050a1021 00000100 Apr 15 19:49:20 llm15 kernel: RTAS 10: 02714001 00000020 434d5020 44415441 Apr 15 19:49:20 llm15 kernel: RTAS 11: 00000100 00000000 00000000 f1800001 Apr 15 19:49:20 llm15 kernel: RTAS 12: 53595320 44415441 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 13: 20051733 17011404 20051733 17011404 Apr 15 19:49:20 llm15 kernel: RTAS 14: 20054833 17011404 53524320 44415441 Apr 15 19:49:20 llm15 kernel: RTAS 15: 37020002 00000000 00000001 00018210 Apr 15 19:49:20 llm15 kernel: RTAS 16: 00000000 00000000 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 17: 00000000 00000000 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 18: 00000000 00000000 00000000 00000000 Apr 15 19:49:20 llm15 kernel: RTAS 19: 00000000 00000000 00000000 00000001 Thanks and Regards, Sharada On Fri, Nov 05, 2004 at 04:09:33PM +1100, Anton Blanchard wrote: > > > Moving this to earlier in the boot sequence would be nice but I'm not > > sure its worth the effort. Is there any way to garauntee that this is > > done vefore anyone calls log_error()? > > Since the userspace tools can handle it, Im OK to ignore the issue. > > Anton > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev From sharada at in.ibm.com Fri Apr 15 22:42:29 2005 From: sharada at in.ibm.com (R Sharada) Date: Fri, 15 Apr 2005 18:12:29 +0530 Subject: query related to create_slbe Message-ID: <20050415124229.GA1674@in.ibm.com> Hello, I was trying to understand the slb code and the creation of slb entries in the Linux kernel, and came up with a few queries in the process. The code I am referring to is 2.6.12-rc2-mm3 code-base. 1) in slb.c within slb_initialize: create_slbe(KERNELBASE, get_kernel_vsid(KERNELBASE), flags, 0); create_slbe(VMALLOCBASE, get_kernel_vsid(KERNELBASE), SLB_VSID_KERNEL, 1); /* We don't bolt the stack for the time being - we're in boot, * so the stack is in the bolted segment. By the time it goes * elsewhere, we'll call _switch() which will bolt in the new * one. */ asm volatile("isync":::"memory"); #endif get_paca()->stab_rr = SLB_NUM_BOLTED; => we seem to be creatig slb entries for kernelbase and vmallocbase (2 entries) however in mmu.h #define SLB_NUM_BOLTED 3 => seems to indicate we are bolting 3 entries. Perhaps I am missing something? 2) Relating to the same code above, create_slbe seems to take the vsid as an argument, but never seems to be using it, as seen in the code below: static inline void create_slbe(unsigned long ea, unsigned long vsid, unsigned long flags, unsigned long entry) { asm volatile("slbmte %0,%1" : : "r" (mk_vsid_data(ea, flags)), "r" (mk_esid_data(ea, entry)) : "memory" ); } => again, am I missing something? Thanks and Regards, Sharada From olof at austin.ibm.com Fri Apr 15 23:05:49 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 15 Apr 2005 08:05:49 -0500 Subject: query related to create_slbe In-Reply-To: <20050415124229.GA1674@in.ibm.com> References: <20050415124229.GA1674@in.ibm.com> Message-ID: <20050415130549.GB5404@austin.ibm.com> On Fri, Apr 15, 2005 at 06:12:29PM +0530, R Sharada wrote: > => we seem to be creatig slb entries for kernelbase and vmallocbase (2 entries) > however in mmu.h > > #define SLB_NUM_BOLTED 3 > > => seems to indicate we are bolting 3 entries. Perhaps I am missing something? Yes, there might be three: * Kernelbase (slot 0) * vmallocbase (slot 1) * kernel stack (slot 2) --> IF and only if it's not the same as kernelbase We always set aside those slots since it's easier to do it that way than to have to deal with it sometimes being two, sometimes three. > 2) Relating to the same code above, create_slbe seems to take the vsid as an > argument, but never seems to be using it, as seen in the code below: > > static inline void create_slbe(unsigned long ea, unsigned long vsid, > unsigned long flags, unsigned long entry) > { > asm volatile("slbmte %0,%1" : > : "r" (mk_vsid_data(ea, flags)), > "r" (mk_esid_data(ea, entry)) > : "memory" ); > } > > => again, am I missing something? Nope, your right. It looks like that argument is a leftover from earlier implementation. I'll submit a patch to remove it. Thanks for spotting it! -Olof From olof at austin.ibm.com Fri Apr 15 23:08:25 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 15 Apr 2005 08:08:25 -0500 Subject: [PATCH] [PPC64] Remove unused argument to create_slbe In-Reply-To: <20050415124229.GA1674@in.ibm.com> References: <20050415124229.GA1674@in.ibm.com> Message-ID: <20050415130825.GC5404@austin.ibm.com> Hi, Remove vsid argument to create_slbe, since it's no longer used. Spotted by R Sharada. Signed-off-by: Olof Johansson Index: 2.6/arch/ppc64/mm/slb.c =================================================================== --- 2.6.orig/arch/ppc64/mm/slb.c 2005-03-29 18:11:13.000000000 -0600 +++ 2.6/arch/ppc64/mm/slb.c 2005-04-15 08:02:03.000000000 -0500 @@ -33,8 +33,8 @@ static inline unsigned long mk_vsid_data return (get_kernel_vsid(ea) << SLB_VSID_SHIFT) | flags; } -static inline void create_slbe(unsigned long ea, unsigned long vsid, - unsigned long flags, unsigned long entry) +static inline void create_slbe(unsigned long ea, unsigned long flags, + unsigned long entry) { asm volatile("slbmte %0,%1" : : "r" (mk_vsid_data(ea, flags)), @@ -145,9 +145,8 @@ void slb_initialize(void) asm volatile("isync":::"memory"); asm volatile("slbmte %0,%0"::"r" (0) : "memory"); asm volatile("isync; slbia; isync":::"memory"); - create_slbe(KERNELBASE, get_kernel_vsid(KERNELBASE), flags, 0); - create_slbe(VMALLOCBASE, get_kernel_vsid(KERNELBASE), - SLB_VSID_KERNEL, 1); + create_slbe(KERNELBASE, flags, 0); + create_slbe(VMALLOCBASE, SLB_VSID_KERNEL, 1); /* We don't bolt the stack for the time being - we're in boot, * so the stack is in the bolted segment. By the time it goes * elsewhere, we'll call _switch() which will bolt in the new From moilanen at austin.ibm.com Sat Apr 16 00:27:56 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 15 Apr 2005 09:27:56 -0500 Subject: RTAS error log sequence numbers In-Reply-To: <20050415092026.GA3030@in.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041104163759.GR10026@austin.ibm.com> <418A61FE.7030500@austin.ibm.com> <20041105050933.GC8470@krispykreme.ozlabs.ibm.com> <20050415092026.GA3030@in.ibm.com> Message-ID: <20050415092756.44509fcf.moilanen@austin.ibm.com> > default kernel output: > > Apr 15 19:49:20 llm15 kernel: RTAS daemon started > Apr 15 19:49:20 llm15 kernel: RTAS: 13 -------- RTAS event begin -------- > Apr 15 19:49:20 llm15 kernel: RTAS 0: 04440003 000003f8 d6009d00 17334900 > Apr 15 19:49:20 llm15 kernel: RTAS 1: 20050414 49424d00 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 2: 00000000 00000000 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 3: 49424d00 55302e31 2d50312d 43310000 > Apr 15 19:49:20 llm15 kernel: RTAS 4: 00503034 4b2326fe 02a00011 37020002 > Apr 15 19:49:20 llm15 kernel: RTAS 5: 00000000 00000000 00000000 f1800001 > Apr 15 19:49:20 llm15 kernel: RTAS 6: 0001ffff ffffffff 01000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 7: 42343138 20202020 38303033 32364645 > Apr 15 19:49:20 llm15 kernel: RTAS 8: 30303030 30303030 30303030 30303030 > Apr 15 19:49:20 llm15 kernel: RTAS 9: 036c4432 50464134 050a1021 00000100 > Apr 15 19:49:20 llm15 kernel: RTAS 10: 02714001 00000020 434d5020 44415441 > Apr 15 19:49:20 llm15 kernel: RTAS 11: 00000100 00000000 00000000 f1800001 > Apr 15 19:49:20 llm15 kernel: RTAS 12: 53595320 44415441 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 13: 20051733 17011404 20051733 17011404 > Apr 15 19:49:20 llm15 kernel: RTAS 14: 20054833 17011404 53524320 44415441 > Apr 15 19:49:20 llm15 kernel: RTAS 15: 37020002 00000000 00000001 00018210 > Apr 15 19:49:20 llm15 kernel: RTAS 16: 00000000 00000000 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 17: 00000000 00000000 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 18: 00000000 00000000 00000000 00000000 > Apr 15 19:49:20 llm15 kernel: RTAS 19: 00000000 00000000 00000000 00000001 Looks like it's complaining about some type of failure with whatever device is in your U0.1-P1-C1 slot. ==== Event-Scan (13) Begin ================================================ Version: 00000004 Severity: 00000002 (Warning) Type 00000003 (Internal Device Failure) Location Code: U0.1-P1-C1 No log debug data present Status: unrecoverable bypassed new Date/Time: 20050414 17334900 ==== IBM Service Processor Log =========================================== Jake From nfont at austin.ibm.com Sat Apr 16 01:27:08 2005 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Fri, 15 Apr 2005 10:27:08 -0500 Subject: RTAS error log sequence numbers In-Reply-To: <20050415092026.GA3030@in.ibm.com> References: <20041104155032.GB1268@krispykreme.ozlabs.ibm.com> <20041105050933.GC8470@krispykreme.ozlabs.ibm.com> <20050415092026.GA3030@in.ibm.com> Message-ID: <200504151027.08459.nfont@austin.ibm.com> On Friday 15 April 2005 4:20 am, R Sharada wrote: > Is this RTAS Scan Event Error a normal thing to ignore? Here's a full dump of the rtas event you posted for the default kernel. It appears that this is just a warning about a fully recovered event and could probably be ignored. The location code for the device it's warning on is below if you're interested. This appears to be different from the newer events which have a different severity. If you can send me the full rtas event I can send you a dump of it. -Nathan F. ==== Raw RTAS Event Begin ================================================= 0x00000000: 04440003 000003f8 d6009d00 17334900 [.D...........3I.] 0x00000010: 20050414 49424d00 00000000 00000000 [ ...IBM.........] 0x00000020: 00000000 00000000 00000000 00000000 [................] 0x00000030: 49424d00 55302e31 2d50312d 43310000 [IBM.U0.1-P1-C1..] 0x00000040: 00503034 4b2326fe 02a00011 37020002 [.P04K#&.....7...] 0x00000050: 00000000 00000000 00000000 f1800001 [................] 0x00000060: 0001ffff ffffffff 01000000 00000000 [................] 0x00000070: 42343138 20202020 38303033 32364645 [B418 800326FE] 0x00000080: 30303030 30303030 30303030 30303030 [0000000000000000] 0x00000090: 036c4432 50464134 050a1021 00000100 [.lD2PFA4...!....] 0x000000a0: 02714001 00000020 434d5020 44415441 [.q at .... CMP DATA] 0x000000b0: 00000100 00000000 00000000 f1800001 [................] 0x000000c0: 53595320 44415441 00000000 00000000 [SYS DATA........] 0x000000d0: 20051733 17011404 20051733 17011404 [ ..3.... ..3....] 0x000000e0: 20054833 17011404 53524320 44415441 [ .H3....SRC DATA] 0x000000f0: 37020002 00000000 00000001 00018210 [7...............] 0x00000100: 00000000 00000000 00000000 00000000 [................] 0x00000110: 00000000 00000000 00000000 00000000 [................] 0x00000120: 00000000 00000000 00000000 00000000 [................] 0x00000130: 00000000 00000000 00000000 00000001 [................] ==== Raw RTAS Event End =================================================== ==== Event-Scan Begin ===================================================== Version: 00000004 Severity: 00000002 (Warning) Disposition: 00000000 (Fully Recovered) Extended: 00000001 Log Length: 000003f8 Initiator 00000000 (Unknown) Target 00000000 (Unknown) Type 00000003 (Internal Device Failure) Location Code: U0.1-P1-C1 Status: unrecoverable bypassed new Date/Time: 20050414 17334900 ==== Event-Scan End ======================================================= From benh at kernel.crashing.org Sat Apr 16 17:23:57 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Apr 2005 17:23:57 +1000 Subject: query related to create_slbe In-Reply-To: <20050415124229.GA1674@in.ibm.com> References: <20050415124229.GA1674@in.ibm.com> Message-ID: <1113636237.5462.252.camel@gaston> On Fri, 2005-04-15 at 18:12 +0530, R Sharada wrote: > Hello, > I was trying to understand the slb code and the creation of slb entries > in the Linux kernel, and came up with a few queries in the process. The code > I am referring to is 2.6.12-rc2-mm3 code-base. > > 1) in slb.c > within slb_initialize: > create_slbe(KERNELBASE, get_kernel_vsid(KERNELBASE), flags, 0); > create_slbe(VMALLOCBASE, get_kernel_vsid(KERNELBASE), > SLB_VSID_KERNEL, 1); > /* We don't bolt the stack for the time being - we're in boot, > * so the stack is in the bolted segment. By the time it goes > * elsewhere, we'll call _switch() which will bolt in the new > * one. */ > asm volatile("isync":::"memory"); > #endif > > get_paca()->stab_rr = SLB_NUM_BOLTED; > > => we seem to be creatig slb entries for kernelbase and vmallocbase (2 entries) > however in mmu.h > > #define SLB_NUM_BOLTED 3 > > => seems to indicate we are bolting 3 entries. Perhaps I am missing something? The third one is the current kernel stack. As the comment in the code explains, we do not need to bolt it at this point, since we are booting, thus using init_mm's stack which is in the bolted kernel segment already. The next time we switch processes, the new stack will be bolted. > 2) Relating to the same code above, create_slbe seems to take the vsid as an > argument, but never seems to be using it, as seen in the code below: > > static inline void create_slbe(unsigned long ea, unsigned long vsid, > unsigned long flags, unsigned long entry) > { > asm volatile("slbmte %0,%1" : > : "r" (mk_vsid_data(ea, flags)), > "r" (mk_esid_data(ea, entry)) > : "memory" ); > } > > => again, am I missing something? No, I suppose this is historical crap, and we end up calling get_kernel_vsid() twice, good catch ! Nothing really bad here though, I'm adding to my TODO list and will submit a patch one of these days if nobody beats me to it. Ben. From benh at kernel.crashing.org Sat Apr 16 17:26:14 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Apr 2005 17:26:14 +1000 Subject: [PATCH] [PPC64] Remove unused argument to create_slbe In-Reply-To: <20050415130825.GC5404@austin.ibm.com> References: <20050415124229.GA1674@in.ibm.com> <20050415130825.GC5404@austin.ibm.com> Message-ID: <1113636374.5462.254.camel@gaston> On Fri, 2005-04-15 at 08:08 -0500, Olof Johansson wrote: > Hi, > > Remove vsid argument to create_slbe, since it's no longer used. Ah, you did beat me to it ! :) Ben. From schwab at suse.de Sat Apr 16 22:56:07 2005 From: schwab at suse.de (Andreas Schwab) Date: Sat, 16 Apr 2005 14:56:07 +0200 Subject: [PATCH] ppc64: improve g5 sound headphone mute In-Reply-To: <1113391382.5463.20.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 13 Apr 2005 21:23:02 +1000") References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> <1113345561.5387.114.camel@gaston> <1113347296.5388.121.camel@gaston> <1113350355.5387.129.camel@gaston> <1113391382.5463.20.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > This patch fixes a couple more issues with the management of the GPIOs > dealing with headphone and line out mute on the G5. It should fix the > remaining problems of people not getting any sound out of the headphone > jack. There's still a minor problem: when booting with line-out plugged (didn't try headphone yet) the initial volume settings are still not right. Unplugging and plugging again fixes this. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Apr 17 00:27:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Apr 2005 00:27:01 +1000 Subject: [PATCH] ppc64: improve g5 sound headphone mute In-Reply-To: References: <1113282436.21548.42.camel@gaston> <1113344225.21548.108.camel@gaston> <1113345561.5387.114.camel@gaston> <1113347296.5388.121.camel@gaston> <1113350355.5387.129.camel@gaston> <1113391382.5463.20.camel@gaston> Message-ID: <1113661621.5516.267.camel@gaston> On Sat, 2005-04-16 at 14:56 +0200, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > This patch fixes a couple more issues with the management of the GPIOs > > dealing with headphone and line out mute on the G5. It should fix the > > remaining problems of people not getting any sound out of the headphone > > jack. > > There's still a minor problem: when booting with line-out plugged (didn't > try headphone yet) the initial volume settings are still not right. > Unplugging and plugging again fixes this. It can be either alsa not restoring the setup properly, or a bug in the driver I'm still chasing where the headphone detection happens before propoer initialisation of the rest of the driver ... From sonny at burdell.org Sun Apr 17 10:48:41 2005 From: sonny at burdell.org (Sonny Rao) Date: Sat, 16 Apr 2005 20:48:41 -0400 Subject: 2.6.12-rc2-mm3 -- "Badness in slb_flush_and_rebolt" Message-ID: <20050417004841.GA4298@kevlar.burdell.org> Apr 15 15:57:23 itcopus122 kernel: Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Apr 15 15:57:23 itcopus122 kernel: Call Trace: Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab880] [c000000badcfc840] 0xc000000badcfc840 (unreliable) Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab910] [c00000000004cdb4] .__schedule_tail+0xb8/0x1b8 Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab9b0] [c0000000003be190] .schedule+0x728/0xc68 Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abae0] [c0000000003bee08] .wait_for_completion+0xd4/0x184 Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abbe0] [c00000000004f53c] .sched_exec+0x1ac/0x1fc Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abce0] [c0000000000eecb0] .compat_do_execve+0xa0/0x36c Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abd90] [c00000000001b578] .sys32_execve+0x7c/0x104 Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abe30] [c00000000000d800] syscall_exit+0x0/0x18 I got an awful lot of these with varying stack traces on 2.6.12-rc2-mm3, it looks like this code is being entered with interrupts enabled. Is this worth worrying about? Machine is an ML8 with 64GB of RAM. Sonny From sonny at burdell.org Sun Apr 17 11:19:00 2005 From: sonny at burdell.org (Sonny Rao) Date: Sat, 16 Apr 2005 21:19:00 -0400 Subject: 2.6.12-rc2-mm3 -- "Badness in slb_flush_and_rebolt" In-Reply-To: <20050417004841.GA4298@kevlar.burdell.org> References: <20050417004841.GA4298@kevlar.burdell.org> Message-ID: <20050417011900.GA4423@kevlar.burdell.org> On Sat, Apr 16, 2005 at 08:48:41PM -0400, Sonny Rao wrote: > Apr 15 15:57:23 itcopus122 kernel: Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 > Apr 15 15:57:23 itcopus122 kernel: Call Trace: > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab880] [c000000badcfc840] 0xc000000badcfc840 (unreliable) > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab910] [c00000000004cdb4] .__schedule_tail+0xb8/0x1b8 > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0ab9b0] [c0000000003be190] .schedule+0x728/0xc68 > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abae0] [c0000000003bee08] .wait_for_completion+0xd4/0x184 > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abbe0] [c00000000004f53c] .sched_exec+0x1ac/0x1fc > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abce0] [c0000000000eecb0] .compat_do_execve+0xa0/0x36c > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abd90] [c00000000001b578] .sys32_execve+0x7c/0x104 > Apr 15 15:57:23 itcopus122 kernel: [c0000007ae0abe30] [c00000000000d800] syscall_exit+0x0/0x18 > > > I got an awful lot of these with varying stack traces on > 2.6.12-rc2-mm3, it looks like this code is being entered with > interrupts enabled. Is this worth worrying about? > > Machine is an ML8 with 64GB of RAM. Doh, just saw the older thread on this, sorry for the dup. Sonny From anton at samba.org Sun Apr 17 11:44:02 2005 From: anton at samba.org (Anton Blanchard) Date: Sun, 17 Apr 2005 11:44:02 +1000 Subject: 2.6.12-rc2-mm3 -- "Badness in slb_flush_and_rebolt" In-Reply-To: <20050417004841.GA4298@kevlar.burdell.org> References: <20050417004841.GA4298@kevlar.burdell.org> Message-ID: <20050417014402.GF18835@krispykreme> Hi, > I got an awful lot of these with varying stack traces on > 2.6.12-rc2-mm3, it looks like this code is being entered with > interrupts enabled. Is this worth worrying about? Try this patch: http://ozlabs.org/ppc64-patches/patch.pl?id=750 Anton From benh at kernel.crashing.org Mon Apr 18 17:37:36 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 18 Apr 2005 17:37:36 +1000 Subject: SMU driver Message-ID: <1113809857.5462.308.camel@gaston> Hi ! Below is a patch against current 2.6.12-rc2 rewriting most of the SMU driver. It is now interrupt based and provides a /dev/smu userland interface for sending commands. I appended after the patch the source code for a small userland tool that blinks the front led using that interface. Now that the basic brick is there, I'll start toying with accessing the sensors, the i2c bus, etc... to get some proper thermal control, but I'll be fairly busy for the rest of the week with LCA, so if somebody wants to beat me on it and figure out how these things work, go for it. I've got the fan commands pretty much figured out already (thanks Markus and the OF code). For any serious hacking, I really suggest you look at the forth words of the SMU OF drivers for those who didn't yet. Index: linux-work/drivers/macintosh/smu.c =================================================================== --- linux-work.orig/drivers/macintosh/smu.c 2005-04-18 11:57:19.000000000 +1000 +++ linux-work/drivers/macintosh/smu.c 2005-04-18 17:30:18.000000000 +1000 @@ -8,21 +8,10 @@ */ /* - * For now, this driver includes: - * - RTC get & set - * - reboot & shutdown commands - * all synchronous with IRQ disabled (ugh) - * * TODO: - * rework in a way the PMU driver works, that is asynchronous - * with a queue of commands. I'll do that as soon as I have an - * SMU based machine at hand. Some more cleanup is needed too, - * like maybe fitting it into a platform device, etc... - * Also check what's up with cache coherency, and if we really - * can't do better than flushing the cache, maybe build a table - * of command len/reply len like the PMU driver to only flush - * what is actually necessary. - * --BenH. + * - Maybe add timeout to commands ? + * - i2c interface + * - blocking version of time functions */ #include @@ -36,6 +25,9 @@ #include #include #include +#include +#include +#include #include #include @@ -45,6 +37,7 @@ #include #include #include +#include #define DEBUG_SMU 1 @@ -57,20 +50,26 @@ /* * This is the command buffer passed to the SMU hardware */ +#define SMU_MAX_DATA 254 + struct smu_cmd_buf { u8 cmd; u8 length; - u8 data[0x0FFE]; + u8 data[SMU_MAX_DATA]; }; struct smu_device { spinlock_t lock; struct device_node *of_node; - int db_ack; /* doorbell ack GPIO */ - int db_req; /* doorbell req GPIO */ + int doorbell; /* doorbell gpio */ u32 __iomem *db_buf; /* doorbell buffer */ + int db_irq; + int msg; + int msg_irq; struct smu_cmd_buf *cmd_buf; /* command buffer virtual */ u32 cmd_buf_abs; /* command buffer absolute */ + struct list_head cmd_list; + struct smu_cmd *cmd_cur; }; /* @@ -79,90 +78,240 @@ */ static struct smu_device *smu; + /* - * SMU low level communication stuff + * SMU driver low level stuff */ -static inline int smu_cmd_stat(struct smu_cmd_buf *cmd_buf, u8 cmd_ack) + +static void smu_start_cmd(void) { - rmb(); - return cmd_buf->cmd == cmd_ack && cmd_buf->length != 0; + unsigned long faddr, fend; + struct smu_cmd *cmd; + + if (list_empty(&smu->cmd_list)) + return; + + /* Fetch first command in queue */ + cmd = list_entry(smu->cmd_list.next, struct smu_cmd, link); + smu->cmd_cur = cmd; + list_del(&cmd->link); + + DPRINTK("SMU: starting cmd %x, %d bytes data\n", cmd->cmd, + cmd->data_len); + DPRINTK("SMU: data buffer: %02x %02x %02x %02x ...\n", + ((u8 *)cmd->data_buf)[0], ((u8 *)cmd->data_buf)[1], + ((u8 *)cmd->data_buf)[2], ((u8 *)cmd->data_buf)[3]); + + /* Fill the SMU command buffer */ + smu->cmd_buf->cmd = cmd->cmd; + smu->cmd_buf->length = cmd->data_len; + memcpy(smu->cmd_buf->data, cmd->data_buf, cmd->data_len); + + /* Flush command and data to RAM */ + faddr = (unsigned long)smu->cmd_buf; + fend = faddr + smu->cmd_buf->length + 2; + flush_inval_dcache_range(faddr, fend); + + /* This isn't exactly a DMA mapping here, I suspect + * the SMU is actually communicating with us via i2c to the + * northbridge or the CPU to access RAM. + */ + writel(smu->cmd_buf_abs, smu->db_buf); + + /* Ring the SMU doorbell */ + pmac_do_feature_call(PMAC_FTR_WRITE_GPIO, NULL, smu->doorbell, 4); } -static inline u8 smu_save_ack_cmd(struct smu_cmd_buf *cmd_buf) + +static irqreturn_t smu_db_intr(int irq, void *arg, struct pt_regs *regs) { - return (~cmd_buf->cmd) & 0xff; + unsigned long flags; + struct smu_cmd *cmd; + void (*done)(struct smu_cmd *cmd, void *misc) = NULL; + void *misc = NULL; + u8 gpio; + int rc = 0; + + /* SMU completed the command, well, we hope, let's make sure + * of it + */ + spin_lock_irqsave(&smu->lock, flags); + + gpio = pmac_do_feature_call(PMAC_FTR_READ_GPIO, NULL, smu->doorbell); + if ((gpio & 7) != 7) + return IRQ_HANDLED; + + cmd = smu->cmd_cur; + smu->cmd_cur = NULL; + if (cmd == NULL) + goto bail; + + if (rc == 0) { + unsigned long faddr; + int reply_len; + u8 ack; + + /* CPU might have brought back the cache line, so we need + * to flush again before peeking at the SMU response. We + * flush the entire buffer for now as we haven't read the + * reply lenght (it's only 2 cache lines anyway) + */ + faddr = (unsigned long)smu->cmd_buf; + flush_inval_dcache_range(faddr, faddr + 256); + + /* Now check ack */ + ack = (~cmd->cmd) & 0xff; + if (ack != smu->cmd_buf->cmd) { + DPRINTK("SMU: incorrect ack, want %x got %x\n", + ack, smu->cmd_buf->cmd); + rc = -EIO; + } + reply_len = rc == 0 ? smu->cmd_buf->length : 0; + DPRINTK("SMU: reply len: %d\n", reply_len); + if (reply_len > cmd->reply_len) { + printk(KERN_WARNING "SMU: reply buffer too small," + "got %d bytes for a %d bytes buffer\n", + reply_len, cmd->reply_len); + reply_len = cmd->reply_len; + } + cmd->reply_len = reply_len; + if (cmd->reply_buf && reply_len) + memcpy(cmd->reply_buf, smu->cmd_buf->data, reply_len); + } + + /* Now complete the command. Write status last in order as we lost + * ownership of the command structure as soon as it's no longer -1 + */ + done = cmd->done; + misc = cmd->misc; + wmb(); + cmd->status = rc; + bail: + /* Start next command if any */ + smu_start_cmd(); + spin_unlock_irqrestore(&smu->lock, flags); + + /* Call command completion handler if any */ + if (done) + done(cmd, misc); + + /* It's an edge interrupt, nothing to do */ + return IRQ_HANDLED; } -static void smu_send_cmd(struct smu_device *dev) + +static irqreturn_t smu_msg_intr(int irq, void *arg, struct pt_regs *regs) { - /* SMU command buf is currently cacheable, we need a physical - * address. This isn't exactly a DMA mapping here, I suspect - * the SMU is actually communicating with us via i2c to the - * northbridge or the CPU to access RAM. + /* I don't quite know what to do with this one, we seem to never + * receive it, so I suspect we have to arm it someway in the SMU + * to start getting events that way. */ - writel(dev->cmd_buf_abs, dev->db_buf); - /* Ring the SMU doorbell */ - pmac_do_feature_call(PMAC_FTR_WRITE_GPIO, NULL, dev->db_req, 4); - pmac_do_feature_call(PMAC_FTR_READ_GPIO, NULL, dev->db_req, 4); + printk(KERN_INFO "SMU: message interrupt !\n"); + + /* It's an edge interrupt, nothing to do */ + return IRQ_HANDLED; } -static int smu_cmd_done(struct smu_device *dev) + +/* + * Queued command management. + * + */ + +int smu_queue_cmd(struct smu_cmd *cmd) { - unsigned long wait = 0; - int gpio; + unsigned long flags; - /* Check the SMU doorbell */ - do { - gpio = pmac_do_feature_call(PMAC_FTR_READ_GPIO, - NULL, dev->db_ack); - if ((gpio & 7) == 7) - return 0; - udelay(100); - } while(++wait < 10000); + if (smu == NULL) + return -ENODEV; + if (cmd->data_len > SMU_MAX_DATA || + cmd->reply_len > SMU_MAX_DATA) + return -EINVAL; - printk(KERN_ERR "SMU timeout !\n"); - return -ENXIO; + cmd->status = 1; + spin_lock_irqsave(&smu->lock, flags); + list_add_tail(&cmd->link, &smu->cmd_list); + if (smu->cmd_cur == NULL) + smu_start_cmd(); + spin_unlock_irqrestore(&smu->lock, flags); + + return 0; } -static int smu_do_cmd(struct smu_device *dev) + +int smu_queue_simple(struct smu_simple_cmd *scmd, u8 command, + unsigned int data_len, + void (*done)(struct smu_cmd *cmd, void *misc), + void *misc, ...) +{ + struct smu_cmd *cmd = &scmd->cmd; + va_list list; + int i; + + if (data_len > sizeof(scmd->buffer)) + return -EINVAL; + + memset(scmd, 0, sizeof(*scmd)); + cmd->cmd = command; + cmd->data_len = data_len; + cmd->data_buf = scmd->buffer; + cmd->reply_len = sizeof(scmd->buffer); + cmd->reply_buf = scmd->buffer; + cmd->done = done; + cmd->misc = misc; + + va_start(list, misc); + for (i = 0; i < data_len; ++i) + scmd->buffer[i] = (u8)va_arg(list, int); + va_end(list); + + return smu_queue_cmd(cmd); +} + + +void smu_poll(void) { - int rc; - u8 cmd_ack; + u8 gpio; - DPRINTK("SMU do_cmd %02x len=%d %02x\n", - dev->cmd_buf->cmd, dev->cmd_buf->length, - dev->cmd_buf->data[0]); - - cmd_ack = smu_save_ack_cmd(dev->cmd_buf); - - /* Clear cmd_buf cache lines */ - flush_inval_dcache_range((unsigned long)dev->cmd_buf, - ((unsigned long)dev->cmd_buf) + - sizeof(struct smu_cmd_buf)); - smu_send_cmd(dev); - rc = smu_cmd_done(dev); - if (rc == 0) - rc = smu_cmd_stat(dev->cmd_buf, cmd_ack) ? 0 : -1; - - DPRINTK("SMU do_cmd %02x len=%d %02x => %d (%02x)\n", - dev->cmd_buf->cmd, dev->cmd_buf->length, - dev->cmd_buf->data[0], rc, cmd_ack); + if (smu == NULL) + return; - return rc; + gpio = pmac_do_feature_call(PMAC_FTR_READ_GPIO, NULL, smu->doorbell); + if ((gpio & 7) == 7) + smu_db_intr(smu->db_irq, smu, NULL); } + +void smu_done_complete(struct smu_cmd *cmd, void *misc) +{ + struct completion *comp = misc; + + complete(comp); +} + + +void smu_spinwait_cmd(struct smu_cmd *cmd) +{ + while(cmd->status == 1) + smu_poll(); +} + + + /* RTC low level commands */ static inline int bcd2hex (int n) { return (((n & 0xf0) >> 4) * 10) + (n & 0xf); } + static inline int hex2bcd (int n) { return ((n / 10) << 4) + (n % 10); } + #if 0 static inline void smu_fill_set_pwrup_timer_cmd(struct smu_cmd_buf *cmd_buf) { @@ -171,14 +320,12 @@ cmd_buf->data[0] = 0x00; memset(cmd_buf->data + 1, 0, 7); } - static inline void smu_fill_get_pwrup_timer_cmd(struct smu_cmd_buf *cmd_buf) { cmd_buf->cmd = 0x8e; cmd_buf->length = 1; cmd_buf->data[0] = 0x01; } - static inline void smu_fill_dis_pwrup_timer_cmd(struct smu_cmd_buf *cmd_buf) { cmd_buf->cmd = 0x8e; @@ -202,94 +349,85 @@ cmd_buf->data[7] = hex2bcd(time->tm_year - 100); } -static inline void smu_fill_get_rtc_cmd(struct smu_cmd_buf *cmd_buf) -{ - cmd_buf->cmd = 0x8e; - cmd_buf->length = 1; - cmd_buf->data[0] = 0x81; -} - -static void smu_parse_get_rtc_reply(struct smu_cmd_buf *cmd_buf, - struct rtc_time *time) -{ - time->tm_sec = bcd2hex(cmd_buf->data[0]); - time->tm_min = bcd2hex(cmd_buf->data[1]); - time->tm_hour = bcd2hex(cmd_buf->data[2]); - time->tm_wday = bcd2hex(cmd_buf->data[3]); - time->tm_mday = bcd2hex(cmd_buf->data[4]); - time->tm_mon = bcd2hex(cmd_buf->data[5]) - 1; - time->tm_year = bcd2hex(cmd_buf->data[6]) + 100; -} -int smu_get_rtc_time(struct rtc_time *time) +int smu_get_rtc_time(struct rtc_time *time, int spinwait) { - unsigned long flags; + struct smu_simple_cmd cmd; int rc; if (smu == NULL) return -ENODEV; memset(time, 0, sizeof(struct rtc_time)); - spin_lock_irqsave(&smu->lock, flags); - smu_fill_get_rtc_cmd(smu->cmd_buf); - rc = smu_do_cmd(smu); - if (rc == 0) - smu_parse_get_rtc_reply(smu->cmd_buf, time); - spin_unlock_irqrestore(&smu->lock, flags); - - return rc; + rc = smu_queue_simple(&cmd, SMU_CMD_RTC_COMMAND, 1, NULL, NULL, + SMU_CMD_RTC_GET_DATETIME); + if (rc) + return rc; + smu_spinwait_simple(&cmd); + + time->tm_sec = bcd2hex(cmd.buffer[0]); + time->tm_min = bcd2hex(cmd.buffer[1]); + time->tm_hour = bcd2hex(cmd.buffer[2]); + time->tm_wday = bcd2hex(cmd.buffer[3]); + time->tm_mday = bcd2hex(cmd.buffer[4]); + time->tm_mon = bcd2hex(cmd.buffer[5]) - 1; + time->tm_year = bcd2hex(cmd.buffer[6]) + 100; + + return 0; } -int smu_set_rtc_time(struct rtc_time *time) +int smu_set_rtc_time(struct rtc_time *time, int spinwait) { - unsigned long flags; + struct smu_simple_cmd cmd; int rc; if (smu == NULL) return -ENODEV; - spin_lock_irqsave(&smu->lock, flags); - smu_fill_set_rtc_cmd(smu->cmd_buf, time); - rc = smu_do_cmd(smu); - spin_unlock_irqrestore(&smu->lock, flags); + rc = smu_queue_simple(&cmd, SMU_CMD_RTC_COMMAND, 8, NULL, NULL, + SMU_CMD_RTC_SET_DATETIME, + hex2bcd(time->tm_sec), + hex2bcd(time->tm_min), + hex2bcd(time->tm_hour), + time->tm_wday, + hex2bcd(time->tm_mday), + hex2bcd(time->tm_mon) + 1, + hex2bcd(time->tm_year - 100)); + if (rc) + return rc; + smu_spinwait_simple(&cmd); - return rc; + return 0; } void smu_shutdown(void) { - const unsigned char *command = "SHUTDOWN"; - unsigned long flags; + struct smu_simple_cmd cmd; if (smu == NULL) return; - spin_lock_irqsave(&smu->lock, flags); - smu->cmd_buf->cmd = 0xaa; - smu->cmd_buf->length = strlen(command); - strcpy(smu->cmd_buf->data, command); - smu_do_cmd(smu); + if (smu_queue_simple(&cmd, SMU_CMD_POWER_COMMAND, 8, NULL, NULL, + 'S', 'H', 'U', 'T', 'D', 'O', 'W', 'N')) + return; + smu_spinwait_simple(&cmd); for (;;) ; - spin_unlock_irqrestore(&smu->lock, flags); } void smu_restart(void) { - const unsigned char *command = "RESTART"; - unsigned long flags; + struct smu_simple_cmd cmd; if (smu == NULL) return; - spin_lock_irqsave(&smu->lock, flags); - smu->cmd_buf->cmd = 0xaa; - smu->cmd_buf->length = strlen(command); - strcpy(smu->cmd_buf->data, command); - smu_do_cmd(smu); + if (smu_queue_simple(&cmd, SMU_CMD_POWER_COMMAND, 7, NULL, NULL, + 'R', 'E', 'S', 'T', 'A', 'R', 'T')) + return; + smu_spinwait_simple(&cmd); for (;;) ; - spin_unlock_irqrestore(&smu->lock, flags); } int smu_present(void) @@ -318,7 +456,11 @@ memset(smu, 0, sizeof(*smu)); spin_lock_init(&smu->lock); + INIT_LIST_HEAD(&smu->cmd_list); smu->of_node = np; + smu->db_irq = NO_IRQ; + smu->msg_irq = NO_IRQ; + /* smu_cmdbuf_abs is in the low 2G of RAM, can be converted to a * 32 bits value safely */ @@ -331,8 +473,8 @@ goto fail; } data = (u32 *)get_property(np, "reg", NULL); - of_node_put(np); if (data == NULL) { + of_node_put(np); printk(KERN_ERR "SMU: Can't find doorbell GPIO address !\n"); goto fail; } @@ -341,8 +483,31 @@ * and ack. GPIOs are at 0x50, best would be to find that out * in the device-tree though. */ - smu->db_req = 0x50 + *data; - smu->db_ack = 0x50 + *data; + smu->doorbell = *data; + if (smu->doorbell < 0x50) + smu->doorbell += 0x50; + if (np->n_intrs > 0) + smu->db_irq = np->intrs[0].line; + + of_node_put(np); + + /* Now look for the smu-interrupt GPIO */ + do { + np = of_find_node_by_name(NULL, "smu-interrupt"); + if (np == NULL) + break; + data = (u32 *)get_property(np, "reg", NULL); + if (data == NULL) { + of_node_put(np); + break; + } + smu->msg = *data; + if (smu->msg < 0x50) + smu->msg += 0x50; + if (np->n_intrs > 0) + smu->msg_irq = np->intrs[0].line; + of_node_put(np); + } while(0); /* Doorbell buffer is currently hard-coded, I didn't find a proper * device-tree entry giving the address. Best would probably to use @@ -362,3 +527,253 @@ return -ENXIO; } + +static int smu_late_init(void) +{ + /* + * Try to request the interrupts + */ + + if (smu->db_irq != NO_IRQ) { + if (request_irq(smu->db_irq, smu_db_intr, + SA_SHIRQ, "SMU doorbell", smu) < 0) { + printk(KERN_WARNING "SMU: can't " + "request interrupt %d\n", + smu->db_irq); + smu->db_irq = NO_IRQ; + } + } + + if (smu->msg_irq != NO_IRQ) { + if (request_irq(smu->msg_irq, smu_msg_intr, + SA_SHIRQ, "SMU message", smu) < 0) { + printk(KERN_WARNING "SMU: can't " + "request interrupt %d\n", + smu->msg_irq); + smu->msg_irq = NO_IRQ; + } + } + + return 0; +} +arch_initcall(smu_late_init); + +/* + * Userland driver interface + */ + +static LIST_HEAD(smu_clist); +static DEFINE_SPINLOCK(smu_clist_lock); + +enum smu_file_mode { + smu_file_commands, + smu_file_events, + smu_file_closing +}; + +struct smu_private +{ + struct list_head list; + enum smu_file_mode mode; + int busy; + struct smu_cmd cmd; + spinlock_t lock; + wait_queue_head_t wait; + u8 buffer[SMU_MAX_DATA]; +}; + + +static int smu_open(struct inode *inode, struct file *file) +{ + struct smu_private *pp; + unsigned long flags; + + pp = kmalloc(sizeof(struct smu_private), GFP_KERNEL); + if (pp == 0) + return -ENOMEM; + memset(pp, 0, sizeof(struct smu_private)); + spin_lock_init(&pp->lock); + pp->mode = smu_file_commands; + init_waitqueue_head(&pp->wait); + + spin_lock_irqsave(&smu_clist_lock, flags); + list_add(&pp->list, &smu_clist); + spin_unlock_irqrestore(&smu_clist_lock, flags); + file->private_data = pp; + + return 0; +} + + +static void smu_user_cmd_done(struct smu_cmd *cmd, void *misc) +{ + struct smu_private *pp = misc; + + wake_up_interruptible(&pp->wait); +} + + +static ssize_t smu_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct smu_private *pp = file->private_data; + unsigned long flags; + struct smu_user_cmd_hdr hdr; + int rc = 0; + + if (pp->busy) + return -EBUSY; + else if (copy_from_user(&hdr, buf, sizeof(hdr))) + return -EFAULT; + else if (hdr.cmdtype == SMU_CMDTYPE_WANTS_EVENTS) { + pp->mode = smu_file_events; + return 0; + } else if (hdr.cmdtype != SMU_CMDTYPE_SMU) + return -EINVAL; + else if (pp->mode != smu_file_commands) + return -EBADFD; + else if (hdr.data_len > SMU_MAX_DATA) + return -EINVAL; + + spin_lock_irqsave(&pp->lock, flags); + if (pp->busy) { + spin_unlock_irqrestore(&pp->lock, flags); + return -EBUSY; + } + pp->busy = 1; + pp->cmd.status = 1; + spin_unlock_irqrestore(&pp->lock, flags); + + if (copy_from_user(pp->buffer, buf + sizeof(hdr), hdr.data_len)) { + pp->busy = 0; + return -EFAULT; + } + + pp->cmd.cmd = hdr.cmd; + pp->cmd.data_len = hdr.data_len; + pp->cmd.reply_len = SMU_MAX_DATA; + pp->cmd.data_buf = pp->buffer; + pp->cmd.reply_buf = pp->buffer; + pp->cmd.done = smu_user_cmd_done; + pp->cmd.misc = pp; + rc = smu_queue_cmd(&pp->cmd); + if (rc < 0) + return rc; + return count; +} + + +static ssize_t smu_read_command(struct file *file, struct smu_private *pp, + char __user *buf, size_t count) +{ + DECLARE_WAITQUEUE(wait, current); + struct smu_user_reply_hdr hdr; + int size, rc = 0; + + if (!pp->busy) + return 0; + if (count < sizeof(struct smu_user_reply_hdr)) + return -EOVERFLOW; + if (pp->cmd.status == 1) { + if (file->f_flags & O_NONBLOCK) + return -EAGAIN; + add_wait_queue(&pp->wait, &wait); + for (;;) { + set_current_state(TASK_INTERRUPTIBLE); + rc = 0; + if (pp->cmd.status != 1) + break; + rc = -ERESTARTSYS; + if (signal_pending(current)) + break; + schedule(); + } + set_current_state(TASK_RUNNING); + remove_wait_queue(&pp->wait, &wait); + if (rc) + return rc; + } + if (pp->cmd.status != 0) + pp->cmd.reply_len = 0; + size = sizeof(hdr) + pp->cmd.reply_len; + if (count < size) + size = count; + rc = size; + hdr.status = pp->cmd.status; + hdr.reply_len = pp->cmd.reply_len; + if (copy_to_user(buf, &hdr, sizeof(hdr))) + return -EFAULT; + size -= sizeof(hdr); + if (size && copy_to_user(buf + sizeof(hdr), pp->buffer, size)) + return -EFAULT; + pp->busy = 0; + + return rc; +} + + +static ssize_t smu_read_events(struct file *file, struct smu_private *pp, + char __user *buf, size_t count) +{ + /* Not implemented */ + msleep_interruptible(1000); + return 0; +} + + +static ssize_t smu_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + struct smu_private *pp = file->private_data; + + if (pp->mode == smu_file_commands) + return smu_read_command(file, pp, buf, count); + if (pp->mode == smu_file_events) + return smu_read_events(file, pp, buf, count); + + return -EBADFD; +} + + +static int smu_release(struct inode *inode, struct file *file) +{ + struct smu_private *pp = file->private_data; + unsigned long flags; + + if (pp == 0) + return 0; + + file->private_data = NULL; + pp->mode = smu_file_closing; + // XXX wait completion + spin_lock_irqsave(&smu_clist_lock, flags); + list_del(&pp->list); + spin_unlock_irqrestore(&smu_clist_lock, flags); + kfree(pp); + + return 0; +} + + +static struct file_operations smu_device_fops __pmacdata = { + .llseek = no_llseek, + .read = smu_read, + .write = smu_write, + .open = smu_open, + .release = smu_release, +}; + +static struct miscdevice pmu_device __pmacdata = { + MISC_DYNAMIC_MINOR, "smu", &smu_device_fops +}; + + +static int smu_device_init(void) +{ + if (!smu) + return -ENODEV; + if (misc_register(&pmu_device) < 0) + printk(KERN_ERR "via-pmu: cannot register misc device.\n"); + return 0; +} +device_initcall(smu_device_init); Index: linux-work/include/asm-ppc64/smu.h =================================================================== --- linux-work.orig/include/asm-ppc64/smu.h 2005-04-18 11:57:19.000000000 +1000 +++ linux-work/include/asm-ppc64/smu.h 2005-04-18 13:05:27.000000000 +1000 @@ -1,22 +1,269 @@ +#ifndef _SMU_H +#define _SMU_H + /* * Definitions for talking to the SMU chip in newer G5 PowerMacs */ #include +#include + +/* + * Known SMU commands + * + * Most of what is below comes from looking at the Open Firmware driver, + * though this is still incomplete and could use better documentation here + * or there... + */ + + +/* + * Partition info commands + * + * I do not know what those are for at this point + */ +#define SMU_CMD_PARTITION_COMMAND 0x3e + + +/* + * Fan control + * + * This is a "mux" for fan control commands, first byte is the + * "sub" command. + */ +#define SMU_CMD_FAN_COMMAND 0x4a + + +/* + * Battery access + * + * Same command number as the PMU, could it be same syntax ? + */ +#define SMU_CMD_BATTERY_COMMAND 0x6f +#define SMU_CMD_GET_BATTERY_INFO 0x00 + +/* + * Real time clock control + * + * This is a "mux", first data byte contains the "sub" command. + * The "RTC" part of the SMU controls the date, time, powerup + * timer, but also a PRAM + * + * Dates are in BCD format on 7 bytes: + * [sec] [min] [hour] [weekday] [month day] [month] [year] + * with month being 1 based and year minus 100 + */ +#define SMU_CMD_RTC_COMMAND 0x8e +#define SMU_CMD_RTC_SET_PWRUP_TIMER 0x00 /* i: 7 bytes date */ +#define SMU_CMD_RTC_GET_PWRUP_TIMER 0x01 /* o: 7 bytes date */ +#define SMU_CMD_RTC_STOP_PWRUP_TIMER 0x02 +#define SMU_CMD_RTC_SET_PRAM_BYTE_ACC 0x20 /* i: 1 byte (address?) */ +#define SMU_CMD_RTC_SET_PRAM_AUTOINC 0x21 /* i: 1 byte (data?) */ +#define SMU_CMD_RTC_SET_PRAM_LO_BYTES 0x22 /* i: 10 bytes */ +#define SMU_CMD_RTC_SET_PRAM_HI_BYTES 0x23 /* i: 10 bytes */ +#define SMU_CMD_RTC_GET_PRAM_BYTE 0x28 /* i: 1 bytes (address?) */ +#define SMU_CMD_RTC_GET_PRAM_LO_BYTES 0x29 /* o: 10 bytes */ +#define SMU_CMD_RTC_GET_PRAM_HI_BYTES 0x2a /* o: 10 bytes */ +#define SMU_CMD_RTC_SET_DATETIME 0x80 /* i: 7 bytes date */ +#define SMU_CMD_RTC_GET_DATETIME 0x81 /* o: 7 bytes date */ + + /* + * i2c commands + * + * No description yet + */ +#define SMU_CMD_I2C_COMMAND 0x9a + + +/* + * Power supply control + * + * The "sub" command is an ASCII string in the data, the + * data lenght is that of the string. + * + * The VSLEW command can be used to get or set the voltage slewing. + * - lenght 5 (only "VSLEW") : it returns "DONE" and 3 bytes of + * reply at data offset 6, 7 and 8. + * - lenght 8 ("VSLEWxyz") has 3 additional bytes appended, and is + * used to set the voltage slewing point. The SMU replies with "DONE" + * I yet have to figure out their exact meaning of those 3 bytes in + * both cases. + * + */ +#define SMU_CMD_POWER_COMMAND 0xaa +#define SMU_CMD_POWER_RESTART "RESTART" +#define SMU_CMD_POWER_SHUTDOWN "SHUTDOWN" +#define SMU_CMD_POWER_VOLTAGE_SLEW "VSLEW" + +/* Misc commands + * + * This command seem to be a grab bag of various things + */ +#define SMU_CMD_MISC_df_COMMAND 0xdf +#define SMU_CMD_MISC_df_SET_DISPLAY_LIT 0x02 /* i: 1 byte */ +#define SMU_CMD_MISC_df_NMI_OPTION 0x04 + +/* + * Version info commands + * + * I haven't quite tried to figure out how these work + */ +#define SMU_CMD_VERSION_COMMAND 0xea + + +/* + * Misc commands + * + * This command seem to be a grab bag of various things + */ +#define SMU_CMD_MISC_ee_COMMAND 0xee +#define SMU_CMD_MISC_ee_GET_DATABLOCK_REC 0x02 +#define SMU_CMD_MISC_ee_LEDS_CTRL 0x04 /* i: 00 (00,01) [00] */ +#define SMU_CMD_MISC_ee_GET_DATA 0x05 /* i: 00 , o: ?? */ + + + +/* + * - Kernel side interface - + */ + +#ifdef __KERNEL__ /* - * Basic routines for use by architecture. To be extended as - * we understand more of the chip + * Asynchronous SMU commands + * + * Fill up this structure and submit it via smu_queue_command(), + * and get notified by the optional done() callback, or because + * status becomes != 1 + */ + +struct smu_cmd; + +struct smu_cmd +{ + u8 cmd; /* command */ + int data_len; /* data len */ + int reply_len; /* reply len */ + void *data_buf; /* data buffer */ + void *reply_buf; /* reply buffer */ + int status; /* command status */ + void (*done)(struct smu_cmd *cmd, void *misc); + void *misc; + struct list_head link; +}; + +/* + * Queues an SMU command, all fields have to be initialized + */ +extern int smu_queue_cmd(struct smu_cmd *cmd); + +/* + * Simple command wrapper. This structure embeds a small buffer + * to ease sending simple SMU commands from the stack + */ +struct smu_simple_cmd +{ + struct smu_cmd cmd; + u8 buffer[16]; +}; + +/* + * Queues a simple command. All fields will be initialized by that + * function + */ +extern int smu_queue_simple(struct smu_simple_cmd *scmd, u8 command, + unsigned int data_len, + void (*done)(struct smu_cmd *cmd, void *misc), + void *misc, + ...); + +/* + * Completion helper. Pass it to smu_queue_simple or as 'done' + * member to smu_queue_cmd, it will call complete() on the struct + * completion passed in the "misc" argument + */ +extern void smu_done_complete(struct smu_cmd *cmd, void *misc); + +/* + * Synchronous helpers. Will spin-wait for completion of a command + */ +extern void smu_spinwait_cmd(struct smu_cmd *cmd); + +static inline void smu_spinwait_simple(struct smu_simple_cmd *scmd) +{ + smu_spinwait_cmd(&scmd->cmd); +} + +/* + * Poll routine to call if blocked with irqs off + */ +extern void smu_poll(void); + + +/* + * Init routine, presence check.... */ extern int smu_init(void); extern int smu_present(void); + + +/* + * Common command wrappers + */ extern void smu_shutdown(void); extern void smu_restart(void); -extern int smu_get_rtc_time(struct rtc_time *time); -extern int smu_set_rtc_time(struct rtc_time *time); +extern int smu_get_rtc_time(struct rtc_time *time, int spinwait); +extern int smu_set_rtc_time(struct rtc_time *time, int spinwait); /* * SMU command buffer absolute address, exported by pmac_setup, * this is allocated very early during boot. */ extern unsigned long smu_cmdbuf_abs; + +#endif /* __KERNEL__ */ + +/* + * - Userland interface - + */ + +/* + * A given instance of the device can be configured for 2 different + * things at the moment: + * + * - sending SMU commands (default at open() time) + * - receiving SMU events (not yet implemented) + * + * Commands are written with write() of a command block. They can be + * "driver" commands (for example to switch to event reception mode) + * or real SMU commands. They are made of a header followed by command + * data if any. + * + * For SMU commands (not for driver commands), you can then read() back + * a reply. The reader will be blocked or not depending on how the device + * file is opened. poll() isn't implemented yet. The reply will consist + * of a header as well, followed by the reply data if any. You should + * always provide a buffer large enough for the maximum reply data, I + * recommand one page. + * + * It is illegal to send SMU commands through a file descriptor configured + * for events reception + * + */ +struct smu_user_cmd_hdr +{ + __u32 cmdtype; +#define SMU_CMDTYPE_SMU 0 /* SMU command */ +#define SMU_CMDTYPE_WANTS_EVENTS 1 /* switch fd to events mode */ + + __u8 cmd; /* SMU command byte */ + __u32 data_len; /* Lenght of data following */ +}; + +struct smu_user_reply_hdr +{ + __u32 status; /* Command status */ + __u32 reply_len; /* Lenght of data follwing */ +}; + +#endif /* _SMU_H */ Index: linux-work/arch/ppc64/kernel/pmac_time.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_time.c 2005-04-18 11:57:19.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pmac_time.c 2005-04-18 12:16:44.000000000 +1000 @@ -89,7 +89,7 @@ #ifdef CONFIG_PMAC_SMU case SYS_CTRLER_SMU: - smu_get_rtc_time(tm); + smu_get_rtc_time(tm, 1); break; #endif /* CONFIG_PMAC_SMU */ default: @@ -133,7 +133,7 @@ #ifdef CONFIG_PMAC_SMU case SYS_CTRLER_SMU: - return smu_set_rtc_time(tm); + return smu_set_rtc_time(tm, 1); #endif /* CONFIG_PMAC_SMU */ default: return -ENODEV; And now the tool I talked about (it needs smu.h from the kernel include/asm-ppc64, I usually do a symlink of it to the working directory where I build the test programs) #include #include #include #include #include #include #include #include "smu.h" int fd; struct smu_user_cmd_hdr *chdr; struct smu_user_reply_hdr *rhdr; unsigned char *cdata, *rdata; unsigned char buffer[1024]; static int sleepled(int state) { int rc; chdr->cmdtype = SMU_CMDTYPE_SMU; chdr->cmd = 0xee; chdr->data_len = 3; cdata[0] = 0x04; cdata[1] = 0x00; cdata[2] = state ? 0x01 : 0x00; rc = write(fd, chdr, sizeof(*chdr) + 3); if (rc < 0) { perror("error writing command"); return rc; } rc = read(fd, rhdr, 1024); if (rc < 0) { perror("error reading reply"); return rc; } printf("sleepled, status: %d, reply_len: %d\n", rhdr->status, rhdr->reply_len); return 0; } int main(int argc, char **argv) { chdr = (struct smu_user_cmd_hdr *)buffer; rhdr = (struct smu_user_reply_hdr *)buffer; cdata = &buffer[sizeof(*chdr)]; rdata = &buffer[sizeof(*rhdr)]; fd = open("/dev/smu", O_RDWR); if (fd == -1) { perror("can't open /dev/smu"); return -1; } if (argc < 2) { sleepled(1); usleep(100); sleepled(0); return 0; } sleepled(strtol(argv[1], NULL, 10)); return 0; } From linas at austin.ibm.com Tue Apr 19 05:38:33 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 18 Apr 2005 14:38:33 -0500 Subject: PATCH [PPC64]: dead processes never reaped Message-ID: <20050418193833.GW15596@austin.ibm.com> Hi, The patch below appears to fix a problem where a number of dead processes linger on the system. On a highly loaded system, dozens of processes were found stuck in do_exit(), calling thier very last schedule(), and then being lost forever. Processes that are PF_DEAD are cleaned up *after* the context switch, in a routine called finish_task_switch(task_t *prev). The "prev" gets the value returned by _switch() in entry.S, but this value comes from __switch_to (struct task_struct *prev, struct task_struct *new) { old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? last = _switch(old_thread, new_thread); return last; } The way I see it, "prev" and "current" are almost always going to be pointing at the same thing; however, if a "need resched" happens, or there's a pre-emept or some-such, then prev and current won't be the same; in which case, finish_task_switch() will end up cleaning up the old current, instead of prev. This will result in dead processes hanging around, which will never be scheduled again, and will never get a chance to have put_task_struct() called on them. This patch fixes this. Signed-off-by: Linas Vepstas --- arch/ppc64/kernel/process.c.orig 2005-04-18 14:26:42.000000000 -0500 +++ arch/ppc64/kernel/process.c 2005-04-18 14:27:54.000000000 -0500 @@ -204,7 +204,7 @@ struct task_struct *__switch_to(struct t flush_tlb_pending(); new_thread = &new->thread; - old_thread = ¤t->thread; + old_thread = &prev->thread; local_irq_save(flags); last = _switch(old_thread, new_thread); From benh at kernel.crashing.org Tue Apr 19 11:01:25 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Apr 2005 11:01:25 +1000 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <20050418193833.GW15596@austin.ibm.com> References: <20050418193833.GW15596@austin.ibm.com> Message-ID: <1113872485.5516.326.camel@gaston> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > Hi, > > The patch below appears to fix a problem where a number of dead processes > linger on the system. On a highly loaded system, dozens of processes > were found stuck in do_exit(), calling thier very last schedule(), and > then being lost forever. > > Processes that are PF_DEAD are cleaned up *after* the context switch, > in a routine called finish_task_switch(task_t *prev). The "prev" gets > the value returned by _switch() in entry.S, but this value comes from > > __switch_to (struct task_struct *prev, > struct task_struct *new) > { > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? > last = _switch(old_thread, new_thread); > return last; > } > > The way I see it, "prev" and "current" are almost always going to be > pointing at the same thing; however, if a "need resched" happens, > or there's a pre-emept or some-such, then prev and current won't be > the same; in which case, finish_task_switch() will end up cleaning > up the old current, instead of prev. This will result in dead processes > hanging around, which will never be scheduled again, and will never > get a chance to have put_task_struct() called on them. I wonder why we bother doing all that at all... we could just return "prev" from __switch_to() no ? Like x86 does... Ben. From benh at kernel.crashing.org Tue Apr 19 11:07:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Apr 2005 11:07:01 +1000 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <20050418193833.GW15596@austin.ibm.com> References: <20050418193833.GW15596@austin.ibm.com> Message-ID: <1113872821.5516.330.camel@gaston> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > Hi, > > The patch below appears to fix a problem where a number of dead processes > linger on the system. On a highly loaded system, dozens of processes > were found stuck in do_exit(), calling thier very last schedule(), and > then being lost forever. > > Processes that are PF_DEAD are cleaned up *after* the context switch, > in a routine called finish_task_switch(task_t *prev). The "prev" gets > the value returned by _switch() in entry.S, but this value comes from > > __switch_to (struct task_struct *prev, > struct task_struct *new) > { > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? > last = _switch(old_thread, new_thread); > return last; > } > > The way I see it, "prev" and "current" are almost always going to be > pointing at the same thing; however, if a "need resched" happens, > or there's a pre-emept or some-such, then prev and current won't be > the same; in which case, finish_task_switch() will end up cleaning > up the old current, instead of prev. This will result in dead processes > hanging around, which will never be scheduled again, and will never > get a chance to have put_task_struct() called on them. Ok, thinking moer about this ... that will need maybe some help from Ingo so I fully understand where schedule's are allowed ... We are basically in the middle of the scheduler here, so I wonder how much of the scheduler itself can be preempted or so ... Basically, under which circumstances can prev and current be different ? Ben. From nickpiggin at yahoo.com.au Tue Apr 19 11:25:15 2005 From: nickpiggin at yahoo.com.au (Nick Piggin) Date: Tue, 19 Apr 2005 11:25:15 +1000 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <1113872821.5516.330.camel@gaston> References: <20050418193833.GW15596@austin.ibm.com> <1113872821.5516.330.camel@gaston> Message-ID: <1113873915.5074.6.camel@npiggin-nld.site> On Tue, 2005-04-19 at 11:07 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > > > Hi, > > > > The patch below appears to fix a problem where a number of dead processes > > linger on the system. On a highly loaded system, dozens of processes > > were found stuck in do_exit(), calling thier very last schedule(), and > > then being lost forever. > > > > Processes that are PF_DEAD are cleaned up *after* the context switch, > > in a routine called finish_task_switch(task_t *prev). The "prev" gets > > the value returned by _switch() in entry.S, but this value comes from > > > > __switch_to (struct task_struct *prev, > > struct task_struct *new) > > { > > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? > > last = _switch(old_thread, new_thread); > > return last; > > } > > > > The way I see it, "prev" and "current" are almost always going to be > > pointing at the same thing; however, if a "need resched" happens, > > or there's a pre-emept or some-such, then prev and current won't be > > the same; in which case, finish_task_switch() will end up cleaning > > up the old current, instead of prev. This will result in dead processes > > hanging around, which will never be scheduled again, and will never > > get a chance to have put_task_struct() called on them. > > Ok, thinking moer about this ... that will need maybe some help from > Ingo so I fully understand where schedule's are allowed ... We are > basically in the middle of the scheduler here, so I wonder how much of > the scheduler itself can be preempted or so ... > Not much. schedule() has a small preempt window at the beginning and end of the function. The context switch is of course run with preempt disabled. Ie. your switch_to should never get preempted. > Basically, under which circumstances can prev and current be different ? > Depends on your context switch, really. prev == current before you switch, and when you switch to 'new' it is different. However, I think the 'new' current has *its* old prev on the stack (which == new current). You just have to preserve the old 'prev' somehow (ie. the thread you switched away from). -- SUSE Labs, Novell Inc. From paulus at samba.org Tue Apr 19 19:53:24 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 19 Apr 2005 19:53:24 +1000 Subject: [PATCH] ppc64: Fix irq parsing on powermac Message-ID: <16996.54548.378523.626499@cargo.ozlabs.ibm.com> When I tried Ben's patches to the powermac sound driver on my G5, I found that it was taking enormous numbers of sound DMA transmit interrupts. This turned out to be because it was incorrectly configured as level-sensitive instead of edge-sensitive, which in turn was because the code that parses the interrupt tree that Open Firmware gives us was incorrectly assigning another device the same irq number as the sound DMA transmit interrupt (i.e. 1). This patch fixes the problem, in a somewhat quick and dirty way for now, but one which will work for all the machines we currently run on. Ultimately Ben and I want to do something more general and robust, but this should go in for 2.6.12. Signed-off-by: Paul Mackerras diff -urN linux/arch/ppc64/kernel/prom.c g5-ppc64/arch/ppc64/kernel/prom.c --- linux/arch/ppc64/kernel/prom.c 2005-04-05 16:04:02.000000000 +1000 +++ g5-ppc64/arch/ppc64/kernel/prom.c 2005-04-19 12:26:50.000000000 +1000 @@ -321,6 +321,10 @@ char *name = get_property(ic->parent, "name", NULL); if (name && !strcmp(name, "u3")) np->intrs[intrcount].line += 128; + else if (!(name && !strcmp(name, "mac-io"))) + /* ignore other cascaded controllers, such as + the k2-sata-root */ + break; } np->intrs[intrcount].sense = 1; if (n > 1) From linas at austin.ibm.com Wed Apr 20 03:22:31 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 19 Apr 2005 12:22:31 -0500 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <1113872821.5516.330.camel@gaston> References: <20050418193833.GW15596@austin.ibm.com> <1113872821.5516.330.camel@gaston> Message-ID: <20050419172231.GY15596@austin.ibm.com> On Tue, Apr 19, 2005 at 11:07:01AM +1000, Benjamin Herrenschmidt was heard to remark: > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > > > Hi, > > > > The patch below appears to fix a problem where a number of dead processes > > linger on the system. On a highly loaded system, dozens of processes > > were found stuck in do_exit(), calling thier very last schedule(), and > > then being lost forever. > > > > Processes that are PF_DEAD are cleaned up *after* the context switch, > > in a routine called finish_task_switch(task_t *prev). The "prev" gets > > the value returned by _switch() in entry.S, but this value comes from > > > > __switch_to (struct task_struct *prev, > > struct task_struct *new) > > { > > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? > > last = _switch(old_thread, new_thread); > > return last; > > } > > > > The way I see it, "prev" and "current" are almost always going to be > > pointing at the same thing; however, if a "need resched" happens, > > or there's a pre-emept or some-such, then prev and current won't be > > the same; in which case, finish_task_switch() will end up cleaning > > up the old current, instead of prev. This will result in dead processes > > hanging around, which will never be scheduled again, and will never > > get a chance to have put_task_struct() called on them. > > Ok, thinking moer about this ... that will need maybe some help from > Ingo so I fully understand where schedule's are allowed ... We are > basically in the middle of the scheduler here, so I wonder how much of > the scheduler itself can be preempted or so ... > > Basically, under which circumstances can prev and current be different ? I remember finding a path through void __sched schedule(void) that took a branch through the goto need_resched; that would result in this. I takes a bit of mental gymnastics to see how this might happen. FWIW, I can send you a debug session showing all cpu's idle and 44 dead processes sitting in do_exit(). All but two of these were Java threads, so this seems to be some sort of thread-scheduling subtlty. (the two that weren't java threads were a find|grep pair that must have gotten tangled in.) Given that the patch seems to fix the problem, I didn't dig much deeper. --linas From linas at austin.ibm.com Wed Apr 20 03:39:30 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 19 Apr 2005 12:39:30 -0500 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <1113872485.5516.326.camel@gaston> References: <20050418193833.GW15596@austin.ibm.com> <1113872485.5516.326.camel@gaston> Message-ID: <20050419173930.GZ15596@austin.ibm.com> On Tue, Apr 19, 2005 at 11:01:25AM +1000, Benjamin Herrenschmidt was heard to remark: > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > > > Hi, > > > > The patch below appears to fix a problem where a number of dead processes > > linger on the system. On a highly loaded system, dozens of processes > > were found stuck in do_exit(), calling thier very last schedule(), and > > then being lost forever. > > > > Processes that are PF_DEAD are cleaned up *after* the context switch, > > in a routine called finish_task_switch(task_t *prev). The "prev" gets > > the value returned by _switch() in entry.S, but this value comes from > > > > __switch_to (struct task_struct *prev, > > struct task_struct *new) > > { > > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not current? > > last = _switch(old_thread, new_thread); > > return last; > > } > > > > The way I see it, "prev" and "current" are almost always going to be > > pointing at the same thing; however, if a "need resched" happens, > > or there's a pre-emept or some-such, then prev and current won't be > > the same; in which case, finish_task_switch() will end up cleaning > > up the old current, instead of prev. This will result in dead processes > > hanging around, which will never be scheduled again, and will never > > get a chance to have put_task_struct() called on them. > > I wonder why we bother doing all that at all... we could just return > "prev" from __switch_to() no ? Like x86 does... Probably. I assume this funny two-step is left-over from a 2.4 kernel design point. Naively, we could rturn "prev", this would save a few cycles. Cut the "addi r3,r3,-THREAD" from entry.S as well. I was being conservative with the patch, making the smallest change possible. Do you want this larger patch? --linas From arnd at arndb.de Wed Apr 20 09:49:21 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Apr 2005 01:49:21 +0200 Subject: [PATCH 0/4] ppc64: prepare for integration of BPA platform Message-ID: <200504200149.22063.arnd@arndb.de> This series of patches adds a bit of infrastructure in preparation of getting the Broadband Processor Architecture (BPA) into the kernel as a new platform type of ppc64. BPA is currently used in a single machine from IBM, with others likely to be added at a later point. None of these preparation patches are really specific to the architecture itself. Hopefully, I will be able to send the actual platform code really soon now. BPA and pSeries can share some code, mostly because they are both using rtas. The first two patches are splitting out the common code from the pSeries_pci implementation into a generic rtas_pci base. The nvram and watchdog drivers are pretty generic and are first used in the new machine. Arnd <>< From arnd at arndb.de Wed Apr 20 09:52:56 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Apr 2005 01:52:56 +0200 Subject: [PATCH 1/4] ppc64: rename arch/ppc64/kernel/pSeries_pci.c In-Reply-To: <200504200149.22063.arnd@arndb.de> References: <200504200149.22063.arnd@arndb.de> Message-ID: <200504200152.58965.arnd@arndb.de> Rename pSeries_pci.c to rtas_pci.c as a preparation to generalize it for use by BPA. Most of the file can be used by any machine that implements rtas. Signed-off-by: Arnd Bergmann --- linux-2.6-ppc.orig/arch/ppc64/kernel/Makefile 2005-03-31 19:11:15.464956912 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/Makefile 2005-03-31 19:11:37.754895872 -0500 @@ -32,13 +32,14 @@ obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ - xics.o rtas.o pSeries_setup.o pSeries_iommu.o + xics.o pSeries_setup.o pSeries_iommu.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_MODULES) += module.o ppc_ksyms.o +obj-$(CONFIG_PPC_RTAS) += rtas.o rtas_pci.o obj-$(CONFIG_RTAS_PROC) += rtas-proc.o obj-$(CONFIG_SCANLOG) += scanlog.o obj-$(CONFIG_VIOPATH) += viopath.o --- linux-2.6-ppc.orig/arch/ppc64/kernel/pSeries_pci.c 2005-03-31 19:11:15.466956608 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/pSeries_pci.c 1969-12-31 19:00:00.000000000 -0500 @@ -1,602 +0,0 @@ -/* - * pSeries_pci.c - * - * Copyright (C) 2001 Dave Engebretsen, IBM Corporation - * Copyright (C) 2003 Anton Blanchard , IBM - * - * pSeries specific routines for PCI. - * - * Based on code from pci.c and chrp_pci.c - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include "mpic.h" -#include "pci.h" - -/* RTAS tokens */ -static int read_pci_config; -static int write_pci_config; -static int ibm_read_pci_config; -static int ibm_write_pci_config; - -static int s7a_workaround; - -extern struct mpic *pSeries_mpic; - -static int config_access_valid(struct device_node *dn, int where) -{ - if (where < 256) - return 1; - if (where < 4096 && dn->pci_ext_config_space) - return 1; - - return 0; -} - -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) -{ - int returnval = -1; - unsigned long buid, addr; - int ret; - - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; - if (!config_access_valid(dn, where)) - return PCIBIOS_BAD_REGISTER_NUMBER; - - addr = ((where & 0xf00) << 20) | (dn->busno << 16) | - (dn->devfn << 8) | (where & 0xff); - buid = dn->phb->buid; - if (buid) { - ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, - addr, buid >> 32, buid & 0xffffffff, size); - } else { - ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size); - } - *val = returnval; - - if (ret) - return PCIBIOS_DEVICE_NOT_FOUND; - - if (returnval == EEH_IO_ERROR_VALUE(size) - && eeh_dn_check_failure (dn, NULL)) - return PCIBIOS_DEVICE_NOT_FOUND; - - return PCIBIOS_SUCCESSFUL; -} - -static int rtas_pci_read_config(struct pci_bus *bus, - unsigned int devfn, - int where, int size, u32 *val) -{ - struct device_node *busdn, *dn; - - if (bus->self) - busdn = pci_device_to_OF_node(bus->self); - else - busdn = bus->sysdata; /* must be a phb */ - - /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->devfn == devfn) - return rtas_read_config(dn, where, size, val); - return PCIBIOS_DEVICE_NOT_FOUND; -} - -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) -{ - unsigned long buid, addr; - int ret; - - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; - if (!config_access_valid(dn, where)) - return PCIBIOS_BAD_REGISTER_NUMBER; - - addr = ((where & 0xf00) << 20) | (dn->busno << 16) | - (dn->devfn << 8) | (where & 0xff); - buid = dn->phb->buid; - if (buid) { - ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val); - } else { - ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val); - } - - if (ret) - return PCIBIOS_DEVICE_NOT_FOUND; - - return PCIBIOS_SUCCESSFUL; -} - -static int rtas_pci_write_config(struct pci_bus *bus, - unsigned int devfn, - int where, int size, u32 val) -{ - struct device_node *busdn, *dn; - - if (bus->self) - busdn = pci_device_to_OF_node(bus->self); - else - busdn = bus->sysdata; /* must be a phb */ - - /* Search only direct children of the bus */ - for (dn = busdn->child; dn; dn = dn->sibling) - if (dn->devfn == devfn) - return rtas_write_config(dn, where, size, val); - return PCIBIOS_DEVICE_NOT_FOUND; -} - -struct pci_ops rtas_pci_ops = { - rtas_pci_read_config, - rtas_pci_write_config -}; - -int is_python(struct device_node *dev) -{ - char *model = (char *)get_property(dev, "model", NULL); - - if (model && strstr(model, "Python")) - return 1; - - return 0; -} - -static int get_phb_reg_prop(struct device_node *dev, - unsigned int addr_size_words, - struct reg_property64 *reg) -{ - unsigned int *ui_ptr = NULL, len; - - /* Found a PHB, now figure out where his registers are mapped. */ - ui_ptr = (unsigned int *)get_property(dev, "reg", &len); - if (ui_ptr == NULL) - return 1; - - if (addr_size_words == 1) { - reg->address = ((struct reg_property32 *)ui_ptr)->address; - reg->size = ((struct reg_property32 *)ui_ptr)->size; - } else { - *reg = *((struct reg_property64 *)ui_ptr); - } - - return 0; -} - -static void python_countermeasures(struct device_node *dev, - unsigned int addr_size_words) -{ - struct reg_property64 reg_struct; - void __iomem *chip_regs; - volatile u32 val; - - if (get_phb_reg_prop(dev, addr_size_words, ®_struct)) - return; - - /* Python's register file is 1 MB in size. */ - chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000); - - /* - * Firmware doesn't always clear this bit which is critical - * for good performance - Anton - */ - -#define PRG_CL_RESET_VALID 0x00010000 - - val = in_be32(chip_regs + 0xf6030); - if (val & PRG_CL_RESET_VALID) { - printk(KERN_INFO "Python workaround: "); - val &= ~PRG_CL_RESET_VALID; - out_be32(chip_regs + 0xf6030, val); - /* - * We must read it back for changes to - * take effect - */ - val = in_be32(chip_regs + 0xf6030); - printk("reg0: %x\n", val); - } - - iounmap(chip_regs); -} - -void __init init_pci_config_tokens (void) -{ - read_pci_config = rtas_token("read-pci-config"); - write_pci_config = rtas_token("write-pci-config"); - ibm_read_pci_config = rtas_token("ibm,read-pci-config"); - ibm_write_pci_config = rtas_token("ibm,write-pci-config"); -} - -unsigned long __devinit get_phb_buid (struct device_node *phb) -{ - int addr_cells; - unsigned int *buid_vals; - unsigned int len; - unsigned long buid; - - if (ibm_read_pci_config == -1) return 0; - - /* PHB's will always be children of the root node, - * or so it is promised by the current firmware. */ - if (phb->parent == NULL) - return 0; - if (phb->parent->parent) - return 0; - - buid_vals = (unsigned int *) get_property(phb, "reg", &len); - if (buid_vals == NULL) - return 0; - - addr_cells = prom_n_addr_cells(phb); - if (addr_cells == 1) { - buid = (unsigned long) buid_vals[0]; - } else { - buid = (((unsigned long)buid_vals[0]) << 32UL) | - (((unsigned long)buid_vals[1]) & 0xffffffff); - } - return buid; -} - -static int phb_set_bus_ranges(struct device_node *dev, - struct pci_controller *phb) -{ - int *bus_range; - unsigned int len; - - bus_range = (int *) get_property(dev, "bus-range", &len); - if (bus_range == NULL || len < 2 * sizeof(int)) { - return 1; - } - - phb->first_busno = bus_range[0]; - phb->last_busno = bus_range[1]; - - return 0; -} - -static int __devinit setup_phb(struct device_node *dev, - struct pci_controller *phb, - unsigned int addr_size_words) -{ - pci_setup_pci_controller(phb); - - if (is_python(dev)) - python_countermeasures(dev, addr_size_words); - - if (phb_set_bus_ranges(dev, phb)) - return 1; - - phb->arch_data = dev; - phb->ops = &rtas_pci_ops; - phb->buid = get_phb_buid(dev); - - return 0; -} - -static void __devinit add_linux_pci_domain(struct device_node *dev, - struct pci_controller *phb, - struct property *of_prop) -{ - memset(of_prop, 0, sizeof(struct property)); - of_prop->name = "linux,pci-domain"; - of_prop->length = sizeof(phb->global_number); - of_prop->value = (unsigned char *)&of_prop[1]; - memcpy(of_prop->value, &phb->global_number, sizeof(phb->global_number)); - prom_add_property(dev, of_prop); -} - -static struct pci_controller * __init alloc_phb(struct device_node *dev, - unsigned int addr_size_words) -{ - struct pci_controller *phb; - struct property *of_prop; - - phb = alloc_bootmem(sizeof(struct pci_controller)); - if (phb == NULL) - return NULL; - - of_prop = alloc_bootmem(sizeof(struct property) + - sizeof(phb->global_number)); - if (!of_prop) - return NULL; - - if (setup_phb(dev, phb, addr_size_words)) - return NULL; - - add_linux_pci_domain(dev, phb, of_prop); - - return phb; -} - -static struct pci_controller * __devinit alloc_phb_dynamic(struct device_node *dev, unsigned int addr_size_words) -{ - struct pci_controller *phb; - - phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller), - GFP_KERNEL); - if (phb == NULL) - return NULL; - - if (setup_phb(dev, phb, addr_size_words)) - return NULL; - - phb->is_dynamic = 1; - - /* TODO: linux,pci-domain? */ - - return phb; -} - -unsigned long __init find_and_init_phbs(void) -{ - struct device_node *node; - struct pci_controller *phb; - unsigned int root_size_cells = 0; - unsigned int index; - unsigned int *opprop = NULL; - struct device_node *root = of_find_node_by_path("/"); - - if (ppc64_interrupt_controller == IC_OPEN_PIC) { - opprop = (unsigned int *)get_property(root, - "platform-open-pic", NULL); - } - - root_size_cells = prom_n_size_cells(root); - - index = 0; - - for (node = of_get_next_child(root, NULL); - node != NULL; - node = of_get_next_child(root, node)) { - if (node->type == NULL || strcmp(node->type, "pci") != 0) - continue; - - phb = alloc_phb(node, root_size_cells); - if (!phb) - continue; - - pci_process_bridge_OF_ranges(phb, node); - pci_setup_phb_io(phb, index == 0); - - if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { - int addr = root_size_cells * (index + 2) - 1; - mpic_assign_isu(pSeries_mpic, index, opprop[addr]); - } - - index++; - } - - of_node_put(root); - pci_devs_phb_init(); - - /* - * pci_probe_only and pci_assign_all_buses can be set via properties - * in chosen. - */ - if (of_chosen) { - int *prop; - - prop = (int *)get_property(of_chosen, "linux,pci-probe-only", - NULL); - if (prop) - pci_probe_only = *prop; - - prop = (int *)get_property(of_chosen, - "linux,pci-assign-all-buses", NULL); - if (prop) - pci_assign_all_buses = *prop; - } - - return 0; -} - -struct pci_controller * __devinit init_phb_dynamic(struct device_node *dn) -{ - struct device_node *root = of_find_node_by_path("/"); - unsigned int root_size_cells = 0; - struct pci_controller *phb; - struct pci_bus *bus; - int primary; - - root_size_cells = prom_n_size_cells(root); - - primary = list_empty(&hose_list); - phb = alloc_phb_dynamic(dn, root_size_cells); - if (!phb) - return NULL; - - pci_process_bridge_OF_ranges(phb, dn); - - pci_setup_phb_io_dynamic(phb, primary); - of_node_put(root); - - pci_devs_phb_init_dynamic(phb); - phb->last_busno = 0xff; - bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data); - phb->bus = bus; - phb->last_busno = bus->subordinate; - - return phb; -} -EXPORT_SYMBOL(init_phb_dynamic); - -#if 0 -void pcibios_name_device(struct pci_dev *dev) -{ - struct device_node *dn; - - /* - * Add IBM loc code (slot) as a prefix to the device names for service - */ - dn = pci_device_to_OF_node(dev); - if (dn) { - char *loc_code = get_property(dn, "ibm,loc-code", 0); - if (loc_code) { - int loc_len = strlen(loc_code); - if (loc_len < sizeof(dev->dev.name)) { - memmove(dev->dev.name+loc_len+1, dev->dev.name, - sizeof(dev->dev.name)-loc_len-1); - memcpy(dev->dev.name, loc_code, loc_len); - dev->dev.name[loc_len] = ' '; - dev->dev.name[sizeof(dev->dev.name)-1] = '\0'; - } - } - } -} -DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device); -#endif - -static void check_s7a(void) -{ - struct device_node *root; - char *model; - - root = of_find_node_by_path("/"); - if (root) { - model = get_property(root, "model", NULL); - if (model && !strcmp(model, "IBM,7013-S7A")) - s7a_workaround = 1; - of_node_put(root); - } -} - -/* RPA-specific bits for removing PHBs */ -int pcibios_remove_root_bus(struct pci_controller *phb) -{ - struct pci_bus *b = phb->bus; - struct resource *res; - int rc, i; - - res = b->resource[0]; - if (!res->flags) { - printk(KERN_ERR "%s: no IO resource for PHB %s\n", __FUNCTION__, - b->name); - return 1; - } - - rc = unmap_bus_range(b); - if (rc) { - printk(KERN_ERR "%s: failed to unmap IO on bus %s\n", - __FUNCTION__, b->name); - return 1; - } - - if (release_resource(res)) { - printk(KERN_ERR "%s: failed to release IO on bus %s\n", - __FUNCTION__, b->name); - return 1; - } - - for (i = 1; i < 3; ++i) { - res = b->resource[i]; - if (!res->flags && i == 0) { - printk(KERN_ERR "%s: no MEM resource for PHB %s\n", - __FUNCTION__, b->name); - return 1; - } - if (res->flags && release_resource(res)) { - printk(KERN_ERR - "%s: failed to release IO %d on bus %s\n", - __FUNCTION__, i, b->name); - return 1; - } - } - - list_del(&phb->list_node); - if (phb->is_dynamic) - kfree(phb); - - return 0; -} -EXPORT_SYMBOL(pcibios_remove_root_bus); - -static void __init pSeries_request_regions(void) -{ - if (!isa_io_base) - return; - - request_region(0x20,0x20,"pic1"); - request_region(0xa0,0x20,"pic2"); - request_region(0x00,0x20,"dma1"); - request_region(0x40,0x20,"timer"); - request_region(0x80,0x10,"dma page reg"); - request_region(0xc0,0x20,"dma2"); -} - -void __init pSeries_final_fixup(void) -{ - struct pci_dev *dev = NULL; - - check_s7a(); - - for_each_pci_dev(dev) { - pci_read_irq_line(dev); - if (s7a_workaround) { - if (dev->irq > 16) { - dev->irq -= 3; - pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); - } - } - } - - phbs_remap_io(); - pSeries_request_regions(); - - pci_addr_cache_build(); -} - -/* - * Assume the winbond 82c105 is the IDE controller on a - * p610. We should probably be more careful in case - * someone tries to plug in a similar adapter. - */ -static void fixup_winbond_82c105(struct pci_dev* dev) -{ - int i; - unsigned int reg; - - if (!(systemcfg->platform & PLATFORM_PSERIES)) - return; - - printk("Using INTC for W82c105 IDE controller.\n"); - pci_read_config_dword(dev, 0x40, ®); - /* Enable LEGIRQ to use INTC instead of ISA interrupts */ - pci_write_config_dword(dev, 0x40, reg | (1<<11)); - - for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) { - /* zap the 2nd function of the winbond chip */ - if (dev->resource[i].flags & IORESOURCE_IO - && dev->bus->number == 0 && dev->devfn == 0x81) - dev->resource[i].flags &= ~IORESOURCE_IO; - } -} -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, - fixup_winbond_82c105); --- linux-2.6-ppc.orig/arch/ppc64/kernel/rtas_pci.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/rtas_pci.c 2005-03-31 19:11:37.758895264 -0500 @@ -0,0 +1,602 @@ +/* + * pSeries_pci.c + * + * Copyright (C) 2001 Dave Engebretsen, IBM Corporation + * Copyright (C) 2003 Anton Blanchard , IBM + * + * pSeries specific routines for PCI. + * + * Based on code from pci.c and chrp_pci.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mpic.h" +#include "pci.h" + +/* RTAS tokens */ +static int read_pci_config; +static int write_pci_config; +static int ibm_read_pci_config; +static int ibm_write_pci_config; + +static int s7a_workaround; + +extern struct mpic *pSeries_mpic; + +static int config_access_valid(struct device_node *dn, int where) +{ + if (where < 256) + return 1; + if (where < 4096 && dn->pci_ext_config_space) + return 1; + + return 0; +} + +static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +{ + int returnval = -1; + unsigned long buid, addr; + int ret; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (!config_access_valid(dn, where)) + return PCIBIOS_BAD_REGISTER_NUMBER; + + addr = ((where & 0xf00) << 20) | (dn->busno << 16) | + (dn->devfn << 8) | (where & 0xff); + buid = dn->phb->buid; + if (buid) { + ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, + addr, buid >> 32, buid & 0xffffffff, size); + } else { + ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size); + } + *val = returnval; + + if (ret) + return PCIBIOS_DEVICE_NOT_FOUND; + + if (returnval == EEH_IO_ERROR_VALUE(size) + && eeh_dn_check_failure (dn, NULL)) + return PCIBIOS_DEVICE_NOT_FOUND; + + return PCIBIOS_SUCCESSFUL; +} + +static int rtas_pci_read_config(struct pci_bus *bus, + unsigned int devfn, + int where, int size, u32 *val) +{ + struct device_node *busdn, *dn; + + if (bus->self) + busdn = pci_device_to_OF_node(bus->self); + else + busdn = bus->sysdata; /* must be a phb */ + + /* Search only direct children of the bus */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + return rtas_read_config(dn, where, size, val); + return PCIBIOS_DEVICE_NOT_FOUND; +} + +static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +{ + unsigned long buid, addr; + int ret; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + if (!config_access_valid(dn, where)) + return PCIBIOS_BAD_REGISTER_NUMBER; + + addr = ((where & 0xf00) << 20) | (dn->busno << 16) | + (dn->devfn << 8) | (where & 0xff); + buid = dn->phb->buid; + if (buid) { + ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val); + } else { + ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val); + } + + if (ret) + return PCIBIOS_DEVICE_NOT_FOUND; + + return PCIBIOS_SUCCESSFUL; +} + +static int rtas_pci_write_config(struct pci_bus *bus, + unsigned int devfn, + int where, int size, u32 val) +{ + struct device_node *busdn, *dn; + + if (bus->self) + busdn = pci_device_to_OF_node(bus->self); + else + busdn = bus->sysdata; /* must be a phb */ + + /* Search only direct children of the bus */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + return rtas_write_config(dn, where, size, val); + return PCIBIOS_DEVICE_NOT_FOUND; +} + +struct pci_ops rtas_pci_ops = { + rtas_pci_read_config, + rtas_pci_write_config +}; + +int is_python(struct device_node *dev) +{ + char *model = (char *)get_property(dev, "model", NULL); + + if (model && strstr(model, "Python")) + return 1; + + return 0; +} + +static int get_phb_reg_prop(struct device_node *dev, + unsigned int addr_size_words, + struct reg_property64 *reg) +{ + unsigned int *ui_ptr = NULL, len; + + /* Found a PHB, now figure out where his registers are mapped. */ + ui_ptr = (unsigned int *)get_property(dev, "reg", &len); + if (ui_ptr == NULL) + return 1; + + if (addr_size_words == 1) { + reg->address = ((struct reg_property32 *)ui_ptr)->address; + reg->size = ((struct reg_property32 *)ui_ptr)->size; + } else { + *reg = *((struct reg_property64 *)ui_ptr); + } + + return 0; +} + +static void python_countermeasures(struct device_node *dev, + unsigned int addr_size_words) +{ + struct reg_property64 reg_struct; + void __iomem *chip_regs; + volatile u32 val; + + if (get_phb_reg_prop(dev, addr_size_words, ®_struct)) + return; + + /* Python's register file is 1 MB in size. */ + chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000); + + /* + * Firmware doesn't always clear this bit which is critical + * for good performance - Anton + */ + +#define PRG_CL_RESET_VALID 0x00010000 + + val = in_be32(chip_regs + 0xf6030); + if (val & PRG_CL_RESET_VALID) { + printk(KERN_INFO "Python workaround: "); + val &= ~PRG_CL_RESET_VALID; + out_be32(chip_regs + 0xf6030, val); + /* + * We must read it back for changes to + * take effect + */ + val = in_be32(chip_regs + 0xf6030); + printk("reg0: %x\n", val); + } + + iounmap(chip_regs); +} + +void __init init_pci_config_tokens (void) +{ + read_pci_config = rtas_token("read-pci-config"); + write_pci_config = rtas_token("write-pci-config"); + ibm_read_pci_config = rtas_token("ibm,read-pci-config"); + ibm_write_pci_config = rtas_token("ibm,write-pci-config"); +} + +unsigned long __devinit get_phb_buid (struct device_node *phb) +{ + int addr_cells; + unsigned int *buid_vals; + unsigned int len; + unsigned long buid; + + if (ibm_read_pci_config == -1) return 0; + + /* PHB's will always be children of the root node, + * or so it is promised by the current firmware. */ + if (phb->parent == NULL) + return 0; + if (phb->parent->parent) + return 0; + + buid_vals = (unsigned int *) get_property(phb, "reg", &len); + if (buid_vals == NULL) + return 0; + + addr_cells = prom_n_addr_cells(phb); + if (addr_cells == 1) { + buid = (unsigned long) buid_vals[0]; + } else { + buid = (((unsigned long)buid_vals[0]) << 32UL) | + (((unsigned long)buid_vals[1]) & 0xffffffff); + } + return buid; +} + +static int phb_set_bus_ranges(struct device_node *dev, + struct pci_controller *phb) +{ + int *bus_range; + unsigned int len; + + bus_range = (int *) get_property(dev, "bus-range", &len); + if (bus_range == NULL || len < 2 * sizeof(int)) { + return 1; + } + + phb->first_busno = bus_range[0]; + phb->last_busno = bus_range[1]; + + return 0; +} + +static int __devinit setup_phb(struct device_node *dev, + struct pci_controller *phb, + unsigned int addr_size_words) +{ + pci_setup_pci_controller(phb); + + if (is_python(dev)) + python_countermeasures(dev, addr_size_words); + + if (phb_set_bus_ranges(dev, phb)) + return 1; + + phb->arch_data = dev; + phb->ops = &rtas_pci_ops; + phb->buid = get_phb_buid(dev); + + return 0; +} + +static void __devinit add_linux_pci_domain(struct device_node *dev, + struct pci_controller *phb, + struct property *of_prop) +{ + memset(of_prop, 0, sizeof(struct property)); + of_prop->name = "linux,pci-domain"; + of_prop->length = sizeof(phb->global_number); + of_prop->value = (unsigned char *)&of_prop[1]; + memcpy(of_prop->value, &phb->global_number, sizeof(phb->global_number)); + prom_add_property(dev, of_prop); +} + +static struct pci_controller * __init alloc_phb(struct device_node *dev, + unsigned int addr_size_words) +{ + struct pci_controller *phb; + struct property *of_prop; + + phb = alloc_bootmem(sizeof(struct pci_controller)); + if (phb == NULL) + return NULL; + + of_prop = alloc_bootmem(sizeof(struct property) + + sizeof(phb->global_number)); + if (!of_prop) + return NULL; + + if (setup_phb(dev, phb, addr_size_words)) + return NULL; + + add_linux_pci_domain(dev, phb, of_prop); + + return phb; +} + +static struct pci_controller * __devinit alloc_phb_dynamic(struct device_node *dev, unsigned int addr_size_words) +{ + struct pci_controller *phb; + + phb = (struct pci_controller *)kmalloc(sizeof(struct pci_controller), + GFP_KERNEL); + if (phb == NULL) + return NULL; + + if (setup_phb(dev, phb, addr_size_words)) + return NULL; + + phb->is_dynamic = 1; + + /* TODO: linux,pci-domain? */ + + return phb; +} + +unsigned long __init find_and_init_phbs(void) +{ + struct device_node *node; + struct pci_controller *phb; + unsigned int root_size_cells = 0; + unsigned int index; + unsigned int *opprop = NULL; + struct device_node *root = of_find_node_by_path("/"); + + if (ppc64_interrupt_controller == IC_OPEN_PIC) { + opprop = (unsigned int *)get_property(root, + "platform-open-pic", NULL); + } + + root_size_cells = prom_n_size_cells(root); + + index = 0; + + for (node = of_get_next_child(root, NULL); + node != NULL; + node = of_get_next_child(root, node)) { + if (node->type == NULL || strcmp(node->type, "pci") != 0) + continue; + + phb = alloc_phb(node, root_size_cells); + if (!phb) + continue; + + pci_process_bridge_OF_ranges(phb, node); + pci_setup_phb_io(phb, index == 0); + + if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { + int addr = root_size_cells * (index + 2) - 1; + mpic_assign_isu(pSeries_mpic, index, opprop[addr]); + } + + index++; + } + + of_node_put(root); + pci_devs_phb_init(); + + /* + * pci_probe_only and pci_assign_all_buses can be set via properties + * in chosen. + */ + if (of_chosen) { + int *prop; + + prop = (int *)get_property(of_chosen, "linux,pci-probe-only", + NULL); + if (prop) + pci_probe_only = *prop; + + prop = (int *)get_property(of_chosen, + "linux,pci-assign-all-buses", NULL); + if (prop) + pci_assign_all_buses = *prop; + } + + return 0; +} + +struct pci_controller * __devinit init_phb_dynamic(struct device_node *dn) +{ + struct device_node *root = of_find_node_by_path("/"); + unsigned int root_size_cells = 0; + struct pci_controller *phb; + struct pci_bus *bus; + int primary; + + root_size_cells = prom_n_size_cells(root); + + primary = list_empty(&hose_list); + phb = alloc_phb_dynamic(dn, root_size_cells); + if (!phb) + return NULL; + + pci_process_bridge_OF_ranges(phb, dn); + + pci_setup_phb_io_dynamic(phb, primary); + of_node_put(root); + + pci_devs_phb_init_dynamic(phb); + phb->last_busno = 0xff; + bus = pci_scan_bus(phb->first_busno, phb->ops, phb->arch_data); + phb->bus = bus; + phb->last_busno = bus->subordinate; + + return phb; +} +EXPORT_SYMBOL(init_phb_dynamic); + +#if 0 +void pcibios_name_device(struct pci_dev *dev) +{ + struct device_node *dn; + + /* + * Add IBM loc code (slot) as a prefix to the device names for service + */ + dn = pci_device_to_OF_node(dev); + if (dn) { + char *loc_code = get_property(dn, "ibm,loc-code", 0); + if (loc_code) { + int loc_len = strlen(loc_code); + if (loc_len < sizeof(dev->dev.name)) { + memmove(dev->dev.name+loc_len+1, dev->dev.name, + sizeof(dev->dev.name)-loc_len-1); + memcpy(dev->dev.name, loc_code, loc_len); + dev->dev.name[loc_len] = ' '; + dev->dev.name[sizeof(dev->dev.name)-1] = '\0'; + } + } + } +} +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device); +#endif + +static void check_s7a(void) +{ + struct device_node *root; + char *model; + + root = of_find_node_by_path("/"); + if (root) { + model = get_property(root, "model", NULL); + if (model && !strcmp(model, "IBM,7013-S7A")) + s7a_workaround = 1; + of_node_put(root); + } +} + +/* RPA-specific bits for removing PHBs */ +int pcibios_remove_root_bus(struct pci_controller *phb) +{ + struct pci_bus *b = phb->bus; + struct resource *res; + int rc, i; + + res = b->resource[0]; + if (!res->flags) { + printk(KERN_ERR "%s: no IO resource for PHB %s\n", __FUNCTION__, + b->name); + return 1; + } + + rc = unmap_bus_range(b); + if (rc) { + printk(KERN_ERR "%s: failed to unmap IO on bus %s\n", + __FUNCTION__, b->name); + return 1; + } + + if (release_resource(res)) { + printk(KERN_ERR "%s: failed to release IO on bus %s\n", + __FUNCTION__, b->name); + return 1; + } + + for (i = 1; i < 3; ++i) { + res = b->resource[i]; + if (!res->flags && i == 0) { + printk(KERN_ERR "%s: no MEM resource for PHB %s\n", + __FUNCTION__, b->name); + return 1; + } + if (res->flags && release_resource(res)) { + printk(KERN_ERR + "%s: failed to release IO %d on bus %s\n", + __FUNCTION__, i, b->name); + return 1; + } + } + + list_del(&phb->list_node); + if (phb->is_dynamic) + kfree(phb); + + return 0; +} +EXPORT_SYMBOL(pcibios_remove_root_bus); + +static void __init pSeries_request_regions(void) +{ + if (!isa_io_base) + return; + + request_region(0x20,0x20,"pic1"); + request_region(0xa0,0x20,"pic2"); + request_region(0x00,0x20,"dma1"); + request_region(0x40,0x20,"timer"); + request_region(0x80,0x10,"dma page reg"); + request_region(0xc0,0x20,"dma2"); +} + +void __init pSeries_final_fixup(void) +{ + struct pci_dev *dev = NULL; + + check_s7a(); + + for_each_pci_dev(dev) { + pci_read_irq_line(dev); + if (s7a_workaround) { + if (dev->irq > 16) { + dev->irq -= 3; + pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); + } + } + } + + phbs_remap_io(); + pSeries_request_regions(); + + pci_addr_cache_build(); +} + +/* + * Assume the winbond 82c105 is the IDE controller on a + * p610. We should probably be more careful in case + * someone tries to plug in a similar adapter. + */ +static void fixup_winbond_82c105(struct pci_dev* dev) +{ + int i; + unsigned int reg; + + if (!(systemcfg->platform & PLATFORM_PSERIES)) + return; + + printk("Using INTC for W82c105 IDE controller.\n"); + pci_read_config_dword(dev, 0x40, ®); + /* Enable LEGIRQ to use INTC instead of ISA interrupts */ + pci_write_config_dword(dev, 0x40, reg | (1<<11)); + + for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) { + /* zap the 2nd function of the winbond chip */ + if (dev->resource[i].flags & IORESOURCE_IO + && dev->bus->number == 0 && dev->devfn == 0x81) + dev->resource[i].flags &= ~IORESOURCE_IO; + } +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, + fixup_winbond_82c105); From arnd at arndb.de Wed Apr 20 10:15:24 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Apr 2005 02:15:24 +0200 Subject: [PATCH 2/4] ppc64: Split out pSeries specific code from rtas_pci.c In-Reply-To: <200504200149.22063.arnd@arndb.de> References: <200504200149.22063.arnd@arndb.de> Message-ID: <200504200215.25740.arnd@arndb.de> BPA is using rtas for PCI but should not be confused by pSeries code. This also avoids some #ifdefs. Other platforms that want to use rtas_pci.c should also create their own platform_pci.c with platform specific fixups. Signed-off-by: Arnd Bergmann --- linux-2.6-ppc.orig/arch/ppc64/kernel/mpic.h 2005-04-01 13:29:21.078978944 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/mpic.h 2005-04-01 13:30:10.431932216 -0500 @@ -265,3 +265,6 @@ extern void mpic_send_ipi(unsigned int i extern int mpic_get_one_irq(struct mpic *mpic, struct pt_regs *regs); /* This one gets to the primary mpic */ extern int mpic_get_irq(struct pt_regs *regs); + +/* global mpic for pSeries */ +extern struct mpic *pSeries_mpic; --- linux-2.6-ppc.orig/arch/ppc64/kernel/pSeries_pci.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/pSeries_pci.c 2005-04-01 13:29:25.018912520 -0500 @@ -0,0 +1,138 @@ +/* + * arch/ppc64/kernel/pSeries_pci.c + * + * Copyright (C) 2001 Dave Engebretsen, IBM Corporation + * Copyright (C) 2003 Anton Blanchard , IBM + * + * pSeries specific routines for PCI. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include + +#include +#include + +#include "pci.h" + +static int __initdata s7a_workaround; + +#if 0 +void pcibios_name_device(struct pci_dev *dev) +{ + struct device_node *dn; + + /* + * Add IBM loc code (slot) as a prefix to the device names for service + */ + dn = pci_device_to_OF_node(dev); + if (dn) { + char *loc_code = get_property(dn, "ibm,loc-code", 0); + if (loc_code) { + int loc_len = strlen(loc_code); + if (loc_len < sizeof(dev->dev.name)) { + memmove(dev->dev.name+loc_len+1, dev->dev.name, + sizeof(dev->dev.name)-loc_len-1); + memcpy(dev->dev.name, loc_code, loc_len); + dev->dev.name[loc_len] = ' '; + dev->dev.name[sizeof(dev->dev.name)-1] = '\0'; + } + } + } +} +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device); +#endif + +static void __init check_s7a(void) +{ + struct device_node *root; + char *model; + + root = of_find_node_by_path("/"); + if (root) { + model = get_property(root, "model", NULL); + if (model && !strcmp(model, "IBM,7013-S7A")) + s7a_workaround = 1; + of_node_put(root); + } +} + +static void __init pSeries_request_regions(void) +{ + if (!isa_io_base) + return; + + request_region(0x20,0x20,"pic1"); + request_region(0xa0,0x20,"pic2"); + request_region(0x00,0x20,"dma1"); + request_region(0x40,0x20,"timer"); + request_region(0x80,0x10,"dma page reg"); + request_region(0xc0,0x20,"dma2"); +} + +void __init pSeries_final_fixup(void) +{ + struct pci_dev *dev = NULL; + + check_s7a(); + + for_each_pci_dev(dev) { + pci_read_irq_line(dev); + if (s7a_workaround) { + if (dev->irq > 16) { + dev->irq -= 3; + pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); + } + } + } + + phbs_remap_io(); + pSeries_request_regions(); + + pci_addr_cache_build(); +} + +/* + * Assume the winbond 82c105 is the IDE controller on a + * p610. We should probably be more careful in case + * someone tries to plug in a similar adapter. + */ +static void fixup_winbond_82c105(struct pci_dev* dev) +{ + int i; + unsigned int reg; + + if (!(systemcfg->platform & PLATFORM_PSERIES)) + return; + + printk("Using INTC for W82c105 IDE controller.\n"); + pci_read_config_dword(dev, 0x40, ®); + /* Enable LEGIRQ to use INTC instead of ISA interrupts */ + pci_write_config_dword(dev, 0x40, reg | (1<<11)); + + for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) { + /* zap the 2nd function of the winbond chip */ + if (dev->resource[i].flags & IORESOURCE_IO + && dev->bus->number == 0 && dev->devfn == 0x81) + dev->resource[i].flags &= ~IORESOURCE_IO; + } +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, + fixup_winbond_82c105); --- linux-2.6-ppc.orig/arch/ppc64/kernel/rtas_pci.c 2005-04-01 13:28:56.169905768 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/rtas_pci.c 2005-04-01 13:29:25.019912368 -0500 @@ -1,12 +1,12 @@ /* - * pSeries_pci.c + * arch/ppc64/kernel/rtas_pci.c * * Copyright (C) 2001 Dave Engebretsen, IBM Corporation * Copyright (C) 2003 Anton Blanchard , IBM * - * pSeries specific routines for PCI. + * RTAS specific routines for PCI. * - * Based on code from pci.c and chrp_pci.c + * Based on code from pci.c, chrp_pci.c and pSeries_pci.c * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -48,10 +48,6 @@ static int write_pci_config; static int ibm_read_pci_config; static int ibm_write_pci_config; -static int s7a_workaround; - -extern struct mpic *pSeries_mpic; - static int config_access_valid(struct device_node *dn, int where) { if (where < 256) @@ -385,12 +381,12 @@ unsigned long __init find_and_init_phbs( pci_process_bridge_OF_ranges(phb, node); pci_setup_phb_io(phb, index == 0); - +#ifdef CONFIG_PPC_PSERIES if (ppc64_interrupt_controller == IC_OPEN_PIC && pSeries_mpic) { int addr = root_size_cells * (index + 2) - 1; mpic_assign_isu(pSeries_mpic, index, opprop[addr]); } - +#endif index++; } @@ -448,46 +444,6 @@ struct pci_controller * __devinit init_p } EXPORT_SYMBOL(init_phb_dynamic); -#if 0 -void pcibios_name_device(struct pci_dev *dev) -{ - struct device_node *dn; - - /* - * Add IBM loc code (slot) as a prefix to the device names for service - */ - dn = pci_device_to_OF_node(dev); - if (dn) { - char *loc_code = get_property(dn, "ibm,loc-code", 0); - if (loc_code) { - int loc_len = strlen(loc_code); - if (loc_len < sizeof(dev->dev.name)) { - memmove(dev->dev.name+loc_len+1, dev->dev.name, - sizeof(dev->dev.name)-loc_len-1); - memcpy(dev->dev.name, loc_code, loc_len); - dev->dev.name[loc_len] = ' '; - dev->dev.name[sizeof(dev->dev.name)-1] = '\0'; - } - } - } -} -DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device); -#endif - -static void check_s7a(void) -{ - struct device_node *root; - char *model; - - root = of_find_node_by_path("/"); - if (root) { - model = get_property(root, "model", NULL); - if (model && !strcmp(model, "IBM,7013-S7A")) - s7a_workaround = 1; - of_node_put(root); - } -} - /* RPA-specific bits for removing PHBs */ int pcibios_remove_root_bus(struct pci_controller *phb) { @@ -537,66 +493,3 @@ int pcibios_remove_root_bus(struct pci_c return 0; } EXPORT_SYMBOL(pcibios_remove_root_bus); - -static void __init pSeries_request_regions(void) -{ - if (!isa_io_base) - return; - - request_region(0x20,0x20,"pic1"); - request_region(0xa0,0x20,"pic2"); - request_region(0x00,0x20,"dma1"); - request_region(0x40,0x20,"timer"); - request_region(0x80,0x10,"dma page reg"); - request_region(0xc0,0x20,"dma2"); -} - -void __init pSeries_final_fixup(void) -{ - struct pci_dev *dev = NULL; - - check_s7a(); - - for_each_pci_dev(dev) { - pci_read_irq_line(dev); - if (s7a_workaround) { - if (dev->irq > 16) { - dev->irq -= 3; - pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); - } - } - } - - phbs_remap_io(); - pSeries_request_regions(); - - pci_addr_cache_build(); -} - -/* - * Assume the winbond 82c105 is the IDE controller on a - * p610. We should probably be more careful in case - * someone tries to plug in a similar adapter. - */ -static void fixup_winbond_82c105(struct pci_dev* dev) -{ - int i; - unsigned int reg; - - if (!(systemcfg->platform & PLATFORM_PSERIES)) - return; - - printk("Using INTC for W82c105 IDE controller.\n"); - pci_read_config_dword(dev, 0x40, ®); - /* Enable LEGIRQ to use INTC instead of ISA interrupts */ - pci_write_config_dword(dev, 0x40, reg | (1<<11)); - - for (i = 0; i < DEVICE_COUNT_RESOURCE; ++i) { - /* zap the 2nd function of the winbond chip */ - if (dev->resource[i].flags & IORESOURCE_IO - && dev->bus->number == 0 && dev->devfn == 0x81) - dev->resource[i].flags &= ~IORESOURCE_IO; - } -} -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, - fixup_winbond_82c105); From arnd at arndb.de Wed Apr 20 10:22:34 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Apr 2005 02:22:34 +0200 Subject: [PATCH 3/4] ppc64: add a simple nvram lowlevel driver In-Reply-To: <200504200149.22063.arnd@arndb.de> References: <200504200149.22063.arnd@arndb.de> Message-ID: <200504200222.34999.arnd@arndb.de> Unlike pSeries, we don't want to use rtas for accessing nvram because the nvram device is rather large and it already is mapped into the physical address space, which makes a much simpler and faster design possible. The firmware provides the location and size of the nvram in the device tree, so it does not really contain any hardware specific bits and could be used on other machines as well. From: Utz Bacher Signed-off-by: Arnd Bergmann Index: linus-2.5/arch/ppc64/kernel/bpa_nvram.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_nvram.c 2005-04-20 01:55:36.000000000 +0200 @@ -0,0 +1,118 @@ +/* + * NVRAM for CPBW + * + * (C) Copyright IBM Corp. 2005 + * + * Authors : Utz Bacher + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include + +#include +#include +#include + +static void __iomem *bpa_nvram_start; +static long bpa_nvram_len; +static spinlock_t bpa_nvram_lock = SPIN_LOCK_UNLOCKED; + +static ssize_t bpa_nvram_read(char *buf, size_t count, loff_t *index) +{ + unsigned long flags; + + if (*index >= bpa_nvram_len) + return 0; + if (*index + count > bpa_nvram_len) + count = bpa_nvram_len - *index; + + spin_lock_irqsave(&bpa_nvram_lock, flags); + + memcpy_fromio(buf, bpa_nvram_start + *index, count); + + spin_unlock_irqrestore(&bpa_nvram_lock, flags); + + *index += count; + return count; +} + +static ssize_t bpa_nvram_write(char *buf, size_t count, loff_t *index) +{ + unsigned long flags; + + if (*index >= bpa_nvram_len) + return 0; + if (*index + count > bpa_nvram_len) + count = bpa_nvram_len - *index; + + spin_lock_irqsave(&bpa_nvram_lock, flags); + + memcpy_toio(bpa_nvram_start + *index, buf, count); + + spin_unlock_irqrestore(&bpa_nvram_lock, flags); + + *index += count; + return count; +} + +static ssize_t bpa_nvram_get_size(void) +{ + return bpa_nvram_len; +} + +int __init bpa_nvram_init(void) +{ + struct device_node *nvram_node; + unsigned long *buffer; + int proplen; + unsigned long nvram_addr; + int ret; + + ret = -ENODEV; + nvram_node = of_find_node_by_type(NULL, "nvram"); + if (!nvram_node) + goto out; + + ret = -EIO; + buffer = (unsigned long *)get_property(nvram_node, "reg", &proplen); + if (proplen != 2*sizeof(unsigned long)) + goto out; + + ret = -ENODEV; + nvram_addr = buffer[0]; + bpa_nvram_len = buffer[1]; + if ( (!bpa_nvram_len) || (!nvram_addr) ) + goto out; + + bpa_nvram_start = ioremap(nvram_addr, bpa_nvram_len); + if (!bpa_nvram_start) + goto out; + + printk(KERN_INFO "BPA NVRAM, %luk mapped to %p\n", + bpa_nvram_len >> 10, bpa_nvram_start); + + ppc_md.nvram_read = bpa_nvram_read; + ppc_md.nvram_write = bpa_nvram_write; + ppc_md.nvram_size = bpa_nvram_get_size; + +out: + of_node_put(nvram_node); + return ret; +} Index: linus-2.5/include/asm-ppc64/nvram.h =================================================================== --- linus-2.5.orig/include/asm-ppc64/nvram.h 2005-04-20 01:54:03.000000000 +0200 +++ linus-2.5/include/asm-ppc64/nvram.h 2005-04-20 01:55:36.000000000 +0200 @@ -70,6 +70,7 @@ extern int pSeries_nvram_init(void); extern int pmac_nvram_init(void); +extern int bpa_nvram_init(void); /* PowerMac specific nvram stuffs */ From arnd at arndb.de Wed Apr 20 10:29:11 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Apr 2005 02:29:11 +0200 Subject: [PATCH 4/4] ppc64: add an rtas based watchdog driver In-Reply-To: <200504200149.22063.arnd@arndb.de> References: <200504200149.22063.arnd@arndb.de> Message-ID: <200504200229.13640.arnd@arndb.de> Add a watchdog using the RTAS OS surveillance service. This is provided as a simpler alternative to rtasd. The added value is that it works with standard watchdog client programs and can therefore also do user space monitoring. On BPA, rtasd is not really useful because the hardware does not have much to report with event-scan. The driver should also work on other platforms that support the OS surveillance rtas calls. From: Utz Bacher Signed-off-by: Arnd Bergmann --- linux-2.6-ppc.orig/drivers/char/watchdog/Kconfig 2005-03-18 07:08:59.836902728 -0500 +++ linux-2.6-ppc/drivers/char/watchdog/Kconfig 2005-03-18 07:09:12.047905480 -0500 @@ -414,6 +414,16 @@ config WATCHDOG_RIO machines. The watchdog timeout period is normally one minute but can be changed with a boot-time parameter. +# ppc64 RTAS watchdog +config WATCHDOG_RTAS + tristate "RTAS watchdog" + depends on WATCHDOG && PPC_RTAS + help + This driver adds watchdog support for the RTAS watchdog. + + To compile this driver as a module, choose M here. The module + will be called wdrtas. + # # ISA-based Watchdog Cards # --- linux-2.6-ppc.orig/drivers/char/watchdog/Makefile 2005-03-18 07:08:59.857899536 -0500 +++ linux-2.6-ppc/drivers/char/watchdog/Makefile 2005-03-18 07:09:52.344904960 -0500 @@ -33,6 +33,7 @@ obj-$(CONFIG_USBPCWATCHDOG) += pcwd_usb. obj-$(CONFIG_IXP4XX_WATCHDOG) += ixp4xx_wdt.o obj-$(CONFIG_IXP2000_WATCHDOG) += ixp2000_wdt.o obj-$(CONFIG_8xx_WDT) += mpc8xx_wdt.o +obj-$(CONFIG_WATCHDOG_RTAS) += wdrtas.o # Only one watchdog can succeed. We probe the hardware watchdog # drivers first, then the softdog driver. This means if your hardware --- linux-2.6-ppc.orig/drivers/char/watchdog/wdrtas.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/drivers/char/watchdog/wdrtas.c 2005-03-18 07:09:12.051904872 -0500 @@ -0,0 +1,691 @@ +/* + * FIXME: add wdrtas_get_status and wdrtas_get_boot_status as soon as + * RTAS calls are available + */ + +/* + * RTAS watchdog driver + * + * (C) Copyright IBM Corp. 2005 + * device driver to exploit watchdog RTAS functions + * + * Authors : Utz Bacher + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define WDRTAS_MAGIC_CHAR 42 +#define WDRTAS_SUPPORTED_MASK (WDIOF_SETTIMEOUT | \ + WDIOF_MAGICCLOSE) + +MODULE_AUTHOR("Utz Bacher "); +MODULE_DESCRIPTION("RTAS watchdog driver"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR); +MODULE_ALIAS_MISCDEV(TEMP_MINOR); + +#ifdef CONFIG_WATCHDOG_NOWAYOUT +static int wdrtas_nowayout = 1; +#else +static int wdrtas_nowayout = 0; +#endif + +static volatile int wdrtas_miscdev_open = 0; +static char wdrtas_expect_close = 0; + +static int wdrtas_interval; + +#define WDRTAS_THERMAL_SENSOR 3 +static int wdrtas_token_get_sensor_state; +#define WDRTAS_SURVEILLANCE_IND 9000 +static int wdrtas_token_set_indicator; +#define WDRTAS_SP_SPI 28 +static int wdrtas_token_get_sp; +static int wdrtas_token_event_scan; + +#define WDRTAS_DEFAULT_INTERVAL 300 + +#define WDRTAS_LOGBUFFER_LEN 128 +static char wdrtas_logbuffer[WDRTAS_LOGBUFFER_LEN]; + + +/*** watchdog access functions */ + +/** + * wdrtas_set_interval - sets the watchdog interval + * @interval: new interval + * + * returns 0 on success, <0 on failures + * + * wdrtas_set_interval sets the watchdog keepalive interval by calling the + * RTAS function set-indicator (surveillance). The unit of interval is + * seconds. + */ +static int +wdrtas_set_interval(int interval) +{ + long result; + static int print_msg = 10; + + /* rtas uses minutes */ + interval = (interval + 59) / 60; + + result = rtas_call(wdrtas_token_set_indicator, 3, 1, NULL, + WDRTAS_SURVEILLANCE_IND, 0, interval); + if ( (result < 0) && (print_msg) ) { + printk("wdrtas: setting the watchdog to %i timeout failed: " + "%li\n", interval, result); + print_msg--; + } + + return result; +} + +/** + * wdrtas_get_interval - returns the current watchdog interval + * @fallback_value: value (in seconds) to use, if the RTAS call fails + * + * returns the interval + * + * wdrtas_get_interval returns the current watchdog keepalive interval + * as reported by the RTAS function ibm,get-system-parameter. The unit + * of the return value is seconds. + */ +static int +wdrtas_get_interval(int fallback_value) +{ + long result; + char value[4]; + + result = rtas_call(wdrtas_token_get_sp, 3, 1, NULL, + WDRTAS_SP_SPI, (void *)__pa(&value), 4); + if ( (value[0] != 0) || (value[1] != 2) || (value[3] != 0) || + (result < 0) ) { + printk("wdrtas: could not get sp_spi watchdog timeout (%li). " + "Continuing\n", result); + return fallback_value; + } + + /* rtas uses minutes */ + return ((int)value[2]) * 60; +} + +/** + * wdrtas_timer_start - starts watchdog + * + * wdrtas_timer_start starts the watchdog by calling the RTAS function + * set-interval (surveillance) + */ +static void +wdrtas_timer_start(void) +{ + wdrtas_set_interval(wdrtas_interval); +} + +/** + * wdrtas_timer_stop - stops watchdog + * + * wdrtas_timer_stop stops the watchdog timer by calling the RTAS function + * set-interval (surveillance) + */ +static void +wdrtas_timer_stop(void) +{ + wdrtas_set_interval(0); +} + +/** + * wdrtas_log_scanned_event - logs an event we received during keepalive + * + * wdrtas_log_scanned_event prints a message to the log buffer dumping + * the results of the last event-scan call + */ +static void +wdrtas_log_scanned_event(void) +{ + int i; + + for (i = 0; i < WDRTAS_LOGBUFFER_LEN; i += 16) + printk("wdrtas: dumping event (line %i/%i), data = " + "%02x %02x %02x %02x %02x %02x %02x %02x " + "%02x %02x %02x %02x %02x %02x %02x %02x\n", + (i / 16) + 1, (WDRTAS_LOGBUFFER_LEN / 16), + wdrtas_logbuffer[i + 0], wdrtas_logbuffer[i + 1], + wdrtas_logbuffer[i + 2], wdrtas_logbuffer[i + 3], + wdrtas_logbuffer[i + 4], wdrtas_logbuffer[i + 5], + wdrtas_logbuffer[i + 6], wdrtas_logbuffer[i + 7], + wdrtas_logbuffer[i + 8], wdrtas_logbuffer[i + 9], + wdrtas_logbuffer[i + 10], wdrtas_logbuffer[i + 11], + wdrtas_logbuffer[i + 12], wdrtas_logbuffer[i + 13], + wdrtas_logbuffer[i + 14], wdrtas_logbuffer[i + 15]); +} + +/** + * wdrtas_timer_keepalive - resets watchdog timer to keep system alive + * + * wdrtas_timer_keepalive restarts the watchdog timer by calling the + * RTAS function event-scan and repeats these calls as long as there are + * events available. All events will be dumped. + */ +static void +wdrtas_timer_keepalive(void) +{ + long result; + + do { + result = rtas_call(wdrtas_token_event_scan, 4, 1, NULL, + RTAS_EVENT_SCAN_ALL_EVENTS, 0, + (void *)__pa(wdrtas_logbuffer), + WDRTAS_LOGBUFFER_LEN); + if (result < 0) + printk("wdrtas: event-scan failed: %li\n",result); + if (result == 0) + wdrtas_log_scanned_event(); + } while (result == 0); +} + +/** + * wdrtas_get_temperature - returns current temperature + * + * returns temperature or <0 on failures + * + * wdrtas_get_temperature returns the current temperature in Fahrenheit. It + * uses the RTAS call get-sensor-state, token 3 to do so + */ +static int +wdrtas_get_temperature(void) +{ + long result; + int temperature = 0; + + result = rtas_call(wdrtas_token_get_sensor_state, 2, 2, + (void *)__pa(&temperature), + WDRTAS_THERMAL_SENSOR, 0); + + if (result < 0) + printk("wdrtas: reading the thermal sensor faild: %li\n", + result); + else + temperature = ((temperature * 9) / 5) + 32; /* fahrenheit */ + + return temperature; +} + +/** + * wdrtas_get_status - returns the status of the watchdog + * + * returns a bitmask of defines WDIOF_... as defined in + * include/linux/watchdog.h + */ +static int +wdrtas_get_status(void) +{ + return 0; /* TODO */ +} + +/** + * wdrtas_get_boot_status - returns the reason for the last boot + * + * returns a bitmask of defines WDIOF_... as defined in + * include/linux/watchdog.h, indicating why the watchdog rebooted the system + */ +static int +wdrtas_get_boot_status(void) +{ + return 0; /* TODO */ +} + +/*** watchdog API and operations stuff */ + +/* wdrtas_write - called when watchdog device is written to + * @file: file structure + * @buf: user buffer with data + * @len: amount to data written + * @ppos: position in file + * + * returns the number of successfully processed characters, which is always + * the number of bytes passed to this function + * + * wdrtas_write processes all the data given to it and looks for the magic + * character 'V'. This character allows the watchdog device to be closed + * properly. + */ +static ssize_t +wdrtas_write(struct file *file, const char __user *buf, + size_t len, loff_t *ppos) +{ + int i; + char c; + + if (!len) + goto out; + + if (!wdrtas_nowayout) { + wdrtas_expect_close = 0; + /* look for 'V' */ + for (i = 0; i < len; i++) { + if (get_user(c, buf + i)) + return -EFAULT; + /* allow to close device */ + if (c == 'V') + wdrtas_expect_close = WDRTAS_MAGIC_CHAR; + } + } + + wdrtas_timer_keepalive(); + +out: + return len; +} + +/** + * wdrtas_ioctl - ioctl function for the watchdog device + * @inode: inode structure + * @file: file structure + * @cmd: command for ioctl + * @arg: argument pointer + * + * returns 0 on success, <0 on failure + * + * wdrtas_ioctl implements the watchdog API ioctls + */ +static int +wdrtas_ioctl(struct inode *inode, struct file *file, + unsigned int cmd, unsigned long arg) +{ + int __user *argp = (void *)arg; + int i; + static struct watchdog_info wdinfo = { + .options = WDRTAS_SUPPORTED_MASK, + .firmware_version = 0, + .identity = "wdrtas" + }; + + switch (cmd) { + case WDIOC_GETSUPPORT: + if (copy_to_user(argp, &wdinfo, sizeof(wdinfo))) + return -EFAULT; + return 0; + + case WDIOC_GETSTATUS: + i = wdrtas_get_status(); + return put_user(i, argp); + + case WDIOC_GETBOOTSTATUS: + i = wdrtas_get_boot_status(); + return put_user(i, argp); + + case WDIOC_GETTEMP: + if (wdrtas_token_get_sensor_state == RTAS_UNKNOWN_SERVICE) + return -EOPNOTSUPP; + + i = wdrtas_get_temperature(); + return put_user(i, argp); + + case WDIOC_SETOPTIONS: + if (get_user(i, argp)) + return -EFAULT; + if (i & WDIOS_DISABLECARD) + wdrtas_timer_stop(); + if (i & WDIOS_ENABLECARD) { + wdrtas_timer_keepalive(); + wdrtas_timer_start(); + } + if (i & WDIOS_TEMPPANIC) { + /* not implemented. Done by H8 */ + } + return 0; + + case WDIOC_KEEPALIVE: + wdrtas_timer_keepalive(); + return 0; + + case WDIOC_SETTIMEOUT: + if (get_user(i, argp)) + return -EFAULT; + + if (wdrtas_set_interval(i)) + return -EINVAL; + + wdrtas_timer_keepalive(); + + if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE) + wdrtas_interval = i; + else + wdrtas_interval = wdrtas_get_interval(i); + /* fallthrough */ + + case WDIOC_GETTIMEOUT: + return put_user(wdrtas_interval, argp); + + default: + return -ENOIOCTLCMD; + } +} + +/** + * wdrtas_open - open function of watchdog device + * @inode: inode structure + * @file: file structure + * + * returns 0 on success, -EBUSY if the file has been opened already, <0 on + * other failures + * + * function called when watchdog device is opened + */ +static int +wdrtas_open(struct inode *inode, struct file *file) +{ + /* only open once */ + if (xchg(&wdrtas_miscdev_open,1)) + return -EBUSY; + + wdrtas_timer_start(); + wdrtas_timer_keepalive(); + + return nonseekable_open(inode, file); +} + +/** + * wdrtas_close - close function of watchdog device + * @inode: inode structure + * @file: file structure + * + * returns 0 on success + * + * close function. Always succeeds + */ +static int +wdrtas_close(struct inode *inode, struct file *file) +{ + /* only stop watchdog, if this was announced using 'V' before */ + if (wdrtas_expect_close == WDRTAS_MAGIC_CHAR) + wdrtas_timer_stop(); + else { + printk("wdrtas: got unexpected close. Watchdog " + "not stopped.\n"); + wdrtas_timer_keepalive(); + } + + wdrtas_expect_close = 0; + xchg(&wdrtas_miscdev_open,0); + return 0; +} + +/** + * wdrtas_temp_read - gives back the temperature in fahrenheit + * @file: file structure + * @buf: user buffer + * @count: number of bytes to be read + * @ppos: position in file + * + * returns always 1 or -EFAULT in case of user space copy failures, <0 on + * other failures + * + * wdrtas_temp_read gives the temperature to the users by copying this + * value as one byte into the user space buffer. The unit is Fahrenheit... + */ +static ssize_t +wdrtas_temp_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + int temperature = 0; + + temperature = wdrtas_get_temperature(); + if (temperature < 0) + return temperature; + + if (copy_to_user(buf, &temperature, 1)) + return -EFAULT; + + return 1; +} + +/** + * wdrtas_temp_open - open function of temperature device + * @inode: inode structure + * @file: file structure + * + * returns 0 on success, <0 on failure + * + * function called when temperature device is opened + */ +static int +wdrtas_temp_open(struct inode *inode, struct file *file) +{ + return nonseekable_open(inode, file); +} + +/** + * wdrtas_temp_close - close function of temperature device + * @inode: inode structure + * @file: file structure + * + * returns 0 on success + * + * close function. Always succeeds + */ +static int +wdrtas_temp_close(struct inode *inode, struct file *file) +{ + return 0; +} + +/** + * wdrtas_reboot - reboot notifier function + * @nb: notifier block structure + * @code: reboot code + * @ptr: unused + * + * returns NOTIFY_DONE + * + * wdrtas_reboot stops the watchdog in case of a reboot + */ +static int +wdrtas_reboot(struct notifier_block *this, unsigned long code, void *ptr) +{ + if ( (code==SYS_DOWN) || (code==SYS_HALT) ) + wdrtas_timer_stop(); + + return NOTIFY_DONE; +} + +/*** initialization stuff */ + +static struct file_operations wdrtas_fops = { + .owner = THIS_MODULE, + .llseek = no_llseek, + .write = wdrtas_write, + .ioctl = wdrtas_ioctl, + .open = wdrtas_open, + .release = wdrtas_close, +}; + +static struct miscdevice wdrtas_miscdev = { + .minor = WATCHDOG_MINOR, + .name = "watchdog", + .fops = &wdrtas_fops, +}; + +static struct file_operations wdrtas_temp_fops = { + .owner = THIS_MODULE, + .llseek = no_llseek, + .read = wdrtas_temp_read, + .open = wdrtas_temp_open, + .release = wdrtas_temp_close, +}; + +static struct miscdevice wdrtas_tempdev = { + .minor = TEMP_MINOR, + .name = "temperature", + .fops = &wdrtas_temp_fops, +}; + +static struct notifier_block wdrtas_notifier = { + .notifier_call = wdrtas_reboot, +}; + +/** + * wdrtas_get_tokens - reads in RTAS tokens + * + * returns 0 on succes, <0 on failure + * + * wdrtas_get_tokens reads in the tokens for the RTAS calls used in + * this watchdog driver. It tolerates, if "get-sensor-state" and + * "ibm,get-system-parameter" are not available. + */ +static int +wdrtas_get_tokens(void) +{ + wdrtas_token_get_sensor_state = rtas_token("get-sensor-state"); + if (wdrtas_token_get_sensor_state == RTAS_UNKNOWN_SERVICE) { + printk("wdrtas: couldn't get token for get-sensor-state. " + "Trying to continue without temperature support.\n"); + } + + wdrtas_token_get_sp = rtas_token("ibm,get-system-parameter"); + if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE) { + printk("wdrtas: couldn't get token for " + "ibm,get-system-parameter. Trying to continue with " + "a default timeout value of %i seconds.\n", + WDRTAS_DEFAULT_INTERVAL); + } + + wdrtas_token_set_indicator = rtas_token("set-indicator"); + if (wdrtas_token_set_indicator == RTAS_UNKNOWN_SERVICE) { + printk("wdrtas: couldn't get token for set-indicator. " + "Terminating watchdog code.\n"); + return -EIO; + } + + wdrtas_token_event_scan = rtas_token("event-scan"); + if (wdrtas_token_event_scan == RTAS_UNKNOWN_SERVICE) { + printk("wdrtas: couldn't get token for event-scan. " + "Terminating watchdog code.\n"); + return -EIO; + } + + return 0; +} + +/** + * wdrtas_unregister_devs - unregisters the misc dev handlers + * + * wdrtas_register_devs unregisters the watchdog and temperature watchdog + * misc devs + */ +static void +wdrtas_unregister_devs(void) +{ + misc_deregister(&wdrtas_miscdev); + if (wdrtas_token_get_sensor_state != RTAS_UNKNOWN_SERVICE) + misc_deregister(&wdrtas_tempdev); +} + +/** + * wdrtas_register_devs - registers the misc dev handlers + * + * returns 0 on succes, <0 on failure + * + * wdrtas_register_devs registers the watchdog and temperature watchdog + * misc devs + */ +static int +wdrtas_register_devs(void) +{ + int result; + + result = misc_register(&wdrtas_miscdev); + if (result) { + printk("wdrtas: couldn't register watchdog misc device. " + "Terminating watchdog code.\n"); + return result; + } + + if (wdrtas_token_get_sensor_state != RTAS_UNKNOWN_SERVICE) { + result = misc_register(&wdrtas_tempdev); + if (result) { + printk("wdrtas: couldn't register watchdog " + "temperature misc device. Continuing without " + "temperature support.\n"); + wdrtas_token_get_sensor_state = RTAS_UNKNOWN_SERVICE; + } + } + + return 0; +} + +/** + * wdrtas_init - init function of the watchdog driver + * + * returns 0 on succes, <0 on failure + * + * registers the file handlers and the reboot notifier + */ +static int __init +wdrtas_init(void) +{ + if (wdrtas_get_tokens()) + return -ENODEV; + + if (wdrtas_register_devs()) + return -ENODEV; + + if (register_reboot_notifier(&wdrtas_notifier)) { + printk("wdrtas: could not register reboot notifier. " + "Terminating watchdog code.\n"); + wdrtas_unregister_devs(); + return -ENODEV; + } + + if (wdrtas_token_get_sp == RTAS_UNKNOWN_SERVICE) + wdrtas_interval = WDRTAS_DEFAULT_INTERVAL; + else + wdrtas_interval = wdrtas_get_interval(WDRTAS_DEFAULT_INTERVAL); + + return 0; +} + +/** + * wdrtas_exit - exit function of the watchdog driver + * + * unregisters the file handlers and the reboot notifier + */ +static void __exit +wdrtas_exit(void) +{ + if (!wdrtas_nowayout) + wdrtas_timer_stop(); + + wdrtas_unregister_devs(); + + unregister_reboot_notifier(&wdrtas_notifier); +} + +module_init(wdrtas_init); +module_exit(wdrtas_exit); From pj at sgi.com Wed Apr 20 12:27:57 2005 From: pj at sgi.com (Paul Jackson) Date: Tue, 19 Apr 2005 19:27:57 -0700 Subject: [PATCH 1/4] ppc64: rename arch/ppc64/kernel/pSeries_pci.c In-Reply-To: <20050420021245.GA7257@wohnheim.fh-wedel.de> References: <200504200149.22063.arnd@arndb.de> <200504200152.58965.arnd@arndb.de> <20050420021245.GA7257@wohnheim.fh-wedel.de> Message-ID: <20050419192757.28aa55e1.pj@sgi.com> > You might want to be consistent wrt. braces for one-line conditional > statements. Perhaps he is consistent - just not to any of the rules that you considered. For example, I will add braces even to a one-line conditional if due to wrapping long lines, it really takes two or more screen lines for the conditional. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 From joern at wohnheim.fh-wedel.de Wed Apr 20 12:12:45 2005 From: joern at wohnheim.fh-wedel.de (=?iso-8859-1?Q?J=F6rn?= Engel) Date: Wed, 20 Apr 2005 04:12:45 +0200 Subject: [PATCH 1/4] ppc64: rename arch/ppc64/kernel/pSeries_pci.c In-Reply-To: <200504200152.58965.arnd@arndb.de> References: <200504200149.22063.arnd@arndb.de> <200504200152.58965.arnd@arndb.de> Message-ID: <20050420021245.GA7257@wohnheim.fh-wedel.de> On Wed, 20 April 2005 01:52:56 +0200, Arnd Bergmann wrote: > > - if (buid) { > - ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval, > - addr, buid >> 32, buid & 0xffffffff, size); > - } else { > - ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size); > - } > - *val = returnval; > - > - if (ret) > - return PCIBIOS_DEVICE_NOT_FOUND; You might want to be consistent wrt. braces for one-line conditional statements. J?rn -- Optimizations always bust things, because all optimizations are, in the long haul, a form of cheating, and cheaters eventually get caught. -- Larry Wall From benh at kernel.crashing.org Wed Apr 20 15:44:10 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 20 Apr 2005 15:44:10 +1000 Subject: PATCH [PPC64]: dead processes never reaped In-Reply-To: <20050418193833.GW15596@austin.ibm.com> References: <20050418193833.GW15596@austin.ibm.com> Message-ID: <1113975850.5515.377.camel@gaston> On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > Hi, > > The patch below appears to fix a problem where a number of dead processes > linger on the system. On a highly loaded system, dozens of processes > were found stuck in do_exit(), calling thier very last schedule(), and > then being lost forever. Ok, we spent some time with Paul decrypting what _switch_to is supposed to do. Our understanding at this point is that the current code is correct on both ppc32 and ppc64, that is: The "prev" passed in is always "current" and we don't see how it can be anything else. We use a local variable instead of current in the common code because accessing current can be slow on some architectures. I don't see any codepath where prev != current before switch_to. If we didn't do some black magic that I explain below, _switch_to would switch the entire context, including stack, and thus including the value of "prev". Which means that we would always come back with prev beeing current, which is useless for reaping the old task. What we want is that this "prev" that was passed to _switch_to() is returned so that we can rip that previous task despite the change of context, that is basically prev has to be an invariant vs. the change of context in switch_to. On ppc & ppc64, we implement that by passing that prev (or it's thread counterpart) to the assembly context switch code in r3. This code will preserve it and return it as-is (or re-transformed from thread to task). So your problem must be somewhere else. I've looked at the need_resched code path and we always reload prev = current from a non-preemptible region, so it can't be wrong. This was verified on 2.6.12-rc2, there might be something else wrong in an older kernel. Ben. From Lev_Makhlis at bmc.com Thu Apr 21 02:02:15 2005 From: Lev_Makhlis at bmc.com (Makhlis, Lev) Date: Wed, 20 Apr 2005 11:02:15 -0500 Subject: [PATCH] /proc/ppc64/lparcfg permissions Message-ID: <1114012954.11706.13.camel@levlinux.boston.bmc.com> Hi, Is there a good reason that /proc/ppc64/lparcfg is only readable to root? (For comparison, any user can run lparstat(1) on AIX.) If not, the patch below fixes it. Signed-off-by: Lev Makhlis --- linux-2.6.11.7/arch/ppc64/kernel/lparcfg.c 2005-04-07 14:57:15.000000000 -0400 +++ linux-lparcfg-mode/arch/ppc64/kernel/lparcfg.c 2005-04-20 11:23:17.000000000 -0400 @@ -559,7 +559,7 @@ struct file_operations lparcfg_fops = { int __init lparcfg_init(void) { struct proc_dir_entry *ent; - mode_t mode = S_IRUSR; + mode_t mode = S_IRUGO; /* Allow writing if we have FW_FEATURE_SPLPAR */ if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { From ppc-dev at storix.com Wed Apr 20 22:05:24 2005 From: ppc-dev at storix.com (ppc-dev at storix.com) Date: Wed, 20 Apr 2005 12:05:24 +0000 Subject: zImage.initrd Message-ID: <200504201205.24660.ppc-dev@storix.com> I am attempting to understand the makefile for creating a zImage.initrd. Essentially I know it creates several 32-bit .o files, links them together and runs addnote. I am particuarly interested (baffled) by kernel-vmlinux.o How do I get from a 64-bit vmlinux to 32-bit kernel-vmlinux.o? Also, I notice that kernel-vmlinux.c is empty. Is this just a dummy placeholder? Any information or insight as to how this works would be greatly appreciated. Thanks David Huffman Storix, Inc. From will_schmidt at vnet.ibm.com Fri Apr 22 00:45:06 2005 From: will_schmidt at vnet.ibm.com (will schmidt) Date: Thu, 21 Apr 2005 09:45:06 -0500 Subject: [PATCH] /proc/ppc64/lparcfg permissions In-Reply-To: <1114012954.11706.13.camel@levlinux.boston.bmc.com> References: <1114012954.11706.13.camel@levlinux.boston.bmc.com> Message-ID: <4267BC72.5040302@vnet.ibm.com> No reason that I'm aware of. Acked-by: Will Schmidt Makhlis, Lev wrote: > Hi, > > Is there a good reason that /proc/ppc64/lparcfg is only readable to root? > (For comparison, any user can run lparstat(1) on AIX.) > > If not, the patch below fixes it. > > Signed-off-by: Lev Makhlis > > --- linux-2.6.11.7/arch/ppc64/kernel/lparcfg.c 2005-04-07 > 14:57:15.000000000 -0400 > +++ linux-lparcfg-mode/arch/ppc64/kernel/lparcfg.c 2005-04-20 > 11:23:17.000000000 -0400 > @@ -559,7 +559,7 @@ struct file_operations lparcfg_fops = { > int __init lparcfg_init(void) > { > struct proc_dir_entry *ent; > - mode_t mode = S_IRUSR; > + mode_t mode = S_IRUGO; > > /* Allow writing if we have FW_FEATURE_SPLPAR */ > if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev From arnd at arndb.de Fri Apr 22 16:49:59 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 22 Apr 2005 08:49:59 +0200 Subject: [PATCH 1/2] ppc64: fix read/write on large /dev/nvram Message-ID: <200504220850.00339.arnd@arndb.de> For large nvram devices on ppc64, reading and writing fails because of oversized arguments to kmalloc. This patch makes the driver use __get_free_page instead of kmalloc and sanitizes error handling while touching the functions. Signed-off-by: Arnd Bergmann --- linus-2.5.orig/arch/ppc64/kernel/nvram.c 2005-04-16 21:40:59.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/nvram.c 2005-04-22 08:05:39.000000000 +0200 @@ -81,80 +81,74 @@ static ssize_t dev_nvram_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { - ssize_t len; - char *tmp_buffer; - int size; + ssize_t ret; + char *tmp = NULL; + ssize_t size; + + ret = -ENODEV; + if (!ppc_md.nvram_size) + goto out; - if (ppc_md.nvram_size == NULL) - return -ENODEV; + ret = 0; size = ppc_md.nvram_size(); + if (*ppos >= size || size < 0) + goto out; - if (!access_ok(VERIFY_WRITE, buf, count)) - return -EFAULT; - if (*ppos >= size) - return 0; - if (count > size) - count = size; - - tmp_buffer = (char *) kmalloc(count, GFP_KERNEL); - if (!tmp_buffer) { - printk(KERN_ERR "dev_read_nvram: kmalloc failed\n"); - return -ENOMEM; - } - - len = ppc_md.nvram_read(tmp_buffer, count, ppos); - if ((long)len <= 0) { - kfree(tmp_buffer); - return len; - } - - if (copy_to_user(buf, tmp_buffer, len)) { - kfree(tmp_buffer); - return -EFAULT; - } + count = min_t(size_t, count, size - *ppos); + count = min(count, PAGE_SIZE); - kfree(tmp_buffer); - return len; + ret = -ENOMEM; + tmp = (char *) __get_free_page(GFP_KERNEL); + if (!tmp) + goto out; + + ret = ppc_md.nvram_read(tmp, count, ppos); + if (ret <= 0) + goto out; + + if (copy_to_user(buf, tmp, ret)) + ret = -EFAULT; + +out: + free_page((unsigned long)tmp); + return ret; } static ssize_t dev_nvram_write(struct file *file, const char __user *buf, - size_t count, loff_t *ppos) + size_t count, loff_t *ppos) { - ssize_t len; - char * tmp_buffer; - int size; + ssize_t ret; + char *tmp = NULL; + ssize_t size; + + ret = -ENODEV; + if (!ppc_md.nvram_size) + goto out; - if (ppc_md.nvram_size == NULL) - return -ENODEV; + ret = 0; size = ppc_md.nvram_size(); + if (*ppos >= size || size < 0) + goto out; - if (!access_ok(VERIFY_READ, buf, count)) - return -EFAULT; - if (*ppos >= size) - return 0; - if (count > size) - count = size; - - tmp_buffer = (char *) kmalloc(count, GFP_KERNEL); - if (!tmp_buffer) { - printk(KERN_ERR "dev_nvram_write: kmalloc failed\n"); - return -ENOMEM; - } - - if (copy_from_user(tmp_buffer, buf, count)) { - kfree(tmp_buffer); - return -EFAULT; - } + count = min_t(size_t, count, size - *ppos); + count = min(count, PAGE_SIZE); - len = ppc_md.nvram_write(tmp_buffer, count, ppos); - if ((long)len <= 0) { - kfree(tmp_buffer); - return len; - } + ret = -ENOMEM; + tmp = (char *) __get_free_page(GFP_KERNEL); + if (!tmp) + goto out; + + ret = -EFAULT; + if (copy_from_user(tmp, buf, count)) + goto out; + + ret = ppc_md.nvram_read(tmp, count, ppos); + +out: + free_page((unsigned long)tmp); + return ret; - kfree(tmp_buffer); - return len; } static int dev_nvram_ioctl(struct inode *inode, struct file *file, From arnd at arndb.de Fri Apr 22 16:54:38 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 22 Apr 2005 08:54:38 +0200 Subject: [PATCH 2/2] ppc64: fix warning in arch/ppc64/kernel/nvram.c In-Reply-To: <200504220850.00339.arnd@arndb.de> References: <200504220850.00339.arnd@arndb.de> Message-ID: <200504220854.38954.arnd@arndb.de> gcc-3.4 warns about part being possibly used without initialization in nvram_create_os_partition. Converting the function to use list_for_each_entry fixes this. Signed-off-by: Arnd Bergmann --- linus-2.5.orig/arch/ppc64/kernel/nvram.c 2005-04-22 08:05:39.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/nvram.c 2005-04-22 08:07:16.000000000 +0200 @@ -332,9 +332,8 @@ */ static int nvram_create_os_partition(void) { - struct list_head * p; struct nvram_partition * part; - struct nvram_partition * new_part = NULL; + struct nvram_partition * new_part; struct nvram_partition * free_part = NULL; int seq_init[2] = { 0, 0 }; loff_t tmp_index; @@ -343,8 +342,7 @@ /* Find a free partition that will give us the maximum needed size If can't find one that will give us the minimum size needed */ - list_for_each(p, &nvram_part->partition) { - part = list_entry(p, struct nvram_partition, partition); + list_for_each_entry(part, &nvram_part->partition, partition) { if (part->header.signature != NVRAM_SIG_FREE) continue; From ntl at pobox.com Sat Apr 23 03:04:23 2005 From: ntl at pobox.com (Nathan Lynch) Date: Fri, 22 Apr 2005 12:04:23 -0500 Subject: [PATCH] /proc/ppc64/lparcfg permissions In-Reply-To: <4267BC72.5040302@vnet.ibm.com> References: <1114012954.11706.13.camel@levlinux.boston.bmc.com> <4267BC72.5040302@vnet.ibm.com> Message-ID: <20050422170423.GC18688@otto> > Makhlis, Lev wrote: > >Hi, > > > >Is there a good reason that /proc/ppc64/lparcfg is only readable to root? > >(For comparison, any user can run lparstat(1) on AIX.) > > No reason that I'm aware of. > I'd be a little more cautious (paranoid?). Reading lparcfg causes several hypervisor and RTAS calls. It seems there's potential for local DOS if we were to make this world-readable. Nathan From Lev_Makhlis at bmc.com Sat Apr 23 03:31:11 2005 From: Lev_Makhlis at bmc.com (Makhlis, Lev) Date: Fri, 22 Apr 2005 12:31:11 -0500 Subject: [PATCH] /proc/ppc64/lparcfg permissions Message-ID: <1114191113.16759.1.camel@levlinux.boston.bmc.com> On Fri, 2005-04-22 at 13:04 -0400, Nathan Lynch wrote: > > Makhlis, Lev wrote: > > >Hi, > > > > > >Is there a good reason that /proc/ppc64/lparcfg is only readable to > root? > > >(For comparison, any user can run lparstat(1) on AIX.) > > > > No reason that I'm aware of. > > > > I'd be a little more cautious (paranoid?). Reading lparcfg causes > several hypervisor and RTAS calls. It seems there's potential for > local DOS if we were to make this world-readable. Do you know if lpar_get_info() on AIX makes the same hypervisor calls? From ntl at pobox.com Sat Apr 23 03:36:26 2005 From: ntl at pobox.com (Nathan Lynch) Date: Fri, 22 Apr 2005 12:36:26 -0500 Subject: [PATCH] /proc/ppc64/lparcfg permissions In-Reply-To: <1114191113.16759.1.camel@levlinux.boston.bmc.com> References: <1114191113.16759.1.camel@levlinux.boston.bmc.com> Message-ID: <20050422173626.GD18688@otto> On Fri, Apr 22, 2005 at 12:31:11PM -0500, Makhlis, Lev wrote: > On Fri, 2005-04-22 at 13:04 -0400, Nathan Lynch wrote: > > > Makhlis, Lev wrote: > > > >Hi, > > > > > > > >Is there a good reason that /proc/ppc64/lparcfg is only readable to > > root? > > > >(For comparison, any user can run lparstat(1) on AIX.) > > > > > > No reason that I'm aware of. > > > > > > > I'd be a little more cautious (paranoid?). Reading lparcfg causes > > several hypervisor and RTAS calls. It seems there's potential for > > local DOS if we were to make this world-readable. > > Do you know if lpar_get_info() on AIX makes the same hypervisor calls? No, I don't know. From davem at davemloft.net Mon Apr 25 12:45:22 2005 From: davem at davemloft.net (David S. Miller) Date: Sun, 24 Apr 2005 19:45:22 -0700 Subject: [PATCH] ppc64: fix semtimedop compat syscall In-Reply-To: <16960.32211.960982.13262@cargo.ozlabs.ibm.com> References: <20050321171349.GA23908@krispykreme> <20050321135031.6f35932a.davem@redhat.com> <20050322142211.1b4337e4.sfr@canb.auug.org.au> <200503221342.46575.arnd@arndb.de> <16960.32211.960982.13262@cargo.ozlabs.ibm.com> Message-ID: <20050424194522.0c31dd1c.davem@davemloft.net> On Wed, 23 Mar 2005 07:19:31 +1100 Paul Mackerras wrote: > Arnd Bergmann writes: > > > One problem is that sign extension can not be expressed in architecture > > independent C code. > > On which architectures does (long)(int) x not give the desired result? It all depends upon what the processor ABI says about incoming arguments, specifically whether the caller is expected to sign extend them or not. If the caller is expected to sign extend, then "(long) (int) x" will get optimized away by the compiler. From sfr at canb.auug.org.au Mon Apr 25 13:29:22 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 25 Apr 2005 13:29:22 +1000 Subject: [PATCH] ppc64: fix semtimedop compat syscall In-Reply-To: <20050424194522.0c31dd1c.davem@davemloft.net> References: <20050321171349.GA23908@krispykreme> <20050321135031.6f35932a.davem@redhat.com> <20050322142211.1b4337e4.sfr@canb.auug.org.au> <200503221342.46575.arnd@arndb.de> <16960.32211.960982.13262@cargo.ozlabs.ibm.com> <20050424194522.0c31dd1c.davem@davemloft.net> Message-ID: <20050425132922.1e570a10.sfr@canb.auug.org.au> On Sun, 24 Apr 2005 19:45:22 -0700 "David S. Miller" wrote: > > On Wed, 23 Mar 2005 07:19:31 +1100 > Paul Mackerras wrote: > > > Arnd Bergmann writes: > > > > > One problem is that sign extension can not be expressed in architecture > > > independent C code. > > > > On which architectures does (long)(int) x not give the desired result? > > It all depends upon what the processor ABI says about incoming arguments, > specifically whether the caller is expected to sign extend them or not. > > If the caller is expected to sign extend, then "(long) (int) x" will get > optimized away by the compiler. Even if x is declared as unsigned int? And is there an architecture like this? -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050425/dc81ddd0/attachment.pgp From davem at davemloft.net Mon Apr 25 13:50:32 2005 From: davem at davemloft.net (David S. Miller) Date: Sun, 24 Apr 2005 20:50:32 -0700 Subject: [PATCH] ppc64: fix semtimedop compat syscall In-Reply-To: <20050425132922.1e570a10.sfr@canb.auug.org.au> References: <20050321171349.GA23908@krispykreme> <20050321135031.6f35932a.davem@redhat.com> <20050322142211.1b4337e4.sfr@canb.auug.org.au> <200503221342.46575.arnd@arndb.de> <16960.32211.960982.13262@cargo.ozlabs.ibm.com> <20050424194522.0c31dd1c.davem@davemloft.net> <20050425132922.1e570a10.sfr@canb.auug.org.au> Message-ID: <20050424205032.38da977f.davem@davemloft.net> On Mon, 25 Apr 2005 13:29:22 +1000 Stephen Rothwell wrote: > > It all depends upon what the processor ABI says about incoming arguments, > > specifically whether the caller is expected to sign extend them or not. > > > > If the caller is expected to sign extend, then "(long) (int) x" will get > > optimized away by the compiler. > > Even if x is declared as unsigned int? No, not in that case. I was mentioning the case where x is declared as a signed int. From sfr at canb.auug.org.au Mon Apr 25 14:05:22 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 25 Apr 2005 14:05:22 +1000 Subject: [PATCH] ppc64: fix semtimedop compat syscall In-Reply-To: <20050424205032.38da977f.davem@davemloft.net> References: <20050321171349.GA23908@krispykreme> <20050321135031.6f35932a.davem@redhat.com> <20050322142211.1b4337e4.sfr@canb.auug.org.au> <200503221342.46575.arnd@arndb.de> <16960.32211.960982.13262@cargo.ozlabs.ibm.com> <20050424194522.0c31dd1c.davem@davemloft.net> <20050425132922.1e570a10.sfr@canb.auug.org.au> <20050424205032.38da977f.davem@davemloft.net> Message-ID: <20050425140522.6c8fc355.sfr@canb.auug.org.au> On Sun, 24 Apr 2005 20:50:32 -0700 "David S. Miller" wrote: > > On Mon, 25 Apr 2005 13:29:22 +1000 > Stephen Rothwell wrote: > > > > It all depends upon what the processor ABI says about incoming arguments, > > > specifically whether the caller is expected to sign extend them or not. > > > > > > If the caller is expected to sign extend, then "(long) (int) x" will get > > > optimized away by the compiler. > > > > Even if x is declared as unsigned int? > > No, not in that case. I was mentioning the case where x is declared > as a signed int. ok, I lost some context, sorry. So, if we declared all the parameters to the compat syscalls to be "unsigned int" (or better compat_arg_t), then we would be able to do all the type conversions in generic C code (assuming that the syscall interface zero extends all the arguments) and then all the asm wrappers would go away, right? -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050425/9be8401e/attachment.pgp From davem at davemloft.net Mon Apr 25 14:07:57 2005 From: davem at davemloft.net (David S. Miller) Date: Sun, 24 Apr 2005 21:07:57 -0700 Subject: [PATCH] ppc64: fix semtimedop compat syscall In-Reply-To: <20050425140522.6c8fc355.sfr@canb.auug.org.au> References: <20050321171349.GA23908@krispykreme> <20050321135031.6f35932a.davem@redhat.com> <20050322142211.1b4337e4.sfr@canb.auug.org.au> <200503221342.46575.arnd@arndb.de> <16960.32211.960982.13262@cargo.ozlabs.ibm.com> <20050424194522.0c31dd1c.davem@davemloft.net> <20050425132922.1e570a10.sfr@canb.auug.org.au> <20050424205032.38da977f.davem@davemloft.net> <20050425140522.6c8fc355.sfr@canb.auug.org.au> Message-ID: <20050424210757.3b930006.davem@davemloft.net> On Mon, 25 Apr 2005 14:05:22 +1000 Stephen Rothwell wrote: > So, if we declared all the parameters to the compat syscalls to be > "unsigned int" (or better compat_arg_t), then we would be able to do all > the type conversions in generic C code (assuming that the syscall > interface zero extends all the arguments) and then all the asm wrappers > would go away, right? It would work but it would suck :-) You'll eat a whole entire stack frame just to extend some arguments. Perhaps tail-calls will eliminate this to some degress, but tail calls result in return address branch cache misses on some platforms so it's still not zero cost. I think a generator for the asm stubs is the best idea, really. I didn't write the sparc64 stubs in assembler for fun, it really is needed to keep it cheap enough. From olh at suse.de Wed Apr 27 07:24:40 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 26 Apr 2005 23:24:40 +0200 Subject: [RFC] splitting out LPAR support from CONFIG_PSERIES In-Reply-To: <200502231425.42041.arnd@arndb.de> References: <200502221723.52051.arnd@arndb.de> <20050223044959.GA10256@austin.ibm.com> <200502231425.42041.arnd@arndb.de> Message-ID: <20050426212440.GA11668@suse.de> On Wed, Feb 23, Arnd Bergmann wrote: > On Middeweken 23 Februar 2005 05:49, Olof Johansson wrote: > +++ linux-2.6-ppc/arch/ppc64/Kconfig 2005-01-07 12:48:05.167929480 -0500 > config PPC_RTAS > - bool "Proc interface to RTAS" > + bool > depends on PPC_PSERIES > + default y > + > +config RTAS_PROC > + bool "Proc interface to RTAS" > + depends on PPC_RTAS > > config RTAS_FLASH > tristate "Firmware flash interface" > - depends on PPC_RTAS > + depends on RTAS_PROC > > config SCANLOG > tristate "Scanlog dump interface" > - depends on PPC_RTAS > + depends on RTAS_PROC && PPC_PSERIES Can you or someone else add help texts to all these options? From olh at suse.de Wed Apr 27 07:36:16 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 26 Apr 2005 23:36:16 +0200 Subject: [PATCH] AGP support for powermac G5 In-Reply-To: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> References: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> Message-ID: <20050426213616.GA11758@suse.de> On Fri, Mar 11, Paul Mackeras wrote: > This patch adds AGP support for the U3 northbridge used in Apple G5 > machines to drivers/char/agp/uninorth-agp.c. This patch is based on > earlier work by Jerome Glisse. With this patch, the driver works in > both ppc32 and ppc64 kernels. > > Signed-off-by: Paul Mackerras > > diff -urN linux-2.5/drivers/char/agp/Kconfig g5/drivers/char/agp/Kconfig > --- linux-2.5/drivers/char/agp/Kconfig 2005-03-07 14:01:44.000000000 +1100 > +++ g5/drivers/char/agp/Kconfig 2005-03-11 13:53:47.000000000 +1100 > @@ -1,6 +1,6 @@ > config AGP > tristate "/dev/agpgart (AGP Support)" if !GART_IOMMU > - depends on ALPHA || IA64 || PPC32 || X86 > + depends on ALPHA || IA64 || PPC || X86 2.6.12-rc3 has flush_dcache_range undefined when both new options are built as a module. .config is based on arch/ppc64/defconfig *** Warning: ".flush_dcache_range" [drivers/char/agp/uninorth-agp.ko] undefined! From olh at suse.de Wed Apr 27 07:38:48 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 26 Apr 2005 23:38:48 +0200 Subject: [PATCH] remove asm/bootinfo.h include in arch/ppc64/boot/main.c Message-ID: <20050426213848.GB11758@suse.de> The defines in bootinfo.h are not used, so the include can be removed. According to Ben, birecs are not used on ppc64: on ppc64, we made the decision of enforcing the presence of an OF device-tree and either an OF-like client interface or a kexec like flattened tree. so if your bootloader want to say things to the kernel, it can do so by adding properties to the device-tree compile-tested with defconfig Signed-off-by: Olaf Hering Index: linux-2.6.12-rc3-olh/arch/ppc64/boot/main.c =================================================================== --- linux-2.6.12-rc3-olh.orig/arch/ppc64/boot/main.c +++ linux-2.6.12-rc3-olh/arch/ppc64/boot/main.c @@ -14,7 +14,6 @@ #include #include #include -#include extern void *finddevice(const char *); extern int getprop(void *, const char *, void *, int); From olh at suse.de Wed Apr 27 07:56:19 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 26 Apr 2005 23:56:19 +0200 Subject: [PATCH] remove unused arch/ppc64/boot/start.c Message-ID: <20050426215619.GA11873@suse.de> start.c is not referenced in the arch/ppc64/boot/Makefile compile tested with the defconfig. Signed-off-by: Olaf Hering Index: linux-2.6.12-rc3-olh/arch/ppc64/boot/start.c =================================================================== --- linux-2.6.12-rc3-olh.orig/arch/ppc64/boot/start.c +++ /dev/null @@ -1,654 +0,0 @@ -/* - * Copyright (C) Paul Mackerras 1997. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#include -#include -#include -#include - -#include - -int (*prom)(void *); - -void *chosen_handle; -void *stdin; -void *stdout; -void *stderr; - -void exit(void); -void *finddevice(const char *name); -int getprop(void *phandle, const char *name, void *buf, int buflen); -void chrpboot(int a1, int a2, void *prom); /* in main.c */ - -void printk(char *fmt, ...); - -void -start(int a1, int a2, void *promptr) -{ - prom = (int (*)(void *)) promptr; - chosen_handle = finddevice("/chosen"); - if (chosen_handle == (void *) -1) - exit(); - if (getprop(chosen_handle, "stdout", &stdout, sizeof(stdout)) != 4) - exit(); - stderr = stdout; - if (getprop(chosen_handle, "stdin", &stdin, sizeof(stdin)) != 4) - exit(); - - chrpboot(a1, a2, promptr); - for (;;) - exit(); -} - -int -write(void *handle, void *ptr, int nb) -{ - struct prom_args { - char *service; - int nargs; - int nret; - void *ihandle; - void *addr; - int len; - int actual; - } args; - - args.service = "write"; - args.nargs = 3; - args.nret = 1; - args.ihandle = handle; - args.addr = ptr; - args.len = nb; - args.actual = -1; - (*prom)(&args); - return args.actual; -} - -int -read(void *handle, void *ptr, int nb) -{ - struct prom_args { - char *service; - int nargs; - int nret; - void *ihandle; - void *addr; - int len; - int actual; - } args; - - args.service = "read"; - args.nargs = 3; - args.nret = 1; - args.ihandle = handle; - args.addr = ptr; - args.len = nb; - args.actual = -1; - (*prom)(&args); - return args.actual; -} - -void -exit() -{ - struct prom_args { - char *service; - } args; - - for (;;) { - args.service = "exit"; - (*prom)(&args); - } -} - -void -pause(void) -{ - struct prom_args { - char *service; - } args; - - args.service = "enter"; - (*prom)(&args); -} - -void * -finddevice(const char *name) -{ - struct prom_args { - char *service; - int nargs; - int nret; - const char *devspec; - void *phandle; - } args; - - args.service = "finddevice"; - args.nargs = 1; - args.nret = 1; - args.devspec = name; - args.phandle = (void *) -1; - (*prom)(&args); - return args.phandle; -} - -void * -claim(unsigned long virt, unsigned long size, unsigned long align) -{ - struct prom_args { - char *service; - int nargs; - int nret; - unsigned int virt; - unsigned int size; - unsigned int align; - void *ret; - } args; - - args.service = "claim"; - args.nargs = 3; - args.nret = 1; - args.virt = virt; - args.size = size; - args.align = align; - (*prom)(&args); - return args.ret; -} - -int -getprop(void *phandle, const char *name, void *buf, int buflen) -{ - struct prom_args { - char *service; - int nargs; - int nret; - void *phandle; - const char *name; - void *buf; - int buflen; - int size; - } args; - - args.service = "getprop"; - args.nargs = 4; - args.nret = 1; - args.phandle = phandle; - args.name = name; - args.buf = buf; - args.buflen = buflen; - args.size = -1; - (*prom)(&args); - return args.size; -} - -int -putc(int c, void *f) -{ - char ch = c; - - if (c == '\n') - putc('\r', f); - return write(f, &ch, 1) == 1? c: -1; -} - -int -putchar(int c) -{ - return putc(c, stdout); -} - -int -fputs(char *str, void *f) -{ - int n = strlen(str); - - return write(f, str, n) == n? 0: -1; -} - -int -readchar(void) -{ - char ch; - - for (;;) { - switch (read(stdin, &ch, 1)) { - case 1: - return ch; - case -1: - printk("read(stdin) returned -1\r\n"); - return -1; - } - } -} - -static char line[256]; -static char *lineptr; -static int lineleft; - -int -getchar(void) -{ - int c; - - if (lineleft == 0) { - lineptr = line; - for (;;) { - c = readchar(); - if (c == -1 || c == 4) - break; - if (c == '\r' || c == '\n') { - *lineptr++ = '\n'; - putchar('\n'); - break; - } - switch (c) { - case 0177: - case '\b': - if (lineptr > line) { - putchar('\b'); - putchar(' '); - putchar('\b'); - --lineptr; - } - break; - case 'U' & 0x1F: - while (lineptr > line) { - putchar('\b'); - putchar(' '); - putchar('\b'); - --lineptr; - } - break; - default: - if (lineptr >= &line[sizeof(line) - 1]) - putchar('\a'); - else { - putchar(c); - *lineptr++ = c; - } - } - } - lineleft = lineptr - line; - lineptr = line; - } - if (lineleft == 0) - return -1; - --lineleft; - return *lineptr++; -} - - - -/* String functions lifted from lib/vsprintf.c and lib/ctype.c */ -unsigned char _ctype[] = { -_C,_C,_C,_C,_C,_C,_C,_C, /* 0-7 */ -_C,_C|_S,_C|_S,_C|_S,_C|_S,_C|_S,_C,_C, /* 8-15 */ -_C,_C,_C,_C,_C,_C,_C,_C, /* 16-23 */ -_C,_C,_C,_C,_C,_C,_C,_C, /* 24-31 */ -_S|_SP,_P,_P,_P,_P,_P,_P,_P, /* 32-39 */ -_P,_P,_P,_P,_P,_P,_P,_P, /* 40-47 */ -_D,_D,_D,_D,_D,_D,_D,_D, /* 48-55 */ -_D,_D,_P,_P,_P,_P,_P,_P, /* 56-63 */ -_P,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U, /* 64-71 */ -_U,_U,_U,_U,_U,_U,_U,_U, /* 72-79 */ -_U,_U,_U,_U,_U,_U,_U,_U, /* 80-87 */ -_U,_U,_U,_P,_P,_P,_P,_P, /* 88-95 */ -_P,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L, /* 96-103 */ -_L,_L,_L,_L,_L,_L,_L,_L, /* 104-111 */ -_L,_L,_L,_L,_L,_L,_L,_L, /* 112-119 */ -_L,_L,_L,_P,_P,_P,_P,_C, /* 120-127 */ -0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 128-143 */ -0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 144-159 */ -_S|_SP,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P, /* 160-175 */ -_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P, /* 176-191 */ -_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U, /* 192-207 */ -_U,_U,_U,_U,_U,_U,_U,_P,_U,_U,_U,_U,_U,_U,_U,_L, /* 208-223 */ -_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L, /* 224-239 */ -_L,_L,_L,_L,_L,_L,_L,_P,_L,_L,_L,_L,_L,_L,_L,_L}; /* 240-255 */ - -size_t strnlen(const char * s, size_t count) -{ - const char *sc; - - for (sc = s; count-- && *sc != '\0'; ++sc) - /* nothing */; - return sc - s; -} - -unsigned long simple_strtoul(const char *cp,char **endp,unsigned int base) -{ - unsigned long result = 0,value; - - if (!base) { - base = 10; - if (*cp == '0') { - base = 8; - cp++; - if ((*cp == 'x') && isxdigit(cp[1])) { - cp++; - base = 16; - } - } - } - while (isxdigit(*cp) && - (value = isdigit(*cp) ? *cp-'0' : toupper(*cp)-'A'+10) < base) { - result = result*base + value; - cp++; - } - if (endp) - *endp = (char *)cp; - return result; -} - -long simple_strtol(const char *cp,char **endp,unsigned int base) -{ - if(*cp=='-') - return -simple_strtoul(cp+1,endp,base); - return simple_strtoul(cp,endp,base); -} - -static int skip_atoi(const char **s) -{ - int i=0; - - while (isdigit(**s)) - i = i*10 + *((*s)++) - '0'; - return i; -} - -#define ZEROPAD 1 /* pad with zero */ -#define SIGN 2 /* unsigned/signed long */ -#define PLUS 4 /* show plus */ -#define SPACE 8 /* space if plus */ -#define LEFT 16 /* left justified */ -#define SPECIAL 32 /* 0x */ -#define LARGE 64 /* use 'ABCDEF' instead of 'abcdef' */ - -static char * number(char * str, long long num, int base, int size, int precision, int type) -{ - char c,sign,tmp[66]; - const char *digits="0123456789abcdefghijklmnopqrstuvwxyz"; - int i; - - if (type & LARGE) - digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"; - if (type & LEFT) - type &= ~ZEROPAD; - if (base < 2 || base > 36) - return 0; - c = (type & ZEROPAD) ? '0' : ' '; - sign = 0; - if (type & SIGN) { - if (num < 0) { - sign = '-'; - num = -num; - size--; - } else if (type & PLUS) { - sign = '+'; - size--; - } else if (type & SPACE) { - sign = ' '; - size--; - } - } - if (type & SPECIAL) { - if (base == 16) - size -= 2; - else if (base == 8) - size--; - } - i = 0; - if (num == 0) - tmp[i++]='0'; - else while (num != 0) - tmp[i++] = digits[do_div(num,base)]; - if (i > precision) - precision = i; - size -= precision; - if (!(type&(ZEROPAD+LEFT))) - while(size-->0) - *str++ = ' '; - if (sign) - *str++ = sign; - if (type & SPECIAL) { - if (base==8) - *str++ = '0'; - else if (base==16) { - *str++ = '0'; - *str++ = digits[33]; - } - } - if (!(type & LEFT)) - while (size-- > 0) - *str++ = c; - while (i < precision--) - *str++ = '0'; - while (i-- > 0) - *str++ = tmp[i]; - while (size-- > 0) - *str++ = ' '; - return str; -} - -/* Forward decl. needed for IP address printing stuff... */ -int sprintf(char * buf, const char *fmt, ...); - -int vsprintf(char *buf, const char *fmt, va_list args) -{ - int len; - unsigned long long num; - int i, base; - char * str; - const char *s; - - int flags; /* flags to number() */ - - int field_width; /* width of output field */ - int precision; /* min. # of digits for integers; max - number of chars for from string */ - int qualifier; /* 'h', 'l', or 'L' for integer fields */ - /* 'z' support added 23/7/1999 S.H. */ - /* 'z' changed to 'Z' --davidm 1/25/99 */ - - - for (str=buf ; *fmt ; ++fmt) { - if (*fmt != '%') { - *str++ = *fmt; - continue; - } - - /* process flags */ - flags = 0; - repeat: - ++fmt; /* this also skips first '%' */ - switch (*fmt) { - case '-': flags |= LEFT; goto repeat; - case '+': flags |= PLUS; goto repeat; - case ' ': flags |= SPACE; goto repeat; - case '#': flags |= SPECIAL; goto repeat; - case '0': flags |= ZEROPAD; goto repeat; - } - - /* get field width */ - field_width = -1; - if (isdigit(*fmt)) - field_width = skip_atoi(&fmt); - else if (*fmt == '*') { - ++fmt; - /* it's the next argument */ - field_width = va_arg(args, int); - if (field_width < 0) { - field_width = -field_width; - flags |= LEFT; - } - } - - /* get the precision */ - precision = -1; - if (*fmt == '.') { - ++fmt; - if (isdigit(*fmt)) - precision = skip_atoi(&fmt); - else if (*fmt == '*') { - ++fmt; - /* it's the next argument */ - precision = va_arg(args, int); - } - if (precision < 0) - precision = 0; - } - - /* get the conversion qualifier */ - qualifier = -1; - if (*fmt == 'h' || *fmt == 'l' || *fmt == 'L' || *fmt =='Z') { - qualifier = *fmt; - ++fmt; - } - - /* default base */ - base = 10; - - switch (*fmt) { - case 'c': - if (!(flags & LEFT)) - while (--field_width > 0) - *str++ = ' '; - *str++ = (unsigned char) va_arg(args, int); - while (--field_width > 0) - *str++ = ' '; - continue; - - case 's': - s = va_arg(args, char *); - if (!s) - s = ""; - - len = strnlen(s, precision); - - if (!(flags & LEFT)) - while (len < field_width--) - *str++ = ' '; - for (i = 0; i < len; ++i) - *str++ = *s++; - while (len < field_width--) - *str++ = ' '; - continue; - - case 'p': - if (field_width == -1) { - field_width = 2*sizeof(void *); - flags |= ZEROPAD; - } - str = number(str, - (unsigned long) va_arg(args, void *), 16, - field_width, precision, flags); - continue; - - - case 'n': - if (qualifier == 'l') { - long * ip = va_arg(args, long *); - *ip = (str - buf); - } else if (qualifier == 'Z') { - size_t * ip = va_arg(args, size_t *); - *ip = (str - buf); - } else { - int * ip = va_arg(args, int *); - *ip = (str - buf); - } - continue; - - case '%': - *str++ = '%'; - continue; - - /* integer number formats - set up the flags and "break" */ - case 'o': - base = 8; - break; - - case 'X': - flags |= LARGE; - case 'x': - base = 16; - break; - - case 'd': - case 'i': - flags |= SIGN; - case 'u': - break; - - default: - *str++ = '%'; - if (*fmt) - *str++ = *fmt; - else - --fmt; - continue; - } - if (qualifier == 'L') - num = va_arg(args, long long); - else if (qualifier == 'l') { - num = va_arg(args, unsigned long); - if (flags & SIGN) - num = (signed long) num; - } else if (qualifier == 'Z') { - num = va_arg(args, size_t); - } else if (qualifier == 'h') { - num = (unsigned short) va_arg(args, int); - if (flags & SIGN) - num = (signed short) num; - } else { - num = va_arg(args, unsigned int); - if (flags & SIGN) - num = (signed int) num; - } - str = number(str, num, base, field_width, precision, flags); - } - *str = '\0'; - return str-buf; -} - -int sprintf(char * buf, const char *fmt, ...) -{ - va_list args; - int i; - - va_start(args, fmt); - i=vsprintf(buf,fmt,args); - va_end(args); - return i; -} - -static char sprint_buf[1024]; - -void -printk(char *fmt, ...) -{ - va_list args; - int n; - - va_start(args, fmt); - n = vsprintf(sprint_buf, fmt, args); - va_end(args); - write(stdout, sprint_buf, n); -} - -int -printf(char *fmt, ...) -{ - va_list args; - int n; - - va_start(args, fmt); - n = vsprintf(sprint_buf, fmt, args); - va_end(args); - write(stdout, sprint_buf, n); - return n; -} From johnrose at austin.ibm.com Thu Apr 28 08:56:12 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 27 Apr 2005 17:56:12 -0500 Subject: [PATCH] enable CONFIG_RTAS_PROC by default In-Reply-To: <1113347022.16917.30.camel@sinatra.austin.ibm.com> References: <1113347022.16917.30.camel@sinatra.austin.ibm.com> Message-ID: <1114642572.6048.48.camel@sinatra.austin.ibm.com> Hi Paul- Is it possible to get this in for 2.6.12? This would save us some headaches, as librtas/DLPAR is pretty hamstrung if this isn't enabled. Thanks- John On Tue, 2005-04-12 at 18:03, John Rose wrote: > Hi- > > This patch enables CONFIG_RTAS_PROC by default on pSeries. This will > preserve /proc/ppc64/rtas/rmo_buffer, which is needed by librtas. > > Thanks- > John > > Signed-off-by: John Rose > > diff -puN arch/ppc64/Kconfig~fix_Kconfig arch/ppc64/Kconfig > --- 2_6_linus_3/arch/ppc64/Kconfig~fix_Kconfig 2005-04-12 18:03:45.000000000 -0500 > +++ 2_6_linus_3-johnrose/arch/ppc64/Kconfig 2005-04-12 18:03:56.000000000 -0500 > @@ -262,6 +262,7 @@ config PPC_RTAS > config RTAS_PROC > bool "Proc interface to RTAS" > depends on PPC_RTAS > + default y > > config RTAS_FLASH > tristate "Firmware flash interface" > > _ From benh at kernel.crashing.org Thu Apr 28 10:23:11 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 28 Apr 2005 10:23:11 +1000 Subject: [PATCH] ppc64: Fix return value of some vDSO calls Message-ID: <1114647791.7112.196.camel@gaston> Hi ! The ppc vDSO would not properly clear the return value for some calls, which will be a problem when interfacing those calls with glibc. This should be fixed before 2.6.12 is released (as it is the first kernel with the ppc vDSO) so that we don't have to play with symbol versioning and ugly workarounds. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/vdso32/cacheflush.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso32/cacheflush.S 2005-03-07 10:22:15.000000000 +1100 +++ linux-work/arch/ppc64/kernel/vdso32/cacheflush.S 2005-04-28 10:20:05.000000000 +1000 @@ -47,6 +47,7 @@ addi r6,r6,128 bdnz 1b isync + li r3,0 blr .cfi_endproc V_FUNCTION_END(__kernel_sync_dicache) @@ -59,6 +60,7 @@ .cfi_startproc sync isync + li r3,0 blr .cfi_endproc V_FUNCTION_END(__kernel_sync_dicache_p5) Index: linux-work/arch/ppc64/kernel/vdso64/cacheflush.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso64/cacheflush.S 2005-03-07 10:22:15.000000000 +1100 +++ linux-work/arch/ppc64/kernel/vdso64/cacheflush.S 2005-04-28 10:19:58.000000000 +1000 @@ -47,6 +47,7 @@ addi r6,r6,128 bdnz 1b isync + li r3,0 blr .cfi_endproc V_FUNCTION_END(__kernel_sync_dicache) @@ -59,6 +60,7 @@ .cfi_startproc sync isync + li r3,0 blr .cfi_endproc V_FUNCTION_END(__kernel_sync_dicache_p5) Index: linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S =================================================================== --- linux-work.orig/arch/ppc64/kernel/vdso32/gettimeofday.S 2005-03-10 13:43:01.000000000 +1100 +++ linux-work/arch/ppc64/kernel/vdso32/gettimeofday.S 2005-04-28 10:18:48.000000000 +1000 @@ -58,6 +58,7 @@ stw r5,TZONE_TZ_DSTTIME(r11) 1: mtlr r12 + li r3,0 blr 2: mr r3,r10 From benh at kernel.crashing.org Thu Apr 28 11:33:59 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 28 Apr 2005 11:33:59 +1000 Subject: [PATCH] ppc64: update to use the new 4L headers Message-ID: <1114652039.7112.213.camel@gaston> Hi ! This patch converts ppc64 to use the generic pgtable-nopud.h instead of the "fixup" header. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/include/asm-ppc64/pgalloc.h =================================================================== --- linux-work.orig/include/asm-ppc64/pgalloc.h 2005-04-28 10:58:13.000000000 +1000 +++ linux-work/include/asm-ppc64/pgalloc.h 2005-04-28 11:14:57.000000000 +1000 @@ -27,7 +27,7 @@ kmem_cache_free(zero_cache, pgd); } -#define pgd_populate(MM, PGD, PMD) pgd_set(PGD, PMD) +#define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD) static inline pmd_t * pmd_alloc_one(struct mm_struct *mm, unsigned long addr) Index: linux-work/include/asm-ppc64/pgtable.h =================================================================== --- linux-work.orig/include/asm-ppc64/pgtable.h 2005-04-28 10:58:13.000000000 +1000 +++ linux-work/include/asm-ppc64/pgtable.h 2005-04-28 11:14:57.000000000 +1000 @@ -1,8 +1,6 @@ #ifndef _PPC64_PGTABLE_H #define _PPC64_PGTABLE_H -#include - /* * This file contains the functions and defines necessary to modify and use * the ppc64 hashed page table. @@ -17,6 +15,8 @@ #include #endif /* __ASSEMBLY__ */ +#include + /* PMD_SHIFT determines what a second-level page table entry can map */ #define PMD_SHIFT (PAGE_SHIFT + PAGE_SHIFT - 3) #define PMD_SIZE (1UL << PMD_SHIFT) @@ -228,12 +228,13 @@ #define pmd_page_kernel(pmd) \ (__bpn_to_ba(pmd_val(pmd) >> PMD_TO_PTEPAGE_SHIFT)) #define pmd_page(pmd) virt_to_page(pmd_page_kernel(pmd)) -#define pgd_set(pgdp, pmdp) (pgd_val(*(pgdp)) = (__ba_to_bpn(pmdp))) -#define pgd_none(pgd) (!pgd_val(pgd)) -#define pgd_bad(pgd) ((pgd_val(pgd)) == 0) -#define pgd_present(pgd) (pgd_val(pgd) != 0UL) -#define pgd_clear(pgdp) (pgd_val(*(pgdp)) = 0UL) -#define pgd_page(pgd) (__bpn_to_ba(pgd_val(pgd))) + +#define pud_set(pudp, pmdp) (pud_val(*(pudp)) = (__ba_to_bpn(pmdp))) +#define pud_none(pud) (!pud_val(pud)) +#define pud_bad(pud) ((pud_val(pud)) == 0UL) +#define pud_present(pud) (pud_val(pud) != 0UL) +#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL) +#define pud_page(pud) (__bpn_to_ba(pud_val(pud))) /* * Find an entry in a page-table-directory. We combine the address region @@ -245,12 +246,13 @@ #define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address)) /* Find an entry in the second-level page table.. */ -#define pmd_offset(dir,addr) \ - ((pmd_t *) pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))) +#define pmd_offset(pudp,addr) \ + ((pmd_t *) pud_page(*(pudp)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))) /* Find an entry in the third-level page table.. */ #define pte_offset_kernel(dir,addr) \ - ((pte_t *) pmd_page_kernel(*(dir)) + (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))) + ((pte_t *) pmd_page_kernel(*(dir)) \ + + (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))) #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr)) #define pte_offset_map_nested(dir,addr) pte_offset_kernel((dir), (addr)) @@ -582,20 +584,23 @@ static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) { pgd_t *pg; + pud_t *pu; pmd_t *pm; pte_t *pt = NULL; pte_t pte; pg = pgdir + pgd_index(ea); if (!pgd_none(*pg)) { - - pm = pmd_offset(pg, ea); - if (pmd_present(*pm)) { - pt = pte_offset_kernel(pm, ea); - pte = *pt; - if (!pte_present(pte)) - pt = NULL; - } + pu = pud_offset(pg, ea); + if (!pud_none(*pu)) { + pm = pmd_offset(pu, ea); + if (pmd_present(*pm)) { + pt = pte_offset_kernel(pm, ea); + pte = *pt; + if (!pte_present(pte)) + pt = NULL; + } + } } return pt; Index: linux-work/arch/ppc64/mm/init.c =================================================================== --- linux-work.orig/arch/ppc64/mm/init.c 2005-04-28 10:58:10.000000000 +1000 +++ linux-work/arch/ppc64/mm/init.c 2005-04-28 11:14:57.000000000 +1000 @@ -136,14 +136,78 @@ #else +static void unmap_im_area_pte(pmd_t *pmd, unsigned long addr, + unsigned long end) +{ + pte_t *pte; + + pte = pte_offset_kernel(pmd, addr); + do { + pte_t ptent = ptep_get_and_clear(&ioremap_mm, addr, pte); + WARN_ON(!pte_none(ptent) && !pte_present(ptent)); + } while (pte++, addr += PAGE_SIZE, addr != end); +} + +static inline void unmap_im_area_pmd(pud_t *pud, unsigned long addr, + unsigned long end) +{ + pmd_t *pmd; + unsigned long next; + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (pmd_none_or_clear_bad(pmd)) + continue; + unmap_im_area_pte(pmd, addr, next); + } while (pmd++, addr = next, addr != end); +} + +static inline void unmap_im_area_pud(pgd_t *pgd, unsigned long addr, + unsigned long end) +{ + pud_t *pud; + unsigned long next; + + pud = pud_offset(pgd, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) + continue; + unmap_im_area_pmd(pud, addr, next); + } while (pud++, addr = next, addr != end); +} + +static void unmap_im_area(unsigned long addr, unsigned long end) +{ + struct mm_struct *mm = &ioremap_mm; + unsigned long next; + pgd_t *pgd; + + spin_lock(&mm->page_table_lock); + + pgd = pgd_offset_i(addr); + flush_cache_vunmap(addr, end); + do { + next = pgd_addr_end(addr, end); + if (pgd_none_or_clear_bad(pgd)) + continue; + unmap_im_area_pud(pgd, addr, next); + } while (pgd++, addr = next, addr != end); + flush_tlb_kernel_range(start, end); + + spin_unlock(&mm->page_table_lock); +} + /* * map_io_page currently only called by __ioremap * map_io_page adds an entry to the ioremap page table * and adds an entry to the HPT, possibly bolting it */ -static void map_io_page(unsigned long ea, unsigned long pa, int flags) +static int map_io_page(unsigned long ea, unsigned long pa, int flags) { pgd_t *pgdp; + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; unsigned long vsid; @@ -151,9 +215,15 @@ if (mem_init_done) { spin_lock(&ioremap_mm.page_table_lock); pgdp = pgd_offset_i(ea); - pmdp = pmd_alloc(&ioremap_mm, pgdp, ea); + pudp = pud_alloc(&ioremap_mm, pgdp, ea); + if (!pudp) + return -ENOMEM; + pmdp = pmd_alloc(&ioremap_mm, pudp, ea); + if (!pmdp) + return -ENOMEM; ptep = pte_alloc_kernel(&ioremap_mm, pmdp, ea); - + if (!ptep) + return -ENOMEM; pa = abs_to_phys(pa); set_pte_at(&ioremap_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, __pgprot(flags))); @@ -181,6 +251,7 @@ panic("map_io_page: could not insert mapping"); } } + return 0; } @@ -194,9 +265,14 @@ flags |= pgprot_val(PAGE_KERNEL); for (i = 0; i < size; i += PAGE_SIZE) - map_io_page(ea+i, pa+i, flags); + if (map_io_page(ea+i, pa+i, flags)) + goto failure; return (void __iomem *) (ea + (addr & ~PAGE_MASK)); + failure: + if (mem_init_done) + unmap_im_area(ea, ea + size); + return NULL; } @@ -206,10 +282,11 @@ return __ioremap(addr, size, _PAGE_NO_CACHE | _PAGE_GUARDED); } -void __iomem * -__ioremap(unsigned long addr, unsigned long size, unsigned long flags) +void __iomem * __ioremap(unsigned long addr, unsigned long size, + unsigned long flags) { unsigned long pa, ea; + void __iomem *ret; /* * Choose an address to map it to. @@ -232,12 +309,16 @@ if (area == NULL) return NULL; ea = (unsigned long)(area->addr); + ret = __ioremap_com(addr, pa, ea, size, flags); + if (!ret) + im_free(area->addr); } else { ea = ioremap_bot; - ioremap_bot += size; + ret = __ioremap_com(addr, pa, ea, size, flags); + if (ret) + ioremap_bot += size; } - - return __ioremap_com(addr, pa, ea, size, flags); + return ret; } #define IS_PAGE_ALIGNED(_val) ((_val) == ((_val) & PAGE_MASK)) @@ -246,6 +327,7 @@ unsigned long size, unsigned long flags) { struct vm_struct *area; + void __iomem *ret; /* For now, require page-aligned values for pa, ea, and size */ if (!IS_PAGE_ALIGNED(pa) || !IS_PAGE_ALIGNED(ea) || @@ -276,7 +358,12 @@ } } - if (__ioremap_com(pa, pa, ea, size, flags) != (void *) ea) { + ret = __ioremap_com(pa, pa, ea, size, flags); + if (ret == NULL) { + printk(KERN_ERR "ioremap_explicit() allocation failure !\n"); + return 1; + } + if (ret != (void *) ea) { printk(KERN_ERR "__ioremap_com() returned unexpected addr\n"); return 1; } @@ -284,69 +371,6 @@ return 0; } -static void unmap_im_area_pte(pmd_t *pmd, unsigned long address, - unsigned long size) -{ - unsigned long base, end; - pte_t *pte; - - if (pmd_none(*pmd)) - return; - if (pmd_bad(*pmd)) { - pmd_ERROR(*pmd); - pmd_clear(pmd); - return; - } - - pte = pte_offset_kernel(pmd, address); - base = address & PMD_MASK; - address &= ~PMD_MASK; - end = address + size; - if (end > PMD_SIZE) - end = PMD_SIZE; - - do { - pte_t page; - page = ptep_get_and_clear(&ioremap_mm, base + address, pte); - address += PAGE_SIZE; - pte++; - if (pte_none(page)) - continue; - if (pte_present(page)) - continue; - printk(KERN_CRIT "Whee.. Swapped out page in kernel page" - " table\n"); - } while (address < end); -} - -static void unmap_im_area_pmd(pgd_t *dir, unsigned long address, - unsigned long size) -{ - unsigned long base, end; - pmd_t *pmd; - - if (pgd_none(*dir)) - return; - if (pgd_bad(*dir)) { - pgd_ERROR(*dir); - pgd_clear(dir); - return; - } - - pmd = pmd_offset(dir, address); - base = address & PGDIR_MASK; - address &= ~PGDIR_MASK; - end = address + size; - if (end > PGDIR_SIZE) - end = PGDIR_SIZE; - - do { - unmap_im_area_pte(pmd, base + address, end - address); - address = (address + PMD_SIZE) & PMD_MASK; - pmd++; - } while (address < end); -} - /* * Unmap an IO region and remove it from imalloc'd list. * Access to IO memory should be serialized by driver. @@ -356,39 +380,19 @@ */ void iounmap(volatile void __iomem *token) { - unsigned long address, start, end, size; - struct mm_struct *mm; - pgd_t *dir; + unsigned long address, size; void *addr; - if (!mem_init_done) { + if (!mem_init_done) return; - } addr = (void *) ((unsigned long __force) token & PAGE_MASK); - if ((size = im_free(addr)) == 0) { + if ((size = im_free(addr)) == 0) return; - } address = (unsigned long)addr; - start = address; - end = address + size; - - mm = &ioremap_mm; - spin_lock(&mm->page_table_lock); - - dir = pgd_offset_i(address); - flush_cache_vunmap(address, end); - do { - unmap_im_area_pmd(dir, address, end - address); - address = (address + PGDIR_SIZE) & PGDIR_MASK; - dir++; - } while (address && (address < end)); - flush_tlb_kernel_range(start, end); - - spin_unlock(&mm->page_table_lock); - return; + unmap_im_area(address, address + size); } static int iounmap_subset_regions(unsigned long addr, unsigned long size) Index: linux-work/arch/ppc64/mm/hugetlbpage.c =================================================================== --- linux-work.orig/arch/ppc64/mm/hugetlbpage.c 2005-04-28 10:58:10.000000000 +1000 +++ linux-work/arch/ppc64/mm/hugetlbpage.c 2005-04-28 11:14:57.000000000 +1000 @@ -42,7 +42,7 @@ return (addr & ~REGION_MASK) >> HUGEPGDIR_SHIFT; } -static pgd_t *hugepgd_offset(struct mm_struct *mm, unsigned long addr) +static pud_t *hugepgd_offset(struct mm_struct *mm, unsigned long addr) { int index; @@ -52,21 +52,21 @@ index = hugepgd_index(addr); BUG_ON(index >= PTRS_PER_HUGEPGD); - return mm->context.huge_pgdir + index; + return (pud_t *)(mm->context.huge_pgdir + index); } -static inline pte_t *hugepte_offset(pgd_t *dir, unsigned long addr) +static inline pte_t *hugepte_offset(pud_t *dir, unsigned long addr) { int index; - if (pgd_none(*dir)) + if (pud_none(*dir)) return NULL; index = (addr >> HPAGE_SHIFT) % PTRS_PER_HUGEPTE; - return (pte_t *)pgd_page(*dir) + index; + return (pte_t *)pud_page(*dir) + index; } -static pgd_t *hugepgd_alloc(struct mm_struct *mm, unsigned long addr) +static pud_t *hugepgd_alloc(struct mm_struct *mm, unsigned long addr) { BUG_ON(! in_hugepage_area(mm->context, addr)); @@ -90,10 +90,9 @@ return hugepgd_offset(mm, addr); } -static pte_t *hugepte_alloc(struct mm_struct *mm, pgd_t *dir, - unsigned long addr) +static pte_t *hugepte_alloc(struct mm_struct *mm, pud_t *dir, unsigned long addr) { - if (! pgd_present(*dir)) { + if (! pud_present(*dir)) { pte_t *new; spin_unlock(&mm->page_table_lock); @@ -104,7 +103,7 @@ * Because we dropped the lock, we should re-check the * entry, as somebody else could have populated it.. */ - if (pgd_present(*dir)) { + if (pud_present(*dir)) { if (new) kmem_cache_free(zero_cache, new); } else { @@ -115,7 +114,7 @@ ptepage = virt_to_page(new); ptepage->mapping = (void *) mm; ptepage->index = addr & HUGEPGDIR_MASK; - pgd_populate(mm, dir, new); + pud_populate(mm, dir, new); } } @@ -124,28 +123,28 @@ static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { - pgd_t *pgd; + pud_t *pud; BUG_ON(! in_hugepage_area(mm->context, addr)); - pgd = hugepgd_offset(mm, addr); - if (! pgd) + pud = hugepgd_offset(mm, addr); + if (! pud) return NULL; - return hugepte_offset(pgd, addr); + return hugepte_offset(pud, addr); } static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { - pgd_t *pgd; + pud_t *pud; BUG_ON(! in_hugepage_area(mm->context, addr)); - pgd = hugepgd_alloc(mm, addr); - if (! pgd) + pud = hugepgd_alloc(mm, addr); + if (! pud) return NULL; - return hugepte_alloc(mm, pgd, addr); + return hugepte_alloc(mm, pud, addr); } static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, @@ -709,10 +708,10 @@ /* cleanup any hugepte pages leftover */ for (i = 0; i < PTRS_PER_HUGEPGD; i++) { - pgd_t *pgd = pgdir + i; + pud_t *pud = (pud_t *)(pgdir + i); - if (! pgd_none(*pgd)) { - pte_t *pte = (pte_t *)pgd_page(*pgd); + if (! pud_none(*pud)) { + pte_t *pte = (pte_t *)pud_page(*pud); struct page *ptepage = virt_to_page(pte); ptepage->mapping = NULL; @@ -720,7 +719,7 @@ BUG_ON(memcmp(pte, empty_zero_page, PAGE_SIZE)); kmem_cache_free(zero_cache, pte); } - pgd_clear(pgd); + pud_clear(pud); } BUG_ON(memcmp(pgdir, empty_zero_page, PAGE_SIZE)); From nickpiggin at yahoo.com.au Thu Apr 28 11:49:36 2005 From: nickpiggin at yahoo.com.au (Nick Piggin) Date: Thu, 28 Apr 2005 11:49:36 +1000 Subject: [PATCH] ppc64: update to use the new 4L headers In-Reply-To: <1114652039.7112.213.camel@gaston> References: <1114652039.7112.213.camel@gaston> Message-ID: <42704130.9050005@yahoo.com.au> Benjamin Herrenschmidt wrote: > Hi ! > > This patch converts ppc64 to use the generic pgtable-nopud.h instead of > the "fixup" header. > Great! Nice work Ben. > - > -static void unmap_im_area_pmd(pgd_t *dir, unsigned long address, > - unsigned long size) > -{ > - unsigned long base, end; > - pmd_t *pmd; > - > - if (pgd_none(*dir)) > - return; > - if (pgd_bad(*dir)) { > - pgd_ERROR(*dir); > - pgd_clear(dir); > - return; > - } > - > - pmd = pmd_offset(dir, address); > - base = address & PGDIR_MASK; > - address &= ~PGDIR_MASK; > - end = address + size; > - if (end > PGDIR_SIZE) > - end = PGDIR_SIZE; > - > - do { > - unmap_im_area_pte(pmd, base + address, end - address); > - address = (address + PMD_SIZE) & PMD_MASK; > - pmd++; > - } while (address < end); > -} Just a bit off-topic: I wonder how many more of these open coded pt walks exist in arch code (yes I see you've cleaned yours up - good). I guess they don't matter much? They will probably get converted when arches convert to the 4L headers. -- SUSE Labs, Novell Inc. From cfriesen at nortel.com Thu Apr 28 12:04:26 2005 From: cfriesen at nortel.com (Chris Friesen) Date: Wed, 27 Apr 2005 20:04:26 -0600 Subject: [PATCH] ppc64: update to use the new 4L headers In-Reply-To: <42704130.9050005@yahoo.com.au> References: <1114652039.7112.213.camel@gaston> <42704130.9050005@yahoo.com.au> Message-ID: <427044AA.5030402@nortel.com> Nick Piggin wrote: > Just a bit off-topic: I wonder how many more of these open > coded pt walks exist in arch code (yes I see you've cleaned > yours up - good). I know there's open coded walks outside the tree (I maintain one) due to there being no suitable function available from with in it... I needed something like: pte_t *va_to_ptep_map(struct mm_struct *mm, unsigned int addr) There was code in follow_page() that did basically what I needed, but it was all contained within that function so I had to re-implement it. Chris From nickpiggin at yahoo.com.au Thu Apr 28 12:15:10 2005 From: nickpiggin at yahoo.com.au (Nick Piggin) Date: Thu, 28 Apr 2005 12:15:10 +1000 Subject: [PATCH] ppc64: update to use the new 4L headers In-Reply-To: <427044AA.5030402@nortel.com> References: <1114652039.7112.213.camel@gaston> <42704130.9050005@yahoo.com.au> <427044AA.5030402@nortel.com> Message-ID: <4270472E.9050708@yahoo.com.au> Chris Friesen wrote: > Nick Piggin wrote: > >> Just a bit off-topic: I wonder how many more of these open >> coded pt walks exist in arch code (yes I see you've cleaned >> yours up - good). > > > I know there's open coded walks outside the tree (I maintain one) due to > there being no suitable function available from with in it... > Oh - I meant hand calculating the addresses rather than using the pmd_addr_end and friends... but: > I needed something like: > > pte_t *va_to_ptep_map(struct mm_struct *mm, unsigned int addr) > > There was code in follow_page() that did basically what I needed, but it > was all contained within that function so I had to re-implement it. > If you can break out exactly what you need, and make that inline or otherwise available via the correct header, I'm sure it would have a good chance of being merged. Keep in mind that you shouldn't introduce an inefficiency to follow_page, however if that is not possible you could simply duplicate what you need in a seperate function in mm/memory.c and use that - better to do it once there than a lot of times in random places. -- SUSE Labs, Novell Inc. From support at paypal.com Thu Apr 28 12:22:06 2005 From: support at paypal.com (support at paypal.com) Date: Thu, 28 Apr 2005 04:22:06 +0200 (CEST) Subject: Billing Issues Message-ID: <20050428022206.C3429F84@wmphpp07.st2.lyceu.net> An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050428/b3b99f44/attachment.htm From arnd at arndb.de Thu Apr 28 17:59:26 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 28 Apr 2005 09:59:26 +0200 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <200504190318.32556.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> Message-ID: <200504280813.j3S8DNLc019256@post.webmailer.de> Implementation of software load support for the BE iommu. This is very different from other iommu code on ppc64, since we only do a static mapping. The mapping is currently hardcoded but should really be read from the firmware, but they don't set up the device nodes yet. Software load is ok as long as the DMA windows all fit below 4GB. Currently, there is a single 512MB DMA window for PCI, USB and ethernet at 0x20000000 for our RAM. No machine we have has more than 512 MB anyway, but we might want to change the firmware to use separate DMA windows for each device. Signed-off-by: Arnd Bergmann Index: linus-2.5/arch/ppc64/kernel/Makefile =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 07:01:07.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-22 07:02:10.000000000 +0200 @@ -34,7 +34,8 @@ pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ pSeries_setup.o pSeries_iommu.o -obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o bpa_iic.o spider-pic.o +obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_iommu.o bpa_nvram.o \ + bpa_iic.o spider-pic.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iommu.c 2005-04-22 07:01:39.000000000 +0200 @@ -0,0 +1,433 @@ +/* + * IOMMU implementation for Broadband Processor Architecture + * We just establish a linear mapping at boot by setting all the + * IOPT cache entries in the CPU. + * The mapping functions should be identical to pci_direct_iommu, + * except for the handling of the high order bit that is required + * by the Spider bridge. These should be split into a separate + * file at the point where we get a different bridge chip. + * + * Copyright (C) 2005 IBM Deutschland Entwicklung GmbH, + * Arnd Bergmann + * + * Based on linear mapping + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pci.h" +#include "bpa_iommu.h" + +#endif + +/* some constants */ +enum { + /* segment table entries */ + IOST_VALID_MASK = 0x8000000000000000ul, + IOST_TAG_MASK = 0x3000000000000000ul, + IOST_PT_BASE_MASK = 0x000003fffffff000ul, + IOST_NNPT_MASK = 0x0000000000000fe0ul, + IOST_PS_MASK = 0x000000000000000ful, + + IOST_PS_4K = 0x1, + IOST_PS_64K = 0x3, + IOST_PS_1M = 0x5, + IOST_PS_16M = 0x7, + + /* iopt tag register */ + IOPT_VALID_MASK = 0x0000000200000000ul, + IOPT_TAG_MASK = 0x00000001fffffffful, + + /* iopt cache register */ + IOPT_PROT_MASK = 0xc000000000000000ul, + IOPT_PROT_NONE = 0x0000000000000000ul, + IOPT_PROT_READ = 0x4000000000000000ul, + IOPT_PROT_WRITE = 0x8000000000000000ul, + IOPT_PROT_RW = 0xc000000000000000ul, + IOPT_COHERENT = 0x2000000000000000ul, + + IOPT_ORDER_MASK = 0x1800000000000000ul, + /* order access to same IOID/VC on same address */ + IOPT_ORDER_ADDR = 0x0800000000000000ul, + /* similar, but only after a write access */ + IOPT_ORDER_WRITES = 0x1000000000000000ul, + /* Order all accesses to same IOID/VC */ + IOPT_ORDER_VC = 0x1800000000000000ul, + + IOPT_RPN_MASK = 0x000003fffffff000ul, + IOPT_HINT_MASK = 0x0000000000000800ul, + IOPT_IOID_MASK = 0x00000000000007fful, + + IOSTO_ENABLE = 0x8000000000000000ul, + IOSTO_ORIGIN = 0x000003fffffff000ul, + IOSTO_HW = 0x0000000000000800ul, + IOSTO_SW = 0x0000000000000400ul, + + IOCMD_CONF_TE = 0x0000800000000000ul, + + /* memory mapped registers */ + IOC_PT_CACHE_DIR = 0x000, + IOC_ST_CACHE_DIR = 0x800, + IOC_PT_CACHE_REG = 0x910, + IOC_ST_ORIGIN = 0x918, + IOC_CONF = 0x930, + + /* The high bit needs to be set on every DMA address, + only 2GB are addressable */ + BPA_DMA_VALID = 0x80000000, + BPA_DMA_MASK = 0x7fffffff, +}; + +static inline unsigned long +get_iopt_entry(unsigned long real_address, unsigned long ioid, + unsigned long prot) +{ + return (prot & IOPT_PROT_MASK) + | (IOPT_COHERENT) + | (IOPT_ORDER_VC) + | (real_address & IOPT_RPN_MASK) + | (ioid & IOPT_IOID_MASK); +} + +typedef struct { + unsigned long val; +} ioste; + +/* cause link error for invalid use */ +extern unsigned long __ioc_invalid_page_size; + +static inline ioste +get_iost_entry(unsigned long iopt_base, unsigned long io_address, unsigned page_size) +{ + unsigned long ps; + unsigned long iostep; + unsigned long nnpt; + unsigned long shift; + + switch (page_size) { + case 0x1000000: + ps = IOST_PS_16M; + nnpt = 0; /* one page per segment */ + shift = 5; /* segment has 16 iopt entries */ + break; + + case 0x100000: + ps = IOST_PS_1M; + nnpt = 0; /* one page per segment */ + shift = 1; /* segment has 256 iopt entries */ + break; + + case 0x10000: + ps = IOST_PS_64K; + nnpt = 0x07; /* 8 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + case 0x1000: + ps = IOST_PS_4K; + nnpt = 0x7f; /* 128 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + default: /* not a known compile time constant */ + ps = __ioc_invalid_page_size; + break; + } + + iostep = iopt_base + + /* need 8 bytes per iopte */ + (((io_address / page_size * 8) + /* align io page tables on 4k page boundaries */ + << shift) + /* nnpt+1 pages go into each iopt */ + & ~(nnpt << 12)); + + nnpt++; /* XXX is this right? */ + return (ioste) { + .val = IOST_VALID_MASK + | (iostep & IOST_PT_BASE_MASK) + | ((nnpt << 5) & IOST_NNPT_MASK) + | (ps & IOST_PS_MASK) + }; +} + +/* compute the address of an io pte */ +static inline unsigned long +get_ioptep(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopt_base; + unsigned long ps; + unsigned long iopt_offset; + + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; + ps = iost_entry.val & IOST_PS_MASK; + + iopt_offset = ((io_address & 0x0fffffff) >> (7 + 2 * ps)) & 0x7fff8ul; + return iopt_base + iopt_offset; +} + +/* compute the tag field of the iopt cache entry */ +static inline unsigned long +get_ioc_tag(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return IOPT_VALID_MASK + | ((iopte & 0x00000000000000ff8ul) >> 3) + | ((iopte & 0x0000003fffffc0000ul) >> 9); +} + +/* compute the hashed 6 bit index for the 4-way associative pte cache */ +static inline unsigned long +get_ioc_hash(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7); +} + +/* same as above, but pretend that we have a simpler 1-way associative + pte cache with an 8 bit index */ +static inline unsigned long +get_ioc_hash_1way(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7) + ^ ((iopte & 0x0000000000000c000ul) >> 8); +} + +static inline ioste +get_iost_cache(void __iomem *base, unsigned long index) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + return (ioste) { in_be64(&p[index]) }; +} + +static inline void +set_iost_cache(void __iomem *base, unsigned long index, ioste ste) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + pr_debug("ioste %02lx was %016lx, store %016lx", index, + get_iost_cache(base, index).val, ste.val); + out_be64(&p[index], ste.val); + pr_debug(" now %016lx\n", get_iost_cache(base, index).val); +} + +static inline unsigned long +get_iopt_cache(void __iomem *base, unsigned long index, unsigned long *tag) +{ + unsigned long __iomem *tags = (void *)(base + IOC_PT_CACHE_DIR); + unsigned long __iomem *p = (void *)(base + IOC_PT_CACHE_REG); + + *tag = tags[index]; + rmb(); + return *p; +} + +static inline void +set_iopt_cache(void __iomem *base, unsigned long index, + unsigned long tag, unsigned long val) +{ + unsigned long __iomem *tags = base + IOC_PT_CACHE_DIR; + unsigned long __iomem *p = base + IOC_PT_CACHE_REG; + pr_debug("iopt %02lx was v%016lx/t%016lx, store v%016lx/t%016lx\n", + index, get_iopt_cache(base, index, &oldtag), oldtag, val, tag); + + out_be64(p, val); + out_be64(&tags[index], tag); +} + +static inline void +set_iost_origin(void __iomem *base) +{ + unsigned long __iomem *p = base + IOC_ST_ORIGIN; + unsigned long origin = IOSTO_ENABLE | IOSTO_SW; + + pr_debug("iost_origin %016lx, now %016lx\n", in_be64(p), origin); + out_be64(p, origin); +} + +static inline void +set_iocmd_config(void __iomem *base) +{ + unsigned long __iomem *p = base + 0xc00; + unsigned long conf; + + conf = in_be64(p); + pr_debug("iost_conf %016lx, now %016lx\n", conf, conf | IOCMD_CONF_TE); + out_be64(p, conf | IOCMD_CONF_TE); +} + +#ifdef __KERNEL__ + +/* FIXME: get these from the device tree */ +#define ioc_base 0x20000511000ull +#define ioc_mmio_base 0x20000510000ull +#define ioid 0x48a +#define iopt_phys_offset (- 0x20000000) /* We have a 512MB offset from the SB */ +#define io_page_size 0x1000000 + +static unsigned long map_iopt_entry(unsigned long address) +{ + switch (address >> 20) { + case 0x600: + address = 0x24020000000ull; /* spider i/o */ + break; +#if 0 + case 0xa00: + address = 0x2400000000ull; /* iic */ + break; +#endif + default: + address += iopt_phys_offset; + break; + } + + return get_iopt_entry(address, ioid, IOPT_PROT_RW); +} + +static void iommu_bus_setup_null(struct pci_bus *b) { } +static void iommu_dev_setup_null(struct pci_dev *d) { } + +/* initialize the iommu to support a simple linear mapping */ +static void bpa_map_iommu(void) +{ + unsigned long address; + void __iomem *base; + ioste ioste; + unsigned long index; + + base = __ioremap(ioc_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_base, base); + set_iocmd_config(base); + iounmap(base); + + base = __ioremap(ioc_mmio_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_mmio_base, base); + + set_iost_origin(base); + + for (address = 0; address < 0x100000000ul; address += io_page_size) { + ioste = get_iost_entry(0x10000000000ul, address, io_page_size); + if ((address & 0xfffffff) == 0) /* segment start */ + set_iost_cache(base, address >> 28, ioste); + index = get_ioc_hash_1way(ioste, address); + pr_debug("addr %08lx, index %02lx, ioste %016lx\n", + address, index, ioste.val); + set_iopt_cache(base, + get_ioc_hash_1way(ioste, address), + get_ioc_tag(ioste, address), + map_iopt_entry(address)); + } + iounmap(base); +} + + +static void *bpa_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, unsigned int __nocast flag) +{ + void *ret; + + ret = (void *)__get_free_pages(flag, get_order(size)); + if (ret != NULL) { + memset(ret, 0, size); + *dma_handle = virt_to_abs(ret) | BPA_DMA_VALID; + } + return ret; +} + +static void bpa_free_coherent(struct device *hwdev, size_t size, + void *vaddr, dma_addr_t dma_handle) +{ + free_pages((unsigned long)vaddr, get_order(size)); +} + +static dma_addr_t bpa_map_single(struct device *hwdev, void *ptr, + size_t size, enum dma_data_direction direction) +{ + return virt_to_abs(ptr) | BPA_DMA_VALID; +} + +static void bpa_unmap_single(struct device *hwdev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ +} + +static int bpa_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++, sg++) { + sg->dma_address = (page_to_phys(sg->page) + sg->offset) + | BPA_DMA_VALID; + sg->dma_length = sg->length; + } + + return nents; +} + +static void bpa_unmap_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ +} + +static int bpa_dma_supported(struct device *dev, u64 mask) +{ + return mask < 0x100000000ull; +} + +void bpa_init_iommu(void) +{ + bpa_map_iommu(); + + /* Direct I/O, IOMMU off */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + + pci_dma_ops.alloc_coherent = bpa_alloc_coherent; + pci_dma_ops.free_coherent = bpa_free_coherent; + pci_dma_ops.map_single = bpa_map_single; + pci_dma_ops.unmap_single = bpa_unmap_single; + pci_dma_ops.map_sg = bpa_map_sg; + pci_dma_ops.unmap_sg = bpa_unmap_sg; + pci_dma_ops.dma_supported = bpa_dma_supported; +} Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iommu.h 2005-04-22 07:01:39.000000000 +0200 @@ -0,0 +1,6 @@ +#ifndef BPA_IOMMU_H +#define BPA_IOMMU_H + +void bpa_init_iommu(void); + +#endif Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:58.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-22 07:01:39.000000000 +0200 @@ -46,6 +46,7 @@ #include "pci.h" #include "bpa_iic.h" +#include "bpa_iommu.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -179,7 +180,7 @@ hpte_init_native(); - pci_direct_iommu_init(); + bpa_init_iommu(); ppc64_interrupt_controller = IC_BPA_IIC; From arnd at arndb.de Thu Apr 28 17:58:18 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 28 Apr 2005 09:58:18 +0200 Subject: [PATCH 2/4] ppc64: Add driver for BPA interrupt controllers In-Reply-To: <200504190318.32556.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> Message-ID: <200504280813.j3S8DNLb019256@post.webmailer.de> Add support for the integrated interrupt controller on BPA CPUs. There is one of those for each SMT thread. The mapping of interrupt numbers to HW interrupt sources is described in arch/ppc64/kernel/bpa_iic.h. This version hardcodes the 'Spider' chip as the secondary interrupt controller. That is not really generic for the architecture, but at the moment it is the only secondary PIC that exists. A little more work will be needed on this as soon as we have boards with multiple external interrupt controllers. Signed-off-by: Arnd Bergmann Index: linus-2.5/arch/ppc64/Kconfig =================================================================== --- linus-2.5.orig/arch/ppc64/Kconfig 2005-04-22 06:59:52.000000000 +0200 +++ linus-2.5/arch/ppc64/Kconfig 2005-04-22 06:59:58.000000000 +0200 @@ -106,6 +106,21 @@ bool default y +config XICS + depends on PPC_PSERIES + bool + default y + +config MPIC + depends on PPC_PSERIES || PPC_PMAC || PPC_MAPLE + bool + default y + +config BPA_IIC + depends on PPC_BPA + bool + default y + # VMX is pSeries only for now until somebody writes the iSeries # exception vectors for it config ALTIVEC Index: linus-2.5/arch/ppc64/kernel/Makefile =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 06:59:52.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-22 07:01:07.000000000 +0200 @@ -28,13 +28,13 @@ mf.o HvLpEvent.o iSeries_proc.o iSeries_htab.o \ iSeries_iommu.o -obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o +obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ - xics.o pSeries_setup.o pSeries_iommu.o + pSeries_setup.o pSeries_iommu.o -obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o +obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o bpa_iic.o spider-pic.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o @@ -50,6 +50,8 @@ obj-$(CONFIG_BOOTX_TEXT) += btext.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_IBMVIO) += vio.o +obj-$(CONFIG_XICS) += xics.o +obj-$(CONFIG_MPIC) += mpic.o obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \ pmac_time.o pmac_nvram.o pmac_low_i2c.o Index: linus-2.5/arch/ppc64/kernel/bpa_iic.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iic.c 2005-04-22 06:59:58.000000000 +0200 @@ -0,0 +1,270 @@ +/* + * BPA Internal Interrupt Controller + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "bpa_iic.h" + +struct iic_pending_bits { + u32 data; + u8 flags; + u8 class; + u8 source; + u8 prio; +}; + +enum iic_pending_flags { + IIC_VALID = 0x80, + IIC_IPI = 0x40, +}; + +struct iic_regs { + struct iic_pending_bits pending; + struct iic_pending_bits pending_destr; + u64 generate; + u64 prio; +}; + +struct iic { + struct iic_regs __iomem *regs; +}; + +static DEFINE_PER_CPU(struct iic, iic); + +void iic_local_enable(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0xff); +} + +void iic_local_disable(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0x0); +} + +static unsigned int iic_startup(unsigned int irq) +{ + return 0; +} + +static void iic_enable(unsigned int irq) +{ + iic_local_enable(); +} + +static void iic_disable(unsigned int irq) +{ +} + +static void iic_end(unsigned int irq) +{ + iic_local_enable(); +} + +static struct hw_interrupt_type iic_pic = { + .typename = " BPA-IIC ", + .startup = iic_startup, + .enable = iic_enable, + .disable = iic_disable, + .end = iic_end, +}; + +static int iic_external_get_irq(struct iic_pending_bits pending) +{ + int irq; + unsigned char node, unit; + + node = pending.source >> 4; + unit = pending.source & 0xf; + irq = -1; + + /* + * This mapping is specific to the Broadband + * Engine. We might need to get the numbers + * from the device tree to support future CPUs. + */ + switch (unit) { + case 0x00: + case 0x0b: + /* + * One of these units can be connected + * to an external interrupt controller. + */ + if (pending.prio > 0x3f || + pending.class != 2) + break; + irq = IIC_EXT_OFFSET + + spider_get_irq(pending.prio + node * IIC_NODE_STRIDE) + + node * IIC_NODE_STRIDE; + break; + case 0x01 ... 0x04: + case 0x07 ... 0x0a: + /* + * These units are connected to the SPEs + */ + if (pending.class > 2) + break; + irq = IIC_SPE_OFFSET + + pending.class * IIC_CLASS_STRIDE + + node * IIC_NODE_STRIDE + + unit; + break; + } + if (irq == -1) + printk(KERN_WARNING "Unexpected interrupt class %02x, " + "source %02x, prio %02x, cpu %02x\n", pending.class, + pending.source, pending.prio, smp_processor_id()); + return irq; +} + +/* Get an IRQ number from the pending state register of the IIC */ +int iic_get_irq(struct pt_regs *regs) +{ + struct iic *iic; + int irq; + struct iic_pending_bits pending; + + iic = &__get_cpu_var(iic); + *(unsigned long *) &pending = + in_be64((unsigned long __iomem *) &iic->regs->pending_destr); + + irq = -1; + if (pending.flags & IIC_VALID) { + if (pending.flags & IIC_IPI) { + irq = IIC_IPI_OFFSET + (pending.prio >> 4); +/* + if (irq > 0x80) + printk(KERN_WARNING "Unexpected IPI prio %02x" + "on CPU %02x\n", pending.prio, + smp_processor_id()); +*/ + } else { + irq = iic_external_get_irq(pending); + } + } + return irq; +} + +static struct iic_regs __iomem *find_iic(int cpu) +{ + struct device_node *np; + int nodeid = cpu / 2; + unsigned long regs; + struct iic_regs __iomem *iic_regs; + + for (np = of_find_node_by_type(NULL, "cpu"); + np; + np = of_find_node_by_type(np, "cpu")) { + if (nodeid == *(int *)get_property(np, "node-id", NULL)) + break; + } + + if (!np) { + printk(KERN_WARNING "IIC: CPU %d not found\n", cpu); + iic_regs = NULL; + } else { + regs = *(long *)get_property(np, "iic", NULL); + + /* hack until we have decided on the devtree info */ + regs += 0x400; + if (cpu & 1) + regs += 0x20; + + printk(KERN_DEBUG "IIC for CPU %d at %lx\n", cpu, regs); + iic_regs = __ioremap(regs, sizeof(struct iic_regs), + _PAGE_NO_CACHE); + } + return iic_regs; +} + +#ifdef CONFIG_SMP +void iic_setup_cpu(void) +{ + out_be64(&__get_cpu_var(iic).regs->prio, 0xff); +} + +void iic_cause_IPI(int cpu, int mesg) +{ + out_be64(&per_cpu(iic, cpu).regs->generate, mesg); +} + +static irqreturn_t iic_ipi_action(int irq, void *dev_id, struct pt_regs *regs) +{ + + smp_message_recv(irq - IIC_IPI_OFFSET, regs); + return IRQ_HANDLED; +} + +static void iic_request_ipi(int irq, const char *name) +{ + /* IPIs are marked SA_INTERRUPT as they must run with irqs + * disabled */ + get_irq_desc(irq)->handler = &iic_pic; + get_irq_desc(irq)->status |= IRQ_PER_CPU; + request_irq(irq, iic_ipi_action, SA_INTERRUPT, name, NULL); +} + +void iic_request_IPIs(void) +{ + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_CALL_FUNCTION, "IPI-call"); + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_RESCHEDULE, "IPI-resched"); +#ifdef CONFIG_DEBUGGER + iic_request_ipi(IIC_IPI_OFFSET + PPC_MSG_DEBUGGER_BREAK, "IPI-debug"); +#endif /* CONFIG_DEBUGGER */ +} +#endif /* CONFIG_SMP */ + +static void iic_setup_spe_handlers(void) +{ + int be, isrc; + + /* Assume two threads per BE are present */ + for (be=0; be < num_present_cpus() / 2; be++) { + for (isrc = 0; isrc < IIC_CLASS_STRIDE * 3; isrc++) { + int irq = IIC_NODE_STRIDE * be + IIC_SPE_OFFSET + isrc; + get_irq_desc(irq)->handler = &iic_pic; + } + } +} + +void iic_init_IRQ(void) +{ + int cpu, irq_offset; + struct iic *iic; + + irq_offset = 0; + for_each_cpu(cpu) { + iic = &per_cpu(iic, cpu); + iic->regs = find_iic(cpu); + if (iic->regs) + out_be64(&iic->regs->prio, 0xff); + } + iic_setup_spe_handlers(); +} Index: linus-2.5/arch/ppc64/kernel/bpa_iic.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iic.h 2005-04-22 06:59:58.000000000 +0200 @@ -0,0 +1,62 @@ +#ifndef ASM_BPA_IIC_H +#define ASM_BPA_IIC_H +#ifdef __KERNEL__ +/* + * Mapping of IIC pending bits into per-node + * interrupt numbers. + * + * IRQ FF CC SS PP FF CC SS PP Description + * + * 00-3f 80 02 +0 00 - 80 02 +0 3f South Bridge + * 00-3f 80 02 +b 00 - 80 02 +b 3f South Bridge + * 41-4a 80 00 +1 ** - 80 00 +a ** SPU Class 0 + * 51-5a 80 01 +1 ** - 80 01 +a ** SPU Class 1 + * 61-6a 80 02 +1 ** - 80 02 +a ** SPU Class 2 + * 70-7f C0 ** ** 00 - C0 ** ** 0f IPI + * + * F flags + * C class + * S source + * P Priority + * + node number + * * don't care + * + * A node consists of a Broadband Engine and an optional + * south bridge device providing a maximum of 64 IRQs. + * The south bridge may be connected to either IOIF0 + * or IOIF1. + * Each SPE is represented as three IRQ lines, one per + * interrupt class. + * 16 IRQ numbers are reserved for inter processor + * interruptions, although these are only used in the + * range of the first node. + * + * This scheme needs 128 IRQ numbers per BIF node ID, + * which means that with the total of 512 lines + * available, we can have a maximum of four nodes. + */ + +enum { + IIC_EXT_OFFSET = 0x00, /* Start of south bridge IRQs */ + IIC_NUM_EXT = 0x40, /* Number of south bridge IRQs */ + IIC_SPE_OFFSET = 0x40, /* Start of SPE interrupts */ + IIC_CLASS_STRIDE = 0x10, /* SPE IRQs per class */ + IIC_IPI_OFFSET = 0x70, /* Start of IPI IRQs */ + IIC_NUM_IPIS = 0x10, /* IRQs reserved for IPI */ + IIC_NODE_STRIDE = 0x80, /* Total IRQs per node */ +}; + +extern void iic_init_IRQ(void); +extern int iic_get_irq(struct pt_regs *regs); +extern void iic_cause_IPI(int cpu, int mesg); +extern void iic_request_IPIs(void); +extern void iic_setup_cpu(void); +extern void iic_local_enable(void); +extern void iic_local_disable(void); + + +extern void spider_init_IRQ(void); +extern int spider_get_irq(unsigned long int_pending); + +#endif +#endif /* ASM_BPA_IIC_H */ Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:52.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:58.000000000 +0200 @@ -45,6 +45,7 @@ #include #include "pci.h" +#include "bpa_iic.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -143,6 +144,9 @@ static void __init bpa_setup_arch(void) { + ppc_md.init_IRQ = iic_init_IRQ; + ppc_md.get_irq = iic_get_irq; + #ifdef CONFIG_SMP smp_init_pSeries(); #endif @@ -158,7 +162,7 @@ /* Find and initialize PCI host bridges */ init_pci_config_tokens(); find_and_init_phbs(); - + spider_init_IRQ(); #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; #endif Index: linus-2.5/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/pSeries_smp.c 2005-04-22 06:58:22.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/pSeries_smp.c 2005-04-22 06:59:58.000000000 +0200 @@ -1,5 +1,5 @@ /* - * SMP support for pSeries machines. + * SMP support for pSeries and BPA machines. * * Dave Engebretsen, Peter Bergner, and * Mike Corrigan {engebret|bergner|mikec}@us.ibm.com @@ -47,6 +47,7 @@ #include #include "mpic.h" +#include "bpa_iic.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -286,6 +287,7 @@ return 1; } +#ifdef CONFIG_XICS static inline void smp_xics_do_message(int cpu, int msg) { set_bit(msg, &xics_ipi_message[cpu].value); @@ -334,6 +336,37 @@ rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, (1UL << interrupt_server_size) - 1 - default_distrib_server, 1); } +#endif /* CONFIG_XICS */ +#ifdef CONFIG_BPA_IIC +static void smp_iic_message_pass(int target, int msg) +{ + unsigned int i; + + if (target < NR_CPUS) { + iic_cause_IPI(target, msg); + } else { + for_each_online_cpu(i) { + if (target == MSG_ALL_BUT_SELF + && i == smp_processor_id()) + continue; + iic_cause_IPI(i, msg); + } + } +} + +static int __init smp_iic_probe(void) +{ + iic_request_IPIs(); + + return cpus_weight(cpu_possible_map); +} + +static void __devinit smp_iic_setup_cpu(int cpu) +{ + if (cpu != boot_cpuid) + iic_setup_cpu(); +} +#endif /* CONFIG_BPA_IIC */ static DEFINE_SPINLOCK(timebase_lock); static unsigned long timebase = 0; @@ -388,14 +421,15 @@ return 1; } - +#ifdef CONFIG_MPIC static struct smp_ops_t pSeries_mpic_smp_ops = { .message_pass = smp_mpic_message_pass, .probe = smp_mpic_probe, .kick_cpu = smp_pSeries_kick_cpu, .setup_cpu = smp_mpic_setup_cpu, }; - +#endif +#ifdef CONFIG_XICS static struct smp_ops_t pSeries_xics_smp_ops = { .message_pass = smp_xics_message_pass, .probe = smp_xics_probe, @@ -403,6 +437,16 @@ .setup_cpu = smp_xics_setup_cpu, .cpu_bootable = smp_pSeries_cpu_bootable, }; +#endif +#ifdef CONFIG_BPA_IIC +static struct smp_ops_t bpa_iic_smp_ops = { + .message_pass = smp_iic_message_pass, + .probe = smp_iic_probe, + .kick_cpu = smp_pSeries_kick_cpu, + .setup_cpu = smp_iic_setup_cpu, + .cpu_bootable = smp_pSeries_cpu_bootable, +}; +#endif /* This is called very early */ void __init smp_init_pSeries(void) @@ -411,10 +455,25 @@ DBG(" -> smp_init_pSeries()\n"); - if (ppc64_interrupt_controller == IC_OPEN_PIC) + switch (ppc64_interrupt_controller) { +#ifdef CONFIG_MPIC + case IC_OPEN_PIC: smp_ops = &pSeries_mpic_smp_ops; - else + break; +#endif +#ifdef CONFIG_XICS + case IC_PPC_XIC: smp_ops = &pSeries_xics_smp_ops; + break; +#endif +#ifdef CONFIG_BPA_IIC + case IC_BPA_IIC: + smp_ops = &bpa_iic_smp_ops; + break; +#endif + default: + panic("Invalid interrupt controller"); + } #ifdef CONFIG_HOTPLUG_CPU smp_ops->cpu_disable = pSeries_cpu_disable; Index: linus-2.5/arch/ppc64/kernel/smp.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/smp.c 2005-04-22 06:58:22.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/smp.c 2005-04-22 06:59:58.000000000 +0200 @@ -71,7 +71,7 @@ int smt_enabled_at_boot = 1; -#ifdef CONFIG_PPC_MULTIPLATFORM +#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_PMAC) || defined(CONFIG_PPC_MAPLE) void smp_mpic_message_pass(int target, int msg) { /* make sure we're sending something that translates to an IPI */ Index: linus-2.5/arch/ppc64/kernel/spider-pic.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/spider-pic.c 2005-04-22 06:59:58.000000000 +0200 @@ -0,0 +1,191 @@ +/* + * External Interrupt Controller on Spider South Bridge + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include + +#include +#include +#include + +#include "bpa_iic.h" + +/* register layout taken from Spider spec, table 7.4-4 */ +enum { + TIR_DEN = 0x004, /* Detection Enable Register */ + TIR_MSK = 0x084, /* Mask Level Register */ + TIR_EDC = 0x0c0, /* Edge Detection Clear Register */ + TIR_PNDA = 0x100, /* Pending Register A */ + TIR_PNDB = 0x104, /* Pending Register B */ + TIR_CS = 0x144, /* Current Status Register */ + TIR_LCSA = 0x150, /* Level Current Status Register A */ + TIR_LCSB = 0x154, /* Level Current Status Register B */ + TIR_LCSC = 0x158, /* Level Current Status Register C */ + TIR_LCSD = 0x15c, /* Level Current Status Register D */ + TIR_CFGA = 0x200, /* Setting Register A0 */ + TIR_CFGB = 0x204, /* Setting Register B0 */ + /* 0x208 ... 0x3ff Setting Register An/Bn */ + TIR_PPNDA = 0x400, /* Packet Pending Register A */ + TIR_PPNDB = 0x404, /* Packet Pending Register B */ + TIR_PIERA = 0x408, /* Packet Output Error Register A */ + TIR_PIERB = 0x40c, /* Packet Output Error Register B */ + TIR_PIEN = 0x444, /* Packet Output Enable Register */ + TIR_PIPND = 0x454, /* Packet Output Pending Register */ + TIRDID = 0x484, /* Spider Device ID Register */ + REISTIM = 0x500, /* Reissue Command Timeout Time Setting */ + REISTIMEN = 0x504, /* Reissue Command Timeout Setting */ + REISWAITEN = 0x508, /* Reissue Wait Control*/ +}; + +static void __iomem *spider_pics[4]; + +static void __iomem *spider_get_pic(int irq) +{ + int node = irq / IIC_NODE_STRIDE; + irq %= IIC_NODE_STRIDE; + + if (irq >= IIC_EXT_OFFSET && + irq < IIC_EXT_OFFSET + IIC_NUM_EXT && + spider_pics) + return spider_pics[node]; + return NULL; +} + +static int spider_get_nr(unsigned int irq) +{ + return (irq % IIC_NODE_STRIDE) - IIC_EXT_OFFSET; +} + +static void __iomem *spider_get_irq_config(int irq) +{ + void __iomem *pic; + pic = spider_get_pic(irq); + return pic + TIR_CFGA + 8 * spider_get_nr(irq); +} + +static void spider_enable_irq(unsigned int irq) +{ + void __iomem *cfg = spider_get_irq_config(irq); + irq = spider_get_nr(irq); + + out_be32(cfg, in_be32(cfg) | 0x3107000eu); + out_be32(cfg + 4, in_be32(cfg + 4) | 0x00020000u | irq); +} + +static void spider_disable_irq(unsigned int irq) +{ + void __iomem *cfg = spider_get_irq_config(irq); + irq = spider_get_nr(irq); + + out_be32(cfg, in_be32(cfg) & ~0x30000000u); +} + +static unsigned int spider_startup_irq(unsigned int irq) +{ + spider_enable_irq(irq); + return 0; +} + +static void spider_shutdown_irq(unsigned int irq) +{ + spider_disable_irq(irq); +} + +static void spider_end_irq(unsigned int irq) +{ + spider_enable_irq(irq); +} + +static void spider_ack_irq(unsigned int irq) +{ + spider_disable_irq(irq); + iic_local_enable(); +} + +static struct hw_interrupt_type spider_pic = { + .typename = " SPIDER ", + .startup = spider_startup_irq, + .shutdown = spider_shutdown_irq, + .enable = spider_enable_irq, + .disable = spider_disable_irq, + .ack = spider_ack_irq, + .end = spider_end_irq, +}; + + +int spider_get_irq(unsigned long int_pending) +{ + void __iomem *regs = spider_get_pic(int_pending); + unsigned long cs; + int irq; + + cs = in_be32(regs + TIR_CS); + + irq = cs >> 24; + if (irq != 63) + return irq; + + return -1; +} + +void spider_init_IRQ(void) +{ + int node; + struct device_node *dn; + unsigned int *property; + long spiderpic; + int n; + +/* FIXME: detect multiple PICs as soon as the device tree has them */ + for (node = 0; node < 1; node++) { + dn = of_find_node_by_path("/"); + n = prom_n_addr_cells(dn); + property = (unsigned int *) get_property(dn, + "platform-spider-pic", NULL); + + if (!property) + continue; + for (spiderpic = 0; n > 0; --n) + spiderpic = (spiderpic << 32) + *property++; + printk(KERN_DEBUG "SPIDER addr: %lx\n", spiderpic); + spider_pics[node] = __ioremap(spiderpic, 0x800, _PAGE_NO_CACHE); + for (n = 0; n < IIC_NUM_EXT; n++) { + int irq = n + IIC_EXT_OFFSET + node * IIC_NODE_STRIDE; + get_irq_desc(irq)->handler = &spider_pic; + + /* do not mask any interrupts because of level */ + out_be32(spider_pics[node] + TIR_MSK, 0x0); + + /* disable edge detection clear */ + /* out_be32(spider_pics[node] + TIR_EDC, 0x0); */ + + /* enable interrupt packets to be output */ + out_be32(spider_pics[node] + TIR_PIEN, + in_be32(spider_pics[node] + TIR_PIEN) | 0x1); + + /* Enable the interrupt detection enable bit. Do this last! */ + out_be32(spider_pics[node] + TIR_DEN, + in_be32(spider_pics[node] +TIR_DEN) | 0x1); + + } + } +} From arnd at arndb.de Thu Apr 28 17:56:59 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 28 Apr 2005 09:56:59 +0200 Subject: [PATCH 1/4] ppc64: add BPA platform type In-Reply-To: <200504190318.32556.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> Message-ID: <200504280813.j3S8DNLa019256@post.webmailer.de> This adds the basic support for running on BPA machines. It should be possible to configure a kernel for any combination of CONFIG_PPC_BPA with any of the other multiplatform targets. Signed-off-by: Arnd Bergmann Index: linus-2.5/MAINTAINERS =================================================================== --- linus-2.5.orig/MAINTAINERS 2005-04-22 06:57:27.000000000 +0200 +++ linus-2.5/MAINTAINERS 2005-04-22 06:57:59.000000000 +0200 @@ -493,6 +493,13 @@ W: http://sourceforge.net/projects/bonding/ S: Supported +BROADBAND PROCESSOR ARCHITECTURE +P: Arnd Bergmann +M: arnd at arndb.de +L: linuxppc64-dev at ozlabs.org +W: http://linuxppc64.org +S: Supported + BTTV VIDEO4LINUX DRIVER P: Gerd Knorr M: kraxel at bytesex.org Index: linus-2.5/arch/ppc64/Kconfig =================================================================== --- linus-2.5.orig/arch/ppc64/Kconfig 2005-04-22 06:57:28.000000000 +0200 +++ linus-2.5/arch/ppc64/Kconfig 2005-04-22 06:58:22.000000000 +0200 @@ -73,6 +73,10 @@ bool " IBM pSeries & new iSeries" default y +config PPC_BPA + bool " Broadband Processor Architecture" + depends on PPC_MULTIPLATFORM + config PPC_PMAC depends on PPC_MULTIPLATFORM bool " Apple G5 based machines" @@ -252,7 +256,7 @@ config PPC_RTAS bool - depends on PPC_PSERIES + depends on PPC_PSERIES || PPC_BPA default y config RTAS_PROC Index: linus-2.5/arch/ppc64/Makefile =================================================================== --- linus-2.5.orig/arch/ppc64/Makefile 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/Makefile 2005-04-22 06:57:59.000000000 +0200 @@ -83,12 +83,14 @@ boottarget-$(CONFIG_PPC_PSERIES) := zImage zImage.initrd boottarget-$(CONFIG_PPC_MAPLE) := zImage zImage.initrd boottarget-$(CONFIG_PPC_ISERIES) := vmlinux.sminitrd vmlinux.initrd vmlinux.sm +boottarget-$(CONFIG_PPC_BPA) := zImage zImage.initrd $(boottarget-y): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ bootimage-$(CONFIG_PPC_PSERIES) := $(boot)/zImage bootimage-$(CONFIG_PPC_PMAC) := vmlinux bootimage-$(CONFIG_PPC_MAPLE) := $(boot)/zImage +bootimage-$(CONFIG_PPC_BPA) := zImage bootimage-$(CONFIG_PPC_ISERIES) := vmlinux BOOTIMAGE := $(bootimage-y) install: vmlinux Index: linus-2.5/arch/ppc64/kernel/Makefile =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 06:57:31.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-22 06:59:10.000000000 +0200 @@ -34,6 +34,8 @@ pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ xics.o pSeries_setup.o pSeries_iommu.o +obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o + obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o obj-$(CONFIG_RTAS_FLASH) += rtas_flash.o @@ -60,6 +62,7 @@ obj-$(CONFIG_PPC_PMAC) += pmac_smp.o smp-tbsync.o obj-$(CONFIG_PPC_ISERIES) += iSeries_smp.o obj-$(CONFIG_PPC_PSERIES) += pSeries_smp.o +obj-$(CONFIG_PPC_BPA) += pSeries_smp.o obj-$(CONFIG_PPC_MAPLE) += smp-tbsync.o endif Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:58:22.000000000 +0200 @@ -0,0 +1,207 @@ +/* + * linux/arch/ppc/kernel/bpa_setup.c + * + * Copyright (C) 1995 Linus Torvalds + * Adapted from 'alpha' version by Gary Thomas + * Modified by Cort Dougan (cort at cs.nmt.edu) + * Modified by PPC64 Team, IBM Corp + * Modified by BPA Team, IBM Deutschland Entwicklung GmbH + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pci.h" + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +extern void pSeries_get_boot_time(struct rtc_time *rtc_time); +extern void pSeries_get_rtc_time(struct rtc_time *rtc_time); +extern int pSeries_set_rtc_time(struct rtc_time *rtc_time); + +extern unsigned long ppc_proc_freq; +extern unsigned long ppc_tb_freq; + +void bpa_get_cpuinfo(struct seq_file *m) +{ + struct device_node *root; + const char *model = ""; + + root = of_find_node_by_path("/"); + if (root) + model = get_property(root, "model", NULL); + seq_printf(m, "machine\t\t: BPA %s\n", model); + of_node_put(root); +} + +static void __init bpa_progress(char *s, unsigned short hex) +{ + printk("*** %04x : %s\n", hex, s ? s : ""); +} + +extern void setup_default_decr(void); + +/* Some sane defaults: 125 MHz timebase, 1GHz processor */ +#define DEFAULT_TB_FREQ 125000000UL +#define DEFAULT_PROC_FREQ (DEFAULT_TB_FREQ * 8) + +/* FIXME: consolidate this into rtas.c or similar */ +static void __init pSeries_calibrate_decr(void) +{ + struct device_node *cpu; + struct div_result divres; + unsigned int *fp; + int node_found; + + /* + * The cpu node should have a timebase-frequency property + * to tell us the rate at which the decrementer counts. + */ + cpu = of_find_node_by_type(NULL, "cpu"); + + ppc_tb_freq = DEFAULT_TB_FREQ; /* hardcoded default */ + node_found = 0; + if (cpu != 0) { + fp = (unsigned int *)get_property(cpu, "timebase-frequency", + NULL); + if (fp != 0) { + node_found = 1; + ppc_tb_freq = *fp; + } + } + if (!node_found) + printk(KERN_ERR "WARNING: Estimating decrementer frequency " + "(not found)\n"); + + ppc_proc_freq = DEFAULT_PROC_FREQ; + node_found = 0; + if (cpu != 0) { + fp = (unsigned int *)get_property(cpu, "clock-frequency", + NULL); + if (fp != 0) { + node_found = 1; + ppc_proc_freq = *fp; + } + } + if (!node_found) + printk(KERN_ERR "WARNING: Estimating processor frequency " + "(not found)\n"); + + of_node_put(cpu); + + printk(KERN_INFO "time_init: decrementer frequency = %lu.%.6lu MHz\n", + ppc_tb_freq/1000000, ppc_tb_freq%1000000); + printk(KERN_INFO "time_init: processor frequency = %lu.%.6lu MHz\n", + ppc_proc_freq/1000000, ppc_proc_freq%1000000); + + tb_ticks_per_jiffy = ppc_tb_freq / HZ; + tb_ticks_per_sec = tb_ticks_per_jiffy * HZ; + tb_ticks_per_usec = ppc_tb_freq / 1000000; + tb_to_us = mulhwu_scale_factor(ppc_tb_freq, 1000000); + div128_by_32(1024*1024, 0, tb_ticks_per_sec, &divres); + tb_to_xs = divres.result_low; + + setup_default_decr(); +} + +static void __init bpa_setup_arch(void) +{ +#ifdef CONFIG_SMP + smp_init_pSeries(); +#endif + + /* init to some ~sane value until calibrate_delay() runs */ + loops_per_jiffy = 50000000; + + if (ROOT_DEV == 0) { + printk("No ramdisk, default root is /dev/hda2\n"); + ROOT_DEV = Root_HDA2; + } + + /* Find and initialize PCI host bridges */ + init_pci_config_tokens(); + find_and_init_phbs(); + +#ifdef CONFIG_DUMMY_CONSOLE + conswitchp = &dummy_con; +#endif + + // bpa_nvram_init(); +} + +/* + * Early initialization. Relocation is on but do not reference unbolted pages + */ +static void __init bpa_init_early(void) +{ + DBG(" -> bpa_init_early()\n"); + + hpte_init_native(); + + pci_direct_iommu_init(); + + ppc64_interrupt_controller = IC_BPA_IIC; + + DBG(" <- bpa_init_early()\n"); +} + + +static int __init bpa_probe(int platform) +{ + if (platform != PLATFORM_BPA) + return 0; + + return 1; +} + +struct machdep_calls __initdata bpa_md = { + .probe = bpa_probe, + .setup_arch = bpa_setup_arch, + .init_early = bpa_init_early, + .get_cpuinfo = bpa_get_cpuinfo, + .restart = rtas_restart, + .power_off = rtas_power_off, + .halt = rtas_halt, + .get_boot_time = pSeries_get_boot_time, + .get_rtc_time = pSeries_get_rtc_time, + .set_rtc_time = pSeries_set_rtc_time, + .calibrate_decr = pSeries_calibrate_decr, + .progress = bpa_progress, +}; Index: linus-2.5/arch/ppc64/kernel/cpu_setup_power4.S =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/cpu_setup_power4.S 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/cpu_setup_power4.S 2005-04-22 06:57:59.000000000 +0200 @@ -73,7 +73,21 @@ _GLOBAL(__setup_cpu_power4) blr - + +_GLOBAL(__setup_cpu_be) + /* Set large page sizes LP=0: 16MB, LP=1: 64KB */ + addi r3, 0, 0 + ori r3, r3, HID6_LB + sldi r3, r3, 32 + nor r3, r3, r3 + mfspr r4, SPRN_HID6 + and r4, r4, r3 + addi r3, 0, 0x02000 + sldi r3, r3, 32 + or r4, r4, r3 + mtspr SPRN_HID6, r4 + blr + _GLOBAL(__setup_cpu_ppc970) mfspr r0,SPRN_HID0 li r11,5 /* clear DOZE and SLEEP */ Index: linus-2.5/arch/ppc64/kernel/cputable.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/cputable.c 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/cputable.c 2005-04-22 06:57:59.000000000 +0200 @@ -34,6 +34,7 @@ extern void __setup_cpu_power3(unsigned long offset, struct cpu_spec* spec); extern void __setup_cpu_power4(unsigned long offset, struct cpu_spec* spec); extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec); +extern void __setup_cpu_be(unsigned long offset, struct cpu_spec* spec); /* We only set the altivec features if the kernel was compiled with altivec @@ -162,6 +163,16 @@ __setup_cpu_power4, COMMON_PPC64_FW }, + { /* BE DD1.x */ + 0xffff0000, 0x00700000, "Broadband Engine", + CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_ALTIVEC_COMP | + CPU_FTR_SMT, + COMMON_USER_PPC64 | PPC_FEATURE_HAS_ALTIVEC_COMP, + 128, 128, + __setup_cpu_be, + COMMON_PPC64_FW + }, { /* default match */ 0x00000000, 0x00000000, "POWER4 (compatible)", CPU_FTR_SPLIT_ID_CACHE | CPU_FTR_USE_TB | CPU_FTR_HPTE_TABLE | Index: linus-2.5/arch/ppc64/kernel/irq.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/irq.c 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/irq.c 2005-04-22 06:57:59.000000000 +0200 @@ -395,6 +395,9 @@ if (ppc64_interrupt_controller == IC_OPEN_PIC) return real_irq; /* no mapping for openpic (for now) */ + if (ppc64_interrupt_controller == IC_BPA_IIC) + return real_irq; /* no mapping for iic either */ + /* don't map interrupts < MIN_VIRT_IRQ */ if (real_irq < MIN_VIRT_IRQ) { virt_irq_to_real_map[real_irq] = real_irq; Index: linus-2.5/arch/ppc64/kernel/proc_ppc64.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/proc_ppc64.c 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/proc_ppc64.c 2005-04-22 06:57:59.000000000 +0200 @@ -53,7 +53,7 @@ if (!root) return 1; - if (!(systemcfg->platform & PLATFORM_PSERIES)) + if (!(systemcfg->platform & (PLATFORM_PSERIES | PLATFORM_BPA))) return 0; if (!proc_mkdir("rtas", root)) Index: linus-2.5/arch/ppc64/kernel/prom_init.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/prom_init.c 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/prom_init.c 2005-04-22 06:58:20.000000000 +0200 @@ -1731,9 +1731,9 @@ &getprop_rval, sizeof(getprop_rval)); /* - * On pSeries, copy the CPU hold code + * On pSeries and BPA, copy the CPU hold code */ - if (RELOC(of_platform) & PLATFORM_PSERIES) + if (RELOC(of_platform) & (PLATFORM_PSERIES | PLATFORM_BPA)) copy_and_flush(0, KERNELBASE - offset, 0x100, 0); /* Index: linus-2.5/arch/ppc64/kernel/setup.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/setup.c 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/setup.c 2005-04-22 06:57:59.000000000 +0200 @@ -348,6 +348,7 @@ extern struct machdep_calls pSeries_md; extern struct machdep_calls pmac_md; extern struct machdep_calls maple_md; +extern struct machdep_calls bpa_md; /* Ultimately, stuff them in an elf section like initcalls... */ static struct machdep_calls __initdata *machines[] = { @@ -360,6 +361,9 @@ #ifdef CONFIG_PPC_MAPLE &maple_md, #endif /* CONFIG_PPC_MAPLE */ +#ifdef CONFIG_PPC_BPA + &bpa_md, +#endif NULL }; Index: linus-2.5/arch/ppc64/kernel/traps.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/traps.c 2005-04-22 06:57:28.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/traps.c 2005-04-22 06:57:59.000000000 +0200 @@ -126,6 +126,10 @@ printk("POWERMAC "); nl = 1; break; + case PLATFORM_BPA: + printk("BPA "); + nl = 1; + break; } if (nl) printk("\n"); Index: linus-2.5/include/asm-ppc64/mmu.h =================================================================== --- linus-2.5.orig/include/asm-ppc64/mmu.h 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/include/asm-ppc64/mmu.h 2005-04-22 06:57:59.000000000 +0200 @@ -196,6 +196,8 @@ #define SLB_VSID_N 0x0000000000000200 /* no-execute */ #define SLB_VSID_L 0x0000000000000100 /* largepage (4M) */ #define SLB_VSID_C 0x0000000000000080 /* class */ +#define SLB_VSID_LS 0x0000000000000070 /* LS if 1, then second + large page size */ #define SLB_VSID_KERNEL (SLB_VSID_KP|SLB_VSID_C) #define SLB_VSID_USER (SLB_VSID_KP|SLB_VSID_KS) Index: linus-2.5/include/asm-ppc64/processor.h =================================================================== --- linus-2.5.orig/include/asm-ppc64/processor.h 2005-04-22 06:57:29.000000000 +0200 +++ linus-2.5/include/asm-ppc64/processor.h 2005-04-22 06:57:59.000000000 +0200 @@ -217,14 +217,22 @@ #define HID0_ABE (1<<3) /* Address Broadcast Enable */ #define HID0_BHTE (1<<2) /* Branch History Table Enable */ #define HID0_BTCD (1<<1) /* Branch target cache disable */ +#define SPRN_HID6 0x3F9 /* Hardware Implementation Register 6 */ +#define HID6_LB (0x0F<<12) /* Concurrent Large Page Modes */ +#define HID6_DLP (1<<20) /* Disable all large page modes (4K only) */ #define SPRN_MSRDORM 0x3F1 /* Hardware Implementation Register 1 */ #define SPRN_HID1 0x3F1 /* Hardware Implementation Register 1 */ #define SPRN_IABR 0x3F2 /* Instruction Address Breakpoint Register */ #define SPRN_NIADORM 0x3F3 /* Hardware Implementation Register 2 */ #define SPRN_HID4 0x3F4 /* 970 HID4 */ #define SPRN_HID5 0x3F6 /* 970 HID5 */ -#define SPRN_TSC 0x3FD /* Thread switch control */ -#define SPRN_TST 0x3FC /* Thread switch timeout */ +#define SPRN_TSCR 0x399 /* Thread switch control on BE */ +#define SPRN_TTR 0x39A /* Thread switch timeout on BE */ +#define TSCR_DEC_ENABLE 0x200000 /* Decrementer Interrupt */ +#define TSCR_EE_ENABLE 0x100000 /* External Interrupt */ +#define TSCR_EE_BOOST 0x080000 /* External Interrupt Boost */ +#define SPRN_TSC 0x3FD /* Thread switch control on others */ +#define SPRN_TST 0x3FC /* Thread switch timeout on others */ #define SPRN_IAC1 0x3F4 /* Instruction Address Compare 1 */ #define SPRN_IAC2 0x3F5 /* Instruction Address Compare 2 */ #define SPRN_ICCR 0x3FB /* Instruction Cache Cacheability Register */ @@ -411,8 +419,9 @@ #define PV_POWER5 0x003A #define PV_POWER5p 0x003B #define PV_970FX 0x003C -#define PV_630 0x0040 -#define PV_630p 0x0041 +#define PV_630 0x0040 +#define PV_630p 0x0041 +#define PV_BE 0x0070 /* Platforms supported by PPC64 */ #define PLATFORM_PSERIES 0x0100 @@ -421,6 +430,7 @@ #define PLATFORM_LPAR 0x0001 #define PLATFORM_POWERMAC 0x0400 #define PLATFORM_MAPLE 0x0500 +#define PLATFORM_BPA 0x1000 /* Compatibility with drivers coming from PPC32 world */ #define _machine (systemcfg->platform) @@ -432,6 +442,7 @@ #define IC_INVALID 0 #define IC_OPEN_PIC 1 #define IC_PPC_XIC 2 +#define IC_BPA_IIC 3 #define XGLUE(a,b) a##b #define GLUE(a,b) XGLUE(a,b) Index: linus-2.5/include/asm-ppc64/smp.h =================================================================== --- linus-2.5.orig/include/asm-ppc64/smp.h 2005-04-22 06:52:24.000000000 +0200 +++ linus-2.5/include/asm-ppc64/smp.h 2005-04-22 06:58:20.000000000 +0200 @@ -85,6 +85,14 @@ extern struct smp_ops_t *smp_ops; +#ifdef CONFIG_PPC_PSERIES +void vpa_init(int cpu); +#else +static inline void vpa_init(int cpu) +{ +} +#endif /* CONFIG_PPC_PSERIES */ + #endif /* __ASSEMBLY__ */ #endif /* !(_PPC64_SMP_H) */ From arnd at arndb.de Thu Apr 28 17:54:00 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 28 Apr 2005 09:54:00 +0200 Subject: [PATCH 0/4] ppc64: Introduce BPA platform Message-ID: <200504190318.32556.arnd@arndb.de> This series of patches add support for a fifth platform type in the ppc64 architecture tree. The Broadband Processor Architecture (BPA) is currently used in a single machine from IBM, with others likely to be added at a later point. I already sent preparation patches before, these need to be applied on top of them. The first three patches add the actual platform code, which should be usable for any BPA compatible implementation. The final patch introduces a new file system to make use of the SPUs inside the processors. This patch is still in a prototype stage and not intended for merging yet. Arnd <>< From arnd at arndb.de Thu Apr 28 18:00:30 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 28 Apr 2005 10:00:30 +0200 Subject: [PATCH 4/4] ppc64: Add SPU file system In-Reply-To: <200504190318.32556.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> Message-ID: <200504280813.j3S8DNLd019256@post.webmailer.de> This is an early version of the SPU file system, which is used to run code on the Synergistic Processing Units of the Broadband Engine. The file system provides a name space similar to posix shared memory or message queues. Users that have write permissions on the file system can create directories in the spufs root. Every directory represents an SPU context, which is currently mapped to a physical SPU, but that is going to change to a virtualization scheme in future updates. An SPU context directory contains a predefined set of files used for manipulating the state of the logical SPU. Users can change permissions on those files, but not actually add or remove files without removing the complete directory. The current set of files is: /mem the contents of the local store memory of the SPU. This can be accessed like a regular shared memory file and contains both code and data in the address space of the SPU. The implemented file operations currently are read(), write() and mmap(). We will need our own address space operations as soon as we allow the SPU context to be scheduled away from the physical SPU into page cache. /run A stub file that lets us do ioctl. The only ioctl method we need is the spu_run() call. spu_run suspends the current thread from the host CPU and transfers the flow of execution to the SPU. The ioctl call return to the calling thread when a state is entered that can not be handled by the kernel, e.g. an error in the SPU code or an exit() from it. When a signal is pending for the host CPU thread, the ioctl is interrupted and the SPU stopped in order to call the signal handler. /mbox The first SPU to CPU communication mailbox. This file is read-only and can be read in units of 32 bits. The file can only be used in non-blocking mode and it even poll() will not block on it. When no data is available in the mailbox, read() returns EAGAIN. /ibox The second SPU to CPU communication mailbox. This file is similar to the first mailbox file, but can be read in blocking I/O mode, and the poll familiy of system calls can be used to wait for it. /wbox The CPU to SPU communation mailbox. It is write-only can can be written in units of 32 bits. If the mailbox is full, write() will block and poll can be used to wait for it becoming empty again. Other files are planned but currently are not implemented or not functional. Signed-off-by: Arnd Bergmann --- linux-2.6-ppc.orig/arch/ppc64/kernel/Makefile 2005-04-01 12:52:36.202930304 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/Makefile 2005-04-01 12:53:32.093954472 -0500 @@ -53,6 +53,7 @@ obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_XICS) += xics.o obj-$(CONFIG_MPIC) += mpic.o +obj-$(CONFIG_SPU_FS) += spu_base.o obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \ pmac_time.o pmac_nvram.o pmac_low_i2c.o --- linux-2.6-ppc.orig/arch/ppc64/kernel/spu_base.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/arch/ppc64/kernel/spu_base.c 2005-04-01 12:53:32.095954168 -0500 @@ -0,0 +1,573 @@ +/* + * Low-level SPU handling + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#define DEBUG 1 + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "bpa_iic.h" + +static int __spu_trap_invalid_dma(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + force_sig(SIGBUS, /* info, */ spu->task); + return 0; +} + +static int __spu_trap_dma_align(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + force_sig(SIGBUS, /* info, */ spu->task); + return 0; +} + +static int __spu_trap_error(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + force_sig(SIGILL, /* info, */ spu->task); + return 0; +} + +static int __spu_trap_data_seg(struct spu *spu, unsigned long ea) +{ + struct spu_priv2 __iomem *priv2; + struct mm_struct *mm; + + pr_debug("%s\n", __FUNCTION__); + + if (REGION_ID(ea) != USER_REGION_ID) { + printk("invalid region access at %016lx\n", ea); + return 1; + } + + priv2 = spu->priv2; + mm = spu->mm; + + if (spu->slb_replace >= 8) + spu->slb_replace = 0; + + out_be64(&priv2->slb_index_W, spu->slb_replace); + out_be64(&priv2->slb_vsid_RW, + (get_vsid(mm->context.id, ea) << SLB_VSID_SHIFT) + | SLB_VSID_USER); + out_be64(&priv2->slb_esid_RW, (ea & ESID_MASK) | SLB_ESID_V); + out_be64(&priv2->mfc_control_RW, MFC_CNTL_RESTART_DMA_COMMAND); + + printk("set slb %d context %lx, ea %016lx, vsid %016lx, esid %016lx\n", + spu->slb_replace, mm->context.id, ea, + (get_vsid(mm->context.id, ea) << SLB_VSID_SHIFT)| SLB_VSID_USER, + (ea & ESID_MASK) | SLB_ESID_V); + return 0; +} + +static int __spu_trap_data_map(struct spu *spu, unsigned long ea) +{ + unsigned long dsisr; + struct spu_priv1 __iomem *priv1; + + pr_debug("%s\n", __FUNCTION__); + priv1 = spu->priv1; + dsisr = in_be64(&priv1->mfc_dsisr_RW); + + if (dsisr & MFC_DSISR_PTE_NOT_FOUND) { + printk("pte lookup ea %016lx, dsisr %lx\n", ea, dsisr); + wake_up(&spu->stop_wq); + } else { + printk("unexpexted data fault ea %016lx, dsisr %lx\n", ea, dsisr); + } + + return 0; +} + +static int __spu_trap_mailbox(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + wake_up(&spu->mbox_wq); + return 0; +} + +static int __spu_trap_stop(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + spu->stop_code = in_be32(&spu->problem->spu_status_R); + wake_up(&spu->stop_wq); + return 0; +} + +static int __spu_trap_halt(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + spu->stop_code = in_be32(&spu->problem->spu_status_R); + wake_up(&spu->stop_wq); + return 0; +} + +static int __spu_trap_tag_group(struct spu *spu) +{ + pr_debug("%s\n", __FUNCTION__); + /* wake_up(&spu->dma_wq); */ + return 0; +} + +static irqreturn_t +spu_irq_class_0(int irq, void *data, struct pt_regs *regs) +{ + struct spu *spu; + unsigned long stat; + + spu = data; + stat = in_be64(&spu->priv1->int_stat_class0_RW); + + if (stat & 1) /* invalid MFC DMA */ + __spu_trap_invalid_dma(spu); + + if (stat & 2) /* invalid DMA alignment */ + __spu_trap_dma_align(spu); + + if (stat & 4) /* error on SPU */ + __spu_trap_error(spu); + + out_be64(&spu->priv1->int_stat_class0_RW, stat); + return stat ? IRQ_HANDLED : IRQ_NONE; +} + +static irqreturn_t +spu_irq_class_1(int irq, void *data, struct pt_regs *regs) +{ + struct spu *spu; + unsigned long stat, dar; + + spu = data; + stat = in_be64(&spu->priv1->int_stat_class1_RW); + dar = in_be64(&spu->priv1->mfc_dar_RW); + + if (stat & 1) /* segment fault */ + __spu_trap_data_seg(spu, dar); + + if (stat & 2) { /* mapping fault */ + __spu_trap_data_map(spu, dar); + } + + if (stat & 4) /* ls compare & suspend on get */ + ; + + if (stat & 8) /* ls compare & suspend on put */ + ; + + out_be64(&spu->priv1->int_stat_class1_RW, stat); + return stat ? IRQ_HANDLED : IRQ_NONE; +} + +static irqreturn_t +spu_irq_class_2(int irq, void *data, struct pt_regs *regs) +{ + struct spu *spu; + unsigned long stat; + + spu = data; + stat = in_be64(&spu->priv1->int_stat_class2_RW); + + if (stat & 1) /* mailbox */ + __spu_trap_mailbox(spu); + + if (stat & 2) /* SPU stop-and-signal */ + __spu_trap_stop(spu); + + if (stat & 4) /* SPU halted */ + __spu_trap_halt(spu); + + if (stat & 8) /* DMA tag group complete */ + __spu_trap_tag_group(spu); + + out_be64(&spu->priv1->int_stat_class2_RW, stat); + return stat ? IRQ_HANDLED : IRQ_NONE; +} + +static int +spu_request_irqs(struct spu *spu) +{ + int ret; + int irq_base; + + irq_base = IIC_NODE_STRIDE * spu->node + IIC_SPE_OFFSET; + + ret = request_irq(irq_base + spu->isrc, + spu_irq_class_0, 0, "spe_class0", spu); + if (ret) + goto out; + out_be64(&spu->priv1->int_mask_class0_RW, 0x7); + + ret = request_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc, + spu_irq_class_1, 0, "spe_class1", spu); + if (ret) + goto out1; + out_be64(&spu->priv1->int_mask_class1_RW, 0x3); + + ret = request_irq(irq_base + 2*IIC_CLASS_STRIDE + spu->isrc, + spu_irq_class_2, 0, "spe_class2", spu); + if (ret) + goto out2; + out_be64(&spu->priv1->int_mask_class2_RW, 0xf); + goto out; + +out2: + free_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc, spu); +out1: + free_irq(irq_base + spu->isrc, spu); +out: + return ret; +} + +static void +spu_free_irqs(struct spu *spu) +{ + int irq_base; + + irq_base = IIC_NODE_STRIDE * spu->node + IIC_SPE_OFFSET; + + free_irq(irq_base + spu->isrc, spu); + free_irq(irq_base + IIC_CLASS_STRIDE + spu->isrc, spu); + free_irq(irq_base + 2*IIC_CLASS_STRIDE + spu->isrc, spu); +} + +static LIST_HEAD(spu_list); +static DECLARE_MUTEX(spu_mutex); + +struct spu *spu_alloc(void) +{ + struct spu *spu; + + down(&spu_mutex); + if (!list_empty(&spu_list)) { + spu = list_entry(spu_list.next, struct spu, list); + list_del_init(&spu->list); + printk("Got SPU %x\n", spu->isrc); + } else { + printk("No SPU left\n"); + spu = NULL; + } + up(&spu_mutex); + return spu; +} +EXPORT_SYMBOL(spu_alloc); + +void spu_free(struct spu *spu) +{ + down(&spu_mutex); + list_add_tail(&spu->list, &spu_list); + up(&spu_mutex); +} +EXPORT_SYMBOL(spu_free); + +extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap); //XXX +static int spu_handle_pte_fault(struct spu *spu) +{ + struct spu_problem __iomem *prob; + struct spu_priv1 __iomem *priv1; + struct spu_priv2 __iomem *priv2; + unsigned long ea, access, is_write; + struct mm_struct *mm; + struct vm_area_struct *vma; + int ret; + + printk("%s\n", __FUNCTION__); + prob = spu->problem; + priv1 = spu->priv1; + priv2 = spu->priv2; + + ea = in_be64(&priv1->mfc_dar_RW); + access = _PAGE_PRESENT | _PAGE_USER; + is_write = in_be64(&priv1->mfc_dsisr_RW) & 0x02000000; + mm = spu->mm; + + ret = hash_page(ea, access, 0x300); + if (ret < 0) { + printk("error in hash_page!\n"); + ret = -EFAULT; + goto out_err; + } + + printk("current %ld, spu %ld, ea %ld\n", current->mm->context.id, mm->context.id, ea); + if (!ret) { + printk("hash inserted, vsid %lx\n", get_vsid(current->mm->context.id, ea)); + goto out_restart; + } + + ret = -EFAULT; + if (ea >= TASK_SIZE) + goto out_err; + + down_read(&mm->mmap_sem); + vma = find_vma(mm, ea); + if (!vma) + goto out; + + if (is_write) { + if (!(vma->vm_flags & VM_WRITE)) + goto out; + } + + ret = 0; +/* FIXME add missing code from do_page_fault */ + switch (handle_mm_fault(mm, vma, ea, is_write)) { + case VM_FAULT_MINOR: + printk("minor\n"); + current->min_flt++; + break; + case VM_FAULT_MAJOR: + printk("major\n"); + current->maj_flt++; + break; + case VM_FAULT_SIGBUS: + ret = -EFAULT; + break; + case VM_FAULT_OOM: + ret = -ENOMEM; + break; + default: + BUG(); + } +out: + up_read(&mm->mmap_sem); + if (ret) + goto out_err; +out_restart: + out_be64(&priv2->mfc_control_RW, MFC_CNTL_RESTART_DMA_COMMAND); +out_err: + printk("%s: returning %d\n", __FUNCTION__, ret); + return ret; +} + +int spu_run(struct spu *spu) +{ + struct spu_problem __iomem *prob; + struct spu_priv1 __iomem *priv1; + struct spu_priv2 __iomem *priv2; + unsigned long status; + int count = 10; + int ret; + + prob = spu->problem; + priv1 = spu->priv1; + priv2 = spu->priv2; + spu->mm = current->mm; + spu->task = current; + out_be32(&prob->spu_runcntl_RW, SPU_RUNCNTL_RUNNABLE); + + do { + ret = wait_event_interruptible(spu->stop_wq, + (!((status = in_be32(&prob->spu_status_R)) & 0x1)) + || (in_be64(&priv1->mfc_dsisr_RW) & MFC_DSISR_PTE_NOT_FOUND)); + + if (status & SPU_STATUS_STOPPED_BY_STOP) + ret = -EAGAIN; + else if (status & SPU_STATUS_STOPPED_BY_HALT) + ret = -EIO; + else if (in_be64(&priv1->mfc_dsisr_RW) & MFC_DSISR_PTE_NOT_FOUND) + ret = spu_handle_pte_fault(spu); + + } while (!ret && count--); + out_be32(&prob->spu_runcntl_RW, SPU_RUNCNTL_STOP); + out_be64(&priv2->slb_invalidate_all_W, 0); + spu->mm = NULL; + spu->task = NULL; + + return ret; +} +EXPORT_SYMBOL(spu_run); + +static void __iomem * __init map_spe_prop(struct device_node *n, + const char *name) +{ + struct address_prop { + unsigned long address; + unsigned int len; + } __attribute__((packed)) *prop; + + void *p; + int proplen; + + p = get_property(n, name, &proplen); + if (proplen != sizeof (struct address_prop)) + return NULL; + + prop = p; + + return ioremap(prop->address, prop->len); +} + +static void spu_unmap(struct spu *spu) +{ + iounmap(spu->priv2); + iounmap(spu->priv1); + iounmap(spu->problem); + iounmap((u8 __iomem *)spu->local_store); +} + +static int __init spu_map_device(struct spu *spu, struct device_node *spe) +{ + unsigned int *isrc_prop; + int ret; + + ret = -ENODEV; + isrc_prop = (u32 *)get_property(spe, "isrc", NULL); + if (!isrc_prop) + goto out; + spu->isrc = *isrc_prop; + + spu->name = get_property(spe, "name", NULL); + if (!spu->name) + goto out; + + /* we use local store as ram, not io memory */ + spu->local_store = (u8 __force *) map_spe_prop(spe, "local-store"); + if (!spu->local_store) + goto out; + + spu->problem= map_spe_prop(spe, "problem"); + if (!spu->problem) + goto out_unmap; + + spu->priv1= map_spe_prop(spe, "priv1"); + if (!spu->priv1) + goto out_unmap; + + spu->priv2= map_spe_prop(spe, "priv2"); + if (!spu->priv2) + goto out_unmap; + ret = 0; + goto out; + +out_unmap: + spu_unmap(spu); +out: + return ret; +} + +static int __init find_spu_node_id(struct device_node *spe) +{ + unsigned int *id; + struct device_node *cpu; + + cpu = spe->parent->parent; + id = (unsigned int *)get_property(cpu, "node-id", NULL); + + return id ? *id : 0; +} + +static int __init create_spu(struct device_node *spe) +{ + struct spu *spu; + int ret; + + ret = -ENOMEM; + spu = kmalloc(sizeof (*spu), GFP_KERNEL); + if (!spu) + goto out; + + ret = spu_map_device(spu, spe); + if (ret) + goto out_free; + + spu->node = find_spu_node_id(spe); + spu->stop_code = 0; + spu->slb_replace = 0; + spu->mm = NULL; + + out_be64(&spu->priv1->mfc_sdr_RW, mfspr(SPRN_SDR1)); + out_be64(&spu->priv1->mfc_sr1_RW, 0x33); + + init_waitqueue_head(&spu->stop_wq); + init_waitqueue_head(&spu->mbox_wq); + + ret = spu_request_irqs(spu); + if (ret) + goto out_unmap; + + down(&spu_mutex); + list_add(&spu->list, &spu_list); + up(&spu_mutex); + + printk(KERN_DEBUG "Using SPE %s %02x %p %p %p %p\n", + spu->name, spu->isrc, spu->local_store, + spu->problem, spu->priv1, spu->priv2); + goto out; + +out_unmap: + spu_unmap(spu); +out_free: + kfree(spu); +out: + return ret; +} + +static void destroy_spu(struct spu *spu) +{ + list_del_init(&spu->list); + + spu_free_irqs(spu); + spu_unmap(spu); + kfree(spu); +} + +static void cleanup_spu_base(void) +{ + struct spu *spu, *tmp; + down(&spu_mutex); + list_for_each_entry_safe(spu, tmp, &spu_list, list) + destroy_spu(spu); + up(&spu_mutex); +} +module_exit(cleanup_spu_base); + +static int __init init_spu_base(void) +{ + struct device_node *node; + int ret; + + ret = -ENODEV; + for (node = of_find_node_by_type(NULL, "spe"); + node; node = of_find_node_by_type(node, "spe")) { + ret = create_spu(node); + if (ret) { + printk(KERN_WARNING "%s: Error initializing %s\n", + __FUNCTION__, node->name); + cleanup_spu_base(); + break; + } + } + return ret; +} +module_init(init_spu_base); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Arnd Bergmann "); --- linux-2.6-ppc.orig/arch/ppc64/mm/hash_utils.c 2005-04-01 12:39:56.616981816 -0500 +++ linux-2.6-ppc/arch/ppc64/mm/hash_utils.c 2005-04-01 12:53:32.098953712 -0500 @@ -355,6 +355,7 @@ int hash_page(unsigned long ea, unsigned return ret; } +EXPORT_SYMBOL_GPL(hash_page); void flush_hash_page(unsigned long context, unsigned long ea, pte_t pte, int local) --- linux-2.6-ppc.orig/fs/Kconfig 2005-04-01 12:39:56.619981360 -0500 +++ linux-2.6-ppc/fs/Kconfig 2005-04-01 12:53:32.100953408 -0500 @@ -853,6 +853,16 @@ config HUGETLBFS config HUGETLB_PAGE def_bool HUGETLBFS +config SPU_FS + tristate "SPU file system" + default m + depends on PPC_BPA + help + The SPU file system is used to access Synergistic Processing + Units on machines implementing the Broadband Processor + Architecture. + + config RAMFS bool default y --- linux-2.6-ppc.orig/fs/Makefile 2005-04-01 12:39:56.621981056 -0500 +++ linux-2.6-ppc/fs/Makefile 2005-04-01 12:53:32.102953104 -0500 @@ -95,3 +95,4 @@ obj-$(CONFIG_BEFS_FS) += befs/ obj-$(CONFIG_HOSTFS) += hostfs/ obj-$(CONFIG_HPPFS) += hppfs/ obj-$(CONFIG_DEBUG_FS) += debugfs/ +obj-$(CONFIG_SPU_FS) += spufs/ --- linux-2.6-ppc.orig/fs/spufs/Makefile 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/fs/spufs/Makefile 2005-04-01 12:53:32.104952800 -0500 @@ -0,0 +1,3 @@ +obj-$(CONFIG_SPU_FS) += spufs.o + +spufs-y += inode.o --- linux-2.6-ppc.orig/fs/spufs/inode.c 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/fs/spufs/inode.c 2005-04-01 12:53:32.107952344 -0500 @@ -0,0 +1,991 @@ +/* + * SPU file system + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +/* SPU context abstraction */ +struct spu_context { + struct spu *spu; /* pointer to a physical SPU if SPUFS_DIRECT */ + struct rw_semaphore backing_sema; /* protects the above */ + spinlock_t mmio_lock; /* protects mmio access */ + long sig; + + struct kref kref; +}; + +static struct spu_context * +alloc_spu_context(void) +{ + struct spu_context *ctx; + ctx = kmalloc(sizeof *ctx, GFP_KERNEL); + if (!ctx) + goto out; + ctx->spu = spu_alloc(); + if (!ctx->spu) + goto out_free; + init_rwsem(&ctx->backing_sema); + spin_lock_init(&ctx->mmio_lock); + kref_init(&ctx->kref); + goto out; +out_free: + kfree(ctx); + ctx = NULL; +out: + return ctx; +} + +static void +destroy_spu_context(struct kref *kref) +{ + struct spu_context *ctx; + ctx = container_of(kref, struct spu_context, kref); + if (ctx->spu) + spu_free(ctx->spu); + kfree(ctx); +} + +static struct spu_context * +get_spu_context(struct spu_context *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} + +static void +put_spu_context(struct spu_context *ctx) +{ + kref_put(&ctx->kref, &destroy_spu_context); +} + +/* The magic number for our file system */ +enum { + SPUFS_MAGIC = 0x23c9b64e, +}; + +/* bits in the inode flags */ +enum { + SPUFS_DIRECT, /* Data resides on a physical SPU */ +}; + +struct spufs_inode_info { + struct spu_context *i_ctx; + struct inode vfs_inode; +}; + +static kmem_cache_t *spufs_inode_cache; +#define SPUFS_I(inode) container_of(inode, struct spufs_inode_info, vfs_inode) + +/* Information about the backing dev, same as ramfs */ + +static struct backing_dev_info spufs_backing_dev_info = { + .ra_pages = 0, /* No readahead */ + .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK | + BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY | BDI_CAP_READ_MAP | + BDI_CAP_WRITE_MAP, +}; + +static struct address_space_operations spufs_aops = { + .readpage = simple_readpage, + .prepare_write = simple_prepare_write, + .commit_write = simple_commit_write, +}; + +/* File operations */ + +static int +spufs_open(struct inode *inode, struct file *file) +{ + struct spufs_inode_info *i = SPUFS_I(inode); + file->private_data = i->i_ctx; + return 0; +} + +static ssize_t +spufs_read(struct file *file, char __user *buffer, size_t size, loff_t *pos) +{ + struct spu *spu; + struct spu_context *ctx; + int ret; + + ctx = file->private_data; + spu = ctx->spu; + + down_read(&ctx->backing_sema); + if (spu->number & 0/*1*/) { + ret = generic_file_read(file, buffer, size, pos); + goto out; + } + + ret = 0; + size = min_t(ssize_t, LS_SIZE - *pos, size); + if (size <= 0) + goto out; + *pos += size; + ret = copy_to_user(buffer, spu->local_store + *pos - size, size); + ret = ret ? -EFAULT : size; + +out: + up_read(&ctx->backing_sema); + return ret; +} + +static ssize_t +spufs_write(struct file *file, const char __user *buffer, size_t size, loff_t *pos) +{ + struct spu_context *ctx = file->private_data; + struct spu *spu = ctx->spu; + + if (spu->number & 0) //1) + return generic_file_write(file, buffer, size, pos); + + size = min_t(ssize_t, LS_SIZE - *pos, size); + if (size <= 0) + return -EFBIG; + *pos += size; + return copy_from_user(spu->local_store + *pos - size, + buffer, size) ? -EFAULT : size; +} + +static int +spufs_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct spu_context *ctx = file->private_data; + struct spu *spu = ctx->spu; + unsigned long pfn; + + if (spu->number & 0) //1) + return generic_file_mmap(file, vma); + + vma->vm_flags |= VM_RESERVED; + pfn = __pa(spu->local_store) >> PAGE_SHIFT; + /* + * This will work for actual SPUs, but not for vmalloc memory: + */ + if (remap_pfn_range(vma, vma->vm_start, pfn, + vma->vm_end-vma->vm_start, vma->vm_page_prot)) + return -EAGAIN; + /**/ + return 0; +} + +static struct file_operations spufs_mem_fops = { + .open = spufs_open, + .read = spufs_read, + .write = spufs_write, + .mmap = spufs_mmap, + .llseek = generic_file_llseek, +}; + +/* generic open function for all pipe-like files */ +static int spufs_pipe_open(struct inode *inode, struct file *file) +{ + struct spufs_inode_info *i = SPUFS_I(inode); + file->private_data = i->i_ctx; + + return nonseekable_open(inode, file); +} + +static ssize_t spufs_mbox_read(struct file *file, char __user *buf, + size_t len, loff_t *pos) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + u32 mbox_stat; + u32 mbox_data; + + if (len < 4) + return -EINVAL; + + ctx = file->private_data; + prob = ctx->spu->problem; + mbox_stat = in_be32(&prob->mb_stat_R); + if (!(mbox_stat & 0x0000ff)) + return -EAGAIN; + + mbox_data = in_be32(&prob->pu_mb_R); + + if (copy_to_user(buf, &mbox_data, sizeof mbox_data)) + return -EFAULT; + + return 4; +} + +static struct file_operations spufs_mbox_fops = { + .open = spufs_pipe_open, + .read = spufs_mbox_read, +}; + +static ssize_t spufs_ibox_read(struct file *file, char __user *buf, + size_t len, loff_t *pos) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + struct spu_priv2 __iomem *priv2; + u32 mbox_stat; + u32 ibox_data; + ssize_t ret; + + if (len < 4) + return -EINVAL; + + ctx = file->private_data; + prob = ctx->spu->problem; + priv2 = ctx->spu->priv2; + + mbox_stat = in_be32(&prob->mb_stat_R); + if (!(mbox_stat & 0xff0000)) + return -EAGAIN; + + ibox_data = in_be64(&priv2->puint_mb_R); + + ret = 4; + if (copy_to_user(buf, &ibox_data, sizeof ibox_data)) + ret = -EFAULT; + + return ret; +} + +static unsigned int spufs_ibox_poll(struct file *file, poll_table *wait) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + u32 mbox_stat; + unsigned int mask; + + ctx = file->private_data; + prob = ctx->spu->problem; + mbox_stat = in_be32(&prob->mb_stat_R); + + poll_wait(file, &ctx->spu->mbox_wq, wait); + + mask = 0; + if (mbox_stat & 0xff0000) + mask |= POLLIN | POLLRDNORM; + + return mask; +} + +static struct file_operations spufs_ibox_fops = { + .open = spufs_pipe_open, + .read = spufs_ibox_read, + .poll = spufs_ibox_poll, +}; + +static ssize_t spufs_wbox_write(struct file *file, const char __user *buf, + size_t len, loff_t *pos) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + u32 mbox_stat; + u32 wbox_data; + + if (len < 4) + return -EINVAL; + + ctx = file->private_data; + prob = ctx->spu->problem; + mbox_stat = in_be32(&prob->mb_stat_R); + if (!(mbox_stat & 0x00ff00)) + return -EAGAIN; + + if (copy_from_user(&wbox_data, buf, sizeof wbox_data)) + return -EFAULT; + + out_be32(&prob->spu_mb_W, wbox_data); + + return 4; +} + +static unsigned int spufs_wbox_poll(struct file *file, poll_table *wait) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + u32 mbox_stat; + unsigned int mask; + + ctx = file->private_data; + prob = ctx->spu->problem; + mbox_stat = in_be32(&prob->mb_stat_R); + + poll_wait(file, &ctx->spu->mbox_wq, wait); + + mask = 0; + if (mbox_stat & 0x00ff00) + mask = POLLOUT | POLLWRNORM; + + return mask; +} + +static struct file_operations spufs_wbox_fops = { + .open = spufs_pipe_open, + .write = spufs_wbox_write, + .poll = spufs_wbox_poll, +}; + +static int spufs_run_open(struct inode *inode, struct file *file) +{ + struct spufs_inode_info *i = SPUFS_I(inode); + file->private_data = i->i_ctx; + + return nonseekable_open(inode, file); +} + +struct spufs_run_arg { + u32 npc; /* inout: Next Program Counter */ + u32 status; /* out: SPU status */ +}; + +static long spufs_run_ioctl(struct file *file, unsigned int num, + unsigned long arg) +{ + struct spu_context *ctx; + struct spu_problem __iomem *prob; + struct spufs_run_arg data; + int ret; + + if (num != _IOWR('s', 0, struct spufs_run_arg)) + return -EINVAL; + + if (copy_from_user(&data, (void __user *)arg, sizeof data)) + return -EFAULT; + + ctx = file->private_data; + prob = ctx->spu->problem; + out_be32(&prob->spu_npc_RW, data.npc); + wmb(); + + ret = spu_run(ctx->spu); +/* + prob->spu_npc_RW = data.npc; + ctx->spu->mm = current->mm; + wmb(); + prob->spu_runcntl_RW = SPU_RUNCNTL_RUNNABLE; + mb(); + + ret = wait_event_interruptible(ctx->spu->stop_wq, + prob->spu_status_R & 0x3e); + + prob->spu_runcntl_RW = SPU_RUNCNTL_STOP; + ctx->spu->mm = NULL; +*/ + data.status = in_be32(&prob->spu_status_R); + data.npc = in_be32(&prob->spu_npc_RW); + if (copy_to_user((void __user *)arg, &data, sizeof data)) + ret = -EFAULT; + + return ret; +} + +static struct file_operations spufs_run_fops = { + .open = spufs_run_open, + .unlocked_ioctl = spufs_run_ioctl, + .compat_ioctl = spufs_run_ioctl, +}; + + +/**** spufs attributes + * + * Attributes in spufs behave similar to those in sysfs: + * + * Writing to an attribute immediately sets a value, an open file can be + * written to multiple times. + * + * Reading from an attribute creates a buffer from the value that might get + * read with multiple read calls. When the attribute has been read completely, + * no further read calls are possible until the file is opened again. + * + * All spufs attributes contain a text representation of a numeric value that + * are accessed with the get() and set() functions. + * + * Perhaps these file operations could be put in debugfs or libfs instead, + * they are not really SPU specific. + */ + +struct spufs_attr { + long (*get)(struct spu_context *); + void (*set)(struct spu_context *, long); + struct spu_context *ctx; + char get_buf[24]; /* enough to store a long and "\n\0" */ + char set_buf[24]; + struct semaphore sem; /* protects access to these buffers */ +}; + +/* spufs_attr_open is called by an actual attribute open file operation + * to set the attribute specific access operations. */ +static int spufs_attr_open(struct inode *inode, struct file *file, + long (*get)(struct spu_context *), + void (*set)(struct spu_context *, long)) +{ + struct spufs_attr *attr; + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) + return -ENOMEM; + + /* reading/writing needs the respective get/set operation */ + if (((file->f_mode & FMODE_READ) && !get) || + ((file->f_mode & FMODE_WRITE) && !set)) + return -EACCES; + + attr->get = get; + attr->set = set; + attr->ctx = SPUFS_I(inode)->i_ctx; + init_MUTEX(&attr->sem); + + file->private_data = attr; + + return nonseekable_open(inode, file); +} + +static int spufs_attr_close(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + return 0; +} + +/* read from the buffer that is filled with the get function */ +static ssize_t spufs_attr_read(struct file *file, char __user *buf, + size_t len, loff_t *ppos) +{ + struct spufs_attr *attr; + size_t size; + ssize_t ret; + + attr = file->private_data; + + down(&attr->sem); + if (*ppos) /* continued read */ + size = strlen(attr->get_buf); + else /* first read */ + size = scnprintf(attr->get_buf, sizeof (attr->get_buf), + "%ld\n", attr->get(attr->ctx)); + + ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size); + up(&attr->sem); + return ret; +} + +/* interpret the buffer as a number to call the set function with */ +static ssize_t spufs_attr_write(struct file *file, const char __user *buf, + size_t len, loff_t *ppos) +{ + struct spufs_attr *attr; + long val; + size_t size; + ssize_t ret; + + + attr = file->private_data; + + down(&attr->sem); + ret = -EFAULT; + size = min(sizeof (attr->set_buf) - 1, len); + if (copy_from_user(attr->set_buf, buf, size)) + goto out; + + ret = len; /* claim we got the whole input */ + attr->set_buf[size] = '\0'; + val = simple_strtol(attr->set_buf, NULL, 0); + attr->set(attr->ctx, val); +out: + up(&attr->sem); + return ret; +} + +#define spufs_attribute(name) \ +static int name ## _open(struct inode *inode, struct file *file) \ +{ \ + return spufs_attr_open(inode, file, &name ## _get, &name ## _set); \ +} \ +static struct file_operations name = { \ + .open = name ## _open, \ + .release = spufs_attr_close, \ + .read = spufs_attr_read, \ + .write = spufs_attr_write, \ +}; + + +static void spufs_signal1_type_set(struct spu_context *ctx, long val) +{ + ctx->sig = val; +} + +static long spufs_signal1_type_get(struct spu_context *ctx) +{ + return ctx->sig; +} + +spufs_attribute(spufs_signal1_type); + +static void spufs_class0_stat_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_stat_class0_RW, val); +} + +static long spufs_class0_stat_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_stat_class0_RW); +} + +spufs_attribute(spufs_class0_stat); + +static void spufs_class1_stat_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_stat_class1_RW, val); +} + +static long spufs_class1_stat_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_stat_class1_RW); +} + +spufs_attribute(spufs_class1_stat); + +static void spufs_class2_stat_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_stat_class2_RW, val); +} + +static long spufs_class2_stat_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_stat_class2_RW); +} + +spufs_attribute(spufs_class2_stat); + +static void spufs_class0_mask_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_mask_class0_RW, val); +} + +static long spufs_class0_mask_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_mask_class0_RW); +} + +spufs_attribute(spufs_class0_mask); + +static void spufs_class1_mask_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_mask_class1_RW, val); +} + +static long spufs_class1_mask_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_mask_class1_RW); +} + +spufs_attribute(spufs_class1_mask); + +static void spufs_class2_mask_set(struct spu_context *ctx, long val) +{ + out_be64(&ctx->spu->priv1->int_mask_class2_RW, val); +} + +static long spufs_class2_mask_get(struct spu_context *ctx) +{ + return in_be64(&ctx->spu->priv1->int_mask_class2_RW); +} + +spufs_attribute(spufs_class2_mask); + +#define priv1_attr(name) \ +static void spufs_ ## name ## _set(struct spu_context *ctx, long val) \ +{ out_be64(&ctx->spu->priv1->name, val); } \ +static long spufs_ ## name ## _get(struct spu_context *ctx) \ +{ return in_be64(&ctx->spu->priv1->name); } \ +spufs_attribute(spufs_ ## name) + +#define priv2_attr(name) \ +static void spufs_ ## name ## _set(struct spu_context *ctx, long val) \ +{ out_be64(&ctx->spu->priv2->name, val); } \ +static long spufs_ ## name ## _get(struct spu_context *ctx) \ +{ return in_be64(&ctx->spu->priv2->name); } \ +spufs_attribute(spufs_ ## name) + +priv1_attr(mfc_sr1_RW); +priv1_attr(mfc_fir_R); +priv1_attr(mfc_fir_status_or_W); +priv1_attr(mfc_fir_status_and_W); +priv1_attr(mfc_fir_mask_R); +priv1_attr(mfc_fir_mask_or_W); +priv1_attr(mfc_fir_mask_and_W); +priv1_attr(mfc_fir_chkstp_enable_RW); +priv1_attr(mfc_cer_R); +priv1_attr(mfc_dsisr_RW); +priv1_attr(mfc_dsir_R); +priv1_attr(mfc_sdr_RW); +priv2_attr(mfc_control_RW); + +/* Inode operations */ + +static struct inode * +spufs_alloc_inode(struct super_block *sb) +{ + struct spufs_inode_info *ei; + + ei = kmem_cache_alloc(spufs_inode_cache, SLAB_KERNEL); + if (!ei) + return NULL; + return &ei->vfs_inode; +} + +static void +spufs_destroy_inode(struct inode *inode) +{ + kmem_cache_free(spufs_inode_cache, SPUFS_I(inode)); +} + +static void +spufs_init_once(void *p, kmem_cache_t * cachep, unsigned long flags) +{ + struct spufs_inode_info *ei = p; + + if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == + SLAB_CTOR_CONSTRUCTOR) { + inode_init_once(&ei->vfs_inode); + } +} + +static struct inode * +spufs_new_inode(struct super_block *sb, int mode) +{ + struct inode *inode; + + inode = new_inode(sb); + if (!inode) + goto out; + + inode->i_mode = mode; + inode->i_uid = current->fsuid; + inode->i_gid = current->fsgid; + inode->i_blksize = PAGE_CACHE_SIZE; + inode->i_blocks = 0; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; +out: + return inode; +} + +static int +spufs_setattr(struct dentry *dentry, struct iattr *attr) +{ + struct inode *inode = dentry->d_inode; + +/* dump_stack(); + printk("ia_size %lld, i_size:%lld\n", attr->ia_size, inode->i_size); +*/ + if (attr->ia_size != inode->i_size) + return -EINVAL; + return inode_setattr(inode, attr); +} + +/* +static int +spufs_create(struct inode *dir, struct dentry *dentry, + int mode, struct nameidata *nd) +{ + static struct inode_operations iops = { + .getattr = simple_getattr, + .setattr = spufs_setattr, + }; + + + struct inode *inode; + int ret; + + ret = -ENOSPC; + inode = spufs_new_inode(dir->i_sb, S_IFREG | mode); + if (!inode) + goto out; + inode->i_op = &iops; + inode->i_fop = &spufs_mem_fops; + inode->i_size = LS_SIZE; + SPUFS_I(inode)->i_spu = spu_alloc(); + if (!SPUFS_I(inode)->i_spu) + goto out_iput; + inode->i_mapping->a_ops = &spufs_aops; + inode->i_mapping->backing_dev_info = &spufs_backing_dev_info; + d_instantiate(dentry, inode); + dget(dentry); + return 0; +out_iput: + iput(inode); +out: + return ret; +} +*/ + +static void +spufs_delete_inode(struct inode *inode) +{ + if (SPUFS_I(inode)->i_ctx) + put_spu_context(SPUFS_I(inode)->i_ctx); + clear_inode(inode); +} + +static struct tree_descr spufs_dir_contents[] = { + { "mem", &spufs_mem_fops, 0644, }, + { "run", &spufs_run_fops, 0400, }, + { "mbox", &spufs_mbox_fops, 0400, }, + { "ibox", &spufs_ibox_fops, 0400, }, + { "wbox", &spufs_wbox_fops, 0200, }, + { "signal1_type", &spufs_signal1_type, 0600, }, + { "signal2_type", &spufs_signal1_type, 0600, }, + +#if 1 /* debugging only */ + { "class0_mask", &spufs_class0_mask, 0600, }, + { "class1_mask", &spufs_class1_mask, 0600, }, + { "class2_mask", &spufs_class2_mask, 0600, }, + { "class0_stat", &spufs_class0_stat, 0600, }, + { "class1_stat", &spufs_class1_stat, 0600, }, + { "class2_stat", &spufs_class2_stat, 0600, }, + { "sr1", &spufs_mfc_sr1_RW, 0600, }, + { "fir", &spufs_mfc_fir_R, 0400, }, + { "fir_status_or", &spufs_mfc_fir_status_or_W, 0200, }, + { "fir_status_and", &spufs_mfc_fir_status_and_W, 0200, }, + { "fir_mask", &spufs_mfc_fir_mask_R, 0400, }, + { "fir_mask_or", &spufs_mfc_fir_mask_or_W, 0200, }, + { "fir_mask_and", &spufs_mfc_fir_mask_and_W, 0200, }, + { "fir_chkstp", &spufs_mfc_fir_chkstp_enable_RW, 0600, }, + { "cer", &spufs_mfc_cer_R, 0400, }, + { "dsisr", &spufs_mfc_dsisr_RW, 0600, }, + { "dsir", &spufs_mfc_dsir_R, 0200, }, + { "cntl", &spufs_mfc_control_RW, 0600, }, + { "sdr", &spufs_mfc_sdr_RW, 0600, }, +#endif + {}, +}; + +static int +spufs_fill_dir(struct dentry *dir, struct tree_descr *files, + int mode, struct spu_context *ctx) +{ + struct inode *inode; + struct dentry *dentry; + int ret; + + static struct inode_operations iops = { + .getattr = simple_getattr, + .setattr = spufs_setattr, + }; + + ret = -ENOSPC; + while (files->name && files->name[0]) { + dentry = d_alloc_name(dir, files->name); + if (!dentry) + goto out; + inode = spufs_new_inode(dir->d_sb, + S_IFREG | (files->mode & mode)); + if (!inode) + goto out; + inode->i_op = &iops; + inode->i_fop = files->ops; + inode->i_mapping->a_ops = &spufs_aops; + inode->i_mapping->backing_dev_info = &spufs_backing_dev_info; + SPUFS_I(inode)->i_ctx = get_spu_context(ctx); + + d_add(dentry, inode); + files++; + } + return 0; +out: + // FIXME: remove all files that are left + return ret; +} + +static int +spufs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + int ret; + struct inode *inode; + struct spu_context *ctx; + + ret = -ENOSPC; + inode = spufs_new_inode(dir->i_sb, mode | S_IFDIR); + if (!inode) + goto out; + + if (dir->i_mode & S_ISGID) { + inode->i_gid = dir->i_gid; + inode->i_mode |= S_ISGID; + } + ctx = alloc_spu_context(); + SPUFS_I(inode)->i_ctx = ctx; + if (!ctx) + goto out_iput; + + inode->i_op = &simple_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + ret = spufs_fill_dir(dentry, spufs_dir_contents, mode, ctx); + if (ret) + goto out_free_ctx; + + d_instantiate(dentry, inode); + dget(dentry); + dir->i_nlink++; + goto out; + +out_free_ctx: + put_spu_context(ctx); +out_iput: + iput(inode); +out: + return ret; +} + +/* This looks really wrong! */ +static int spufs_rmdir(struct inode *root, struct dentry *dir_dentry) +{ + struct dentry *dentry; + int err; + + spin_lock(&dcache_lock); + + /* check if any entry is used */ + err = -EBUSY; + list_for_each_entry(dentry, &dir_dentry->d_subdirs, d_child) { + if (d_unhashed(dentry) || !dentry->d_inode) + continue; + if (atomic_read(&dentry->d_count) != 1) + goto out; + } + /* remove all entries */ + err = 0; + list_for_each_entry(dentry, &dir_dentry->d_subdirs, d_child) { + if (d_unhashed(dentry) || !dentry->d_inode) + continue; + atomic_dec(&dentry->d_count); + __d_drop(dentry); + } +out: + spin_unlock(&dcache_lock); + if (!err) { + shrink_dcache_parent(dir_dentry); + err = simple_rmdir(root, dir_dentry); + } + return err; +} + +/* File system initialization */ + +static int +spufs_create_root(struct super_block *sb) { + static struct inode_operations spufs_dir_inode_operations = { + .lookup = simple_lookup, + .mkdir = spufs_mkdir, + .rmdir = spufs_rmdir, +// .rename = simple_rename, // XXX maybe + }; + + struct inode *inode; + int ret; + + ret = -ENOMEM; + inode = spufs_new_inode(sb, S_IFDIR | 0777); + + if (inode) { + inode->i_op = &spufs_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + SPUFS_I(inode)->i_ctx = NULL; + sb->s_root = d_alloc_root(inode); + if (!sb->s_root) + iput(inode); + else + ret = 0; + } + return ret; +} + +static int +spufs_fill_super(struct super_block *sb, void *data, int silent) +{ + static struct super_operations s_ops = { + .alloc_inode = spufs_alloc_inode, + .destroy_inode = spufs_destroy_inode, + .statfs = simple_statfs, + .delete_inode = spufs_delete_inode, + .drop_inode = generic_delete_inode, + }; + + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_blocksize = PAGE_CACHE_SIZE; + sb->s_blocksize_bits = PAGE_CACHE_SHIFT; + sb->s_magic = SPUFS_MAGIC; + sb->s_op = &s_ops; + + return spufs_create_root(sb); +} + +static struct super_block * +spufs_get_sb(struct file_system_type *fstype, int flags, + const char *name, void *data) +{ + return get_sb_single(fstype, flags, data, spufs_fill_super); +} + +static struct file_system_type spufs_type = { + .owner = THIS_MODULE, + .name = "spufs", + .get_sb = spufs_get_sb, + .kill_sb = kill_litter_super, +}; + +static int spufs_init(void) +{ + int ret; + ret = -ENOMEM; + spufs_inode_cache = kmem_cache_create("spufs_inode_cache", + sizeof(struct spufs_inode_info), 0, + SLAB_HWCACHE_ALIGN, spufs_init_once, NULL); + + if (!spufs_inode_cache) + goto out; + ret = register_filesystem(&spufs_type); + if (ret) + kmem_cache_destroy(spufs_inode_cache); +out: + return ret; +} +module_init(spufs_init); + +static void spufs_exit(void) +{ + unregister_filesystem(&spufs_type); + kmem_cache_destroy(spufs_inode_cache); +} +module_exit(spufs_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Arnd Bergmann "); + --- linux-2.6-ppc.orig/include/asm-ppc64/spu.h 1969-12-31 19:00:00.000000000 -0500 +++ linux-2.6-ppc/include/asm-ppc64/spu.h 2005-04-01 12:55:10.054973928 -0500 @@ -0,0 +1,459 @@ +/* + * SPU core / file system interface and HW structures + * + * (C) Copyright IBM Deutschland Entwicklung GmbH 2005 + * + * Author: Arnd Bergmann + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef _SPU_H +#define _SPU_H + +#define LS_ORDER (6) /* 256 kb */ + +#define LS_SIZE (PAGE_SIZE << LS_ORDER) + +struct spu { + char *name; + u8 *local_store; + struct spu_problem __iomem *problem; + struct spu_priv1 __iomem *priv1; + struct spu_priv2 __iomem *priv2; + struct list_head list; + int number; + u32 isrc; + u32 node; + struct kref kref; + size_t ls_size; + unsigned int slb_replace; + struct mm_struct *mm; + struct task_struct *task; + + u32 stop_code; + wait_queue_head_t stop_wq; + wait_queue_head_t mbox_wq; +}; + +struct spu *spu_alloc(void); +void spu_free(struct spu *spu); +int spu_run(struct spu *spu); + +/* + * This defines the Local Store, Problem Area and Privlege Area of an SPU. + */ + +union MFC_TagSizeClassCmd { + struct { + u16 mfc_size; + u16 mfc_tag; + u8 pad; + u8 mfc_rclassid; + u16 mfc_cmd; + } u; + struct { + u32 mfc_size_tag32; + u32 mfc_class_cmd32; + } by32; + u64 all64; +}; + +struct MFC_cq_sr { + u64 mfc_cq_data0_RW; + u64 mfc_cq_data1_RW; + u64 mfc_cq_data2_RW; + u64 mfc_cq_data3_RW; +}; + +struct spu_problem { + u8 pad_0x0000_0x3000[0x3000 - 0x0000]; /* 0x0000 */ + + /* DMA Area */ + u8 pad_0x3000_0x3004[0x4]; /* 0x3000 */ + u32 mfc_lsa_W; /* 0x3004 */ + u64 mfc_ea_W; /* 0x3008 */ + union MFC_TagSizeClassCmd mfc_union_W; /* 0x3010 */ + u8 pad_0x3018_0x3104[0xec]; /* 0x3018 */ + u32 dma_qstatus_R; /* 0x3104 */ + u8 pad_0x3108_0x3204[0xfc]; /* 0x3108 */ + u32 dma_querytype_RW; /* 0x3204 */ + u8 pad_0x3208_0x321c[0x14]; /* 0x3208 */ + u32 dma_querymask_RW; /* 0x321c */ + u8 pad_0x3220_0x322c[0xc]; /* 0x3220 */ + u32 dma_tagstatus_R; /* 0x322c */ +#define DMA_TAGSTATUS_INTR_ANY 1u +#define DMA_TAGSTATUS_INTR_ALL 2u + u8 pad_0x3230_0x4000[0x4000 - 0x3230]; /* 0x3230 */ + + /* SPU Control Area */ + u8 pad_0x4000_0x4004[0x4]; /* 0x4000 */ + u32 pu_mb_R; /* 0x4004 */ + u8 pad_0x4008_0x400c[0x4]; /* 0x4008 */ + u32 spu_mb_W; /* 0x400c */ + u8 pad_0x4010_0x4014[0x4]; /* 0x4010 */ + u32 mb_stat_R; /* 0x4014 */ + u8 pad_0x4018_0x401c[0x4]; /* 0x4018 */ + u32 spu_runcntl_RW; /* 0x401c */ +#define SPU_RUNCNTL_STOP 0L +#define SPU_RUNCNTL_RUNNABLE 1L + u8 pad_0x4020_0x4024[0x4]; /* 0x4020 */ + u32 spu_status_R; /* 0x4024 */ +#define SPU_STATUS_STOPPED 0x0 +#define SPU_STATUS_RUNNING 0x1 +#define SPU_STATUS_STOPPED_BY_STOP 0x2 +#define SPU_STATUS_STOPPED_BY_HALT 0x4 +#define SPU_STATUS_WAITING_FOR_CHANNEL 0x8 +#define SPU_STATUS_SINGLE_STEP 0x10 + u8 pad_0x4028_0x402c[0x4]; /* 0x4028 */ + u32 spu_spe_R; /* 0x402c */ + u8 pad_0x4030_0x4034[0x4]; /* 0x4030 */ + u32 spu_npc_RW; /* 0x4034 */ + u8 pad_0x4038_0x14000[0x14000 - 0x4038]; /* 0x4038 */ + + /* Signal Notification Area */ + u8 pad_0x14000_0x1400c[0xc]; /* 0x14000 */ + u32 signal_notify1; /* 0x1400c */ + u8 pad_0x14010_0x1c00c[0x7ffc]; /* 0x14010 */ + u32 signal_notify2; /* 0x1c00c */ +} __attribute__ ((aligned(0x20000))); + +/* SPU Privilege 2 State Area */ +struct spu_priv2 { + /* MFC Registers */ + u8 pad_0x0000_0x1100[0x1100 - 0x0000]; /* 0x0000 */ + + /* SLB Management Registers */ + u8 pad_0x1100_0x1108[0x8]; /* 0x1100 */ + u64 slb_index_W; /* 0x1108 */ +#define SLB_INDEX_MASK 0x7L + u64 slb_esid_RW; /* 0x1110 */ + u64 slb_vsid_RW; /* 0x1118 */ +#define SLB_VSID_SUPERVISOR_STATE (0x1ull << 11) +#define SLB_VSID_SUPERVISOR_STATE_MASK (0x1ull << 11) +#define SLB_VSID_PROBLEM_STATE (0x1ull << 10) +#define SLB_VSID_PROBLEM_STATE_MASK (0x1ull << 10) +#define SLB_VSID_EXECUTE_SEGMENT (0x1ull << 9) +#define SLB_VSID_NO_EXECUTE_SEGMENT (0x1ull << 9) +#define SLB_VSID_EXECUTE_SEGMENT_MASK (0x1ull << 9) +#define SLB_VSID_4K_PAGE (0x0 << 8) +#define SLB_VSID_LARGE_PAGE (0x1ull << 8) +#define SLB_VSID_PAGE_SIZE_MASK (0x1ull << 8) +#define SLB_VSID_CLASS_MASK (0x1ull << 7) +#define SLB_VSID_VIRTUAL_PAGE_SIZE_MASK (0x1ull << 6) + u64 slb_invalidate_entry_W; /* 0x1120 */ + u64 slb_invalidate_all_W; /* 0x1128 */ + u8 pad_0x1130_0x2000[0x2000 - 0x1130]; /* 0x1130 */ + + /* Context Save / Restore Area */ + struct MFC_cq_sr spuq[16]; /* 0x2000 */ + struct MFC_cq_sr puq[8]; /* 0x2200 */ + u8 pad_0x2300_0x3000[0x3000 - 0x2300]; /* 0x2300 */ + + /* MFC Control */ + u64 mfc_control_RW; /* 0x3000 */ +#define MFC_CNTL_RESUME_DMA_QUEUE (0ull << 0) +#define MFC_CNTL_SUSPEND_DMA_QUEUE (1ull << 0) +#define MFC_CNTL_SUSPEND_DMA_QUEUE_MASK (1ull << 0) +#define MFC_CNTL_NORMAL_DMA_QUEUE_OPERATION (0ull << 8) +#define MFC_CNTL_SUSPEND_IN_PROGRESS (1ull << 8) +#define MFC_CNTL_SUSPEND_COMPLETE (3ull << 8) +#define MFC_CNTL_SUSPEND_DMA_STATUS_MASK (3ull << 8) +#define MFC_CNTL_DMA_QUEUES_EMPTY (1ull << 14) +#define MFC_CNTL_DMA_QUEUES_EMPTY_MASK (1ull << 14) +#define MFC_CNTL_PURGE_DMA_REQUEST (1ull << 15) +#define MFC_CNTL_PURGE_DMA_IN_PROGRESS (1ull << 24) +#define MFC_CNTL_PURGE_DMA_COMPLETE (3ull << 24) +#define MFC_CNTL_PURGE_DMA_STATUS_MASK (3ull << 24) +#define MFC_CNTL_RESTART_DMA_COMMAND (1ull << 32) +#define MFC_CNTL_DMA_COMMAND_REISSUE_PENDING (1ull << 32) +#define MFC_CNTL_DMA_COMMAND_REISSUE_STATUS_MASK (1ull << 32) +#define MFC_CNTL_MFC_PRIVILEGE_STATE (2ull << 33) +#define MFC_CNTL_MFC_PROBLEM_STATE (3ull << 33) +#define MFC_CNTL_MFC_KEY_PROTECTION_STATE_MASK (3ull << 33) +#define MFC_CNTL_DECREMENTER_HALTED (1ull << 35) +#define MFC_CNTL_DECREMENTER_RUNNING (1ull << 40) +#define MFC_CNTL_DECREMENTER_STATUS_MASK (1ull << 40) + u8 pad_0x3008_0x4000[0x4000 - 0x3008]; /* 0x3008 */ + + /* Interrupt Mailbox */ + u64 puint_mb_R; /* 0x4000 */ + u8 pad_0x4008_0x4040[0x4040 - 0x4008]; /* 0x4008 */ + + /* SPU Control */ + u64 spu_privcntl_RW; /* 0x4040 */ +#define SPU_PRIVCNTL_MODE_NORMAL (0x0ull << 0) +#define SPU_PRIVCNTL_MODE_SINGLE_STEP (0x1ull << 0) +#define SPU_PRIVCNTL_MODE_MASK (0x1ull << 0) +#define SPU_PRIVCNTL_NO_ATTENTION_EVENT (0x0ull << 1) +#define SPU_PRIVCNTL_ATTENTION_EVENT (0x1ull << 1) +#define SPU_PRIVCNTL_ATTENTION_EVENT_MASK (0x1ull << 1) +#define SPU_PRIVCNT_LOAD_REQUEST_NORMAL (0x0ull << 2) +#define SPU_PRIVCNT_LOAD_REQUEST_ENABLE_MASK (0x1ull << 2) + u8 pad_0x4048_0x4058[0x10]; /* 0x4048 */ + u64 spu_lslr_RW; /* 0x4058 */ + u64 spu_chnlcntptr_RW; /* 0x4060 */ + u64 spu_chnlcnt_RW; /* 0x4068 */ + u64 spu_chnldata_RW; /* 0x4070 */ + u64 spu_cfg_RW; /* 0x4078 */ + u8 pad_0x4080_0x5000[0x5000 - 0x4080]; /* 0x4080 */ + + /* PV2_ImplRegs: Implementation-specific privileged-state 2 regs */ + u64 spu_pm_trace_tag_status_RW; /* 0x5000 */ + u64 spu_tag_status_query_RW; /* 0x5008 */ +#define TAG_STATUS_QUERY_CONDITION_BITS (0x3ull << 32) +#define TAG_STATUS_QUERY_MASK_BITS (0xffffffffull) + u64 spu_cmd_buf1_RW; /* 0x5010 */ +#define SPU_COMMAND_BUFFER_1_LSA_BITS (0x7ffffull << 32) +#define SPU_COMMAND_BUFFER_1_EAH_BITS (0xffffffffull) + u64 spu_cmd_buf2_RW; /* 0x5018 */ +#define SPU_COMMAND_BUFFER_2_EAL_BITS ((0xffffffffull) << 32) +#define SPU_COMMAND_BUFFER_2_TS_BITS (0xffffull << 16) +#define SPU_COMMAND_BUFFER_2_TAG_BITS (0x3full) + u64 spu_atomic_status_RW; /* 0x5020 */ +} __attribute__ ((aligned(0x20000))); + +/* SPU Privilege 1 State Area */ +struct spu_priv1 { + /* Control and Configuration Area */ + u64 mfc_sr1_RW; /* 0x000 */ +#define MFC_STATE1_LOCAL_STORAGE_DECODE_MASK 0x01ull +#define MFC_STATE1_BUS_TLBIE_MASK 0x02ull +#define MFC_STATE1_REAL_MODE_OFFSET_ENABLE_MASK 0x04ull +#define MFC_STATE1_PROBLEM_STATE_MASK 0x08ull +#define MFC_STATE1_RELOCATE_MASK 0x10ull +#define MFC_STATE1_MASTER_RUN_CONTROL_MASK 0x20ull + u64 mfc_lpid_RW; /* 0x008 */ + u64 spu_idr_RW; /* 0x010 */ + u64 mfc_vr_RO; /* 0x018 */ +#define MFC_VERSION_BITS (0xffff << 16) +#define MFC_REVISION_BITS (0xffff) +#define MFC_GET_VERSION_BITS(vr) (((vr) & MFC_VERSION_BITS) >> 16) +#define MFC_GET_REVISION_BITS(vr) ((vr) & MFC_REVISION_BITS) + u64 spu_vr_RO; /* 0x020 */ +#define SPU_VERSION_BITS (0xffff << 16) +#define SPU_REVISION_BITS (0xffff) +#define SPU_GET_VERSION_BITS(vr) (vr & SPU_VERSION_BITS) >> 16 +#define SPU_GET_REVISION_BITS(vr) (vr & SPU_REVISION_BITS) + u8 pad_0x28_0x100[0x100 - 0x28]; /* 0x28 */ + + + /* Interrupt Area */ + u64 int_mask_class0_RW; /* 0x100 */ +#define CLASS0_ENABLE_DMA_ALIGNMENT_INTR 0x1L +#define CLASS0_ENABLE_INVALID_DMA_COMMAND_INTR 0x2L +#define CLASS0_ENABLE_SPU_ERROR_INTR 0x4L +#define CLASS0_ENABLE_MFC_FIR_INTR 0x8L + u64 int_mask_class1_RW; /* 0x108 */ +#define CLASS1_ENABLE_SEGMENT_FAULT_INTR 0x1L +#define CLASS1_ENABLE_STORAGE_FAULT_INTR 0x2L +#define CLASS1_ENABLE_LS_COMPARE_SUSPEND_ON_GET_INTR 0x4L +#define CLASS1_ENABLE_LS_COMPARE_SUSPEND_ON_PUT_INTR 0x8L + u64 int_mask_class2_RW; /* 0x110 */ +#define CLASS2_ENABLE_MAILBOX_INTR 0x1L +#define CLASS2_ENABLE_SPU_STOP_INTR 0x2L +#define CLASS2_ENABLE_SPU_HALT_INTR 0x4L +#define CLASS2_ENABLE_SPU_DMA_TAG_GROUP_COMPLETE_INTR 0x8L + u8 pad_0x118_0x140[0x28]; /* 0x118 */ + u64 int_stat_class0_RW; /* 0x140 */ + u64 int_stat_class1_RW; /* 0x148 */ + u64 int_stat_class2_RW; /* 0x150 */ + u8 pad_0x158_0x180[0x28]; /* 0x158 */ + u64 int_route_RW; /* 0x180 */ + + /* Interrupt Routing */ + u8 pad_0x188_0x200[0x200 - 0x188]; /* 0x188 */ + + /* Atomic Unit Control Area */ + u64 mfc_atomic_flush_RW; /* 0x200 */ +#define mfc_atomic_flush_enable 0x1L + u8 pad_0x208_0x280[0x78]; /* 0x208 */ + u64 resource_allocation_groupID_RW; /* 0x280 */ + u64 resource_allocation_enable_RW; /* 0x288 */ + u8 pad_0x290_0x380[0x380 - 0x290]; /* 0x290 */ + + /* MFC Fault Isolation Area */ + /* mfc_fir_R: MFC Fault Isolation Register. + * mfc_fir_status_or_W: MFC Fault Isolation Status OR Register. + * mfc_fir_status_and_W: MFC Fault Isolation Status AND Register. + * mfc_fir_mask_R: MFC FIR Mask Register. + * mfc_fir_mask_or_W: MFC FIR Mask OR Register. + * mfc_fir_mask_and_W: MFC FIR Mask AND Register. + * mfc_fir_chkstp_enable_W: MFC FIR Checkstop Enable Register. + */ + u64 mfc_fir_R; /* 0x380 */ + u64 mfc_fir_status_or_W; /* 0x388 */ + u64 mfc_fir_status_and_W; /* 0x390 */ + u64 mfc_fir_mask_R; /* 0x398 */ + u64 mfc_fir_mask_or_W; /* 0x3a0 */ + u64 mfc_fir_mask_and_W; /* 0x3a8 */ + u64 mfc_fir_chkstp_enable_RW; /* 0x3b0 */ + u8 pad_0x3b8_0x3c8[0x3c8 - 0x3b8]; /* 0x3b8 */ + + /* SPU_Cache_ImplRegs: Implementation-dependent cache registers */ + + u64 smf_sbi_signal_sel; /* 0x3c8 */ +#define smf_sbi_mask_lsb 56 +#define smf_sbi_shift (63 - smf_sbi_mask_lsb) +#define smf_sbi_mask (0x301LL << smf_sbi_shift) +#define smf_sbi_bus0_bits (0x001LL << smf_sbi_shift) +#define smf_sbi_bus2_bits (0x100LL << smf_sbi_shift) +#define smf_sbi2_bus0_bits (0x201LL << smf_sbi_shift) +#define smf_sbi2_bus2_bits (0x300LL << smf_sbi_shift) + u64 smf_ato_signal_sel; /* 0x3d0 */ +#define smf_ato_mask_lsb 35 +#define smf_ato_shift (63 - smf_ato_mask_lsb) +#define smf_ato_mask (0x3LL << smf_ato_shift) +#define smf_ato_bus0_bits (0x2LL << smf_ato_shift) +#define smf_ato_bus2_bits (0x1LL << smf_ato_shift) + u8 pad_0x3d8_0x400[0x400 - 0x3d8]; /* 0x3d8 */ + + /* TLB Management Registers */ + u64 mfc_sdr_RW; /* 0x400 */ + u8 pad_0x408_0x500[0xf8]; /* 0x408 */ + u64 tlb_index_hint_RO; /* 0x500 */ + u64 tlb_index_W; /* 0x508 */ + u64 tlb_vpn_RW; /* 0x510 */ + u64 tlb_rpn_RW; /* 0x518 */ + u8 pad_0x520_0x540[0x20]; /* 0x520 */ + u64 tlb_invalidate_entry_W; /* 0x540 */ + u64 tlb_invalidate_all_W; /* 0x548 */ + u8 pad_0x550_0x580[0x580 - 0x550]; /* 0x550 */ + + /* SPU_MMU_ImplRegs: Implementation-dependent MMU registers */ + u64 smm_hid; /* 0x580 */ +#define PAGE_SIZE_MASK 0xf000000000000000ull +#define PAGE_SIZE_16MB_64KB 0x2000000000000000ull + u8 pad_0x588_0x600[0x600 - 0x588]; /* 0x588 */ + + /* MFC Status/Control Area */ + u64 mfc_accr_RW; /* 0x600 */ +#define MFC_ACCR_EA_ACCESS_GET (1 << 0) +#define MFC_ACCR_EA_ACCESS_PUT (1 << 1) +#define MFC_ACCR_LS_ACCESS_GET (1 << 3) +#define MFC_ACCR_LS_ACCESS_PUT (1 << 4) + u8 pad_0x608_0x610[0x8]; /* 0x608 */ + u64 mfc_dsisr_RW; /* 0x610 */ +#define MFC_DSISR_PTE_NOT_FOUND (1 << 30) +#define MFC_DSISR_ACCESS_DENIED (1 << 27) +#define MFC_DSISR_ATOMIC (1 << 26) +#define MFC_DSISR_ACCESS_PUT (1 << 25) +#define MFC_DSISR_ADDR_MATCH (1 << 22) +#define MFC_DSISR_LS (1 << 17) +#define MFC_DSISR_L (1 << 16) +#define MFC_DSISR_ADDRESS_OVERFLOW (1 << 0) + u8 pad_0x618_0x620[0x8]; /* 0x618 */ + u64 mfc_dar_RW; /* 0x620 */ + u8 pad_0x628_0x700[0x700 - 0x628]; /* 0x628 */ + + /* Replacement Management Table (RMT) Area */ + u64 rmt_index_RW; /* 0x700 */ + u8 pad_0x708_0x710[0x8]; /* 0x708 */ + u64 rmt_data1_RW; /* 0x710 */ + u8 pad_0x718_0x800[0x800 - 0x718]; /* 0x718 */ + + /* Control/Configuration Registers */ + u64 mfc_dsir_R; /* 0x800 */ +#define MFC_DSIR_Q (1 << 31) +#define MFC_DSIR_SPU_QUEUE MFC_DSIR_Q + u64 mfc_lsacr_RW; /* 0x808 */ +#define MFC_LSACR_COMPARE_MASK ((~0ull) << 32) +#define MFC_LSACR_COMPARE_ADDR ((~0ull) >> 32) + u64 mfc_lscrr_R; /* 0x810 */ +#define MFC_LSCRR_Q (1 << 31) +#define MFC_LSCRR_SPU_QUEUE MFC_LSCRR_Q +#define MFC_LSCRR_QI_SHIFT 32 +#define MFC_LSCRR_QI_MASK ((~0ull) << MFC_LSCRR_QI_SHIFT) + u8 pad_0x818_0x900[0x900 - 0x818]; /* 0x818 */ + + /* Real Mode Support Registers */ + u64 mfc_rm_boundary; /* 0x900 */ + u8 pad_0x908_0x938[0x30]; /* 0x908 */ + u64 smf_dma_signal_sel; /* 0x938 */ +#define mfc_dma1_mask_lsb 41 +#define mfc_dma1_shift (63 - mfc_dma1_mask_lsb) +#define mfc_dma1_mask (0x3LL << mfc_dma1_shift) +#define mfc_dma1_bits (0x1LL << mfc_dma1_shift) +#define mfc_dma2_mask_lsb 43 +#define mfc_dma2_shift (63 - mfc_dma2_mask_lsb) +#define mfc_dma2_mask (0x3LL << mfc_dma2_shift) +#define mfc_dma2_bits (0x1LL << mfc_dma2_shift) + u8 pad_0x940_0xa38[0xf8]; /* 0x940 */ + u64 smm_signal_sel; /* 0xa38 */ +#define smm_sig_mask_lsb 12 +#define smm_sig_shift (63 - smm_sig_mask_lsb) +#define smm_sig_mask (0x3LL << smm_sig_shift) +#define smm_sig_bus0_bits (0x2LL << smm_sig_shift) +#define smm_sig_bus2_bits (0x1LL << smm_sig_shift) + u8 pad_0xa40_0xc00[0xc00 - 0xa40]; /* 0xa40 */ + + /* DMA Command Error Area */ + u64 mfc_cer_R; /* 0xc00 */ +#define MFC_CER_Q (1 << 31) +#define MFC_CER_SPU_QUEUE MFC_CER_Q + u8 pad_0xc08_0x1000[0x1000 - 0xc08]; /* 0xc08 */ + + /* PV1_ImplRegs: Implementation-dependent privileged-state 1 regs */ + /* DMA Command Error Area */ + u64 spu_ecc_cntl_RW; /* 0x1000 */ +#define SPU_ECC_CNTL_E (1ull << 0ull) +#define SPU_ECC_CNTL_ENABLE SPU_ECC_CNTL_E +#define SPU_ECC_CNTL_DISABLE (~SPU_ECC_CNTL_E & 1L) +#define SPU_ECC_CNTL_S (1ull << 1ull) +#define SPU_ECC_STOP_AFTER_ERROR SPU_ECC_CNTL_S +#define SPU_ECC_CONTINUE_AFTER_ERROR (~SPU_ECC_CNTL_S & 2L) +#define SPU_ECC_CNTL_B (1ull << 2ull) +#define SPU_ECC_BACKGROUND_ENABLE SPU_ECC_CNTL_B +#define SPU_ECC_BACKGROUND_DISABLE (~SPU_ECC_CNTL_B & 4L) +#define SPU_ECC_CNTL_I_SHIFT 3ull +#define SPU_ECC_CNTL_I_MASK (3ull << SPU_ECC_CNTL_I_SHIFT) +#define SPU_ECC_WRITE_ALWAYS (~SPU_ECC_CNTL_I & 12L) +#define SPU_ECC_WRITE_CORRECTABLE (1ull << SPU_ECC_CNTL_I_SHIFT) +#define SPU_ECC_WRITE_UNCORRECTABLE (3ull << SPU_ECC_CNTL_I_SHIFT) +#define SPU_ECC_CNTL_D (1ull << 5ull) +#define SPU_ECC_DETECTION_ENABLE SPU_ECC_CNTL_D +#define SPU_ECC_DETECTION_DISABLE (~SPU_ECC_CNTL_D & 32L) + u64 spu_ecc_stat_RW; /* 0x1008 */ +#define SPU_ECC_CORRECTED_ERROR (1ull << 0ul) +#define SPU_ECC_UNCORRECTED_ERROR (1ull << 1ul) +#define SPU_ECC_SCRUB_COMPLETE (1ull << 2ul) +#define SPU_ECC_SCRUB_IN_PROGRESS (1ull << 3ul) +#define SPU_ECC_INSTRUCTION_ERROR (1ull << 4ul) +#define SPU_ECC_DATA_ERROR (1ull << 5ul) +#define SPU_ECC_DMA_ERROR (1ull << 6ul) +#define SPU_ECC_STATUS_CNT_MASK (256ull << 8) + u64 spu_ecc_addr_RW; /* 0x1010 */ + u64 spu_err_mask_RW; /* 0x1018 */ +#define SPU_ERR_ILLEGAL_INSTR (1ull << 0ul) +#define SPU_ERR_ILLEGAL_CHANNEL (1ull << 1ul) + u8 pad_0x1020_0x1028[0x1028 - 0x1020]; /* 0x1020 */ + + /* SPU Debug-Trace Bus (DTB) Selection Registers */ + u64 spu_trig0_sel; /* 0x1028 */ + u64 spu_trig1_sel; /* 0x1030 */ + u64 spu_trig2_sel; /* 0x1038 */ + u64 spu_trig3_sel; /* 0x1040 */ + u64 spu_trace_sel; /* 0x1048 */ +#define spu_trace_sel_mask 0x1f1fLL +#define spu_trace_sel_bus0_bits 0x1000LL +#define spu_trace_sel_bus2_bits 0x0010LL + u64 spu_event0_sel; /* 0x1050 */ + u64 spu_event1_sel; /* 0x1058 */ + u64 spu_event2_sel; /* 0x1060 */ + u64 spu_event3_sel; /* 0x1068 */ + u64 spu_trace_cntl; /* 0x1070 */ +} __attribute__ ((aligned(0x2000))); + +#endif --- linux-2.6-ppc.orig/mm/memory.c 2005-04-01 12:39:56.630979688 -0500 +++ linux-2.6-ppc/mm/memory.c 2005-04-01 12:53:32.111951736 -0500 @@ -2108,6 +2108,7 @@ unsigned long vmalloc_to_pfn(void * vmal { return page_to_pfn(vmalloc_to_page(vmalloc_addr)); } +EXPORT_SYMBOL_GPL(handle_mm_fault); EXPORT_SYMBOL(vmalloc_to_pfn); From olof at austin.ibm.com Fri Apr 29 00:05:58 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 28 Apr 2005 09:05:58 -0500 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <200504280813.j3S8DNLc019256@post.webmailer.de> References: <200504190318.32556.arnd@arndb.de> <200504280813.j3S8DNLc019256@post.webmailer.de> Message-ID: <20050428140558.GB1023@austin.ibm.com> Hi, Some comments below. On Thu, Apr 28, 2005 at 09:59:26AM +0200, Arnd Bergmann wrote: > Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linus-2.5/arch/ppc64/kernel/bpa_iommu.c 2005-04-22 07:01:39.000000000 +0200 > @@ -0,0 +1,433 @@ > +/* some constants */ > +enum { > + /* segment table entries */ [...] > +}; Hmm. I thought the benefit of enum was to be able to do type checking later on if it's a typed enum. Here you mix different definitions in the same large untyped enum declaration. Can they be moved to a bpa_iommu.h file and #defined instead? > +/* cause link error for invalid use */ > +extern unsigned long __ioc_invalid_page_size; [...] > + default: /* not a known compile time constant */ > + ps = __ioc_invalid_page_size; > + break; > + } Why do we need to detect this at link time? > + nnpt++; /* XXX is this right? */ Well, does it work? :-) > + return (ioste) { > + .val = IOST_VALID_MASK > + | (iostep & IOST_PT_BASE_MASK) > + | ((nnpt << 5) & IOST_NNPT_MASK) > + | (ps & IOST_PS_MASK) > + }; Can you create a mk_ioste() inline instead of doing this construct? > +static inline unsigned long > +get_ioptep(ioste iost_entry, unsigned long io_address) > +{ > + unsigned long iopt_base; > + unsigned long ps; > + unsigned long iopt_offset; > + > + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; > + ps = iost_entry.val & IOST_PS_MASK; > + > + iopt_offset = ((io_address & 0x0fffffff) >> (7 + 2 * ps)) & 0x7fff8ul; Magic. Can we get it explained either by defines instead of constants or by a comment? > +/* compute the hashed 6 bit index for the 4-way associative pte cache */ > +static inline unsigned long > +get_ioc_hash(ioste iost_entry, unsigned long io_address) > +{ > + unsigned long iopte = get_ioptep(iost_entry, io_address); > + > + return ((iopte & 0x000000000000001f8ul) >> 3) > + ^ ((iopte & 0x00000000000020000ul) >> 17) > + ^ ((iopte & 0x00000000000010000ul) >> 15) > + ^ ((iopte & 0x00000000000008000ul) >> 13) > + ^ ((iopte & 0x00000000000004000ul) >> 11) > + ^ ((iopte & 0x00000000000002000ul) >> 9) > + ^ ((iopte & 0x00000000000001000ul) >> 7); Can't you reverse the subword by just doing one XOR instead of 6? That's what I did for the ext2 bitops on ppc64. See http://www.ussg.iu.edu/hypermail/linux/kernel/0408.2/1321.html > +static inline ioste > +get_iost_cache(void __iomem *base, unsigned long index) > +{ > + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); > + return (ioste) { in_be64(&p[index]) }; mk_ioste() would be nice here too. > +#ifdef __KERNEL__ Are we ever not __KERNEL__? > +/* initialize the iommu to support a simple linear mapping */ > +static void bpa_map_iommu(void) > +{ [...] > + for (address = 0; address < 0x100000000ul; address += io_page_size) { This looks like way more than the 512MB DMA window you mentioned in the beginning. -Olof From olof at austin.ibm.com Fri Apr 29 00:30:30 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 28 Apr 2005 09:30:30 -0500 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <20050428140558.GB1023@austin.ibm.com> References: <200504190318.32556.arnd@arndb.de> <200504280813.j3S8DNLc019256@post.webmailer.de> <20050428140558.GB1023@austin.ibm.com> Message-ID: <20050428143030.GC1023@austin.ibm.com> On Thu, Apr 28, 2005 at 09:05:58AM -0500, Olof Johansson wrote: > > +/* compute the hashed 6 bit index for the 4-way associative pte cache */ > > +static inline unsigned long > > +get_ioc_hash(ioste iost_entry, unsigned long io_address) > > +{ > > + unsigned long iopte = get_ioptep(iost_entry, io_address); > > + > > + return ((iopte & 0x000000000000001f8ul) >> 3) > > + ^ ((iopte & 0x00000000000020000ul) >> 17) > > + ^ ((iopte & 0x00000000000010000ul) >> 15) > > + ^ ((iopte & 0x00000000000008000ul) >> 13) > > + ^ ((iopte & 0x00000000000004000ul) >> 11) > > + ^ ((iopte & 0x00000000000002000ul) >> 9) > > + ^ ((iopte & 0x00000000000001000ul) >> 7); > > Can't you reverse the subword by just doing one XOR instead of 6? Ugh, I wrote that before I had coffee. No you can't, you can just negate the value by doing the XOR. Nevermind. -Olof From sonny at burdell.org Fri Apr 29 00:44:15 2005 From: sonny at burdell.org (Sonny Rao) Date: Thu, 28 Apr 2005 10:44:15 -0400 Subject: [PATCH 0/4] ppc64: Introduce BPA platform In-Reply-To: <200504190318.32556.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> Message-ID: <20050428144415.GA28779@kevlar.burdell.org> On Thu, Apr 28, 2005 at 09:54:00AM +0200, Arnd Bergmann wrote: > This series of patches add support for a fifth platform type in the > ppc64 architecture tree. The Broadband Processor Architecture (BPA) > is currently used in a single machine from IBM, with others likely > to be added at a later point. > > I already sent preparation patches before, these need to be applied > on top of them. > The first three patches add the actual platform code, which should > be usable for any BPA compatible implementation. > > The final patch introduces a new file system to make use of the > SPUs inside the processors. This patch is still in a prototype stage > and not intended for merging yet. > > Arnd <>< Is BPA the same thing (architecture) as the STI Cell Processor? Sonny From arnd at arndb.de Fri Apr 29 14:35:43 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 29 Apr 2005 06:35:43 +0200 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <20050428140558.GB1023@austin.ibm.com> References: <200504190318.32556.arnd@arndb.de> <200504280813.j3S8DNLc019256@post.webmailer.de> <20050428140558.GB1023@austin.ibm.com> Message-ID: <200504290635.44965.arnd@arndb.de> On Dunnersdag 28 April 2005 16:05, Olof Johansson wrote: > On Thu, Apr 28, 2005 at 09:59:26AM +0200, Arnd Bergmann wrote: > > +/* some constants */ > > +enum { > > + /* segment table entries */ > [...] > > +}; > > Hmm. I thought the benefit of enum was to be able to do type checking > later on if it's a typed enum. Here you mix different definitions in > the same large untyped enum declaration. Can they be moved to a > bpa_iommu.h file and #defined instead? I prefer to avoid macros altogether, and this is one of the ways to do it. We have had the discussion about how to define constants a few times before on lkml without reaching a conclusion. > > +/* cause link error for invalid use */ > > +extern unsigned long __ioc_invalid_page_size; > [...] > > + default: /* not a known compile time constant */ > > + ps = __ioc_invalid_page_size; > > + break; > > + } > > Why do we need to detect this at link time? I want to avoid doing BUG() or something similar, so I try to detect a user error as early as possible. > > + nnpt++; /* XXX is this right? */ > > Well, does it work? :-) Yes, but it seems to contradict the specs... > > + return (ioste) { > > + .val = IOST_VALID_MASK > > + | (iostep & IOST_PT_BASE_MASK) > > + | ((nnpt << 5) & IOST_NNPT_MASK) > > + | (ps & IOST_PS_MASK) > > + }; > > Can you create a mk_ioste() inline instead of doing this construct? ok. > > +static inline unsigned long > > +get_ioptep(ioste iost_entry, unsigned long io_address) > > +{ > > + unsigned long iopt_base; > > + unsigned long ps; > > + unsigned long iopt_offset; > > + > > + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; > > + ps = iost_entry.val & IOST_PS_MASK; > > + > > + iopt_offset = ((io_address & 0x0fffffff) >> (7 + 2 * ps)) & 0x7fff8ul; > > Magic. Can we get it explained either by defines instead of constants > or by a comment? This comes from a graphical representation in the specs. I'll add a comment to point to that image. > > +static inline ioste > > +get_iost_cache(void __iomem *base, unsigned long index) > > +{ > > + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); > > + return (ioste) { in_be64(&p[index]) }; > > mk_ioste() would be nice here too. ok. > > +#ifdef __KERNEL__ > > Are we ever not __KERNEL__? Sorry, I thought I had removed that code already. This does not belong there. > > +/* initialize the iommu to support a simple linear mapping */ > > +static void bpa_map_iommu(void) > > +{ > [...] > > + for (address = 0; address < 0x100000000ul; address += io_page_size) { > > This looks like way more than the 512MB DMA window you mentioned in the > beginning. True. This is probably a bug in the comment. The code will change as soon as the firmware provides the correct dma-window properties. Arnd <>< From olof at lixom.net Fri Apr 29 15:36:15 2005 From: olof at lixom.net (Olof Johansson) Date: Fri, 29 Apr 2005 00:36:15 -0500 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <200504290635.44965.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> <200504280813.j3S8DNLc019256@post.webmailer.de> <20050428140558.GB1023@austin.ibm.com> <200504290635.44965.arnd@arndb.de> Message-ID: <20050429053615.GA30219@lixom.net> On Fri, Apr 29, 2005 at 06:35:43AM +0200, Arnd Bergmann wrote: > On Dunnersdag 28 April 2005 16:05, Olof Johansson wrote: > > > On Thu, Apr 28, 2005 at 09:59:26AM +0200, Arnd Bergmann wrote: > > > +/* some constants */ > > > +enum { > > > + /* segment table entries */ > > [...] > > > +}; > > > > Hmm. I thought the benefit of enum was to be able to do type checking > > later on if it's a typed enum. Here you mix different definitions in > > the same large untyped enum declaration. Can they be moved to a > > bpa_iommu.h file and #defined instead? > > I prefer to avoid macros altogether, and this is one of the ways to > do it. We have had the discussion about how to define constants > a few times before on lkml without reaching a conclusion. Most of arch/ppc64/* uses #defines, for whatever that's worth. Still, CodingStyle seems to recommend enums for "related constants", I assume it's so they can be typed. I don't care enough either way to argue it further. Anyhow, enum or #define, it should be moved to bpa_iommu.h. > > Why do we need to detect this at link time? > > I want to avoid doing BUG() or something similar, so I > try to detect a user error as early as possible. User or developer error? I thought it was a developer one, and a quite specialized one at that. Either way, there's already a primitive that can be used instead of making your own: BUILD_BUG_ON(). > > > + nnpt++; /* XXX is this right? */ > > > > Well, does it work? :-) > > Yes, but it seems to contradict the specs... A comment to that effect could be nice. > > > +static inline unsigned long > > > +get_ioptep(ioste iost_entry, unsigned long io_address) > > > +{ > > > + unsigned long iopt_base; > > > + unsigned long ps; > > > + unsigned long iopt_offset; > > > + > > > + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; > > > + ps = iost_entry.val & IOST_PS_MASK; > > > + > > > + iopt_offset = ((io_address & 0x0fffffff) >> (7 + 2 * ps)) & 0x7fff8ul; > > > > Magic. Can we get it explained either by defines instead of constants > > or by a comment? > > This comes from a graphical representation in the specs. I'll add a comment > to point to that image. I guess it'd be a bit much information to just add in a comment, but for readability that's probably the best way to go. Not many people have the specs, but on the other hand if you're messing around with this code then chances are you have them. I don't know what the status is for a release of public specifications, but if they're not available then people will be looking to learn from the implementation and the documentation around it instead. -Olof From arnd at arndb.de Fri Apr 29 15:48:27 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 29 Apr 2005 07:48:27 +0200 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <20050429053615.GA30219@lixom.net> References: <200504190318.32556.arnd@arndb.de> <200504290635.44965.arnd@arndb.de> <20050429053615.GA30219@lixom.net> Message-ID: <200504290748.30055.arnd@arndb.de> On Freedag 29 April 2005 07:36, Olof Johansson wrote: > On Fri, Apr 29, 2005 at 06:35:43AM +0200, Arnd Bergmann wrote: >> > Anyhow, enum or #define, it should be moved to bpa_iommu.h. I don't interface with any other files, so I prefer to have everything in one file here. If there is anyone else who agrees with you on moving this into a separate file, I don't mind doing that. > User or developer error? I thought it was a developer one, and a quite > specialized one at that. Either way, there's already a primitive that > can be used instead of making your own: BUILD_BUG_ON(). Right. I had forgotten about that, thanks. > > Yes, but it seems to contradict the specs... > > A comment to that effect could be nice. Ok, I'll change that. > > This comes from a graphical representation in the specs. I'll add a comment > > to point to that image. > > I guess it'd be a bit much information to just add in a comment, but for > readability that's probably the best way to go. Not many people have > the specs, but on the other hand if you're messing around with this code > then chances are you have them. The picture should be in the parts that we are working on getting public. > I don't know what the status is for a release of public specifications, > but if they're not available then people will be looking to learn from > the implementation and the documentation around it instead. There will be a supplement to Book 1-3 of PowerPC AS that is going to be public. Book 4 is rarely needed and the chances of publishing that are not as good. Thanks, Arnd <>< From arnd at arndb.de Fri Apr 29 14:35:28 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 29 Apr 2005 06:35:28 +0200 Subject: [PATCH 0/4] ppc64: Introduce BPA platform In-Reply-To: <20050428144415.GA28779@kevlar.burdell.org> References: <200504190318.32556.arnd@arndb.de> <20050428144415.GA28779@kevlar.burdell.org> Message-ID: <200504290635.29604.arnd@arndb.de> On Dunnersdag 28 April 2005 16:44, Sonny Rao wrote: > Is BPA the same thing (architecture) as the STI Cell Processor? Yes, that is it. BPA is what Cell is called in the official specification. Arnd <>< From benh at kernel.crashing.org Fri Apr 29 16:43:04 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 29 Apr 2005 16:43:04 +1000 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <200504290748.30055.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> <200504290635.44965.arnd@arndb.de> <20050429053615.GA30219@lixom.net> <200504290748.30055.arnd@arndb.de> Message-ID: <1114756984.7111.268.camel@gaston> On Fri, 2005-04-29 at 07:48 +0200, Arnd Bergmann wrote: > On Freedag 29 April 2005 07:36, Olof Johansson wrote: > > On Fri, Apr 29, 2005 at 06:35:43AM +0200, Arnd Bergmann wrote: > >> > > Anyhow, enum or #define, it should be moved to bpa_iommu.h. > > I don't interface with any other files, so I prefer to have > everything in one file here. If there is anyone else who > agrees with you on moving this into a separate file, I don't > mind doing that. The habit here is for such "private" .h to be next to the .c, while a shared one goes in include/asm-* or include/linux. Ben. From olof at lixom.net Fri Apr 29 17:07:52 2005 From: olof at lixom.net (Olof Johansson) Date: Fri, 29 Apr 2005 02:07:52 -0500 Subject: [PATCH] PPC64: Remove hot busy-wait loop in __hash_page Message-ID: <20050429070752.GA30785@lixom.net> Hi, It turns out that our current __hash_page code will do a very hot busy-wait loop waiting on _PAGE_BUSY to be cleared. It even does ldarx/stdcx in the loop, which will bounce reservations around like crazy if there's more than one CPU spinning on the same PTE (or even another PTE in the same reservation granule). The end result is that each fault takes longer when there's contention, which in turn increases the chance of another thread hitting the same fault and also piling up. Not pretty. There's two options here: 1. Do an out-of-line busy loop a'la spinlocks with just loads (no reserves) 2. Just bail and refault if needed. (2) makes sense here: If the PTE is busy, chances are it's in flux anyway and the other code path making a change might just be ready to hash it. This fixes a stampede seen on a large-ish system where a multithreaded HPC app faults in the same text pages on several cpus at the same time. Signed-off-by: Olof Johansson Index: 2.6/arch/ppc64/mm/hash_low.S =================================================================== --- 2.6.orig/arch/ppc64/mm/hash_low.S 2005-04-24 17:49:43.000000000 -0500 +++ 2.6/arch/ppc64/mm/hash_low.S 2005-04-28 17:19:04.000000000 -0500 @@ -85,7 +85,10 @@ _GLOBAL(__hash_page) bne- htab_wrong_access /* Check if PTE is busy */ andi. r0,r31,_PAGE_BUSY - bne- 1b + /* If so, just bail out and refault if needed. Someone else + * is changing this PTE anyway and might hash it. + */ + bne- bail_ok /* Prepare new PTE value (turn access RW into DIRTY, then * add BUSY,HASHPTE and ACCESSED) */ @@ -215,6 +218,10 @@ _GLOBAL(htab_call_hpte_remove) /* Try all again */ b htab_insert_pte +bail_ok: + li r3,0 + b bail + htab_pte_insert_ok: /* Insert slot number & secondary bit in PTE */ rldimi r30,r3,12,63-15 From arnd at arndb.de Fri Apr 29 18:31:57 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 29 Apr 2005 10:31:57 +0200 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <1114756984.7111.268.camel@gaston> References: <200504190318.32556.arnd@arndb.de> <200504290748.30055.arnd@arndb.de> <1114756984.7111.268.camel@gaston> Message-ID: <200504291031.59223.arnd@arndb.de> On Freedag 29 April 2005 08:43, Benjamin Herrenschmidt wrote: > The habit here is for such "private" .h to be next to the .c, while a > shared one goes in include/asm-* or include/linux. Ok, here comes an updated version, which should integrate changes for all comments I got so far. Implementation of software load support for the BE iommu. This is very different from other iommu code on ppc64, since we only do a static mapping. The mapping is currently hardcoded but should really be read from the firmware, but they don't set up the device nodes yet. Software load is ok as long as the DMA windows all fit below 4GB. Currently, there is a single 512MB DMA window for PCI, USB and ethernet at 0x20000000 for our RAM. No machine we have has more than 512 MB anyway, but we might want to change the firmware to use separate DMA windows for each device. Signed-off-by: Arnd Bergmann Index: linus-2.5/arch/ppc64/kernel/Makefile =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/Makefile 2005-04-22 07:01:07.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/Makefile 2005-04-29 10:01:44.000000000 +0200 @@ -34,7 +34,8 @@ pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ pSeries_setup.o pSeries_iommu.o -obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_nvram.o bpa_iic.o spider-pic.o +obj-$(CONFIG_PPC_BPA) += bpa_setup.o bpa_iommu.o bpa_nvram.o \ + bpa_iic.o spider-pic.o obj-$(CONFIG_EEH) += eeh.o obj-$(CONFIG_PROC_FS) += proc_ppc64.o Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iommu.c 2005-04-29 10:24:03.000000000 +0200 @@ -0,0 +1,377 @@ +/* + * IOMMU implementation for Broadband Processor Architecture + * We just establish a linear mapping at boot by setting all the + * IOPT cache entries in the CPU. + * The mapping functions should be identical to pci_direct_iommu, + * except for the handling of the high order bit that is required + * by the Spider bridge. These should be split into a separate + * file at the point where we get a different bridge chip. + * + * Copyright (C) 2005 IBM Deutschland Entwicklung GmbH, + * Arnd Bergmann + * + * Based on linear mapping + * Copyright (C) 2003 Benjamin Herrenschmidt (benh at kernel.crashing.org) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pci.h" +#include "bpa_iommu.h" + +static inline unsigned long +get_iopt_entry(unsigned long real_address, unsigned long ioid, + unsigned long prot) +{ + return (prot & IOPT_PROT_MASK) + | (IOPT_COHERENT) + | (IOPT_ORDER_VC) + | (real_address & IOPT_RPN_MASK) + | (ioid & IOPT_IOID_MASK); +} + +typedef struct { + unsigned long val; +} ioste; + +static inline ioste +mk_ioste(unsigned long val) +{ + ioste ioste = { .val = val, }; + return ioste; +} + +static inline ioste +get_iost_entry(unsigned long iopt_base, unsigned long io_address, unsigned page_size) +{ + unsigned long ps; + unsigned long iostep; + unsigned long nnpt; + unsigned long shift; + + switch (page_size) { + case 0x1000000: + ps = IOST_PS_16M; + nnpt = 0; /* one page per segment */ + shift = 5; /* segment has 16 iopt entries */ + break; + + case 0x100000: + ps = IOST_PS_1M; + nnpt = 0; /* one page per segment */ + shift = 1; /* segment has 256 iopt entries */ + break; + + case 0x10000: + ps = IOST_PS_64K; + nnpt = 0x07; /* 8 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + case 0x1000: + ps = IOST_PS_4K; + nnpt = 0x7f; /* 128 pages per io page table */ + shift = 0; /* all entries are used */ + break; + + default: /* not a known compile time constant */ + BUILD_BUG_ON(1); + break; + } + + iostep = iopt_base + + /* need 8 bytes per iopte */ + (((io_address / page_size * 8) + /* align io page tables on 4k page boundaries */ + << shift) + /* nnpt+1 pages go into each iopt */ + & ~(nnpt << 12)); + + nnpt++; /* this seems to work, but the documentation is not clear + about wether we put nnpt or nnpt-1 into the ioste bits. + In theory, this can't work for 4k pages. */ + return mk_ioste(IOST_VALID_MASK + | (iostep & IOST_PT_BASE_MASK) + | ((nnpt << 5) & IOST_NNPT_MASK) + | (ps & IOST_PS_MASK)); +} + +/* compute the address of an io pte */ +static inline unsigned long +get_ioptep(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopt_base; + unsigned long page_size; + unsigned long page_number; + unsigned long iopt_offset; + + iopt_base = iost_entry.val & IOST_PT_BASE_MASK; + page_size = iost_entry.val & IOST_PS_MASK; + + /* decode page size to compute page number */ + page_number = (io_address & 0x0fffffff) >> (10 + 2 * page_size); + /* page number is an offset into the io page table */ + iopt_offset = (page_number << 3) & 0x7fff8ul; + return iopt_base + iopt_offset; +} + +/* compute the tag field of the iopt cache entry */ +static inline unsigned long +get_ioc_tag(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return IOPT_VALID_MASK + | ((iopte & 0x00000000000000ff8ul) >> 3) + | ((iopte & 0x0000003fffffc0000ul) >> 9); +} + +/* compute the hashed 6 bit index for the 4-way associative pte cache */ +static inline unsigned long +get_ioc_hash(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7); +} + +/* same as above, but pretend that we have a simpler 1-way associative + pte cache with an 8 bit index */ +static inline unsigned long +get_ioc_hash_1way(ioste iost_entry, unsigned long io_address) +{ + unsigned long iopte = get_ioptep(iost_entry, io_address); + + return ((iopte & 0x000000000000001f8ul) >> 3) + ^ ((iopte & 0x00000000000020000ul) >> 17) + ^ ((iopte & 0x00000000000010000ul) >> 15) + ^ ((iopte & 0x00000000000008000ul) >> 13) + ^ ((iopte & 0x00000000000004000ul) >> 11) + ^ ((iopte & 0x00000000000002000ul) >> 9) + ^ ((iopte & 0x00000000000001000ul) >> 7) + ^ ((iopte & 0x0000000000000c000ul) >> 8); +} + +static inline ioste +get_iost_cache(void __iomem *base, unsigned long index) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + return mk_ioste(in_be64(&p[index])); +} + +static inline void +set_iost_cache(void __iomem *base, unsigned long index, ioste ste) +{ + unsigned long __iomem *p = (base + IOC_ST_CACHE_DIR); + pr_debug("ioste %02lx was %016lx, store %016lx", index, + get_iost_cache(base, index).val, ste.val); + out_be64(&p[index], ste.val); + pr_debug(" now %016lx\n", get_iost_cache(base, index).val); +} + +static inline unsigned long +get_iopt_cache(void __iomem *base, unsigned long index, unsigned long *tag) +{ + unsigned long __iomem *tags = (void *)(base + IOC_PT_CACHE_DIR); + unsigned long __iomem *p = (void *)(base + IOC_PT_CACHE_REG); + + *tag = tags[index]; + rmb(); + return *p; +} + +static inline void +set_iopt_cache(void __iomem *base, unsigned long index, + unsigned long tag, unsigned long val) +{ + unsigned long __iomem *tags = base + IOC_PT_CACHE_DIR; + unsigned long __iomem *p = base + IOC_PT_CACHE_REG; + pr_debug("iopt %02lx was v%016lx/t%016lx, store v%016lx/t%016lx\n", + index, get_iopt_cache(base, index, &oldtag), oldtag, val, tag); + + out_be64(p, val); + out_be64(&tags[index], tag); +} + +static inline void +set_iost_origin(void __iomem *base) +{ + unsigned long __iomem *p = base + IOC_ST_ORIGIN; + unsigned long origin = IOSTO_ENABLE | IOSTO_SW; + + pr_debug("iost_origin %016lx, now %016lx\n", in_be64(p), origin); + out_be64(p, origin); +} + +static inline void +set_iocmd_config(void __iomem *base) +{ + unsigned long __iomem *p = base + 0xc00; + unsigned long conf; + + conf = in_be64(p); + pr_debug("iost_conf %016lx, now %016lx\n", conf, conf | IOCMD_CONF_TE); + out_be64(p, conf | IOCMD_CONF_TE); +} + +/* FIXME: get these from the device tree */ +#define ioc_base 0x20000511000ull +#define ioc_mmio_base 0x20000510000ull +#define ioid 0x48a +#define iopt_phys_offset (- 0x20000000) /* We have a 512MB offset from the SB */ +#define io_page_size 0x1000000 + +static unsigned long map_iopt_entry(unsigned long address) +{ + switch (address >> 20) { + case 0x600: + address = 0x24020000000ull; /* spider i/o */ + break; + default: + address += iopt_phys_offset; + break; + } + + return get_iopt_entry(address, ioid, IOPT_PROT_RW); +} + +static void iommu_bus_setup_null(struct pci_bus *b) { } +static void iommu_dev_setup_null(struct pci_dev *d) { } + +/* initialize the iommu to support a simple linear mapping + * for each DMA window used by any device. For now, we + * happen to know that there is only one DMA window in use, + * starting at iopt_phys_offset. */ +static void bpa_map_iommu(void) +{ + unsigned long address; + void __iomem *base; + ioste ioste; + unsigned long index; + + base = __ioremap(ioc_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_base, base); + set_iocmd_config(base); + iounmap(base); + + base = __ioremap(ioc_mmio_base, 0x1000, _PAGE_NO_CACHE); + pr_debug("%lx mapped to %p\n", ioc_mmio_base, base); + + set_iost_origin(base); + + for (address = 0; address < 0x100000000ul; address += io_page_size) { + ioste = get_iost_entry(0x10000000000ul, address, io_page_size); + if ((address & 0xfffffff) == 0) /* segment start */ + set_iost_cache(base, address >> 28, ioste); + index = get_ioc_hash_1way(ioste, address); + pr_debug("addr %08lx, index %02lx, ioste %016lx\n", + address, index, ioste.val); + set_iopt_cache(base, + get_ioc_hash_1way(ioste, address), + get_ioc_tag(ioste, address), + map_iopt_entry(address)); + } + iounmap(base); +} + + +static void *bpa_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, unsigned int __nocast flag) +{ + void *ret; + + ret = (void *)__get_free_pages(flag, get_order(size)); + if (ret != NULL) { + memset(ret, 0, size); + *dma_handle = virt_to_abs(ret) | BPA_DMA_VALID; + } + return ret; +} + +static void bpa_free_coherent(struct device *hwdev, size_t size, + void *vaddr, dma_addr_t dma_handle) +{ + free_pages((unsigned long)vaddr, get_order(size)); +} + +static dma_addr_t bpa_map_single(struct device *hwdev, void *ptr, + size_t size, enum dma_data_direction direction) +{ + return virt_to_abs(ptr) | BPA_DMA_VALID; +} + +static void bpa_unmap_single(struct device *hwdev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction) +{ +} + +static int bpa_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ + int i; + + for (i = 0; i < nents; i++, sg++) { + sg->dma_address = (page_to_phys(sg->page) + sg->offset) + | BPA_DMA_VALID; + sg->dma_length = sg->length; + } + + return nents; +} + +static void bpa_unmap_sg(struct device *hwdev, struct scatterlist *sg, + int nents, enum dma_data_direction direction) +{ +} + +static int bpa_dma_supported(struct device *dev, u64 mask) +{ + return mask < 0x100000000ull; +} + +void bpa_init_iommu(void) +{ + bpa_map_iommu(); + + /* Direct I/O, IOMMU off */ + ppc_md.iommu_dev_setup = iommu_dev_setup_null; + ppc_md.iommu_bus_setup = iommu_bus_setup_null; + + pci_dma_ops.alloc_coherent = bpa_alloc_coherent; + pci_dma_ops.free_coherent = bpa_free_coherent; + pci_dma_ops.map_single = bpa_map_single; + pci_dma_ops.unmap_single = bpa_unmap_single; + pci_dma_ops.map_sg = bpa_map_sg; + pci_dma_ops.unmap_sg = bpa_unmap_sg; + pci_dma_ops.dma_supported = bpa_dma_supported; +} Index: linus-2.5/arch/ppc64/kernel/bpa_iommu.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linus-2.5/arch/ppc64/kernel/bpa_iommu.h 2005-04-29 09:47:29.000000000 +0200 @@ -0,0 +1,65 @@ +#ifndef BPA_IOMMU_H +#define BPA_IOMMU_H + +/* some constants */ +enum { + /* segment table entries */ + IOST_VALID_MASK = 0x8000000000000000ul, + IOST_TAG_MASK = 0x3000000000000000ul, + IOST_PT_BASE_MASK = 0x000003fffffff000ul, + IOST_NNPT_MASK = 0x0000000000000fe0ul, + IOST_PS_MASK = 0x000000000000000ful, + + IOST_PS_4K = 0x1, + IOST_PS_64K = 0x3, + IOST_PS_1M = 0x5, + IOST_PS_16M = 0x7, + + /* iopt tag register */ + IOPT_VALID_MASK = 0x0000000200000000ul, + IOPT_TAG_MASK = 0x00000001fffffffful, + + /* iopt cache register */ + IOPT_PROT_MASK = 0xc000000000000000ul, + IOPT_PROT_NONE = 0x0000000000000000ul, + IOPT_PROT_READ = 0x4000000000000000ul, + IOPT_PROT_WRITE = 0x8000000000000000ul, + IOPT_PROT_RW = 0xc000000000000000ul, + IOPT_COHERENT = 0x2000000000000000ul, + + IOPT_ORDER_MASK = 0x1800000000000000ul, + /* order access to same IOID/VC on same address */ + IOPT_ORDER_ADDR = 0x0800000000000000ul, + /* similar, but only after a write access */ + IOPT_ORDER_WRITES = 0x1000000000000000ul, + /* Order all accesses to same IOID/VC */ + IOPT_ORDER_VC = 0x1800000000000000ul, + + IOPT_RPN_MASK = 0x000003fffffff000ul, + IOPT_HINT_MASK = 0x0000000000000800ul, + IOPT_IOID_MASK = 0x00000000000007fful, + + IOSTO_ENABLE = 0x8000000000000000ul, + IOSTO_ORIGIN = 0x000003fffffff000ul, + IOSTO_HW = 0x0000000000000800ul, + IOSTO_SW = 0x0000000000000400ul, + + IOCMD_CONF_TE = 0x0000800000000000ul, + + /* memory mapped registers */ + IOC_PT_CACHE_DIR = 0x000, + IOC_ST_CACHE_DIR = 0x800, + IOC_PT_CACHE_REG = 0x910, + IOC_ST_ORIGIN = 0x918, + IOC_CONF = 0x930, + + /* The high bit needs to be set on every DMA address, + only 2GB are addressable */ + BPA_DMA_VALID = 0x80000000, + BPA_DMA_MASK = 0x7fffffff, +}; + + +void bpa_init_iommu(void); + +#endif Index: linus-2.5/arch/ppc64/kernel/bpa_setup.c =================================================================== --- linus-2.5.orig/arch/ppc64/kernel/bpa_setup.c 2005-04-22 06:59:58.000000000 +0200 +++ linus-2.5/arch/ppc64/kernel/bpa_setup.c 2005-04-29 10:01:12.000000000 +0200 @@ -46,6 +46,7 @@ #include "pci.h" #include "bpa_iic.h" +#include "bpa_iommu.h" #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -179,7 +180,7 @@ hpte_init_native(); - pci_direct_iommu_init(); + bpa_init_iommu(); ppc64_interrupt_controller = IC_BPA_IIC; From paulus at samba.org Fri Apr 29 23:06:54 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 29 Apr 2005 23:06:54 +1000 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu In-Reply-To: <200504291031.59223.arnd@arndb.de> References: <200504190318.32556.arnd@arndb.de> <200504290748.30055.arnd@arndb.de> <1114756984.7111.268.camel@gaston> <200504291031.59223.arnd@arndb.de> Message-ID: <17010.12654.630130.252581@cargo.ozlabs.ibm.com> Arnd Bergmann writes: > Implementation of software load support for the BE iommu. This is very Could you expand a bit on what "software load support" is? Your description is a bit terse. Thanks, Paul. From anton at samba.org Fri Apr 29 23:41:38 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 29 Apr 2005 23:41:38 +1000 Subject: [PATCH] ppc64: noexec fixes Message-ID: <20050429134137.GC19662@krispykreme> Hi, There were a few issues with the ppc64 noexec support: The 64bit ABI has a non executable stack by default. At the moment 64bit apps require a PT_GNU_STACK section in order to have a non executable stack. Disable the read implies exec workaround on the 64bit ABI. The 64bit toolchain has never had problems with incorrect mmap permissions (the 32bit has, thats why we need to retain the workaround). With these fixes as well as a gcc fix from Alan Modra (that was recently committed) 64bit apps work as expected. Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/include/asm-ppc64/page.h =================================================================== --- linux-2.6.12-rc2.orig/include/asm-ppc64/page.h 2005-04-26 17:05:20.466826509 +1000 +++ linux-2.6.12-rc2/include/asm-ppc64/page.h 2005-04-26 17:05:56.628140917 +1000 @@ -252,10 +252,19 @@ /* * This is the default if a program doesn't have a PT_GNU_STACK - * program header entry. + * program header entry. The PPC64 ELF ABI has a non executable stack + * stack by default, so in the absense of a PT_GNU_STACK program header + * we turn execute permission off. */ -#define VM_STACK_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ - VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) #endif /* __KERNEL__ */ #endif /* _PPC64_PAGE_H */ Index: linux-2.6.12-rc2/include/asm-ppc64/elf.h =================================================================== --- linux-2.6.12-rc2.orig/include/asm-ppc64/elf.h 2005-04-26 17:05:20.467826175 +1000 +++ linux-2.6.12-rc2/include/asm-ppc64/elf.h 2005-04-26 17:05:56.631139914 +1000 @@ -229,9 +229,13 @@ /* * An executable for which elf_read_implies_exec() returns TRUE will - * have the READ_IMPLIES_EXEC personality flag set automatically. + * have the READ_IMPLIES_EXEC personality flag set automatically. This + * is only required to work around bugs in old 32bit toolchains. Since + * the 64bit ABI has never had these issues dont enable the workaround + * even if we have an executable stack. */ -#define elf_read_implies_exec(ex, exec_stk) (exec_stk != EXSTACK_DISABLE_X) +#define elf_read_implies_exec(ex, exec_stk) (test_thread_flag(TIF_32BIT) ? \ + (exec_stk != EXSTACK_DISABLE_X) : 0) #endif From anton at samba.org Fri Apr 29 23:44:37 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 29 Apr 2005 23:44:37 +1000 Subject: [PATCH] ppc64: remove unnecessary include Message-ID: <20050429134437.GD19662@krispykreme> Hi, We no longer use any ppcdebug stuff in a.out.h, so remove the define. Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/include/asm-ppc64/a.out.h =================================================================== --- linux-2.6.12-rc2.orig/include/asm-ppc64/a.out.h 2005-03-06 07:08:25.000000000 +1100 +++ linux-2.6.12-rc2/include/asm-ppc64/a.out.h 2005-04-29 17:10:27.122629165 +1000 @@ -1,8 +1,6 @@ #ifndef __PPC64_A_OUT_H__ #define __PPC64_A_OUT_H__ -#include - /* * c 2001 PPC 64 Team, IBM Corp * From anton at samba.org Fri Apr 29 23:51:51 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 29 Apr 2005 23:51:51 +1000 Subject: [PATCH] ppc64: firmware workaround Message-ID: <20050429135151.GE19662@krispykreme> Recent gcc 4.0 testing uncovered a firmware issue. Some properties are larger than 31 bytes and due to gcc 4.0s better stack allocation this overflow ran over non volatile register storage. Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-2.6.12-rc2.orig/arch/ppc64/kernel/prom_init.c 2005-04-29 23:45:36.975308922 +1000 +++ linux-2.6.12-rc2/arch/ppc64/kernel/prom_init.c 2005-04-29 23:48:20.497309791 +1000 @@ -1448,6 +1448,12 @@ } } +/* + * The Open Firmware 1275 specification states properties must be 31 bytes or + * less, however not all firmwares obey this. Make it 64 bytes to be safe. + */ +#define MAX_PROPERTY_NAME 64 + static void __init scan_dt_build_struct(phandle node, unsigned long *mem_start, unsigned long *mem_end) { @@ -1457,7 +1463,7 @@ unsigned long soff; unsigned char *valp; unsigned long offset = reloc_offset(); - char pname[32]; + char pname[MAX_PROPERTY_NAME]; char *path; path = RELOC(prom_scratch); From anton at samba.org Fri Apr 29 23:54:46 2005 From: anton at samba.org (Anton Blanchard) Date: Fri, 29 Apr 2005 23:54:46 +1000 Subject: [PATCH] ppc64: enforce medium thread priority in hypervisor calls Message-ID: <20050429135446.GF19662@krispykreme> Hi, Calls into the hypervisor do not raise the thread priority. Ensure we are running at medium priority upon entry to the hypervisor. Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/arch/ppc64/kernel/pSeries_hvCall.S =================================================================== --- linux-2.6.12-rc2.orig/arch/ppc64/kernel/pSeries_hvCall.S 2005-04-28 18:08:25.284495131 +1000 +++ linux-2.6.12-rc2/arch/ppc64/kernel/pSeries_hvCall.S 2005-04-28 18:13:06.195016349 +1000 @@ -28,6 +28,8 @@ unsigned long *out3); R10 */ _GLOBAL(plpar_hcall) + HMT_MEDIUM + mfcr r0 std r8,STK_PARM(r8)(r1) /* Save out ptrs */ @@ -53,6 +55,8 @@ /* Simple interface with no output values (other than status) */ _GLOBAL(plpar_hcall_norets) + HMT_MEDIUM + mfcr r0 stw r0,8(r1) @@ -75,6 +79,8 @@ unsigned long *out1); 120(R1) */ _GLOBAL(plpar_hcall_8arg_2ret) + HMT_MEDIUM + mfcr r0 ld r11,STK_PARM(r11)(r1) /* put arg8 in R11 */ stw r0,8(r1) @@ -99,6 +105,8 @@ unsigned long *out4); 112(R1) */ _GLOBAL(plpar_hcall_4out) + HMT_MEDIUM + mfcr r0 stw r0,8(r1) From anton at samba.org Sat Apr 30 00:04:10 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 30 Apr 2005 00:04:10 +1000 Subject: [PATCH] ppc64: use smp_mb and smp_wmb Message-ID: <20050429140410.GG19662@krispykreme> Hi, Use smp_mb and smp_wmb. In particular smp_wmb is lighter weight than wmb. Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/arch/ppc64/kernel/smp.c =================================================================== --- linux-2.6.12-rc2.orig/arch/ppc64/kernel/smp.c 2005-04-26 19:32:18.881307808 +1000 +++ linux-2.6.12-rc2/arch/ppc64/kernel/smp.c 2005-04-29 09:45:19.765332186 +1000 @@ -126,7 +126,7 @@ * the processor will continue on to secondary_start */ paca[nr].cpu_start = 1; - mb(); + smp_mb(); } #endif /* CONFIG_PPC_MULTIPLATFORM */ @@ -258,7 +258,7 @@ } call_data = &data; - wmb(); + smp_wmb(); /* Send a message to all other CPUs and wait for them to respond */ smp_ops->message_pass(MSG_ALL_BUT_SELF, PPC_MSG_CALL_FUNCTION); @@ -435,7 +435,7 @@ /* get the target out of it's holding state */ per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; - wmb(); + smp_wmb(); while (!cpu_online(cpu)) cpu_relax(); @@ -451,7 +451,7 @@ int i; for (i = 0; i < 100; i++) { - rmb(); + smp_rmb(); if (per_cpu(cpu_state, cpu) == CPU_DEAD) return; msleep(100); @@ -467,7 +467,7 @@ cpu = smp_processor_id(); printk(KERN_DEBUG "CPU%d offline\n", cpu); __get_cpu_var(cpu_state) = CPU_DEAD; - wmb(); + smp_wmb(); while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) cpu_relax(); @@ -519,7 +519,7 @@ * be written out to main store before we release * the processor. */ - mb(); + smp_mb(); /* wake up cpus */ DBG("smp: kicking cpu %d\n", cpu); Index: linux-2.6.12-rc2/arch/ppc64/kernel/time.c =================================================================== --- linux-2.6.12-rc2.orig/arch/ppc64/kernel/time.c 2005-03-06 07:08:24.000000000 +1100 +++ linux-2.6.12-rc2/arch/ppc64/kernel/time.c 2005-04-29 09:45:19.769331880 +1000 @@ -221,15 +221,15 @@ temp_varp->tb_to_xs = do_gtod.varp->tb_to_xs; temp_varp->tb_orig_stamp = new_tb_orig_stamp; temp_varp->stamp_xsec = new_stamp_xsec; - mb(); + smp_mb(); do_gtod.varp = temp_varp; do_gtod.var_idx = temp_idx; ++(systemcfg->tb_update_count); - wmb(); + smp_wmb(); systemcfg->tb_orig_stamp = new_tb_orig_stamp; systemcfg->stamp_xsec = new_stamp_xsec; - wmb(); + smp_wmb(); ++(systemcfg->tb_update_count); } @@ -648,7 +648,7 @@ temp_varp->tb_to_xs = new_tb_to_xs; temp_varp->stamp_xsec = new_stamp_xsec; temp_varp->tb_orig_stamp = do_gtod.varp->tb_orig_stamp; - mb(); + smp_mb(); do_gtod.varp = temp_varp; do_gtod.var_idx = temp_idx; @@ -662,10 +662,10 @@ * loops back and reads them again until this criteria is met. */ ++(systemcfg->tb_update_count); - wmb(); + smp_wmb(); systemcfg->tb_to_xs = new_tb_to_xs; systemcfg->stamp_xsec = new_stamp_xsec; - wmb(); + smp_wmb(); ++(systemcfg->tb_update_count); write_sequnlock_irqrestore( &xtime_lock, flags ); From anton at samba.org Sat Apr 30 01:25:00 2005 From: anton at samba.org (Anton Blanchard) Date: Sat, 30 Apr 2005 01:25:00 +1000 Subject: [PATCH] ppc64: reverse prediction on spinlock busy loop code Message-ID: <20050429152500.GI19662@krispykreme> Mispredicting the spinlock busy loop also means we slow down the rate at which we do the loads which can be good for heavily contended locks. Note: There are some gcc issues with our default build and branch prediction, but a CONFIG_POWER4_ONLY build should emit them correctly. Im working with Alan Modra on it now. Anton -- From: Jake Moilanen On our raw spinlocks, we currently have an attempt at the lock, and if we do not get it we enter a spin loop. This spinloop will likely continue for awhile, and we pridict likely. Shouldn't we predict that we will get out of the loop so our next instructions are already prefetched. Even when we miss because the lock is still held, it won't matter since we are waiting anyways. I did a couple quick benchmarks, but the results are inconclusive. 16-way 690 running specjbb with original code # ./specjbb 3000 16 1 1 19 30 120 ... Valid run, Score is 59282 16-way 690 running specjbb with unlikely code # ./specjbb 3000 16 1 1 19 30 120 ... Valid run, Score is 59541 I saw a smaller increase on a JS20 (~1.6%) JS20 specjbb w/ original code # ./specjbb 400 2 1 1 19 30 120 ... Valid run, Score is 20460 JS20 specjbb w/ unlikely code # ./specjbb 400 2 1 1 19 30 120 ... Valid run, Score is 20803 Jake Signed-off-by: Jake Moilanen Signed-off-by: Anton Blanchard Index: linux-2.6.12-rc2/include/asm-ppc64/spinlock.h =================================================================== --- linux-2.6.12-rc2.orig/include/asm-ppc64/spinlock.h 2005-04-29 09:45:52.770813561 +1000 +++ linux-2.6.12-rc2/include/asm-ppc64/spinlock.h 2005-04-29 10:27:34.217207239 +1000 @@ -110,7 +110,7 @@ HMT_low(); if (SHARED_PROCESSOR) __spin_yield(lock); - } while (likely(lock->lock != 0)); + } while (unlikely(lock->lock != 0)); HMT_medium(); } } @@ -128,7 +128,7 @@ HMT_low(); if (SHARED_PROCESSOR) __spin_yield(lock); - } while (likely(lock->lock != 0)); + } while (unlikely(lock->lock != 0)); HMT_medium(); local_irq_restore(flags_dis); } @@ -194,7 +194,7 @@ HMT_low(); if (SHARED_PROCESSOR) __rw_yield(rw); - } while (likely(rw->lock < 0)); + } while (unlikely(rw->lock < 0)); HMT_medium(); } } @@ -251,7 +251,7 @@ HMT_low(); if (SHARED_PROCESSOR) __rw_yield(rw); - } while (likely(rw->lock != 0)); + } while (unlikely(rw->lock != 0)); HMT_medium(); } } From paulus at samba.org Sat Apr 30 14:31:32 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 30 Apr 2005 14:31:32 +1000 Subject: [PATCH] ppc64: fix 32-bit signal frame back link Message-ID: <17011.2596.728274.392025@cargo.ozlabs.ibm.com> When the kernel creates a signal frame on the user stack, it puts the old stack pointer value at the beginning so that the signal frame is linked into the chain of stack frames like any other frame. Unfortunately, for 32-bit processes we are writing the old stack pointer as a 64-bit value rather than a 32-bit value, and the process sees that as a null pointer, since it only looks at the first 32 bits, which are zero since ppc is bigendian and the stack pointer is below 4GB. This bug is in SLES9 and RHEL4 too, hence the ccs. This patch fixes the bug by making the signal code write the old stack pointer as a u32 instead of an unsigned long. Signed-off-by: Paul Mackerras --- diff -urN linux-2.6/arch/ppc64/kernel/signal32.c g5-ppc64/arch/ppc64/kernel/signal32.c --- linux-2.6/arch/ppc64/kernel/signal32.c 2005-04-26 15:37:55.000000000 +1000 +++ g5-ppc64/arch/ppc64/kernel/signal32.c 2005-04-29 23:04:40.000000000 +1000 @@ -657,7 +657,7 @@ /* Save user registers on the stack */ frame = &rt_sf->uc.uc_mcontext; - if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) + if (put_user(regs->gpr[1], (u32 __user *)newsp)) goto badframe; if (vdso32_rt_sigtramp && current->thread.vdso_base) { @@ -842,7 +842,7 @@ regs->link = (unsigned long) frame->mctx.tramp; } - if (put_user(regs->gpr[1], (unsigned long __user *)newsp)) + if (put_user(regs->gpr[1], (u32 __user *)newsp)) goto badframe; regs->gpr[1] = (unsigned long) newsp; regs->gpr[3] = sig; From arnd at arndb.de Sat Apr 30 22:02:02 2005 From: arnd at arndb.de (=?iso-8859-1?Q?Arnd_Bergmann?=) Date: Sat, 30 Apr 2005 14:02:02 +0200 Subject: [PATCH 3/4] ppc64: Add driver for BPA iommu Message-ID: <26879984$111486195242737180269552.40594107@config8.schlund.de> Paul Mackerras schrieb am 29.04.2005, 15:06:54: > Arnd Bergmann writes: > > > Implementation of software load support for the BE iommu. This is very > > Could you expand a bit on what "software load support" is? Your > description is a bit terse. The Cell processor can put the I/O page table either in memory like the hashed page table (hardware load) or have the operating system write the entries into memory mapped CPU registers (software load). I use the software load mechanism because I know that all I/O page table entries for the amount of installed physical memory fit into the IO TLB cache. At the point when we get machines with more than 4GB of installed memory, we can either use hardware I/O page table access like the other platforms do or dynamically update the I/O TLB entries when a page fault occurs in the I/O subsystem. The software load can then use the macros that I have implemented for the static mapping in order to do the TLB cache updates. I'll make sure to add the description to the patches when I send them next time. Arnd <><