From sfr at canb.auug.org.au Mon Mar 1 00:50:26 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 1 Mar 2004 00:50:26 +1100 Subject: 2.6 viodasd and IDE emulation In-Reply-To: <20040227152313.GN5801@krispykreme> References: <20040227152313.GN5801@krispykreme> Message-ID: <20040301005026.71600d7c.sfr@canb.auug.org.au> Thanks Anton for letting people know. On Sat, 28 Feb 2004 02:23:14 +1100 Anton Blanchard wrote: > > IDE emulation on iseries virtual disks was removed in 2.6 recently (it > had no chance of being merged upstream) so if you are having trouble > finding the root filesystem on your iseries that is probably it. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jdewand at redhat.com Tue Mar 2 08:15:11 2004 From: jdewand at redhat.com (Julie DeWandel) Date: Mon, 01 Mar 2004 16:15:11 -0500 Subject: lparcfg.c comments Message-ID: <4043A7DF.2040205@redhat.com> I was merging the SPLPAR changes for lparcfg.c into our version of the code when I noticed a few things. Here they are: Bug: * parse_system_parameter_string kmallocs some buffers ("local_buffer", "workbuffer") but kmalloc could fail and return a NULL. The code needs to check for this rather than to potentially reference a null pointer. Suggestions: * h_get_ppp, h_pic, and h_purr are declared as returning an unsigned int. However, these functions always return 0 and that value is never checked by the caller. Suggest declaring them as returning a void. * h_get_ppp, h_pic, h_purr, and parse_system_parameter_string aren't called from elsewhere. Suggest making them static. * parse_system_parameter_string declares things like "workbuffer", "idx", "w_idx" in the middle of the code which isn't appreciated by the compiler. Suggest declaring them at the top of the function or top of the code block they are used within. * both lparcfg.c and viopath.c declare the same function: e2a. Suggest making one of them non-static (the one in viopath.c) and eliminating the one from lparcfg.c. Thanks, Julie ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Mar 2 08:53:34 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 01 Mar 2004 15:53:34 -0600 Subject: [ppc64 patch] Message-ID: <4043B0DE.9070404@us.ibm.com> Hi Linus, please apply these two patches. viodev->unit_address only carries 32 bits of information anyways, so there's no point in making it 64-bits wide. To hotplug-remove virtual devices, we need vio_find_node() so we have a pointer to pass to vio_unregister_device(). -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vio-unitaddr.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040301/2d2dc2c0/attachment.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: vio-find.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040301/2d2dc2c0/attachment-0001.txt From torvalds at osdl.org Tue Mar 2 09:04:39 2004 From: torvalds at osdl.org (Linus Torvalds) Date: Mon, 1 Mar 2004 14:04:39 -0800 (PST) Subject: [ppc64 patch] In-Reply-To: <4043B0DE.9070404@us.ibm.com> References: <4043B0DE.9070404@us.ibm.com> Message-ID: On Mon, 1 Mar 2004, Hollis Blanchard wrote: > >Hi Linus, please apply these two patches. Please send separate patches as separate emails. Now I need to save the email twice, and then edit away the proper parts, and then apply a single email as two separate ones. Not convenient. Much better to just send me two emails, with two patches and two explanations. Linus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From santil at us.ibm.com Tue Mar 2 10:19:34 2004 From: santil at us.ibm.com (Santiago Leon) Date: Mon, 01 Mar 2004 18:19:34 -0500 Subject: [PATCH] broken PowerPC Virtual Ethernet Message-ID: <4043C506.8050105@us.ibm.com> Linus, Please apply this patch... It fixes the PowerPC Virtual Ethernet driver that got broken by the recent ppc64 iommu patch... -- Santiago A. Leon Power Linux Development IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ibmveth_iommu.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040301/8269eada/attachment.txt From linas at austin.ibm.com Tue Mar 2 11:48:44 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 1 Mar 2004 18:48:44 -0600 Subject: kdb ppc64 patches Message-ID: <20040301184844.A33678@forte.austin.ibm.com> Hi, I've just started applying the latest 2.6.3 kdb patches against the latest 2.6.4-rc1 ppc64 development tree kept in ameslab. The patch applied more or less cleanly, except in the two places where it should have failed. I'm now trying to compile. In the meanwhile, I wanted to start sending back some prelim feedback, see if this channel is still open, i.e. that this is still the right place to send kdb feedback. Makefile: round line 410, the ameslab Makefile seems have a fancier/better construct than what comes out with the stock kdb patch: core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ ifeq ($(CONFIG_KDB),y) # Use ifeq for now because kdb subdirs are not in bk yet # Otherwise make mrproper will die because it also cleans core-n core-y += kdb/ endif (The above is not a diff, because I have nothig to diff against). --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kaos at sgi.com Tue Mar 2 11:57:37 2004 From: kaos at sgi.com (Keith Owens) Date: Tue, 02 Mar 2004 11:57:37 +1100 Subject: kdb ppc64 patches In-Reply-To: Your message of "Mon, 01 Mar 2004 18:48:44 MDT." <20040301184844.A33678@forte.austin.ibm.com> Message-ID: <2347.1078189057@kao2.melbourne.sgi.com> On Mon, 1 Mar 2004 18:48:44 -0600, linas at austin.ibm.com wrote: >ifeq ($(CONFIG_KDB),y) > # Use ifeq for now because kdb subdirs are not in bk yet > # Otherwise make mrproper will die because it also cleans core-n > core-y += kdb/ >endif Yuck! Make it core-$(subst n,ignore,$(CONFIG_KDB)) += kdb/ # workaround for make mrproper hitting $(core-n) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Mar 2 23:33:26 2004 From: anton at samba.org (Anton Blanchard) Date: Tue, 2 Mar 2004 23:33:26 +1100 Subject: lparcfg.c comments In-Reply-To: <4043A7DF.2040205@redhat.com> References: <4043A7DF.2040205@redhat.com> Message-ID: <20040302123326.GM5801@krispykreme> Hi, > I was merging the SPLPAR changes for lparcfg.c into our version of the > code when I noticed a few things. Here they are: Good suggestions. Bonus points for using seq_file or even the simpler seq_single proc interfaces for the read part at least. Check out the latest checkin to arch/ppc64/kernel/eeh.c for how simple it is. We should really convert all the stuff we can (like rtas-proc.c) to use it. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From od at suse.de Wed Mar 3 02:04:02 2004 From: od at suse.de (Olaf Dabrunz) Date: Tue, 2 Mar 2004 16:04:02 +0100 Subject: [BUG] sungem driver crashes on "cat /sys/.../config" Message-ID: <20040302150402.GA3775@suse.de> Hello benh, you were right, the crash can only be reproduced when the interface is down. After "ifconfig eth0 up" both "cat /sys/.../config" on the ethernet device and "lspci" work. Regards, -- Olaf Dabrunz (od / odabrunz), SUSE Linux AG, N?rnberg ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Mar 3 14:32:26 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 3 Mar 2004 14:32:26 +1100 Subject: KDB in ameslab In-Reply-To: <20040302195946.J74832@forte.austin.ibm.com> References: <20040217043527.GC25491@krispykreme> <20040302195946.J74832@forte.austin.ibm.com> Message-ID: <20040303033226.GO5801@krispykreme> > *Please* apply this ASAP, before it bit-rots, and I have to do the > work all over again. Good stuff, only I cant seem to find include/linux/dis-asm.h. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kaos at ocs.com.au Wed Mar 3 14:45:18 2004 From: kaos at ocs.com.au (Keith Owens) Date: Wed, 03 Mar 2004 14:45:18 +1100 Subject: KDB in ameslab In-Reply-To: Your message of "Wed, 03 Mar 2004 14:32:26 +1100." <20040303033226.GO5801@krispykreme> Message-ID: <5192.1078285518@kao2.melbourne.sgi.com> On Wed, 3 Mar 2004 14:32:26 +1100, Anton Blanchard wrote: > >> *Please* apply this ASAP, before it bit-rots, and I have to do the >> work all over again. > >Good stuff, only I cant seem to find include/linux/dis-asm.h. Part of the kdb common (arch independent) patch. ftp://oss.sgi.com/projects/kdb/download/v4.3/, pick your release base. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kaos at sgi.com Wed Mar 3 21:17:57 2004 From: kaos at sgi.com (Keith Owens) Date: Wed, 03 Mar 2004 21:17:57 +1100 Subject: KDB in ameslab In-Reply-To: Your message of "Tue, 02 Mar 2004 19:59:46 MDT." <20040302195946.J74832@forte.austin.ibm.com> Message-ID: <2624.1078309077@ocs3.ocs.com.au> On Tue, 2 Mar 2004 19:59:46 -0600, linas at austin.ibm.com wrote: >p.s. Keith, I will try to create and send you the corresponding >ppc64 architecture patch; however, this might be ugly, because >I suspect the andrew morton kernels are still fairly out of sync >with the BK ameslab trees. That is OK, I had the same problem with ia64 in 2.4. Users had to apply the ia64 patches from David Mosberger first, then kdb for ia64. Submit kdb for ppc64 patches against the current ppc64 tree, which I guess is what your recent patch is. Tell users that they have to apply the general ppc64 patches first, before applying kdb for ppc64. Alternatively you can maintain kdb for ppc64 directly in the ppc64 tree, then I do not have to get involved. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Mar 3 21:34:04 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 3 Mar 2004 21:34:04 +1100 Subject: KDB in ameslab In-Reply-To: <20040302195946.J74832@forte.austin.ibm.com> References: <20040217043527.GC25491@krispykreme> <20040302195946.J74832@forte.austin.ibm.com> Message-ID: <20040303103404.GP5801@krispykreme> > p.s. Keith, I will try to create and send you the corresponding > ppc64 architecture patch; however, this might be ugly, because > I suspect the andrew morton kernels are still fairly out of sync > with the BK ameslab trees. Its not too bad, Andrew and Linus' tree are usually a few days to a week out of date. (except a few things that we have to rework before submitting, but there arent many of them left). With some luck the patch will apply to -mm and mainline. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Thu Mar 4 00:55:32 2004 From: olh at suse.de (Olaf Hering) Date: Wed, 3 Mar 2004 14:55:32 +0100 Subject: [PATCH] improve xmon symbol lookup Message-ID: <20040303135532.GA11204@suse.de> xmon will die reliable in symbol lookup, if the system is broken enough. This patch makes the lookup optional, E instead of e, T instead of t. Please apply. diff -urN /dev/shm/linuxppc64-2.5/arch/ppc64/xmon/xmon.c linuxppc64-2.5/arch/ppc64/xmon/xmon.c --- /dev/shm/linuxppc64-2.5/arch/ppc64/xmon/xmon.c 2004-03-03 00:48:17.000000000 +0100 +++ linuxppc64-2.5/arch/ppc64/xmon/xmon.c 2004-03-03 14:48:33.000000000 +0100 @@ -86,8 +86,8 @@ static void prdump(unsigned long, long); static int ppc_inst_dump(unsigned long, long); void print_address(unsigned long); static int getsp(void); -static void backtrace(struct pt_regs *); -static void excprint(struct pt_regs *); +static void backtrace(struct pt_regs *, int use_kallsyms); +static void excprint(struct pt_regs *, int use_kallsyms); static void prregs(struct pt_regs *); static void memops(int); static void memlocate(void); @@ -160,6 +160,7 @@ Commands:\n\ di dump instructions\n\ df dump float values\n\ dd dump double values\n\ + E print exception information with sym\n\ e print exception information\n\ f flush cache\n\ la lookup symbol+offset of specified address\n\ @@ -175,8 +176,8 @@ Commands:\n\ r print registers\n\ s single step\n\ S print special registers\n\ + T print backtrace with sym\n\ t print backtrace\n\ - T Enable/Disable PPCDBG flags\n\ x exit monitor\n\ u dump segment table or SLB\n\ ? help\n" @@ -192,10 +193,14 @@ static int xmon_trace[NR_CPUS]; static struct pt_regs *xmon_regs[NR_CPUS]; void __xmon_print_symbol(const char *fmt, unsigned long address); -#define xmon_print_symbol(fmt, addr) \ +#define xmon_print_symbol(yes,fmt, addr) \ do { \ + if (yes) { \ __check_printsym_format(fmt, ""); \ __xmon_print_symbol(fmt, addr); \ + } else { \ + printf(fmt,""); \ + } \ } while(0) /* @@ -296,7 +301,7 @@ xmon(struct pt_regs *excp) msr = get_msr(); set_msrd(msr & ~MSR_EE); /* disable interrupts */ xmon_regs[smp_processor_id()] = excp; - excprint(excp); + excprint(excp, 0); #ifdef CONFIG_SMP leaving_xmon = 0; /* possible race condition here if a CPU is held up and gets @@ -358,13 +363,13 @@ xmon_bpt(struct pt_regs *regs) if (bp->count) { --bp->count; remove_bpts(); - excprint(regs); + excprint(regs, 0); xmon_trace[smp_processor_id()] = BRSTEP; regs->msr |= MSR_SE; } else { printf("Stopped at breakpoint %x (%lx ", (bp - bpts) + 1, bp->address); - xmon_print_symbol("%s)\n", bp->address); + xmon_print_symbol(0, "%s)\n", bp->address); xmon(regs); } return 1; @@ -390,7 +395,7 @@ xmon_dabr_match(struct pt_regs *regs) if (dabr.enabled && dabr.count) { --dabr.count; remove_bpts(); - excprint(regs); + excprint(regs, 0); xmon_trace[smp_processor_id()] = BRSTEP; regs->msr |= MSR_SE; } else { @@ -406,7 +411,7 @@ xmon_iabr_match(struct pt_regs *regs) if (iabr.enabled && iabr.count) { --iabr.count; remove_bpts(); - excprint(regs); + excprint(regs, 0); xmon_trace[smp_processor_id()] = BRSTEP; regs->msr |= MSR_SE; } else { @@ -546,17 +551,26 @@ cmds(struct pt_regs *excp) if (excp != NULL) prregs(excp); /* print regs */ break; + case 'E': + if (excp == NULL) + printf("No exception information\n"); + else + excprint(excp, 1); + break; case 'e': if (excp == NULL) printf("No exception information\n"); else - excprint(excp); + excprint(excp, 0); break; case 'S': super_regs(); break; + case 'T': + backtrace(excp, 1); + break; case 't': - backtrace(excp); + backtrace(excp, 0); break; case 'f': cacheflush(); @@ -584,9 +598,11 @@ cmds(struct pt_regs *excp) #endif /* CONFIG_SMP */ case 'z': bootcmds(); +#if 0 case 'T': debug_trace(); break; +#endif case 'u': dump_segments(); break; @@ -803,7 +819,7 @@ bpt_cmds(void) } else { printf("Cleared breakpoint %x (%lx ", (bp - bpts) + 1, bp->address); - xmon_print_symbol("%s)\n", bp->address); + xmon_print_symbol(1, "%s)\n", bp->address); bp->enabled = 0; } } @@ -840,7 +856,7 @@ bpt_cmds(void) if (bp->enabled) { printf("%2x trap %.16lx %8x ", bpnum, bp->address, bp->count); - xmon_print_symbol("%s\n", bp->address); + xmon_print_symbol(1, "%s\n", bp->address); } break; } @@ -867,7 +883,7 @@ bpt_cmds(void) scanhex(&bp->count); printf("Set breakpoint %2x trap %.16lx %8x ", (bp-bpts) + 1, bp->address, bp->count); - xmon_print_symbol("%s\n", bp->address); + xmon_print_symbol(1, "%s\n", bp->address); break; } } @@ -898,7 +914,7 @@ const char *getvecname(unsigned long vec } static void -backtrace(struct pt_regs *excp) +backtrace(struct pt_regs *excp, int use_kallsyms) { unsigned long sp; unsigned long lr; @@ -950,7 +966,7 @@ backtrace(struct pt_regs *excp) printf("exception: %lx %s regs %lx\n", regs.trap, getvecname(regs.trap), sp+112); printf(" %.16lx", regs.nip); if (regs.nip & 0xffffffff00000000UL) - xmon_print_symbol(" %s", regs.nip); + xmon_print_symbol(use_kallsyms, " %s", regs.nip); printf("\n"); if (regs.gpr[1] < sp) { printf("\n", regs.gpr[1]); @@ -962,7 +978,7 @@ backtrace(struct pt_regs *excp) break; } else { if (stack[2]) - xmon_print_symbol(" %s", stack[2]); + xmon_print_symbol(use_kallsyms, " %s", stack[2]); printf("\n"); } if (stack[0] && stack[0] <= sp) { @@ -989,7 +1005,7 @@ getsp() spinlock_t exception_print_lock = SPIN_LOCK_UNLOCKED; void -excprint(struct pt_regs *fp) +excprint(struct pt_regs *fp, int use_kallsyms) { unsigned long flags; @@ -1001,10 +1017,10 @@ excprint(struct pt_regs *fp) printf("Vector: %lx %s at [%lx]\n", fp->trap, getvecname(fp->trap), fp); printf(" pc: %lx", fp->nip); - xmon_print_symbol(" (%s)\n", fp->nip); + xmon_print_symbol(use_kallsyms, " (%s)\n", fp->nip); printf(" lr: %lx", fp->link); - xmon_print_symbol(" (%s)\n", fp->link); + xmon_print_symbol(use_kallsyms, " (%s)\n", fp->link); printf(" sp: %lx\n", fp->gpr[1]); printf(" msr: %lx\n", fp->msr); @@ -1959,7 +1975,7 @@ symbol_lookup(void) case 'a': if (scanhex(&addr)) { printf("%lx: ", addr); - xmon_print_symbol("%s\n", addr); + xmon_print_symbol(1, "%s\n", addr); } termch = 0; break; @@ -2029,6 +2045,7 @@ void __xmon_print_symbol(const char *fmt } } +#if 0 static void debug_trace(void) { unsigned long val, cmd, on; @@ -2077,6 +2094,7 @@ static void debug_trace(void) cmd = skipbl(); } } +#endif static void dump_slb(void) { -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Mar 4 01:20:32 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 4 Mar 2004 01:20:32 +1100 Subject: [PATCH] improve xmon symbol lookup In-Reply-To: <20040303135532.GA11204@suse.de> References: <20040303135532.GA11204@suse.de> Message-ID: <20040303142032.GV5801@krispykreme> > xmon will die reliable in symbol lookup, if the system is broken enough. > > This patch makes the lookup optional, E instead of e, T instead of t. Id love to know where we are locking up. Actually im not sure why we dont do the following, just wrap the entire kallsyms call with the __debugger_fault_handler stuff. Untested patch below. Anton ===== arch/ppc64/xmon/xmon.c 1.36 vs edited ===== --- 1.36/arch/ppc64/xmon/xmon.c Fri Feb 27 16:25:16 2004 +++ edited/arch/ppc64/xmon/xmon.c Thu Mar 4 01:16:49 2004 @@ -191,12 +191,7 @@ static struct pt_regs *xmon_regs[NR_CPUS]; -void __xmon_print_symbol(const char *fmt, unsigned long address); -#define xmon_print_symbol(fmt, addr) \ -do { \ - __check_printsym_format(fmt, ""); \ - __xmon_print_symbol(fmt, addr); \ -} while(0) +void xmon_print_symbol(const char *fmt, unsigned long address); /* * Stuff for reading and writing memory safely @@ -1974,6 +1969,8 @@ else printf("Symbol '%s' not found.\n", tmp); sync(); + /* wait a little while to see if we get a machine check */ + __delay(200); } __debugger_fault_handler = 0; termch = 0; @@ -1981,51 +1978,20 @@ } } - -/* xmon version of __print_symbol */ -void __xmon_print_symbol(const char *fmt, unsigned long address) +/* xmon version of print_symbol */ +void xmon_print_symbol(const char *fmt, unsigned long address) { - char *modname; - const char *name; - unsigned long offset, size; - if (setjmp(bus_error_jmp) == 0) { __debugger_fault_handler = handle_fault; sync(); - name = kallsyms_lookup(address, &size, &offset, &modname, - tmpstr); + + print_symbol(fmt, address); + sync(); /* wait a little while to see if we get a machine check */ __delay(200); } else { - name = "symbol lookup failed"; - } - - __debugger_fault_handler = 0; - - if (!name) { - char addrstr[sizeof("0x%lx") + (BITS_PER_LONG*3/10)]; - - sprintf(addrstr, "0x%lx", address); - printf(fmt, addrstr); - return; - } - - if (modname) { - /* This is pretty small. */ - char buffer[sizeof("%s+%#lx/%#lx [%s]") - + strlen(name) + 2*(BITS_PER_LONG*3/10) - + strlen(modname)]; - - sprintf(buffer, "%s+%#lx/%#lx [%s]", - name, offset, size, modname); - printf(fmt, buffer); - } else { - char buffer[sizeof("%s+%#lx/%#lx") - + strlen(name) + 2*(BITS_PER_LONG*3/10)]; - - sprintf(buffer, "%s+%#lx/%#lx", name, offset, size); - printf(fmt, buffer); + printf("*** symbol lookup failed"); } } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jean-laurent.gazelle at thalescomputers.fr Thu Mar 4 02:23:44 2004 From: jean-laurent.gazelle at thalescomputers.fr (Gazelle Jean-Laurent) Date: Wed, 3 Mar 2004 16:23:44 +0100 Subject: kernel 2.6.3 on JS20 Message-ID: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> Hi all, I'm trying to build linux kernel 2.6 on a JS20 platform. I extracted the TAGGED version 2.6.3 of ameslab 2.5 repository and rebuilded it successfully but run into an issue at boot time (complete log is attached): [boot]0100 MM Init [boot]0100 MM Init Done register_vpa: cpu 0x0 Trap instruction interrupt : Invalid Instruction Looks like 'register_vpa' hcall fails... I should have missed something. Do I need another kernel version ? Any kind of information welcome. regards, -- ------------------------------------------------------------ Jean-Laurent GAZELLE 150, rue M. Berthelot Phone: +33 (0)4 98 16 34 66 Z.I. Toulon Est BP244 Fax : +33 (0)4 98 16 34 01 83078 TOULON Cedex9 e-mail : jlg at thalescomputers.fr Thales Computers FRANCE _____________ http://www.thalescomputers.com _______________ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: js20_linux-2.6.3.txt Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040303/f456a8ca/attachment.txt From moilanen at austin.ibm.com Thu Mar 4 02:36:33 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 03 Mar 2004 09:36:33 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> Message-ID: <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> Yeah, there are a number of FW regressions lately. This one is due to FW giving us the SPLPAR hypertas set and the kernel thinking it's on a SPLPAR machine. I have a patch that has a bunch of hacks to work around the FW issues (there's more past this one). It's not pretty, but it will get your system booted at least. The second patch is to work around an IDE HW issue where the IO space has to be in the ISA range. FW is supposed to get a fix for this one soon. Thanks, Jake On Wed, 2004-03-03 at 09:23, Gazelle Jean-Laurent wrote: > Hi all, > > I'm trying to build linux kernel 2.6 on a JS20 platform. > > I extracted the TAGGED version 2.6.3 of ameslab 2.5 repository and rebuilded it > successfully but run into an issue at boot time (complete log is attached): > [boot]0100 MM Init > [boot]0100 MM Init Done > register_vpa: cpu 0x0 > Trap instruction interrupt : Invalid Instruction > > Looks like 'register_vpa' hcall fails... > > > I should have missed something. Do I need another kernel version ? > Any kind of information welcome. > > > regards, > -- > ------------------------------------------------------------ > Jean-Laurent GAZELLE 150, rue M. Berthelot > Phone: +33 (0)4 98 16 34 66 Z.I. Toulon Est BP244 > Fax : +33 (0)4 98 16 34 01 83078 TOULON Cedex9 > e-mail : jlg at thalescomputers.fr > Thales Computers FRANCE > _____________ http://www.thalescomputers.com _______________ -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-js20-fw-workarounds-2.patch.bz2 Type: application/x-bzip Size: 1354 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040303/ecbc6735/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-js20-ide-workaround-1.patch.bz2 Type: application/x-bzip Size: 965 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040303/ecbc6735/attachment-0001.bin From olh at suse.de Thu Mar 4 02:39:07 2004 From: olh at suse.de (Olaf Hering) Date: Wed, 3 Mar 2004 16:39:07 +0100 Subject: crashes in clear_user_page Message-ID: <20040303153907.GC15383@suse.de> I get crashes in clear_user_page() while building rpms on a p660. gcc is 3.2.2, config is arch/ppc64/configs/pseries, plain ameslab tree. there is lot of IO, 8 processes do unpack rpms in parallel on a reiserfs filesystem. This is not a new problem, I think it happend also whit 2.6.3. any ideas how to track it down? --- arch/ppc64/configs/pSeries_defconfig 2004-03-02 02:49:29.000000000 +0100 +++ .config 2004-03-03 13:17:22.000000000 +0100 @@ -919,7 +923,7 @@ CONFIG_XFS_POSIX_ACL=y # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set -CONFIG_AUTOFS_FS=m +CONFIG_AUTOFS_FS=y # CONFIG_AUTOFS4_FS is not set # @@ -1050,8 +1054,8 @@ CONFIG_OPROFILE=y # Kernel hacking # CONFIG_DEBUG_KERNEL=y -CONFIG_DEBUG_STACKOVERFLOW=y -CONFIG_DEBUG_STACK_USAGE=y +# CONFIG_DEBUG_STACKOVERFLOW is not set +# CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_SLAB is not set CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUGGER=y @@ -1060,7 +1064,7 @@ CONFIG_XMON=y CONFIG_XMON_DEFAULT=y # CONFIG_PPCDBG is not set # CONFIG_DEBUG_INFO is not set -CONFIG_DEBUG_SPINLOCK_SLEEP=y +# CONFIG_DEBUG_SPINLOCK_SLEEP is not set # # Security options @@ -1072,21 +1076,21 @@ CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y -CONFIG_CRYPTO_NULL=m -CONFIG_CRYPTO_MD4=m +# CONFIG_CRYPTO_NULL is not set +# CONFIG_CRYPTO_MD4 is not set CONFIG_CRYPTO_MD5=m CONFIG_CRYPTO_SHA1=m -CONFIG_CRYPTO_SHA256=m -CONFIG_CRYPTO_SHA512=m +# CONFIG_CRYPTO_SHA256 is not set +# CONFIG_CRYPTO_SHA512 is not set CONFIG_CRYPTO_DES=m -CONFIG_CRYPTO_BLOWFISH=m -CONFIG_CRYPTO_TWOFISH=m -CONFIG_CRYPTO_SERPENT=m -CONFIG_CRYPTO_AES=m -CONFIG_CRYPTO_CAST5=m -CONFIG_CRYPTO_CAST6=m +# CONFIG_CRYPTO_BLOWFISH is not set +# CONFIG_CRYPTO_TWOFISH is not set +# CONFIG_CRYPTO_SERPENT is not set +# CONFIG_CRYPTO_AES is not set +# CONFIG_CRYPTO_CAST5 is not set +# CONFIG_CRYPTO_CAST6 is not set CONFIG_CRYPTO_DEFLATE=m -CONFIG_CRYPTO_TEST=m +# CONFIG_CRYPTO_TEST is not set # # Library routines papaya:~ # w 15:08:49 up 24 min, 2 users, load average: 0.59, 0.40, 0.27 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root ttyS0 14:45 0.00s 0.06s 0.01s w root pts/23 15:02 48.00s 13.75s 13.69s rm -rf /abuild/olh /abuild/olh6 papaya:~ # cpu 0: Vector: 300 (Data Access) at [c00000010a542f40] pc: c000000000045270 () lr: c000000000094080 () sp: c00000010a5431c0 msr: a000000000009032 dar: c0000000fffff000 dsisr: a000000 current = 0xc000000131519260 paca = 0xc00000000052c000 pid = 15128, comm = rpm press ? for help 0:mon> t c00000010a5431c0 c000000000094040 c00000010a5432a0 c0000000000943d0 c00000010a543390 c000000000094f0c c00000010a543440 c0000000000442a8 c00000010a543570 c00000000000aa94 c00000010a543860 c0000000005d4100 c00000010a543900 c00000000007d9a4 c00000010a5439e0 c00000000007e1ac c00000010a543ab0 c00000000007e288 c00000010a543c80 c0000000000a933c c00000010a543d20 c0000000000a9744 c00000010a543dc0 c000000000024380 c00000010a543e30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000010a543ea0 000000000fc62bcc 0:mon> ci stopping all cpus cpu0p moVcetctoor: 500 (Hardware Interrupt) at [c00000000b857b60] pc: c0000000000148e8 () lr: c000000000014910 () sp: c00000000b857de0 msr: a000000000009032 current = 0xc000000003f28970 paca = 0xc00000000052e000 pid = 0, comm = swapper cpu 5: Vector: 500 (Hardware Interrupt) at [c000000003f47b60] pc: c0000000000148f4 () lr: c000000000014910 () sp: c000000003f47de0 msr: a000000000009032 current = 0xc000000003f59b50 paca = 0xc000000000536000 pid = 0, comm = swapper cpu 2: Vector: 500 (Hardware Interrupt) at [c00000000b847b60] pc: c0000000000148e4 () lr: c000000000014910 () sp: c00000000b847de0 msr: a000000000009032 current = 0xc00000000b84ed30 paca = 0xc000000000530000 pid = 0, comm = swapper cpu 3: Vector: 500 (Hardware Interrupt) at [c000000003f77b60] pc: c0000000000148f4 () lr: c000000000014910 () sp: c000000003f77de0 msr: a000000000009032 current = 0xc00000000b84d260 paca = 0xc000000000532000 pid = 0, comm = swapper cpu 4: Vector: 500 (Hardware Interrupt) at [c000000003f57b60] pc: c0000000000148e0 () lr: c000000000014910 () sp: c000000003f57de0 msr: a000000000009032 current = 0xc000000003f5b620 paca = 0xc000000000534000 pid = 0, comm = swapper 0:mon> e cpu 0: Vector: 300 (Data Access) at [c00000010a542f40] pc: c000000000045270 () lr: c000000000094080 () sp: c00000010a5431c0 msr: a000000000009032 dar: c0000000fffff000 dsisr: a000000 current = 0xc000000131519260 paca = 0xc00000000052c000 pid = 15128, comm = rpm 0:mon> r R00 = 0000000000000020 R16 = 0000000010032f18 R01 = c00000010a5431c0 R17 = 000000004016af68 R02 = c0000000006defc8 R18 = c00000010a543e08 R03 = c0000000fffff000 R19 = c000000158a55800 R04 = 000000004013aa4b R20 = 000000004013aa4b R05 = c000000008ffffb0 R21 = c000000170f5b800 R06 = c000000158a55800 R22 = c000000174593a80 R07 = c0000000006dc008 R23 = c000000000749140 R08 = c0000000006dff28 R24 = 0000000002000000 R09 = 0000000000000080 R25 = 4000000000000000 R10 = c000000000004000 R26 = cccccccccccccccd R11 = c000000000005000 R27 = c000000149096a00 R12 = 0000000042828442 R28 = c000000008ffffb0 R13 = c00000000052c000 R29 = c000000000000000 R14 = 00000000ffffb8f8 R30 = c0000000005d4800 R15 = 0000000000000000 R31 = c000000008ffffb0 pc = c000000000045270 msr = a000000000009032 lr = c000000000094080 cr = 0000000082828442 ctr = 0000000000000020 xer = 0000000000000000 trap = 300 0:mon> T c00000010a5431c0 c000000000094040 .do_anonymous_page+0x1e4/0x4d4 c00000010a5432a0 c0000000000943d0 .do_no_page+0xa0/0x850 c00000010a543390 c000000000094f0c .handle_mm_fault+0x1f8/0x308 c00000010a543440 c0000000000442a8 .do_page_fault+0x24c/0x4a4 c00000010a543570 c00000000000aa94 stab_bolted_user_return+0x118/0x11c c00000010a543860 c0000000005d4100 0xc0000000005d4100 c00000010a543900 c00000000007d9a4 .do_generic_mapping_read+0x318/0x7e0 c00000010a5439e0 c00000000007e1ac .__generic_file_aio_read+0x1c4/0x1f8 c00000010a543ab0 c00000000007e288 .generic_file_read+0x60/0x98 c00000010a543c80 c0000000000a933c .vfs_read+0x10c/0x164 c00000010a543d20 c0000000000a9744 .sys_pread64+0x60/0xa8 c00000010a543dc0 c000000000024380 .sys32_pread64+0x1c/0x34 c00000010a543e30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000010a543ea0 000000000fc62bcc 0:mon> 0:mon> E cpu 0: Vector: 300 (Data Access) at [c00000010a542f40] pc: c000000000045270 (.clear_user_page+0x1c/0x50) lr: c000000000094080 (.do_anonymous_page+0x224/0x4d4) sp: c00000010a5431c0 msr: a000000000009032 dar: c0000000fffff000 dsisr: a000000 current = 0xc000000131519260 paca = 0xc00000000052c000 pid = 15128, comm = rpm 0:mon> 0:mon> 0:mon> 0:mon> di c000000000045260 c000000000045260 e9480000 ld r10,0(r8) c000000000045264 812b0064 lwz r9,100(r11) c000000000045268 800a0064 lwz r0,100(r10) c00000000004526c 7c0903a6 mtctr r0 c000000000045270 7c001fec dcbz r0,r3 c000000000045274 7c634a14 add r3,r3,r9 c000000000045278 4200fff8 bdnz 0xc000000000045270 # .clear_user_page+0x1c c00000000004527c 39200400 li r9,1024 c000000000045280 e8050000 ld r0,0(r5) c000000000045284 7800b7e2 rldicl r0,r0,54,63 c000000000045288 2c000000 cmpwi r0,0 c00000000004528c 4d820020 beqlr c000000000045290 7c0028a8 ldarx r0,r0,r5 c000000000045294 7c004878 andc r0,r0,r9 c000000000045298 7c0029ad stdcx. r0,r0,r5 c00000000004529c 40a2fff4 bne- 0xc000000000045290 # .clear_user_page+0x3c -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 4 04:48:28 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 3 Mar 2004 11:48:28 -0600 Subject: KDB in ameslab In-Reply-To: <2624.1078309077@ocs3.ocs.com.au>; from kaos@sgi.com on Wed, Mar 03, 2004 at 09:17:57PM +1100 References: <20040302195946.J74832@forte.austin.ibm.com> <2624.1078309077@ocs3.ocs.com.au> Message-ID: <20040303114828.K74832@forte.austin.ibm.com> On Wed, Mar 03, 2004 at 09:17:57PM +1100, Keith Owens wrote: > > On Tue, 2 Mar 2004 19:59:46 -0600, > linas at austin.ibm.com wrote: > >p.s. Keith, I will try to create and send you the corresponding > >ppc64 architecture patch; however, this might be ugly, because > >I suspect the andrew morton kernels are still fairly out of sync > >with the BK ameslab trees. > > That is OK, I had the same problem with ia64 in 2.4. Users had to > apply the ia64 patches from David Mosberger first, then kdb for ia64. We don't distrubute ppc64 as a set of patches; instead, users are told to bk clone a bitkeeper tree (in ameslab). Based on Anton's comments about ameslab being almost in sync with the torvalds/morton trees, I'm thinking that providing you with a ppc64-arch-only patch that applies cleanly to the akpm kernels shouldn't be too hard. Then again, If you don't really want this patch, I don't really want to hassle with generating it. I thought it might be nice, "for the record" to have it on your ftp site ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 4 04:55:18 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 3 Mar 2004 11:55:18 -0600 Subject: KDB in ameslab In-Reply-To: <5192.1078285518@kao2.melbourne.sgi.com>; from kaos@ocs.com.au on Wed, Mar 03, 2004 at 02:45:18PM +1100 References: <20040303033226.GO5801@krispykreme> <5192.1078285518@kao2.melbourne.sgi.com> Message-ID: <20040303115518.L74832@forte.austin.ibm.com> On Wed, Mar 03, 2004 at 02:45:18PM +1100, Keith Owens wrote: > On Wed, 3 Mar 2004 14:32:26 +1100, > Anton Blanchard wrote: > > > >> *Please* apply this ASAP, before it bit-rots, and I have to do the > >> work all over again. > > > >Good stuff, only I cant seem to find include/linux/dis-asm.h. > > Part of the kdb common (arch independent) patch. > ftp://oss.sgi.com/projects/kdb/download/v4.3/, pick your release base. bk new somefile.c followed by bk diffs fails to add "somefile.c" to the diffs :-( Maybe I'm using bk wrong. I'll try a new patch later today. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Thu Mar 4 05:54:11 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 3 Mar 2004 12:54:11 -0600 (CST) Subject: [PATCH] ppc64: More IOMMU cleanups Message-ID: Linus, Below patch contains further IOMMU cleanups: * Tidying up some of the arguments to iommu_*() * Comment cleanup * Don't bump the hint to the next block for large allocs, to avoid fragmentation. * Simplify vmerge logic during SG allocations * Moving the memory barriers from the bus-specific parts into the common code. Some changes are mine, some are from Ben Herrenschmidt. arch/ppc64/kernel/iommu.c | 255 +++++++++++++++++++++++------------------- arch/ppc64/kernel/pci_iommu.c | 17 -- arch/ppc64/kernel/vio.c | 9 - include/asm-ppc64/iommu.h | 12 + 4 files changed, 155 insertions(+), 138 deletions(-) Thanks, Olof # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1636 -> 1.1637 # arch/ppc64/kernel/iommu.c 1.1 -> 1.2 # arch/ppc64/kernel/vio.c 1.9 -> 1.10 # arch/ppc64/kernel/pci_iommu.c 1.1 -> 1.2 # include/asm-ppc64/iommu.h 1.1 -> 1.2 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 04/03/03 olof at olof.austin.ibm.com 1.1637 # ppc64: More IOMMU cleanups # -------------------------------------------- # diff -Nru a/arch/ppc64/kernel/iommu.c b/arch/ppc64/kernel/iommu.c --- a/arch/ppc64/kernel/iommu.c Wed Mar 3 12:38:30 2004 +++ b/arch/ppc64/kernel/iommu.c Wed Mar 3 12:38:30 2004 @@ -1,12 +1,12 @@ /* - * arch/ppc64/kernel/pci_iommu.c + * arch/ppc64/kernel/iommu.c * Copyright (C) 2001 Mike Corrigan & Dave Engebretsen, IBM Corporation * * Rewrite, cleanup, new allocation schemes, virtual merging: * Copyright (C) 2004 Olof Johansson, IBM Corporation * and Ben. Herrenschmidt, IBM Corporation * - * Dynamic DMA mapping support, platform-independent parts. + * Dynamic DMA mapping support, bus-independent parts. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -60,41 +60,57 @@ __setup("iommu=", setup_iommu); static unsigned long iommu_range_alloc(struct iommu_table *tbl, unsigned long npages, - unsigned long *handle) + unsigned long *handle) { unsigned long n, end, i, start; - unsigned long hint; unsigned long limit; int largealloc = npages > 15; + int pass = 0; - if (handle && *handle) - hint = *handle; - else - hint = largealloc ? tbl->it_largehint : tbl->it_hint; + /* This allocator was derived from x86_64's bit string search */ - /* Most of this is stolen from x86_64's bit string search function */ + /* Sanity check */ + if (unlikely(npages) == 0) { + if (printk_ratelimit()) + WARN_ON(1); + return NO_TCE; + } - start = hint; + if (handle && *handle) + start = *handle; + else + start = largealloc ? tbl->it_largehint : tbl->it_hint; - /* Use only half of the table for small allocs (less than 15 pages). */ + /* Use only half of the table for small allocs (15 pages or less) */ + limit = largealloc ? tbl->it_mapsize : tbl->it_halfpoint; - limit = largealloc ? tbl->it_mapsize : tbl->it_mapsize >> 1; + if (largealloc && start < tbl->it_halfpoint) + start = tbl->it_halfpoint; - if (largealloc && start < (tbl->it_mapsize >> 1)) - start = tbl->it_mapsize >> 1; + /* The case below can happen if we have a small segment appended + * to a large, or when the previous alloc was at the very end of + * the available space. If so, go back to the initial start. + */ + if (start >= limit) + start = largealloc ? tbl->it_largehint : tbl->it_hint; again: n = find_next_zero_bit(tbl->it_map, limit, start); - end = n + npages; - if (end >= limit) { - if (hint) { - start = largealloc ? tbl->it_mapsize >> 1 : 0; - hint = 0; + + if (unlikely(end >= limit)) { + if (likely(pass++ < 2)) { + /* First failure, just rescan the half of the table. + * Second failure, rescan the other half of the table. + */ + start = (largealloc ^ pass) ? tbl->it_halfpoint : 0; + limit = pass ? tbl->it_mapsize : limit; goto again; - } else + } else { + /* Third failure, give up */ return NO_TCE; + } } for (i = n; i < end; i++) @@ -106,16 +122,17 @@ for (i = n; i < end; i++) __set_bit(i, tbl->it_map); - /* Bump the hint to a new PHB cache line, which - * is 16 entries wide on all pSeries machines. - */ - if (largealloc) - tbl->it_largehint = (end+tbl->it_blocksize-1) & - ~(tbl->it_blocksize-1); - else - tbl->it_hint = (end+tbl->it_blocksize-1) & - ~(tbl->it_blocksize-1); + /* Bump the hint to a new block for small allocs. */ + if (largealloc) { + /* Don't bump to new block to avoid fragmentation */ + tbl->it_largehint = end; + } else { + /* Overflow will be taken care of at the next allocation */ + tbl->it_hint = (end + tbl->it_blocksize - 1) & + ~(tbl->it_blocksize - 1); + } + /* Update handle for SG allocations */ if (handle) *handle = end; @@ -123,35 +140,38 @@ } dma_addr_t iommu_alloc(struct iommu_table *tbl, void *page, - unsigned int npages, int direction, - unsigned long *handle) + unsigned int npages, int direction) { unsigned long entry, flags; - dma_addr_t retTce = NO_TCE; + dma_addr_t ret = NO_TCE; spin_lock_irqsave(&(tbl->it_lock), flags); - /* Allocate a range of entries into the table */ - entry = iommu_range_alloc(tbl, npages, handle); + entry = iommu_range_alloc(tbl, npages, NULL); + if (unlikely(entry == NO_TCE)) { spin_unlock_irqrestore(&(tbl->it_lock), flags); return NO_TCE; } - - /* We got the tces we wanted */ + entry += tbl->it_offset; /* Offset into real TCE table */ - retTce = entry << PAGE_SHIFT; /* Set the return dma address */ + ret = entry << PAGE_SHIFT; /* Set the return dma address */ /* Put the TCEs in the HW table */ - ppc_md.tce_build(tbl, entry, npages, (unsigned long)page & PAGE_MASK, direction); + ppc_md.tce_build(tbl, entry, npages, (unsigned long)page & PAGE_MASK, + direction); + - /* Flush/invalidate TLBs if necessary */ + /* Flush/invalidate TLB caches if necessary */ if (ppc_md.tce_flush) ppc_md.tce_flush(tbl); - + spin_unlock_irqrestore(&(tbl->it_lock), flags); - return retTce; + /* Make sure updates are seen by hardware */ + mb(); + + return ret; } static void __iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr, @@ -168,7 +188,7 @@ if (printk_ratelimit()) { printk(KERN_INFO "iommu_free: invalid entry\n"); printk(KERN_INFO "\tentry = 0x%lx\n", entry); - printk(KERN_INFO "\tdma_ddr = 0x%lx\n", (u64)dma_addr); + printk(KERN_INFO "\tdma_addr = 0x%lx\n", (u64)dma_addr); printk(KERN_INFO "\tTable = 0x%lx\n", (u64)tbl); printk(KERN_INFO "\tbus# = 0x%lx\n", (u64)tbl->it_busno); printk(KERN_INFO "\tmapsize = 0x%lx\n", (u64)tbl->it_mapsize); @@ -194,68 +214,30 @@ __iommu_free(tbl, dma_addr, npages); - /* Flush/invalidate TLBs if necessary */ + /* Make sure TLB cache is flushed if the HW needs it. We do + * not do an mb() here on purpose, it is not needed on any of + * the current platforms. + */ if (ppc_md.tce_flush) ppc_md.tce_flush(tbl); spin_unlock_irqrestore(&(tbl->it_lock), flags); } -/* - * Build a iommu_table structure. This contains a bit map which - * is used to manage allocation of the tce space. - */ -struct iommu_table *iommu_init_table(struct iommu_table *tbl) -{ - unsigned long sz; - static int welcomed = 0; - - /* it_size is in pages, it_mapsize in number of entries */ - tbl->it_mapsize = tbl->it_size * tbl->it_entrysize; - - if (systemcfg->platform == PLATFORM_POWERMAC) - tbl->it_mapsize = tbl->it_size * (PAGE_SIZE / sizeof(unsigned int)); - else - tbl->it_mapsize = tbl->it_size * (PAGE_SIZE / sizeof(union tce_entry)); - - /* sz is the number of bytes needed for the bitmap */ - sz = (tbl->it_mapsize + 7) >> 3; - - tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz)); - - if (!tbl->it_map) - panic("iommu_init_table: Can't allocate memory, size %ld bytes\n", sz); - - memset(tbl->it_map, 0, sz); - - tbl->it_hint = 0; - tbl->it_largehint = 0; - spin_lock_init(&tbl->it_lock); - - if (!welcomed) { - printk(KERN_INFO "IOMMU table initialized, virtual merging %s\n", - novmerge ? "disabled" : "enabled"); - welcomed = 1; - } - - return tbl; -} - - -int iommu_alloc_sg(struct iommu_table *tbl, struct scatterlist *sglist, int nelems, - int direction, unsigned long *handle) +int iommu_alloc_sg(struct iommu_table *tbl, struct device *dev, + struct scatterlist *sglist, int nelems, int direction) { dma_addr_t dma_next, dma_addr; - unsigned long flags, vaddr, npages, entry; - struct scatterlist *s, *outs, *segstart, *ps; + unsigned long flags; + struct scatterlist *s, *outs, *segstart; int outcount; + unsigned long handle; - /* Initialize some stuffs */ outs = s = segstart = &sglist[0]; outcount = 1; - ps = NULL; + handle = 0; - /* Init first segment length for error handling */ + /* Init first segment length for backout at failure */ outs->dma_length = 0; DBG("mapping %d elements:\n", nelems); @@ -263,13 +245,21 @@ spin_lock_irqsave(&(tbl->it_lock), flags); for (s = outs; nelems; nelems--, s++) { + unsigned long vaddr, npages, entry, slen; + + slen = s->length; + /* Sanity check */ + if (slen == 0) { + dma_next = 0; + continue; + } /* Allocate iommu entries for that segment */ vaddr = (unsigned long)page_address(s->page) + s->offset; - npages = PAGE_ALIGN(vaddr + s->length) - (vaddr & PAGE_MASK); + npages = PAGE_ALIGN(vaddr + slen) - (vaddr & PAGE_MASK); npages >>= PAGE_SHIFT; - entry = iommu_range_alloc(tbl, npages, handle); + entry = iommu_range_alloc(tbl, npages, &handle); - DBG(" - vaddr: %lx, size: %lx\n", vaddr, s->length); + DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen); /* Handle failure */ if (unlikely(entry == NO_TCE)) { @@ -293,13 +283,10 @@ /* If we are in an open segment, try merging */ if (segstart != s) { DBG(" - trying merge...\n"); - /* We cannot merge is: + /* We cannot merge if: * - allocated dma_addr isn't contiguous to previous allocation - * - current entry has an offset into the page - * - previous entry didn't end on a page boundary */ - if (novmerge || (dma_addr != dma_next) || s->offset || - (ps->offset + ps->length) % PAGE_SIZE) { + if (novmerge || (dma_addr != dma_next)) { /* Can't merge: create a new segment */ segstart = s; outcount++; outs++; @@ -310,31 +297,28 @@ } } - /* If we are beginning a new segment, fill entries */ if (segstart == s) { + /* This is a new segment, fill entries */ DBG(" - filling new segment.\n"); outs->dma_address = dma_addr; - outs->dma_length = s->length; + outs->dma_length = slen; } /* Calculate next page pointer for contiguous check */ - dma_next = (dma_addr & PAGE_MASK) + (npages << PAGE_SHIFT); + dma_next = dma_addr + slen; DBG(" - dma next is: %lx\n", dma_next); - - /* Keep a pointer to the previous entry */ - ps = s; } - /* Make sure the update is visible to hardware. */ - mb(); - - /* Flush/invalidate TLBs if necessary */ + /* Flush/invalidate TLB caches if necessary */ if (ppc_md.tce_flush) ppc_md.tce_flush(tbl); spin_unlock_irqrestore(&(tbl->it_lock), flags); + /* Make sure updates are seen by hardware */ + mb(); + DBG("mapped %d elements:\n", outcount); /* For the sake of iommu_free_sg, we clear out the length in the @@ -348,25 +332,26 @@ return outcount; failure: - spin_unlock_irqrestore(&(tbl->it_lock), flags); for (s = &sglist[0]; s <= outs; s++) { if (s->dma_length != 0) { + unsigned long vaddr, npages; + vaddr = s->dma_address & PAGE_MASK; npages = (PAGE_ALIGN(s->dma_address + s->dma_length) - vaddr) >> PAGE_SHIFT; - iommu_free(tbl, vaddr, npages); + __iommu_free(tbl, vaddr, npages); } } + spin_unlock_irqrestore(&(tbl->it_lock), flags); return 0; } -void iommu_free_sg(struct iommu_table *tbl, struct scatterlist *sglist, int nelems, - int direction) +void iommu_free_sg(struct iommu_table *tbl, struct scatterlist *sglist, + int nelems) { unsigned long flags; - /* Lock the whole operation to try to free as a "chunk" */ spin_lock_irqsave(&(tbl->it_lock), flags); while (nelems--) { @@ -381,9 +366,49 @@ sglist++; } - /* Flush/invalidate TLBs if necessary */ + /* Flush/invalidate TLBs if necessary. As for iommu_free(), we + * do not do an mb() here, the affected platforms do not need it + * when freeing. + */ if (ppc_md.tce_flush) ppc_md.tce_flush(tbl); spin_unlock_irqrestore(&(tbl->it_lock), flags); +} + +/* + * Build a iommu_table structure. This contains a bit map which + * is used to manage allocation of the tce space. + */ +struct iommu_table *iommu_init_table(struct iommu_table *tbl) +{ + unsigned long sz; + static int welcomed = 0; + + /* it_size is in pages, it_mapsize in number of entries */ + tbl->it_mapsize = (tbl->it_size << PAGE_SHIFT) / tbl->it_entrysize; + + /* Set aside 1/4 of the table for large allocations. */ + tbl->it_halfpoint = tbl->it_mapsize * 3 / 4; + + /* number of bytes needed for the bitmap */ + sz = (tbl->it_mapsize + 7) >> 3; + + tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz)); + if (!tbl->it_map) + panic("iommu_init_table: Can't allocate %ld bytes\n", sz); + + memset(tbl->it_map, 0, sz); + + tbl->it_hint = 0; + tbl->it_largehint = tbl->it_halfpoint; + spin_lock_init(&tbl->it_lock); + + if (!welcomed) { + printk(KERN_INFO "IOMMU table initialized, virtual merging %s\n", + novmerge ? "disabled" : "enabled"); + welcomed = 1; + } + + return tbl; } diff -Nru a/arch/ppc64/kernel/pci_iommu.c b/arch/ppc64/kernel/pci_iommu.c --- a/arch/ppc64/kernel/pci_iommu.c Wed Mar 3 12:38:30 2004 +++ b/arch/ppc64/kernel/pci_iommu.c Wed Mar 3 12:38:30 2004 @@ -99,10 +99,7 @@ memset(ret, 0, size); /* Set up tces to cover the allocated range */ - mapping = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL, NULL); - - /* Make sure the update is visible to hardware. */ - mb(); + mapping = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL); if (mapping == NO_TCE) { free_pages((unsigned long)ret, order); @@ -145,7 +142,6 @@ dma_addr_t dma_handle = NO_TCE; unsigned long uaddr; unsigned int npages; - unsigned long handle = 0; BUG_ON(direction == PCI_DMA_NONE); @@ -156,7 +152,7 @@ tbl = devnode_table(hwdev); if (tbl) { - dma_handle = iommu_alloc(tbl, vaddr, npages, direction, &handle); + dma_handle = iommu_alloc(tbl, vaddr, npages, direction); if (dma_handle == NO_TCE) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_alloc failed, tbl %p vaddr %p npages %d\n", @@ -166,8 +162,6 @@ dma_handle |= (uaddr & ~PAGE_MASK); } - mb(); - return dma_handle; } @@ -194,7 +188,6 @@ int direction) { struct iommu_table * tbl; - unsigned long handle; BUG_ON(direction == PCI_DMA_NONE); @@ -205,9 +198,7 @@ if (!tbl) return 0; - handle = 0; - - return iommu_alloc_sg(tbl, sglist, nelems, direction, &handle); + return iommu_alloc_sg(tbl, &pdev->dev, sglist, nelems, direction); } void pci_iommu_unmap_sg(struct pci_dev *pdev, struct scatterlist *sglist, int nelems, @@ -221,7 +212,7 @@ if (!tbl) return; - iommu_free_sg(tbl, sglist, nelems, direction); + iommu_free_sg(tbl, sglist, nelems); } /* We support DMA to/from any memory page via the iommu */ diff -Nru a/arch/ppc64/kernel/vio.c b/arch/ppc64/kernel/vio.c --- a/arch/ppc64/kernel/vio.c Wed Mar 3 12:38:30 2004 +++ b/arch/ppc64/kernel/vio.c Wed Mar 3 12:38:30 2004 @@ -432,7 +432,7 @@ tbl = dev->iommu_table; if (tbl) { - dma_handle = iommu_alloc(tbl, vaddr, npages, direction, NULL); + dma_handle = iommu_alloc(tbl, vaddr, npages, direction); dma_handle |= (uaddr & ~PAGE_MASK); } @@ -461,7 +461,6 @@ int direction) { struct iommu_table *tbl; - unsigned long handle; BUG_ON(direction == PCI_DMA_NONE); @@ -472,7 +471,7 @@ if (!tbl) return 0; - return iommu_alloc_sg(tbl, sglist, nelems, direction, &handle); + return iommu_alloc_sg(tbl, &vdev->dev, sglist, nelems, direction); } EXPORT_SYMBOL(vio_map_sg); @@ -485,7 +484,7 @@ tbl = vdev->iommu_table; if (tbl) - iommu_free_sg(tbl, sglist, nelems, direction); + iommu_free_sg(tbl, sglist, nelems); } EXPORT_SYMBOL(vio_unmap_sg); @@ -517,7 +516,7 @@ /* Page allocation succeeded */ memset(ret, 0, npages << PAGE_SHIFT); /* Set up tces to cover the allocated range */ - tce = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL, NULL); + tce = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL); if (tce == NO_TCE) { PPCDBG(PPCDBG_TCE, "vio_alloc_consistent: iommu_alloc failed\n" ); free_pages((unsigned long)ret, order); diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h --- a/include/asm-ppc64/iommu.h Wed Mar 3 12:38:30 2004 +++ b/include/asm-ppc64/iommu.h Wed Mar 3 12:38:30 2004 @@ -24,6 +24,7 @@ #include #include +#include /* * IOMAP_MAX_ORDER defines the largest contiguous block @@ -78,6 +79,7 @@ unsigned long it_blocksize; /* Entries in each block (cacheline) */ unsigned long it_hint; /* Hint for next alloc */ unsigned long it_largehint; /* Hint for large allocs */ + unsigned long it_halfpoint; /* Breaking point for small/large allocs */ spinlock_t it_lock; /* Protects it_map */ unsigned long it_mapsize; /* Size of map in # of entries (bits) */ unsigned long *it_map; /* A simple allocation bitmap for now */ @@ -132,16 +134,16 @@ /* allocates a range of tces and sets them to the pages */ extern dma_addr_t iommu_alloc(struct iommu_table *, void *page, - unsigned int numPages, int direction, - unsigned long *handle); + unsigned int numPages, int direction); extern void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr, unsigned int npages); /* same with sg lists */ -extern int iommu_alloc_sg(struct iommu_table *table, struct scatterlist *sglist, - int nelems, int direction, unsigned long *handle); +extern int iommu_alloc_sg(struct iommu_table *table, struct device *dev, + struct scatterlist *sglist, int nelems, + int direction); extern void iommu_free_sg(struct iommu_table *tbl, struct scatterlist *sglist, - int nelems, int direction); + int nelems); extern void tce_init_pSeries(void); ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 4 07:30:25 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 3 Mar 2004 14:30:25 -0600 Subject: [PATCH] Re: KDB in ameslab In-Reply-To: <20040302195946.J74832@forte.austin.ibm.com>; from linas@austin.ibm.com on Tue, Mar 02, 2004 at 07:59:46PM -0600 References: <20040217043527.GC25491@krispykreme> <20040302195946.J74832@forte.austin.ibm.com> Message-ID: <20040303143023.A23924@forte.austin.ibm.com> Hi, Resending because the ppc64 mail list manager bounced the message because the patch was too big. This patch brings the ameslab ppc64 tree up to the current level of KDB (v 4.3 hot from the sgi ftp site as of two days ago) Note that since ameslab currently has kdb 4.1(?) in it, that this patch has the effect of undoing the old kdb as it adds the new kdb. Rather than attaching, because of file size limits, I've put a new, slightly improved patch at: http://www-124.ibm.com/linux/patches/?patch_id=1384 (it may take 1-24 hours for the above URL to bcome valid). The 'slight improvement' includes: -- addition of missing dis-asm.h -- fixing CONFIG_KDB_MODULES so that kdb modules work This patch should apply cleanly to the ameslab tree. --linas On Tue, Mar 02, 2004 at 07:59:46PM -0600, linas at austin.ibm.com wrote: > On Tue, Feb 17, 2004 at 03:35:27PM +1100, Anton Blanchard wrote: > > A side effect of this is that KDB is probably broken. I started looking > > into fixing it however I noticed it looks out of date. Does someone have > > the urge to update it? > > I notice that you still haven't applied my old "append __ to the > debug handles in arch/ppc64/kdb/kdbmain.c patch" to fix the above. :-( > > No matter: here's a big honking patch. > -- It updates yesterdays ameslab ppc64 bk tree to KDB version 4.3 > -- It compiles, it runs, it seems to work. > Caveats: > o I have not tested on lpars yet > o I stubbed out the TCE code, because that's all changed. > o I haven't tried CONFIG_KDB_MODULES which are probably broken. > o Its got a couple of other messy areas that need some tweaking, > which I may try to do tommorrow, or maybe IBM India might fix > later. > o It seems to take about 10 seconds between typing in 'startKDB' > and the time you get the kdb prompt. Don't know why. It also > takes 10 seconds to switch cpus (with the kdb cpu command). > > *Please* apply this ASAP, before it bit-rots, and I have to do the > work all over again. > > --linas > > > p.s. Keith, I will try to create and send you the corresponding > ppc64 architecture patch; however, this might be ugly, because > I suspect the andrew morton kernels are still fairly out of sync > with the BK ameslab trees. > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 4 07:43:04 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 3 Mar 2004 14:43:04 -0600 Subject: kdb ppc64 patches In-Reply-To: <20040302073703.GA6623@in.ibm.com>; from ananth@in.ibm.com on Tue, Mar 02, 2004 at 12:37:03PM +0500 References: <20040301184844.A33678@forte.austin.ibm.com> <20040302073703.GA6623@in.ibm.com> Message-ID: <20040303144304.M74832@forte.austin.ibm.com> On Tue, Mar 02, 2004 at 12:37:03PM +0500, Ananth N Mavinakayanahalli wrote: > Hello Linas, > > This is Ananth and I work with the LTC India RAS team. > > I just saw your mail on the kdb mailing list regarding kdb and ppc64. Just > wondering if you are working on getting kdb v4.3 up on the latest ppc64 tree? I just posted a new/improved patch at http://www-124.ibm.com/linux/patches/?patch_id=1384 I'm not really planning on working much more on KDB, I just needed a couple of features out of it, and that's what this patch provides. > 1. The poll_funcs[] call has moved to the arch-agnostic area. I haven't tried poll_funcs with hvc consoles, etc; it may need fixes there. Or maybe not, I don't know. > 4. A few other minor tweaks such as checking for pt_regs/kdb_eframe_t for > NULL in some functions, setting KDB_STATE to SSBPT in kdb_bp_trap(), > etc. I haven't done any of this, I haven't even looked at it. The patch that I mailed out "worked for me" so I didn't dig any deeper. If this needs to be done, then do it; I'm not planning to. Another area that needs fixing: the way that TCE's work has changed completely in recent kernels, and so I just stubbed out the 'examine tce' routine. This needs to be fixed. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Thu Mar 4 08:36:06 2004 From: olh at suse.de (Olaf Hering) Date: Wed, 3 Mar 2004 22:36:06 +0100 Subject: [PATCH] improve xmon symbol lookup In-Reply-To: <20040303142032.GV5801@krispykreme> References: <20040303135532.GA11204@suse.de> <20040303142032.GV5801@krispykreme> Message-ID: <20040303213606.GA6452@suse.de> On Thu, Mar 04, Anton Blanchard wrote: > > > xmon will die reliable in symbol lookup, if the system is broken enough. > > > > This patch makes the lookup optional, E instead of e, T instead of t. > > Id love to know where we are locking up. Actually im not sure why we > dont do the following, just wrap the entire kallsyms call with the > __debugger_fault_handler stuff. I will give it a try. another thing that bites me all the time: 3:mon> rd cpu 3: Vector: 300 (Data Access) at [c000000141e82940] pc: c00000000004a6b8 () lr: c00000000004a690 () sp: c000000141e82bc0 msr: a000000000001032 dar: d dsisr: 40000000 current = 0xc000000128405b50 paca = 0xc000000000532000 pid = 26138, comm = rpm We are already in xmon -ETOAST -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Mar 4 11:38:18 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 4 Mar 2004 11:38:18 +1100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> Message-ID: <16454.31354.991602.980399@cargo.ozlabs.ibm.com> Gazelle Jean-Laurent writes: > I'm trying to build linux kernel 2.6 on a JS20 platform. > > I extracted the TAGGED version 2.6.3 of ameslab 2.5 repository and rebuilded it > successfully but run into an issue at boot time (complete log is attached): Note that if you do something like: bk export -rv2.6.3 ../build in a clone of the ameslab linux-2.5 repository, you will get a copy of Linus' 2.6.3 release. Because ameslab is a child of Linus' linux-2.5 tree, the whole of Linus' tree is present in the ameslab tree. The "v2.6.3" tag that Linus puts on a changeset gets imported into the ameslab tree, and represents the same state in both trees. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 4 12:12:07 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 3 Mar 2004 19:12:07 -0600 Subject: [PATCH] add page lookup to page command Message-ID: <20040303191207.N74832@forte.austin.ibm.com> Hi, The attached patch adds support for automatic struct page * lookup for the 'page' command. By typing 'page -s ', it will find the mem_map[] entry that corresponds to vaddr, and display that. (To do this manually is a bit tedious; besides chasing non-exported symbols in System.map, one has to do some hex multiplication by sizeof(struct page); so this patch automates that.) It also enables the memmap command for PPC64. I think the comments about it being arch-dependent are bogus; I think all arches use the same code. But I took the conservative approach and just added CONFIG_PPC64 instead. --linas -------------- next part -------------- --- kdb/modules/kdbm_pg.c.orig 2004-03-03 17:03:39.000000000 -0600 +++ kdb/modules/kdbm_pg.c 2004-03-03 17:49:21.000000000 -0600 @@ -151,18 +151,32 @@ kdbm_page(int argc, const char **argv, c long offset=0; int nextarg; int diag; + int lookup_page = 0; - if (argc != 1) + if (argc == 2) { + if (strcmp(argv[1], "-s") != 0) { + return KDB_ARGCOUNT; + } + lookup_page = 1; + } else if (argc != 1) { return KDB_ARGCOUNT; + } - nextarg = 1; + nextarg = argc; diag = kdbgetaddrarg(argc, argv, &nextarg, &addr, &offset, NULL, regs); if (diag) return diag; + /* Assume argument is a page number, not address */ if (addr < PAGE_OFFSET) addr = (unsigned long) &mem_map[addr]; + /* Get the struct page * that corresponds to this addr */ + if (lookup_page) + { + addr = (unsigned long) virt_to_page(addr); + } + if ((diag = kdb_getarea(page, addr))) return(diag); @@ -476,9 +490,10 @@ out: -#ifdef CONFIG_X86 -/* According to Steve Lord, this code is ix86 specific. Patches to extend it to - * other architectures will be greatefully accepted. +#if defined(CONFIG_X86) | defined(CONFIG_PPC64) +/* According to Steve Lord, this code is ix86 specific. + * Patches to extend it to other architectures will be + * greatefully accepted. */ static int kdbm_memmap(int argc, const char **argv, const char **envp, @@ -541,12 +556,12 @@ kdbm_memmap(int argc, const char **argv, kdb_printf(" high page count: %6d\n", page_counts[8]); return 0; } -#endif /* CONFIG_X86 */ +#endif /* CONFIG_X86 | CONFIG_PPC64 */ static int __init kdbm_pg_init(void) { #ifndef CONFIG_DISCONTIGMEM - kdb_register("page", kdbm_page, "", "Display page", 0); + kdb_register("page", kdbm_page, "[-s] ", "Display page [or page of addr]", 0); #endif kdb_register("inode", kdbm_inode, "", "Display inode", 0); kdb_register("sb", kdbm_sb, "", "Display super_block", 0); @@ -554,7 +569,7 @@ static int __init kdbm_pg_init(void) kdb_register("inode_pages", kdbm_inode_pages, "", "Display pages in an inode", 0); kdb_register("req", kdbm_request, "", "dump request struct", 0); kdb_register("rqueue", kdbm_rqueue, "", "dump request queue", 0); -#ifdef CONFIG_X86 +#if defined(CONFIG_X86) | defined(CONFIG_PPC64) kdb_register("memmap", kdbm_memmap, "", "page table summary", 0); #endif @@ -573,7 +588,7 @@ static void __exit kdbm_pg_exit(void) kdb_unregister("inode_pages"); kdb_unregister("req"); kdb_unregister("rqueue"); -#ifdef CONFIG_X86 +#if defined(CONFIG_X86) | defined(CONFIG_PPC64) kdb_unregister("memmap"); #endif } From anton at samba.org Thu Mar 4 13:03:29 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 4 Mar 2004 13:03:29 +1100 Subject: [PATCH] add page lookup to page command In-Reply-To: <20040303191207.N74832@forte.austin.ibm.com> References: <20040303191207.N74832@forte.austin.ibm.com> Message-ID: <20040304020329.GB5801@krispykreme> Hi Linas, > The attached patch adds support for automatic struct page * lookup > for the 'page' command. By typing 'page -s ', it will find > the mem_map[] entry that corresponds to vaddr, and display that. > (To do this manually is a bit tedious; besides chasing non-exported > symbols in System.map, one has to do some hex multiplication by > sizeof(struct page); so this patch automates that.) We cant play with mem_map directly when NUMA/DISCONTIGMEM is enabled, check out discontigmem_pfn_to_page for the nasty details. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Mar 4 16:24:51 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 4 Mar 2004 16:24:51 +1100 Subject: crashes in clear_user_page In-Reply-To: <20040303153907.GC15383@suse.de> References: <20040303153907.GC15383@suse.de> Message-ID: <20040304052451.GC5801@krispykreme> Hi, > I get crashes in clear_user_page() while building rpms on a p660. > gcc is 3.2.2, config is arch/ppc64/configs/pseries, plain ameslab tree. > > there is lot of IO, 8 processes do unpack rpms in parallel on a reiserfs > filesystem. It turns out you got a kernel segment with ks set. Shouldnt ever happen. We werent zeroing the old contents of the segment table entry before inserting the new one, if we overwrote a user segment with a kernel one the ks bit would remain set. Its a POWER3/RS64 only bug. Give this patch a go. While I was there I modified our vsid calculation code to match reality (we only use 13 bits of the EA). Anton diff -urN ameslab-2.5/arch/ppc64/kernel/head.S foobar2/arch/ppc64/kernel/head.S --- ameslab-2.5/arch/ppc64/kernel/head.S 2004-03-02 09:06:07.935488713 +1100 +++ foobar2/arch/ppc64/kernel/head.S 2004-03-04 16:17:07.374783850 +1100 @@ -904,12 +904,13 @@ /* (((ea >> 28) & 0x1fff) << 15) | (ea >> 60) */ mfspr r21,DAR - rldicl r20,r21,36,32 /* Permits a full 32b of ESID */ - rldicr r20,r20,15,48 - rldicl r21,r21,4,60 - or r20,r20,r21 + rldicl r20,r21,36,51 + sldi r20,r20,15 + srdi r21,r21,60 + or r20,r20,r21 - li r21,9 /* VSID_RANDOMIZER */ + /* VSID_RANDOMIZER */ + li r21,9 sldi r21,r21,32 oris r21,r21,58231 ori r21,r21,39831 @@ -933,11 +934,11 @@ rldicl r23,r23,57,63 cmpwi r23,0 bne 2f - ld r23,8(r21) /* Get the current vsid part of the ste */ + li r23,0 rldimi r23,r20,12,0 /* Insert the new vsid value */ std r23,8(r21) /* Put new entry back into the stab */ eieio /* Order vsid update */ - ld r23,0(r21) /* Get the esid part of the ste */ + li r23,0 mfspr r20,DAR /* Get the new esid */ rldicl r20,r20,36,28 /* Permits a full 36b of ESID */ rldimi r23,r20,28,0 /* Insert the new esid value */ @@ -971,13 +972,13 @@ std r23,0(r21) sync - ld r23,8(r21) + li r23,0 rldimi r23,r20,12,0 std r23,8(r21) eieio - ld r23,0(r21) /* Get the esid part of the ste */ - mr r22,r23 + ld r22,0(r21) /* Get the esid part of the ste */ + li r23,0 mfspr r20,DAR /* Get the new esid */ rldicl r20,r20,36,28 /* Permits a full 32b of ESID */ rldimi r23,r20,28,0 /* Insert the new esid value */ diff -urN ameslab-2.5/arch/ppc64/kernel/stab.c foobar2/arch/ppc64/kernel/stab.c --- ameslab-2.5/arch/ppc64/kernel/stab.c 2004-03-04 10:17:49.350078218 +1100 +++ foobar2/arch/ppc64/kernel/stab.c 2004-03-04 12:51:58.410484312 +1100 @@ -88,6 +88,8 @@ for (group = 0; group < 2; group++) { for (entry = 0; entry < 8; entry++, ste++) { if (!(ste->dw0.dw0.v)) { + ste->dw0.dword0 = 0; + ste->dw1.dword1 = 0; ste->dw1.dw1.vsid = vsid; ste->dw0.dw0.esid = esid; ste->dw0.dw0.kp = 1; @@ -135,6 +137,9 @@ castout_ste->dw0.dw0.v = 0; asm volatile("sync" : : : "memory"); /* Order update */ + + castout_ste->dw0.dword0 = 0; + castout_ste->dw1.dword1 = 0; castout_ste->dw1.dw1.vsid = vsid; old_esid = castout_ste->dw0.dw0.esid; castout_ste->dw0.dw0.esid = esid; diff -urN ameslab-2.5/include/asm-ppc64/mmu_context.h foobar2/include/asm-ppc64/mmu_context.h --- ameslab-2.5/include/asm-ppc64/mmu_context.h 2004-02-25 15:29:53.182417225 +1100 +++ foobar2/include/asm-ppc64/mmu_context.h 2004-03-04 12:49:09.946082835 +1100 @@ -189,7 +189,7 @@ { unsigned long ordinal, vsid; - ordinal = (((ea >> 28) & 0x1fffff) * LAST_USER_CONTEXT) | (ea >> 60); + ordinal = (((ea >> 28) & 0x1fff) * LAST_USER_CONTEXT) | (ea >> 60); vsid = (ordinal * VSID_RANDOMIZER) & VSID_MASK; ifppcdebug(PPCDBG_HTABSTRESS) { ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Thu Mar 4 20:26:43 2004 From: olh at suse.de (Olaf Hering) Date: Thu, 4 Mar 2004 10:26:43 +0100 Subject: crashes in clear_user_page In-Reply-To: <20040304052451.GC5801@krispykreme> References: <20040303153907.GC15383@suse.de> <20040304052451.GC5801@krispykreme> Message-ID: <20040304092643.GA16646@suse.de> On Thu, Mar 04, Anton Blanchard wrote: > > Hi, > > > I get crashes in clear_user_page() while building rpms on a p660. > > gcc is 3.2.2, config is arch/ppc64/configs/pseries, plain ameslab tree. > > > > there is lot of IO, 8 processes do unpack rpms in parallel on a reiserfs > > filesystem. > > It turns out you got a kernel segment with ks set. Shouldnt ever happen. > > We werent zeroing the old contents of the segment table entry before > inserting the new one, if we overwrote a user segment with a kernel one > the ks bit would remain set. Its a POWER3/RS64 only bug. > > Give this patch a go. While I was there I modified our vsid calculation > code to match reality (we only use 13 bits of the EA). does not help, unless we can not blame reiserfs. 5:mon> d c0000000fffeb010 c0000000fffeb010 **************** **************** | | 5:mon> papaya:~ # cpu 3: Vector: 300 (Data Access) at [c00000012079f2d0] pc: c000000000087ec4 () lr: c000000000087dac () sp: c00000012079f550 msr: a000000000009032 dar: c0000000fffeb010 dsisr: a000000 current = 0xc0000001650bdb50 paca = 0xc000000000532000 pid = 27307, comm = rpm cresprs e?sVoecrto:r :300 3(00 Da(tDaa3t:a mAcceon>s s) at [c00000012b75f390] pc: c000000000088494 () lr: c00000000008830c () sp: c00000012b75f610 msr: a000000000001032 dar: c0000000fff87008 dsisr: a000000 current = 0xc000000167cd2d30 paca = 0xc000000000534000 pid = 27174, comm = rpm cpu 5: Vector: 300 (Data Access) at [c00000011f72b700] pc: c0000000000d88ac () lr: c0000000000d8900 () sp: c00000011f72b980 msr: a000000000009032 dar: c0000000ffd863e8 dsisr: a000000 current = 0xc000000174f5db50 paca = 0xc000000000536000 pid = 27227, comm = rpm 3:mon> E cpu 3: Vector: 300 (Data Access) at [c00000012079f2d0] pc: c000000000087ec4 (.cache_grow+0x254/0x4f4) lr: c000000000087dac (.cache_grow+0x13c/0x4f4) sp: c00000012079f550 msr: a000000000009032 dar: c0000000fffeb010 dsisr: a000000 current = 0xc0000001650bdb50 paca = 0xc000000000532000 pid = 27307, comm = rpm 3:mon> t c00000012079f550 c000000000087dac c00000012079f610 c00000000008830c c00000012079f6b0 c00000000008884c c00000012079f730 c0000000000b1198 c00000012079f7b0 c0000000000acde8 c00000012079f860 c0000000000aded8 c00000012079f8e0 c00000000010d6e0 c00000012079fab0 c00000000010dd0c c00000012079fcf0 c0000000000a9564 c00000012079fd90 c0000000000a96a0 c00000012079fe30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000012079fea0 000000800033a3fc 3:mon> T c00000012079f550 c000000000087dac .cache_grow+0x13c/0x4f4 c00000012079f610 c00000000008830c .cache_alloc_refill+0x1a8/0x35c c00000012079f6b0 c00000000008884c .kmem_cache_alloc+0x70/0x74 c00000012079f730 c0000000000b1198 .alloc_buffer_head+0x28/0x78 c00000012079f7b0 c0000000000acde8 .create_buffers+0x58/0x108 c00000012079f860 c0000000000aded8 .create_empty_buffers+0x24/0x15c c00000012079f8e0 c00000000010d6e0 .reiserfs_prepare_file_region_for_write+0x984/0x9ac c00000012079fab0 c00000000010dd0c .reiserfs_file_write+0x604/0x820 c00000012079fcf0 c0000000000a9564 .vfs_write+0x10c/0x164 c00000012079fd90 c0000000000a96a0 .sys_write+0x50/0x94 c00000012079fe30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000012079fea0 000000800033a3fc 0x800033a3fc 3:mon> c cpus stopped: 3* 4 5 3:mon> c4 press ? for help 4:mon> E cpu 4: Vector: 300 (Data Access) at [c00000012b75f390] pc: c000000000088494 (.cache_alloc_refill+0x330/0x35c) lr: c00000000008830c (.cache_alloc_refill+0x1a8/0x35c) sp: c00000012b75f610 msr: a000000000001032 dar: c0000000fff87008 dsisr: a000000 current = 0xc000000167cd2d30 paca = 0xc000000000534000 pid = 27174, comm = rpm 4:mon> T c00000012b75f610 c00000000008830c .cache_alloc_refill+0x1a8/0x35c c00000012b75f6b0 c00000000008884c .kmem_cache_alloc+0x70/0x74 c00000012b75f730 c0000000000b1198 .alloc_buffer_head+0x28/0x78 c00000012b75f7b0 c0000000000acde8 .create_buffers+0x58/0x108 c00000012b75f860 c0000000000aded8 .create_empty_buffers+0x24/0x15c c00000012b75f8e0 c00000000010d6e0 .reiserfs_prepare_file_region_for_write+0x984/0x9ac c00000012b75fab0 c00000000010dd0c .reiserfs_file_write+0x604/0x820 c00000012b75fcf0 c0000000000a9564 .vfs_write+0x10c/0x164 c00000012b75fd90 c0000000000a96a0 .sys_write+0x50/0x94 c00000012b75fe30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000012b75fea0 000000800033a3fc 0x800033a3fc 4:mon> c5 press ? for help 5:mon> E cpu 5: Vector: 300 (Data Access) at [c00000011f72b700] pc: c0000000000d88ac (.__mark_inode_dirty+0x148/0x1a4) lr: c0000000000d8900 (.__mark_inode_dirty+0x19c/0x1a4) sp: c00000011f72b980 msr: a000000000009032 dar: c0000000ffd863e8 dsisr: a000000 current = 0xc000000174f5db50 paca = 0xc000000000536000 pid = 27227, comm = rpm 5:mon> T c00000011f72b980 c0000000000d8900 .__mark_inode_dirty+0x19c/0x1a4 c00000011f72ba10 c0000000000cfd28 .inode_update_time+0xb8/0xe4 c00000011f72bab0 c00000000010db08 .reiserfs_file_write+0x400/0x820 c00000011f72bcf0 c0000000000a9564 .vfs_write+0x10c/0x164 c00000011f72bd90 c0000000000a96a0 .sys_write+0x50/0x94 c00000011f72be30 c000000000011a44 ret_from_syscall_1 exception: c00 (System Call) regs c00000011f72bea0 000000800033a3fc 0x800033a3fc -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Thu Mar 4 23:47:59 2004 From: olh at suse.de (Olaf Hering) Date: Thu, 4 Mar 2004 13:47:59 +0100 Subject: crashes in clear_user_page In-Reply-To: <20040304092643.GA16646@suse.de> References: <20040303153907.GC15383@suse.de> <20040304052451.GC5801@krispykreme> <20040304092643.GA16646@suse.de> Message-ID: <20040304124759.GA19797@suse.de> On Thu, Mar 04, Olaf Hering wrote: > > On Thu, Mar 04, Anton Blanchard wrote: > > > > > Hi, > > > > > I get crashes in clear_user_page() while building rpms on a p660. > > > gcc is 3.2.2, config is arch/ppc64/configs/pseries, plain ameslab tree. > > > > > > there is lot of IO, 8 processes do unpack rpms in parallel on a reiserfs > > > filesystem. > > > > It turns out you got a kernel segment with ks set. Shouldnt ever happen. > > > > We werent zeroing the old contents of the segment table entry before > > inserting the new one, if we overwrote a user segment with a kernel one > > the ks bit would remain set. Its a POWER3/RS64 only bug. > > > > Give this patch a go. While I was there I modified our vsid calculation > > code to match reality (we only use 13 bits of the EA). > > does not help, unless we can not blame reiserfs. now > 5:mon> d c0000000fffeb010 > c0000000fffeb010 **************** **************** | | > 5:mon> 5:mon> u Segment table contents of cpu 5 000 c000000000000090 00006a99b4b14000 001 e000000000000090 0000a708a8242000 002 d000000000000090 000008d12e6ab000 003 00000080000000b0 0000b6c9b710a000 004 0000008000000030 0000c50de9452000 008 00000000100000b0 00008de66f10a000 024 c000000030000090 0000a12fdcb14000 032 0000000040000030 00005111b3f4b000 064 e000000080000090 00008dee68242000 120 c0000000f00000b0 00007b887cb14000 128 c000000100000090 0000386534b14000 136 c000000110000090 0000f541ecb14000 144 c000000120000090 0000b21ea4b14000 152 c000000130000090 00006efb5cb14000 160 c000000140000090 00002bd814b14000 168 c000000150000090 0000e8b4ccb14000 176 c000000160000090 0000a59184b14000 184 c000000170000090 0000626e3cb14000 248 000001fff00000b0 0000ab2cff10a000 -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Fri Mar 5 02:37:34 2004 From: anton at samba.org (Anton Blanchard) Date: Fri, 5 Mar 2004 02:37:34 +1100 Subject: crashes in clear_user_page In-Reply-To: <20040304124759.GA19797@suse.de> References: <20040303153907.GC15383@suse.de> <20040304052451.GC5801@krispykreme> <20040304092643.GA16646@suse.de> <20040304124759.GA19797@suse.de> Message-ID: <20040304153734.GD5801@krispykreme> > Segment table contents of cpu 5 > 000 c000000000000090 00006a99b4b14000 > 001 e000000000000090 0000a708a8242000 > 002 d000000000000090 000008d12e6ab000 > 003 00000080000000b0 0000b6c9b710a000 > 004 0000008000000030 0000c50de9452000 > 008 00000000100000b0 00008de66f10a000 > 024 c000000030000090 0000a12fdcb14000 > 032 0000000040000030 00005111b3f4b000 > 064 e000000080000090 00008dee68242000 > 120 c0000000f00000b0 00007b887cb14000 ^ ks bit set for kernel segment. This proves: a) I am right about the cause of the oops, but also b) I cant write kernel patches > 128 c000000100000090 0000386534b14000 > 136 c000000110000090 0000f541ecb14000 > 144 c000000120000090 0000b21ea4b14000 > 152 c000000130000090 00006efb5cb14000 > 160 c000000140000090 00002bd814b14000 > 168 c000000150000090 0000e8b4ccb14000 > 176 c000000160000090 0000a59184b14000 > 184 c000000170000090 0000626e3cb14000 > 248 000001fff00000b0 0000ab2cff10a000 Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jean-laurent.gazelle at thalescomputers.fr Fri Mar 5 03:49:47 2004 From: jean-laurent.gazelle at thalescomputers.fr (Gazelle Jean-Laurent) Date: Thu, 4 Mar 2004 17:49:47 +0100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> Message-ID: <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> Thanks for your help, I finally applied these 2 patches on 2.6.4-rc1 TAGGED version (boot sequence of 2.6.3 was stopped because of NVRAM partion read issue) Now under 2.6.4-rc1 + ide patch + fw patch , I'm almost at the end of boot sequence, except that there's something wrong about IDE interrupts (interrupt 0x10 received and disabled at boot time). Is it a still pending problem or is there a fix ? AMD8111: chipset revision 3 AMD8111: 0000:00:04.1 (rev 03) UDMA133 controller AMD8111: 100% native mode on irq 32 ide0: BM-DMA at 0x7c00-0x7c07, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0x7c08-0x7c0f, BIOS settings: hdc:pio, hdd:pio Probing IDE interface ide0... hda: ST94011A, ATA DISK drive Interrupt 0x10 (real) is invalid, disabling it. ide0 at 0x7400-0x7407,0x6c02 on irq 32 Probing IDE interface ide1... hdc: TOSHIBA MK4019GAXB, ATA DISK drive ide1 at 0x7800-0x7807,0x7002 on irq 32 hda: max request size: 1024KiB hda: lost interrupt hda: lost interrupt hda: lost interrupt hda: 78140160 sectors (40007 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(33) hda: lost interrupt hda:<4>hda: dma_timer_expiry: dma status == 0x24 hda: DMA interrupt recovery Jean-Laurent PS : Yes I've extracted my kernel sources using the 'bk export -rv2.6.3 ../build' PJ : complete log sequence. On Wed, 03 Mar 2004 09:36:33 -0600 Jake Moilanen wrote: R> Yeah, there are a number of FW regressions lately. This one is due to R> FW giving us the SPLPAR hypertas set and the kernel thinking it's on a R> SPLPAR machine. I have a patch that has a bunch of hacks to work around R> the FW issues (there's more past this one). It's not pretty, but it R> will get your system booted at least. R> R> The second patch is to work around an IDE HW issue where the IO space R> has to be in the ISA range. FW is supposed to get a fix for this one R> soon. R> R> Thanks, R> Jake R> R> On Wed, 2004-03-03 at 09:23, Gazelle Jean-Laurent wrote: R> > Hi all, R> > R> > I'm trying to build linux kernel 2.6 on a JS20 platform. R> > R> > I extracted the TAGGED version 2.6.3 of ameslab 2.5 repository and rebuilded it R> > successfully but run into an issue at boot time (complete log is attached): R> > [boot]0100 MM Init R> > [boot]0100 MM Init Done R> > register_vpa: cpu 0x0 R> > Trap instruction interrupt : Invalid Instruction R> > R> > Looks like 'register_vpa' hcall fails... R> > R> > R> > I should have missed something. Do I need another kernel version ? R> > Any kind of information welcome. R> > R> > R> > regards, R> > -- R> > ------------------------------------------------------------ R> > Jean-Laurent GAZELLE 150, rue M. Berthelot R> > Phone: +33 (0)4 98 16 34 66 Z.I. Toulon Est BP244 R> > Fax : +33 (0)4 98 16 34 01 83078 TOULON Cedex9 R> > e-mail : jlg at thalescomputers.fr R> > Thales Computers FRANCE R> > _____________ http://www.thalescomputers.com _______________ R> -- ------------------------------------------------------------ Jean-Laurent GAZELLE 150, rue M. Berthelot Phone: +33 (0)4 98 16 34 66 Z.I. Toulon Est BP244 Fax : +33 (0)4 98 16 34 01 83078 TOULON Cedex9 e-mail : jlg at thalescomputers.fr Thales Computers FRANCE _____________ http://www.thalescomputers.com _______________ -------------- next part -------------- A non-text attachment was scrubbed... Name: Log_264.txt.bz2 Type: application/x-bzip2 Size: 3786 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040304/a401946f/attachment.bin From moilanen at austin.ibm.com Fri Mar 5 05:27:48 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 04 Mar 2004 12:27:48 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> Message-ID: <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> > I finally applied these 2 patches on 2.6.4-rc1 TAGGED version > (boot sequence of 2.6.3 was stopped because of NVRAM partion read issue) What was the NVRAM partition read issue?? > Now under 2.6.4-rc1 + ide patch + fw patch , I'm almost at the end of boot sequence, > except that there's something wrong about IDE interrupts (interrupt 0x10 received and disabled at boot time). > Is it a still pending problem or is there a fix ? > I just ran into this one after pulling this morning. This looks like a result of IDE probing for devices generating an interrupt as a side effect, but not having called request_irq(). request_irq will put the real-to-virtual mapping in the radix tree. Since that has not occurred xics_get_irq() will not know about the early interrupt and will disable it. This will cause all future interrupts to be missed for IDE. I've attached a patch to fix the problem by going down the "slow path" in finding the real-to-virtual mappings of the irq when the radix tree takes a miss. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-get-irq-fix-1.patch Type: text/x-patch Size: 3121 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040304/4a0aebe0/attachment.bin From moilanen at austin.ibm.com Fri Mar 5 09:01:05 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 04 Mar 2004 16:01:05 -0600 Subject: [PATCH] pci_dev to device_node fix Message-ID: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> On the JS20's there was a recent FW change where it moved one of the buses to busno 0. There is also a bridge w/ a devfn of 0 on that bus. While trying probing behind that bridge we try looking up the device_node using the busno and devfn to read config space. When we call pci_device_to_OF_node() we'll pass in the devfn and busno only to compare with to find the matching device_node. Instead of getting the bridge's device_node, we get the PHB's device_node, because there is an assumption when setting up the PHB in update_dn_pci_info() that the PHB's devfn is 0. This will cause us to recursively go down the first bridge and we actually die when trying to make duplicate entries in sysfs. This patch adds "is_phb" property to the device_node. Since we are trying to find a device_node from a pci_dev, we shouldn't worry about PHB's since PHB's are stored in pci_controllers. This change will allow us to correctly get the device_node. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-pci-dev-to-ofnode-fix-1.patch Type: text/x-patch Size: 2787 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040304/94a7167f/attachment.bin From benh at kernel.crashing.org Fri Mar 5 10:54:25 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 05 Mar 2004 10:54:25 +1100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> Message-ID: <1078444465.5704.29.camel@gaston> > I just ran into this one after pulling this morning. This looks like a > result of IDE probing for devices generating an interrupt as a side > effect, but not having called request_irq(). request_irq will put the > real-to-virtual mapping in the radix tree. Since that has not occurred > xics_get_irq() will not know about the early interrupt and will disable > it. This will cause all future interrupts to be missed for IDE. I've > attached a patch to fix the problem by going down the "slow path" in > finding the real-to-virtual mappings of the irq when the radix tree > takes a miss. The interrupt is shared with something else ? If not it should have been disabled on the controller in the first place... Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 5 10:56:35 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 05 Mar 2004 10:56:35 +1100 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> Message-ID: <1078444595.5703.31.camel@gaston> On Fri, 2004-03-05 at 09:01, Jake Moilanen wrote: > On the JS20's there was a recent FW change where it moved one of the > buses to busno 0. There is also a bridge w/ a devfn of 0 on that bus. > While trying probing behind that bridge we try looking up the > device_node using the busno and devfn to read config space. When we > call pci_device_to_OF_node() we'll pass in the devfn and busno only to > compare with to find the matching device_node. Instead of getting the > bridge's device_node, we get the PHB's device_node, because there is an > assumption when setting up the PHB in update_dn_pci_info() that the > PHB's devfn is 0. This will cause us to recursively go down the first > bridge and we actually die when trying to make duplicate entries in > sysfs. > > This patch adds "is_phb" property to the device_node. Since we are > trying to find a device_node from a pci_dev, we shouldn't worry about > PHB's since PHB's are stored in pci_controllers. This change will allow > us to correctly get the device_node. Hi Jake ! I don't like the workaround, this isn't 2.4, we are trying to do things cleanly in 2.6 ;) Why can't we fix the assumption that devfn 0 == phb in the first place ? That is, what piece of code relies on it and what would it cost to fix it ? Ben ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Mar 5 16:14:15 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 5 Mar 2004 16:14:15 +1100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> Message-ID: <16456.3239.326222.506814@cargo.ozlabs.ibm.com> Jake Moilanen writes: > I just ran into this one after pulling this morning. This looks like a > result of IDE probing for devices generating an interrupt as a side > effect, but not having called request_irq(). request_irq will put the > real-to-virtual mapping in the radix tree. Since that has not occurred > xics_get_irq() will not know about the early interrupt and will disable > it. This will cause all future interrupts to be missed for IDE. I've > attached a patch to fix the problem by going down the "slow path" in > finding the real-to-virtual mappings of the irq when the radix tree > takes a miss. OK, I see, the thing I missed previously is that we need to set the IRQ_DISABLED bit in irq_desc[virq].status for the virtual irq corresponding to the real irq we got. Your patch looks OK except for a couple of very minor points. I would call the real->virt mapping function in irq.c real_irq_to_virt_slow() and leave the one in xics.c as real_irq_to_virt(). And I would structure the test as: } else { irq = real_irq_to_virt(vec); if (irq == NO_IRQ) irq = real_irq_to_virt_slow(vec); if (irq == NO_IRQ) { printk(...) etc., and save a level of indentation that way. Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jean-laurent.gazelle at thalescomputers.fr Sat Mar 6 00:30:35 2004 From: jean-laurent.gazelle at thalescomputers.fr (Gazelle Jean-Laurent) Date: Fri, 5 Mar 2004 14:30:35 +0100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> Message-ID: <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> R> What was the NVRAM partition read issue?? Attached the 2.6.3 kernel boot log on JS20 : NVRAM failure, and the boot stops after 'ikconfig' output. nvram_scan_partitions: Error parsing nvram partitions nvram_init: Failed nvram_scan_partitions Total HugeTLB memory allocated, 0 RTAS daemon started ikconfig 0.7 with /proc/config* R> > Now under 2.6.4-rc1 + ide patch + fw patch , I'm almost at the end of boot sequence, R> > except that there's something wrong about IDE interrupts (interrupt 0x10 received and disabled at boot time). R> > Is it a still pending problem or is there a fix ? R> > I have a problem with this fix. I don't have the body of 'irq_offset_up' function, you've used in your patch. regards, Jean-Laurent -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_263.txt Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040305/50033a41/attachment.txt From moilanen at austin.ibm.com Sat Mar 6 00:48:20 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 05 Mar 2004 07:48:20 -0600 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078444595.5703.31.camel@gaston> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> Message-ID: <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> > I don't like the workaround, this isn't 2.4, we are trying to do > things cleanly in 2.6 ;) > > Why can't we fix the assumption that devfn 0 == phb in the first > place ? That is, what piece of code relies on it and what would it > cost to fix it ? The problem is, what do you set it to. A PHB doesn't have a devfn. You can't set it to -1, because there are sections of the code that will mask the devfn w/ 0xff and cause the same problem if there is a bridge is at devfn 0xff on a particular bus. The only clean way is to add an extra field in to note that this device_node is a PHB. That way it can ignore the devfn. Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Mar 6 02:05:33 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 05 Mar 2004 09:05:33 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078444465.5704.29.camel@gaston> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <1078444465.5704.29.camel@gaston> Message-ID: <1078499133.10402.175.camel@DYN279927END.austin.ibm.com> > The interrupt is shared with something else ? If not it should have > been disabled on the controller in the first place... What the IDE sequence is: disable_irq() probe_for_drive() <- Generates an interrupt enable_irq() << we get interrupted here >> request_irq() It appears that the interrupt is queued and not dropped on the floor. It is my understanding that it is hardware dependent if it gets queued or dropped when ibm,int-off rtas call is made. Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Mar 6 02:09:38 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 05 Mar 2004 09:09:38 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> Message-ID: <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> On Fri, 2004-03-05 at 07:30, Gazelle Jean-Laurent wrote: > R> What was the NVRAM partition read issue?? > Attached the 2.6.3 kernel boot log on JS20 : > NVRAM failure, and the boot stops after 'ikconfig' output. I'm not sure why you are stopping after ikconfig. But for the nvram partition read issue make sure you have the patch from Olaf to fix this problem (posted below). > I have a problem with this fix. I don't have the body of > 'irq_offset_up' function, you've used in your patch. What do you mean by this? Thanks, Jake --- /tmp/linuxppc64-2.5/arch/ppc64/kernel/pSeries_nvram.c 2004-02-12 03:47:53.000000000 +0000 +++ ./arch/ppc64/kernel/pSeries_nvram.c 2004-02-16 20:45:12.000000000 +0000 @@ -29,7 +29,7 @@ #include static unsigned int nvram_size; -static unsigned int nvram_fetch, nvram_store; +static int nvram_fetch, nvram_store; static char nvram_buf[NVRW_CNT]; /* assume this is in the first 4GB */ static spinlock_t nvram_lock = SPIN_LOCK_UNLOCKED; @@ -41,7 +41,7 @@ static ssize_t pSeries_nvram_read(char * unsigned long flags; char *p = buf; - if (nvram_size == 0 || nvram_fetch) + if (nvram_size == 0 || nvram_fetch == RTAS_UNKNOWN_SERVICE) return -ENODEV; if (*index >= nvram_size) @@ -83,7 +83,7 @@ static ssize_t pSeries_nvram_write(char unsigned long flags; const char *p = buf; - if (nvram_size == 0 || nvram_store) + if (nvram_size == 0 || nvram_store == RTAS_UNKNOWN_SERVICE) return -ENODEV; if (*index >= nvram_size) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Mar 6 02:16:14 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 05 Mar 2004 09:16:14 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <16456.3239.326222.506814@cargo.ozlabs.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <16456.3239.326222.506814@cargo.ozlabs.ibm.com> Message-ID: <1078499774.10402.188.camel@DYN279927END.austin.ibm.com> > OK, I see, the thing I missed previously is that we need to set the > IRQ_DISABLED bit in irq_desc[virq].status for the virtual irq > corresponding to the real irq we got. Your patch looks OK except for > a couple of very minor points. I would call the real->virt mapping > function in irq.c real_irq_to_virt_slow() and leave the one in xics.c > as real_irq_to_virt(). And I would structure the test as: > > } else { > irq = real_irq_to_virt(vec); > if (irq == NO_IRQ) > irq = real_irq_to_virt_slow(vec); > if (irq == NO_IRQ) { > printk(...) Another problem I do see is when you do disable the IRQ. Don't you need to EOI the interrupt. Otherwise that processor's CPPR is going to be stuck w/ the phantom interrupt's priority. (Of course this would currently be fixed due to nested interrupts not being handled correctly, and if a more favored interrupt came in, it would write the CPPR back down to base) Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jean-laurent.gazelle at thalescomputers.fr Sat Mar 6 02:17:20 2004 From: jean-laurent.gazelle at thalescomputers.fr (Gazelle Jean-Laurent) Date: Fri, 5 Mar 2004 16:17:20 +0100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> Message-ID: <20040305161720.02d8d5a6.jean-laurent.gazelle@thalescomputers.fr> R> > I have a problem with this fix. I don't have the body of R> > 'irq_offset_up' function, you've used in your patch. R> R> What do you mean by this? R> Ater update with 'linux-2.6-get-irq-fix-1.patch', the kernel compilation fails : arch/ppc64/kernel/built-in.o(.text+0x24a70): In function `.xics_get_irq': : undefined reference to `.irq_offset_up' I can't find the 'irq_offset_up' function definition ? thanks, Jean-Laurent ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Mar 6 02:34:00 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 05 Mar 2004 09:34:00 -0600 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040305161720.02d8d5a6.jean-laurent.gazelle@thalescomputers.fr> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> <20040305161720.02d8d5a6.jean-laurent.gazelle@thalescomputers.fr> Message-ID: <1078500840.10402.191.camel@DYN279927END.austin.ibm.com> > Ater update with 'linux-2.6-get-irq-fix-1.patch', the kernel compilation fails : > > arch/ppc64/kernel/built-in.o(.text+0x24a70): In function `.xics_get_irq': > : undefined reference to `.irq_offset_up' > > I can't find the 'irq_offset_up' function definition ? I'm not sure why you are seeing this, irq_offset_up is used multiple times in xics.c. I would 'make distclean' and try building again to be safe. Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Mar 6 02:38:06 2004 From: anton at samba.org (Anton Blanchard) Date: Sat, 6 Mar 2004 02:38:06 +1100 Subject: [PATCH] Re: KDB in ameslab In-Reply-To: <20040303143023.A23924@forte.austin.ibm.com> References: <20040217043527.GC25491@krispykreme> <20040302195946.J74832@forte.austin.ibm.com> <20040303143023.A23924@forte.austin.ibm.com> Message-ID: <20040305153806.GI5801@krispykreme> > This patch brings the ameslab ppc64 tree up to the current > level of KDB (v 4.3 hot from the sgi ftp site as of two days ago) > Note that since ameslab currently has kdb 4.1(?) in it, that > this patch has the effect of undoing the old kdb as it adds the > new kdb. Applied, but I couldnt find kdb/modules/kdbm_task.o. Can you send it on? BTW I find dirdiff or akpm patchscripts works better than BK for this sort of thing. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Mar 6 02:46:47 2004 From: anton at samba.org (Anton Blanchard) Date: Sat, 6 Mar 2004 02:46:47 +1100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078500840.10402.191.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> <20040305161720.02d8d5a6.jean-laurent.gazelle@thalescomputers.fr> <1078500840.10402.191.camel@DYN279927END.austin.ibm.com> Message-ID: <20040305154647.GJ5801@krispykreme> > I'm not sure why you are seeing this, irq_offset_up is used multiple > times in xics.c. I would 'make distclean' and try building again to be > safe. Note its in ameslab only, not in Linus' tree. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jean-laurent.gazelle at thalescomputers.fr Sat Mar 6 03:36:15 2004 From: jean-laurent.gazelle at thalescomputers.fr (Gazelle Jean-Laurent) Date: Fri, 5 Mar 2004 17:36:15 +0100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <20040305154647.GJ5801@krispykreme> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <20040305143035.35459b52.jean-laurent.gazelle@thalescomputers.fr> <1078499378.10402.180.camel@DYN279927END.austin.ibm.com> <20040305161720.02d8d5a6.jean-laurent.gazelle@thalescomputers.fr> <1078500840.10402.191.camel@DYN279927END.austin.ibm.com> <20040305154647.GJ5801@krispykreme> Message-ID: <20040305173615.72bf4689.jean-laurent.gazelle@thalescomputers.fr> You're right, I've extracted the sources from Linus' tagged version, not from ameslab branch. thanks, Jean-Laurent On Sat, 6 Mar 2004 02:46:47 +1100 Anton Blanchard wrote: R> R> R> > I'm not sure why you are seeing this, irq_offset_up is used multiple R> > times in xics.c. I would 'make distclean' and try building again to be R> > safe. R> R> Note its in ameslab only, not in Linus' tree. R> R> Anton R> ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 6 05:30:40 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 5 Mar 2004 12:30:40 -0600 Subject: crashes in clear_user_page In-Reply-To: <20040304153734.GD5801@krispykreme>; from anton@samba.org on Fri, Mar 05, 2004 at 02:37:34AM +1100 References: <20040303153907.GC15383@suse.de> <20040304052451.GC5801@krispykreme> <20040304092643.GA16646@suse.de> <20040304124759.GA19797@suse.de> <20040304153734.GD5801@krispykreme> Message-ID: <20040305123040.R74832@forte.austin.ibm.com> On Fri, Mar 05, 2004 at 02:37:34AM +1100, Anton Blanchard wrote: > > > Segment table contents of cpu 5 > > 000 c000000000000090 00006a99b4b14000 > > 001 e000000000000090 0000a708a8242000 > > 002 d000000000000090 000008d12e6ab000 > > 003 00000080000000b0 0000b6c9b710a000 > > 004 0000008000000030 0000c50de9452000 > > 008 00000000100000b0 00008de66f10a000 > > 024 c000000030000090 0000a12fdcb14000 > > 032 0000000040000030 00005111b3f4b000 > > 064 e000000080000090 00008dee68242000 > > 120 c0000000f00000b0 00007b887cb14000 > ^ > ks bit set for kernel segment. This proves: > > a) I am right about the cause of the oops, but also I hit this too on my build machine > b) I cant write kernel patches So, ahh, will you be providing a patch? Should I volunteer to help? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From igor at cs.wisc.edu Sat Mar 6 06:30:09 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Fri, 5 Mar 2004 13:30:09 -0600 (CST) Subject: Allocating kernel memory within a specific range. Message-ID: Hello ppc64 hackers. I have a question, which might seem a bit weird at first, but bear with me. I would like to get at kernel memory that lies within 32MB of kernel code (i.e. within the range of the unconditional branch instruction). In other words, I would like to get at memory between the end of kernel code and address 0xc000000001FFFFFF. As far as I can tell, this memory is available to the memory allocator, but whenever I call kmalloc() I tend to get addresses that are 180MB away or more. Is there a way to massage kmalloc() or some other function into giving me what I want? I am not asking for an officially published interface, just something that I could get working in my kernel module. An alternative would be to get memory in the 0xBFFFFFFFFE000000-0xBFFFFFFFFFFFFFFF range. From what I understand, this is close to impossible, since the upper 21 bits are not considered in addressing (except the high nibble). In case you are curious why I need this. I am working on a port of KernInst, a tool that allows dynamically splicing code into a running kernel, thus allowing dynamic kernel modification. I need to be able to overwrite a single instruction atomically with a jump to the modified code. The 32MB requirement requirement is due to range of the branch instruction. Thanks for any and all ideas. -Igor ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 6 07:30:53 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 05 Mar 2004 14:30:53 -0600 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <4048B173.6060009@ltcfwd.linux.ibm.com> References: <403FCD0A.7050109@ltcfwd.linux.ibm.com> <4048B173.6060009@ltcfwd.linux.ibm.com> Message-ID: <1078518653.8681.25.camel@verve.austin.ibm.com> Hi Linda, Greg- >From what I can tell, this doesn't build. The new RPA files call rtas_set_power_level(), which doesn't exist in the tree and isn't created by the patch. John On Fri, 2004-03-05 at 10:57, Linda Xie wrote: > Hi Greg, > > The attached patch was created against > //kernel.bkbits.net/gregkh/linux/pci-2.6: > > ChangeSet at 1.1627, 2004-03-05 09:43:24-06:00, lxie at threadlp13.austin.ibm.com > [PATCH] rpaphp/rpadlpar: > add support for VIO devices > add support for multifunction cards > code restructure > Lindent cleanups > > If there are no objections, please apply. > > Thanks, > > Linda > > > The following test scenarios have been tested with the latest rpa code > and user-land tools: > > a) drslot_chrp_slot command line stress tests: > =============================== > dlpar_test_empty: remove-then-add back an empty slot 10 times in a > loop. (PASSED) > dlpar_test_e100: remove-then-add back a non-empty slot (has an e100 > adapter) 10 times in a loop. (PASSED) > dlpar_test_4port : remove-then add back a non-empty slot (has a 4-port > PCI card) 10 times in a loop. (PASSED) > > > b). HMC dlpar I/O slots tests: > ===================== > > case-1: boot w/o empty slot 2-1-5: > then HMC: ADD -> RM -> ADD ->RM -> ADD -> RM -> ADD (PASSED) > > case-2: boot w/o e100 ethernet slot 1-2-4: > then HMC: ADD -> RM -> ADD ->RM -> ADD -> RM -> ADD (PASSED) > > case-3: boot w/o 4-port ethernet card slot 1-2-5: > then HMC: ADD -> RM -> ADD ->RM -> ADD -> RM -> ADD (PASSED) > then ping another partition (bigboylp4) (PASSED) > > case-4: boot with slot 1-2-4(has an e100 dapater): > echo: 0-1-0-1-0-1-0-1-0-1 (PASSED) > > case-5: boot with slot 1-2-5(has a 4-port PCI card): > echo: 0-1-0-1-0-1-0-1-0-1 (PASSED) > > > c): PCI Hotplug e100 adapter and 4port pcnet32 card > ===================================== > boot w/o -- >HOT insert --> ping --> HOT remove (ping stopped) ---> HOT > insert --> ping resumed > > d): DLPAR VIO(command line): > ======================= > ifup v-lan --> ping --> remove( ping stopped) -->add back(ping resumed) > 10 times in a loop. > > > > > > > > > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Sat Mar 6 08:21:01 2004 From: olh at suse.de (Olaf Hering) Date: Fri, 5 Mar 2004 22:21:01 +0100 Subject: crashes in clear_user_page In-Reply-To: <20040305123040.R74832@forte.austin.ibm.com> References: <20040303153907.GC15383@suse.de> <20040304052451.GC5801@krispykreme> <20040304092643.GA16646@suse.de> <20040304124759.GA19797@suse.de> <20040304153734.GD5801@krispykreme> <20040305123040.R74832@forte.austin.ibm.com> Message-ID: <20040305212101.GA23467@suse.de> On Fri, Mar 05, linas at austin.ibm.com wrote: > > On Fri, Mar 05, 2004 at 02:37:34AM +1100, Anton Blanchard wrote: > > > > > Segment table contents of cpu 5 > > > 000 c000000000000090 00006a99b4b14000 > > > 001 e000000000000090 0000a708a8242000 > > > 002 d000000000000090 000008d12e6ab000 > > > 003 00000080000000b0 0000b6c9b710a000 > > > 004 0000008000000030 0000c50de9452000 > > > 008 00000000100000b0 00008de66f10a000 > > > 024 c000000030000090 0000a12fdcb14000 > > > 032 0000000040000030 00005111b3f4b000 > > > 064 e000000080000090 00008dee68242000 > > > 120 c0000000f00000b0 00007b887cb14000 > > ^ > > ks bit set for kernel segment. This proves: > > > > a) I am right about the cause of the oops, but also > > I hit this too on my build machine > > > b) I cant write kernel patches > > So, ahh, will you be providing a patch? Should I volunteer to help? I forgot to run lilo to update the bootfile ;) everything seems to be fine now. 22:20:07 up 43 min, 2 users, load average: 12.72, 14.11, 14.31 -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sat Mar 6 10:37:34 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 06 Mar 2004 10:37:34 +1100 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> Message-ID: <1078529854.5704.106.camel@gaston> > The problem is, what do you set it to. A PHB doesn't have a devfn. You > can't set it to -1, because there are sections of the code that will > mask the devfn w/ 0xff and cause the same problem if there is a bridge > is at devfn 0xff on a particular bus. The only clean way is to add an > extra field in to note that this device_node is a PHB. That way it can > ignore the devfn. And how do you deal with device nodes that aren't PCI ? Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sat Mar 6 10:39:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 06 Mar 2004 10:39:53 +1100 Subject: kernel 2.6.3 on JS20 In-Reply-To: <1078499133.10402.175.camel@DYN279927END.austin.ibm.com> References: <20040303162344.47794c93.jean-laurent.gazelle@thalescomputers.fr> <1078328193.10402.9.camel@DYN279927END.austin.ibm.com> <20040304174947.66eb09a6.jean-laurent.gazelle@thalescomputers.fr> <1078424868.10402.102.camel@DYN279927END.austin.ibm.com> <1078444465.5704.29.camel@gaston> <1078499133.10402.175.camel@DYN279927END.austin.ibm.com> Message-ID: <1078529993.6327.108.camel@gaston> On Sat, 2004-03-06 at 02:05, Jake Moilanen wrote: > > The interrupt is shared with something else ? If not it should have > > been disabled on the controller in the first place... > > What the IDE sequence is: > > disable_irq() > probe_for_drive() <- Generates an interrupt > enable_irq() > << we get interrupted here >> > request_irq() > > It appears that the interrupt is queued and not dropped on the floor. > It is my understanding that it is hardware dependent if it gets queued > or dropped when ibm,int-off rtas call is made. The enable_irq should do nothing if there is no hander attached. I know it's all a bit hairy, the IDE code is definitely asking for trouble, but that's the way it should work. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sat Mar 6 10:59:05 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 06 Mar 2004 10:59:05 +1100 Subject: Allocating kernel memory within a specific range. In-Reply-To: References: Message-ID: <1078531145.5700.130.camel@gaston> On Sat, 2004-03-06 at 06:30, Igor Grobman wrote: > Hello ppc64 hackers. > > I have a question, which might seem a bit weird at first, but bear with > me. I would like to get at kernel memory that lies within 32MB of kernel > code (i.e. within the range of the unconditional branch instruction). In > other words, I would like to get at memory between the end of kernel code > and address 0xc000000001FFFFFF. As far as I can tell, this memory is > available to the memory allocator, but whenever I call kmalloc() I tend to > get addresses that are 180MB away or more. Is there a way to massage > kmalloc() or some other function into giving me what I want? I am not > asking for an officially published interface, just something that I could > get working in my kernel module. Nothing that works after kernel mm is initialized. You can reserve memory early during boot though, using the lmb_ mecanism, which provides some alignement & limits for allocations. (in prom.c). But that's a bit scary, don't think about using that for more than in-house experiments. > An alternative would be to get memory in the > 0xBFFFFFFFFE000000-0xBFFFFFFFFFFFFFFF range. From what I understand, this > is close to impossible, since the upper 21 bits are not considered in > addressing (except the high nibble). > > In case you are curious why I need this. I am working on a port of > KernInst, a tool that allows dynamically splicing code into a running > kernel, thus allowing dynamic kernel modification. I need to be able to > overwrite a single instruction atomically with a jump to the modified > code. The 32MB requirement requirement is due to range of the branch > instruction. Another possibility is to play MMU mapping tricks and map the last page of the virtual address space & use absolute branches. > Thanks for any and all ideas. > -Igor > -- Benjamin Herrenschmidt ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From igor at cs.wisc.edu Sat Mar 6 11:27:46 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Fri, 5 Mar 2004 18:27:46 -0600 (CST) Subject: Allocating kernel memory within a specific range. In-Reply-To: <1078531145.5700.130.camel@gaston> References: <1078531145.5700.130.camel@gaston> Message-ID: On Sat, 6 Mar 2004, Benjamin Herrenschmidt wrote: > Nothing that works after kernel mm is initialized. You can reserve > memory early during boot though, using the lmb_ mecanism, which provides > some alignement & limits for allocations. (in prom.c). But that's a bit > scary, don't think about using that for more than in-house experiments. > I saw this too, but apart from being scary, what you suggest defeats the purpose of my tool, since I want it to work on an otherwise unmodified kernel (apart from the module itself). > Another possibility is to play MMU mapping tricks and map the last page > of the virtual address space & use absolute branches. I really like this idea. I completely forgot that addresses get sign extended even in the absolute case. Does anyone see pitfalls here? Thanks! Igor ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sat Mar 6 11:35:30 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 06 Mar 2004 11:35:30 +1100 Subject: Allocating kernel memory within a specific range. In-Reply-To: References: <1078531145.5700.130.camel@gaston> Message-ID: <1078533329.6327.143.camel@gaston> > I really like this idea. I completely forgot that addresses get sign > extended even in the absolute case. Does anyone see pitfalls here? The only pitfall is that I may actually use those too in the future for some pages of code mapped both in kernel & user space.. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 6 12:40:47 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 5 Mar 2004 19:40:47 -0600 Subject: [PATCH] Re: KDB in ameslab In-Reply-To: <20040305153806.GI5801@krispykreme>; from anton@samba.org on Sat, Mar 06, 2004 at 02:38:06AM +1100 References: <20040217043527.GC25491@krispykreme> <20040302195946.J74832@forte.austin.ibm.com> <20040303143023.A23924@forte.austin.ibm.com> <20040305153806.GI5801@krispykreme> Message-ID: <20040305194047.T74832@forte.austin.ibm.com> On Sat, Mar 06, 2004 at 02:38:06AM +1100, Anton Blanchard wrote: > > > This patch brings the ameslab ppc64 tree up to the current > > level of KDB (v 4.3 hot from the sgi ftp site as of two days ago) > > Note that since ameslab currently has kdb 4.1(?) in it, that > > this patch has the effect of undoing the old kdb as it adds the > > new kdb. > > Applied, but I couldnt find kdb/modules/kdbm_task.o. Can you send it on? My apologies. I haven't been able to do anything right for a while now. I really really hate screwing up. Puts me in a foul mood. Oh well. Attached. --linas -------------- next part -------------- /* * Copyright (c) 2003 Silicon Graphics, Inc. All Rights Reserved. * * This program is free software; you can redistribute it and/or modify it * under the terms of version 2 of the GNU General Public License * as published by the Free Software Foundation. * * This program is distributed in the hope that it would be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * Further, this software is distributed without any warranty that it is * free of the rightful claim of any third person regarding infringement * or the like. Any license provided herein, whether implied or * otherwise, applies only to this software file. Patent licenses, if * any, provided herein do not apply to combinations of this program with * other software, or any other product whatsoever. * * You should have received a copy of the GNU General Public * License along with this program; if not, write the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. * * Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy, * Mountain View, CA 94043, or: * * http://www.sgi.com * * For further information regarding this notice, see: * * http://oss.sgi.com/projects/GenInfo/NoticeExplan */ #include #include #include #include #include #include #include #include #include MODULE_AUTHOR("SGI"); MODULE_DESCRIPTION("Debug struct task and sigset information"); MODULE_LICENSE("GPL"); static char * kdb_cpus_allowed_string(struct task_struct *tp) { #ifndef CPU_ARRAY_SIZE static char maskbuf[BITS_PER_LONG/4+8]; sprintf(maskbuf, "0x%0lx", tp->cpus_allowed); #else int i, j; static char maskbuf[CPU_ARRAY_SIZE * BITS_PER_LONG / 4 + 8]; strcpy(maskbuf, "0x"); for (j=2, i=CPU_ARRAY_SIZE-1; i >= 0; i--) { j += sprintf(maskbuf + j, "%0*lx", (int)(2*sizeof(tp->cpus_allowed.mask[0])), tp->cpus_allowed.mask[i]); } #endif /* CPU_ARRAY_SIZE */ return maskbuf; } static int kdbm_task(int argc, const char **argv, const char **envp, struct pt_regs *regs) { unsigned long addr; long offset=0; int nextarg; int e = 0; struct task_struct *tp = NULL; if (argc != 1) return KDB_ARGCOUNT; nextarg = 1; if ((e = kdbgetaddrarg(argc, argv, &nextarg, &addr, &offset, NULL, regs)) != 0) return(e); if (!(tp = kmalloc(sizeof(*tp), GFP_ATOMIC))) { kdb_printf("%s: cannot kmalloc tp\n", __FUNCTION__); goto out; } if ((e = kdb_getarea(*tp, addr))) { kdb_printf("%s: invalid task address\n", __FUNCTION__); goto out; } kdb_printf( "struct task at 0x%p, pid=%d flags=0x%lx state=%ld comm=\"%s\"\n", tp, tp->pid, tp->flags, tp->state, tp->comm); kdb_printf(" cpu=%d policy=%lu ", kdb_process_cpu(tp), tp->policy); kdb_printf( "prio=%d static_prio=%d cpus_allowed=%s", tp->prio, tp->static_prio, kdb_cpus_allowed_string(tp)); kdb_printf(" &thread=0x%p\n", &tp->thread); kdb_printf(" need_resched=%d ", test_tsk_thread_flag(tp, TIF_NEED_RESCHED)); kdb_printf( "timestamp=%llu time_slice=%u", tp->timestamp, tp->time_slice); kdb_printf(" lock_depth=%d\n", tp->lock_depth); kdb_printf( " fs=0x%p files=0x%p mm=0x%p\n", tp->fs, tp->files, tp->mm); kdb_printf( " uid=%d euid=%d suid=%d fsuid=%d gid=%d egid=%d sgid=%d fsgid=%d\n", tp->uid, tp->euid, tp->suid, tp->fsuid, tp->gid, tp->egid, tp->sgid, tp->fsgid); kdb_printf( " user=0x%p\n", tp->user); if (tp->sysvsem.undo_list) kdb_printf( " sysvsem.sem_undo refcnt %d proc_list=0x%p\n", atomic_read(&tp->sysvsem.undo_list->refcnt), tp->sysvsem.undo_list->proc_list); kdb_printf( " signal=0x%p &blocked=0x%p &pending=0x%p\n", tp->signal, &tp->blocked, &tp->pending); kdb_printf( " utime=%ld stime=%ld cutime=%ld cstime=%ld\n", tp->utime, tp->stime, tp->cutime, tp->cstime); out: if (tp) kfree(tp); return e; } static int kdbm_sigset(int argc, const char **argv, const char **envp, struct pt_regs *regs) { sigset_t *sp = NULL; unsigned long addr; long offset=0; int nextarg; int e = 0; int i; char fmt[32]; if (argc != 1) return KDB_ARGCOUNT; #ifndef _NSIG_WORDS kdb_printf("unavailable on this platform, _NSIG_WORDS not defined.\n"); #else nextarg = 1; if ((e = kdbgetaddrarg(argc, argv, &nextarg, &addr, &offset, NULL, regs)) != 0) return(e); if (!(sp = kmalloc(sizeof(*sp), GFP_ATOMIC))) { kdb_printf("%s: cannot kmalloc sp\n", __FUNCTION__); goto out; } if ((e = kdb_getarea(*sp, addr))) { kdb_printf("%s: invalid sigset address\n", __FUNCTION__); goto out; } sprintf(fmt, "[%%d]=0x%%0%dlx ", (int)sizeof(sp->sig[0])*2); kdb_printf("sigset at 0x%p : ", sp); for (i=_NSIG_WORDS-1; i >= 0; i--) { if (i == 0 || sp->sig[i]) { kdb_printf(fmt, i, sp->sig[i]); } } kdb_printf("\n"); #endif /* _NSIG_WORDS */ out: if (sp) kfree(sp); return e; } static int __init kdbm_task_init(void) { kdb_register("task", kdbm_task, "", "Display task_struct", 0); kdb_register("sigset", kdbm_sigset, "", "Display sigset_t", 0); return 0; } static void __exit kdbm_task_exit(void) { kdb_unregister("task"); kdb_unregister("sigset"); } kdb_module_init(kdbm_task_init) kdb_module_exit(kdbm_task_exit) From johnrose at austin.ibm.com Tue Mar 9 04:02:43 2004 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 08 Mar 2004 11:02:43 -0600 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: References: Message-ID: <1078765362.17176.18.camel@verve.austin.ibm.com> Hi- > I BUILT rpa modules OK except two warnings. I tested my kernel that > was patched with your rtas_user_2_6_partial.patch > When do you think rtas_set_power_level will be included in Greg's > tree? The rtas_user patch was rejected in October, and won't ever be included. If the new patch depends on a subset of the rejected patch, such a dependency should have been addressed before or during submission. I can make a patch containing rtas_set_power_level, if you need me to. John ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Tue Mar 9 05:01:05 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 08 Mar 2004 12:01:05 -0600 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078529854.5704.106.camel@gaston> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> <1078529854.5704.106.camel@gaston> Message-ID: <1078768865.10402.252.camel@DYN279927END.austin.ibm.com> On Fri, 2004-03-05 at 17:37, Benjamin Herrenschmidt wrote: > > The problem is, what do you set it to. A PHB doesn't have a devfn. You > > can't set it to -1, because there are sections of the code that will > > mask the devfn w/ 0xff and cause the same problem if there is a bridge > > is at devfn 0xff on a particular bus. The only clean way is to add an > > extra field in to note that this device_node is a PHB. That way it can > > ignore the devfn. > > And how do you deal with device nodes that aren't PCI ? Non-pci device_nodes shouldn't be the children of a PHB device_node in the device tree. Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mjanders at us.ibm.com Tue Mar 9 07:09:52 2004 From: mjanders at us.ibm.com (Michael Anderson) Date: Mon, 8 Mar 2004 14:09:52 -0600 Subject: request_firmware_nowait question Message-ID: I'm hoping to gather a little info on the request_firmware_nowait interface. I am working on implementing the request_firmware interface into the icom serial device driver. There are a set of firmware images (3) that are needed for the adapter to operate. Currently, the images are in the icom.h file. The firmware images are loaded at module init time. So my first question is, would I need to use the _nowait variant of the request_firmware interface? The _nowait variant appears to spawn a thread which leaves me wondering how best to synchronize with the icom driver when the firmware loads complete. As I mentioned there are 3 firmware images that need to be loaded, one following the other, can not overlap. So the icom device driver will need status on when the asynchronous _nowait finishes on the final image. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Tue Mar 9 08:22:38 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 09 Mar 2004 08:22:38 +1100 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078768865.10402.252.camel@DYN279927END.austin.ibm.com> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> <1078529854.5704.106.camel@gaston> <1078768865.10402.252.camel@DYN279927END.austin.ibm.com> Message-ID: <1078780957.14366.214.camel@gaston> On Tue, 2004-03-09 at 05:01, Jake Moilanen wrote: > On Fri, 2004-03-05 at 17:37, Benjamin Herrenschmidt wrote: > > > The problem is, what do you set it to. A PHB doesn't have a devfn. You > > > can't set it to -1, because there are sections of the code that will > > > mask the devfn w/ 0xff and cause the same problem if there is a bridge > > > is at devfn 0xff on a particular bus. The only clean way is to add an > > > extra field in to note that this device_node is a PHB. That way it can > > > ignore the devfn. > > > > And how do you deal with device nodes that aren't PCI ? > > Non-pci device_nodes shouldn't be the children of a PHB device_node in > the device tree. They can be grand-children at least (macio devices are an exemple, but then devices on a southbridge). Anyway, I still don't see the problem. The PHB proper is not part of the PCI bus, or if it is, it will respond to a devfn normally. (IMHO, the whole idea of adding those fields to struct device_node is crap in the beginning, but it's a bit late to change that in 2.6) Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Tue Mar 9 08:30:20 2004 From: greg at kroah.com (Greg KH) Date: Mon, 8 Mar 2004 13:30:20 -0800 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <4048B173.6060009@ltcfwd.linux.ibm.com> References: <403FCD0A.7050109@ltcfwd.linux.ibm.com> <4048B173.6060009@ltcfwd.linux.ibm.com> Message-ID: <20040308213020.GD16396@kroah.com> On Fri, Mar 05, 2004 at 10:57:23AM -0600, Linda Xie wrote: > Hi Greg, > > The attached patch was created against > //kernel.bkbits.net/gregkh/linux/pci-2.6: > > ChangeSet at 1.1627, 2004-03-05 09:43:24-06:00, lxie at threadlp13.austin.ibm.com > [PATCH] rpaphp/rpadlpar: > add support for VIO devices > add support for multifunction cards > code restructure > Lindent cleanups > > If there are no objections, please apply. I'll hold off until you and John agree on this :) Also, please don't send me compressed patches, just make it inline, or plain text. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Mar 9 08:33:30 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 08 Mar 2004 15:33:30 -0600 Subject: [ppc64 patch] export vio_find_node() Message-ID: <1078781610.12741.16.camel@dyn95394220.austin.ibm.com> Hi Linus, this adds vio_find_node() to vio.h in case anyone actually wants to use it... please apply. ===== include/asm-ppc64/vio.h 1.4 vs edited ===== --- 1.4/include/asm-ppc64/vio.h Mon Mar 1 10:04:24 2004 +++ edited/include/asm-ppc64/vio.h Mon Mar 8 15:34:18 2004 @@ -46,6 +46,7 @@ const struct vio_dev *dev); struct vio_dev * __devinit vio_register_device(struct device_node *node_vdev); void __devinit vio_unregister_device(struct vio_dev *dev); +struct vio_dev *vio_find_node(struct device_node *vnode); const void * vio_get_attribute(struct vio_dev *vdev, void* which, int* length); int vio_get_irq(struct vio_dev *dev); struct iommu_table * vio_build_iommu_table(struct vio_dev *dev); ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Tue Mar 9 21:10:26 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 09 Mar 2004 21:10:26 +1100 Subject: Problem with OF entry point Message-ID: <1078827025.5700.256.camel@gaston> Hi ! I've been chasing a problem where the G5 would die inside OF in some weird ways when trying to open all displays on the system. I finally found out that OF is fairly unhappy that we clobber SPRG2 in enter_prom() (entry.S). We do that to save our stack pointer which is supposed to be "clobbered" by OF. We were also "restoring" more registers than necessary in that code. I made the assumption that since we are running in OF runtime env., our stack pointer is already a 32 bits value, so we don't bother if it's top 32 bits are clobbered (cleared actually) by OF. So this saving/restoring of SP looks completely useless to me. If this was wrong, then we would be broken anyway since we pass to OF pointers to things on the stack... I've cleaned up the code a bit and produced that patch, which appear to work fine on the G5. I'd appreciate some tests on other machines along with comments in case I overlooked something. If things are fine by tomorrow, I'll send this patch to Andrew/Linus. (Dan: that will fix your dual screen boot bug) Ben. ===== arch/ppc64/kernel/entry.S 1.30 vs edited ===== --- 1.30/arch/ppc64/kernel/entry.S Mon Jan 19 17:28:26 2004 +++ edited/arch/ppc64/kernel/entry.S Tue Mar 9 21:00:33 2004 @@ -570,11 +570,10 @@ * of all registers that it saves. We therefore save those registers * PROM might touch to the stack. (r0, r3-r13 are caller saved) */ - SAVE_8GPRS(2, r1) /* Save the TOC & incoming param(s) */ - SAVE_GPR(13, r1) /* Save paca */ - SAVE_8GPRS(14, r1) /* Save the non-volatiles */ - SAVE_10GPRS(22, r1) /* ditto */ - + SAVE_8GPRS(2, r1) + SAVE_GPR(13, r1) + SAVE_8GPRS(14, r1) + SAVE_10GPRS(22, r1) mfcr r4 std r4,_CCR(r1) mfctr r5 @@ -592,20 +591,16 @@ mfmsr r11 std r11,_MSR(r1) - /* Unfortunatly, the stack pointer is also clobbered, so it is saved - * in the SPRG2 which allows us to restore our original state after - * PROM returns. - */ - mtspr SPRG2,r1 - - /* put a relocation offset into r3 */ + /* Get the PROM entrypoint */ bl .reloc_offset LOADADDR(r12,prom) sub r12,r12,r3 - ld r12,PROMENTRY(r12) /* get the prom->entry value */ + ld r12,PROMENTRY(r12) mtlr r12 - mfmsr r11 /* grab the current MSR */ + /* Switch MSR to 32 bits mode + */ + mfmsr r11 li r12,1 rldicr r12,r12,MSR_SF_LG,(63-MSR_SF_LG) andc r11,r11,r12 @@ -615,22 +610,20 @@ mtmsrd r11 isync - REST_8GPRS(2, r1) /* Restore the TOC & param(s) */ - REST_GPR(13, r1) /* Restore paca */ - REST_8GPRS(14, r1) /* Restore the non-volatiles */ - REST_10GPRS(22, r1) /* ditto */ - blrl /* Entering PROM here... */ - - mfspr r1,SPRG2 /* Restore the stack pointer */ - ld r6,_MSR(r1) /* Restore the MSR */ - mtmsrd r6 + /* Restore arguments & enter PROM here... */ + ld r3,GPR3(r1) + blrl + + /* Restore the MSR (back to 64 bits) */ + ld r0,_MSR(r1) + mtmsrd r0 isync - REST_GPR(2, r1) /* Restore the TOC */ - REST_GPR(13, r1) /* Restore paca */ - REST_8GPRS(14, r1) /* Restore the non-volatiles */ - REST_10GPRS(22, r1) /* ditto */ - + /* Restore other registers */ + REST_GPR(2, r1) + REST_GPR(13, r1) + REST_8GPRS(14, r1) + REST_10GPRS(22, r1) ld r4,_CCR(r1) mtcr r4 ld r5,_CTR(r1) @@ -645,9 +638,10 @@ mtsrr0 r9 ld r10,_SRR1(r1) mtsrr1 r10 + addi r1,r1,PROM_FRAME_SIZE - ld r0,16(r1) /* get return address */ - + ld r0,16(r1) mtlr r0 - blr /* return to caller */ + blr + #endif /* defined(CONFIG_PPC_PSERIES) */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Mar 9 21:23:25 2004 From: anton at samba.org (Anton Blanchard) Date: Tue, 9 Mar 2004 21:23:25 +1100 Subject: largepage BUG_ON Message-ID: <20040309102324.GI13007@krispykreme> With a bit of effort I managed to hit the following: kernel BUG in hugepte_offset at arch/ppc64/mm/hugetlbpage.c:195! NIP: C0000000000459A0 XER: 0000000020000000 LR: C00000000004682C REGS: c0000007fe62f7a0 TRAP: 0700 Not tainted MSR: 9000000000029032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c0000012286f5260[32071] 'bustmem' THREAD: c0000007fe62c000 CPU: 14 GPR00: 0000000000000000 C0000007FE62FA20 C000000000776E20 C000000000000000 GPR04: 0000010300000000 0000000000000080 C0000000006605E0 0000000000000001 GPR08: C000000059721000 0000000000000000 C000000000000000 000000004800B6C0 GPR12: CCCCCCCCCCCCCCCD C0000000004FC000 0000000000000000 0000000000000000 GPR16: 0000000000000000 FFFFFFFFFFFFFFF4 0000000000000001 0000010000000000 GPR20: 0000010C80000000 0000000002746000 C0000007FF435000 0000000000000000 GPR24: C000000000801F00 0000002746000000 0000000000000000 0000000000000005 GPR28: C000000000854300 000000000000D400 C000000934EFAFE0 0000010300000000 NIP [c0000000000459a0] .hugepte_offset+0xa4/0xb4 LR [c00000000004682c] .unmap_hugepage_range+0x15c/0x2d0 Call Trace: [c000000000046a0c] .zap_hugepage_range+0x6c/0x9c [c000000000093d3c] .zap_page_range+0x78/0x218 [c000000000099e04] .do_mmap_pgoff+0x630/0x7cc [c000000000015a24] .sys_mmap+0x17c/0x1b0 [c00000000000f624] .ret_from_syscall_1+0x0/0xa4 Which points to the second BUG_ON here: static hugepte_t *hugepte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd = NULL; BUG_ON(!in_hugepage_area(mm->context, addr)); pgd = pgd_offset(mm, addr); pmd = pmd_offset(pgd, addr); /* We shouldn't find a (normal) PTE page pointer here */ BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); return (hugepte_t *)pmd; } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Mar 9 23:58:51 2004 From: anton at samba.org (Anton Blanchard) Date: Tue, 9 Mar 2004 23:58:51 +1100 Subject: largepage BUG_ON In-Reply-To: <20040309102324.GI13007@krispykreme> References: <20040309102324.GI13007@krispykreme> Message-ID: <20040309125851.GJ13007@krispykreme> > With a bit of effort I managed to hit the following: Hit it again. It seems I have a bug in my application where im trying to access just past the end of my allocated largepage memory region (16GB). Somehow I end up getting a small page pte in there: pmd_hugepage, pmd 4800b6c0 addr 10400000000 kernel BUG in hugepte_offset at arch/ppc64/mm/hugetlbpage.c:204! Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Wed Mar 10 02:12:10 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 9 Mar 2004 16:12:10 +0100 Subject: Problem with OF entry point In-Reply-To: <1078827025.5700.256.camel@gaston> References: <1078827025.5700.256.camel@gaston> Message-ID: <23115037-71DC-11D8-B4F8-000A95A4DC02@kernel.crashing.org> > I finally found out that OF is fairly unhappy that we clobber SPRG2 > in enter_prom() (entry.S). We do that to save our stack pointer which > is supposed to be "clobbered" by OF. We were also "restoring" more > registers than necessary in that code. > > I made the assumption that since we are running in OF runtime env., > our stack pointer is already a 32 bits value, so we don't bother > if it's top 32 bits are clobbered (cleared actually) by OF. So this > saving/restoring of SP looks completely useless to me. If this was > wrong, then we would be broken anyway since we pass to OF pointers > to things on the stack... The client interface is required to not change _either_ SPRG2 or R1. It is allowed to modify only LR, CTR, XER, CR[01567], and R3-R12. (And, when running in virtual mode, some of the stuff related to MMU functions, of course). So normally you don't need any special save/restore, as the normal calling conventions will handle it Just Fine(tm). Unfortunately, Apple's 64-bit OF only has a 32-bit Client Interface, and seems to be a bit buggy as well, as you found out ;-) Your changes look sane to me, though. But why do we need assembler code for this anyway? Except for fixing up buggy firmware... It might make a lot of sense to run the kernel-side client-interface code (i.e., prom.c) in 32-bit mode, if there's only a 32-bit client interface available? Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Mar 10 08:24:33 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 10 Mar 2004 08:24:33 +1100 Subject: Problem with OF entry point In-Reply-To: <23115037-71DC-11D8-B4F8-000A95A4DC02@kernel.crashing.org> References: <1078827025.5700.256.camel@gaston> <23115037-71DC-11D8-B4F8-000A95A4DC02@kernel.crashing.org> Message-ID: <1078867473.9739.14.camel@gaston> > The client interface is required to not change _either_ SPRG2 or R1. > It is allowed to modify only LR, CTR, XER, CR[01567], and R3-R12. > (And, when running in virtual mode, some of the stuff related to > MMU functions, of course). Apparently, it's not about changing them, it's about using them ;) It seems OF sets some stuffs in these registers before calling us and will die on the next exception if we happen to change those values. > So normally you don't need any special save/restore, as the normal > calling conventions will handle it Just Fine(tm). No (see below) > Unfortunately, Apple's 64-bit OF only has a 32-bit Client Interface, > and seems to be a bit buggy as well, as you found out ;-) > > Your changes look sane to me, though. But why do we need assembler > code for this anyway? Except for fixing up buggy firmware... No. The firmware is 32 bits, so it will internally fail to properly save/restore the high bits of non-volatile registers. We need the asm wrapper to save them and switch to 32 bits for the OF call. > It might make a lot of sense to run the kernel-side client-interface > code (i.e., prom.c) in 32-bit mode, if there's only a 32-bit > client interface available? Hehe, it would, it would ... but then, we would also need to get rid of all the shared data structures between prom.c "init" code and the rest of the kernel ;) Note that we do plan to do that at one point. Step #1 will be to split prom.c, a bit like we did on ppc32. Step #2 will be to define a cleaner "interface" between prom_init.c and the rest of the kernel, at which point we may be able to just compile prom_init as a completely separate (and 32 bits relocatable) binary. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 10 10:13:09 2004 From: greg at kroah.com (Greg KH) Date: Tue, 9 Mar 2004 15:13:09 -0800 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <404E515C.2070508@ltcfwd.linux.ibm.com> References: <403FCD0A.7050109@ltcfwd.linux.ibm.com> <4048B173.6060009@ltcfwd.linux.ibm.com> <404E515C.2070508@ltcfwd.linux.ibm.com> Message-ID: <20040309231309.GE14038@kroah.com> On Tue, Mar 09, 2004 at 05:21:00PM -0600, Linda Xie wrote: > > > > >Also, please don't send me compressed patches, just make it inline, or > >plain text. > > > >thanks, > > > >greg k-h > > Greg, > > The attached rtas_set_power.patch was created by John. Please apply it > to your tree. > diff -Nru a/arch/ppc64/kernel/rtas.c b/arch/ppc64/kernel/rtas.c > --- a/arch/ppc64/kernel/rtas.c Mon Mar 8 13:29:08 2004 > +++ b/arch/ppc64/kernel/rtas.c Mon Mar 8 13:29:08 2004 Um, no. I'm not the ppc64 maintainer. Patches to that part of the tree should go though them, not me. Sorry, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 10 12:13:10 2004 From: greg at kroah.com (Greg KH) Date: Tue, 9 Mar 2004 17:13:10 -0800 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <404E5C55.6010909@ltcfwd.linux.ibm.com> References: <404E5C55.6010909@ltcfwd.linux.ibm.com> Message-ID: <20040310011310.GB15707@kroah.com> On Tue, Mar 09, 2004 at 06:07:49PM -0600, Linda Xie wrote: > > diff -Nru a/include/asm-ppc64/vio.h b/include/asm-ppc64/vio.h > --- a/include/asm-ppc64/vio.h Tue Mar 9 15:38:39 2004 > +++ b/include/asm-ppc64/vio.h Tue Mar 9 15:38:39 2004 Sorry, you are going to have to get this through the ppc64 maintainers to Linus, not from me. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Mar 10 13:07:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 10 Mar 2004 13:07:11 +1100 Subject: Problem with OF entry point In-Reply-To: <1078867473.9739.14.camel@gaston> References: <1078827025.5700.256.camel@gaston> <23115037-71DC-11D8-B4F8-000A95A4DC02@kernel.crashing.org> <1078867473.9739.14.camel@gaston> Message-ID: <1078884431.9745.43.camel@gaston> Ok, here's a new version that just adds a paranoid clearing of the high bits of r1 when going out of OF... just in case ;) I don't see how the firmware could be broken enough to completely clobber r1 including the low bits, let me know if you ever heard about that. I'm going to push that to Andrew/Linus later today. ===== arch/ppc64/kernel/entry.S 1.30 vs edited ===== --- 1.30/arch/ppc64/kernel/entry.S Mon Jan 19 17:28:26 2004 +++ edited/arch/ppc64/kernel/entry.S Wed Mar 10 12:13:08 2004 @@ -570,11 +570,10 @@ * of all registers that it saves. We therefore save those registers * PROM might touch to the stack. (r0, r3-r13 are caller saved) */ - SAVE_8GPRS(2, r1) /* Save the TOC & incoming param(s) */ - SAVE_GPR(13, r1) /* Save paca */ - SAVE_8GPRS(14, r1) /* Save the non-volatiles */ - SAVE_10GPRS(22, r1) /* ditto */ - + SAVE_8GPRS(2, r1) + SAVE_GPR(13, r1) + SAVE_8GPRS(14, r1) + SAVE_10GPRS(22, r1) mfcr r4 std r4,_CCR(r1) mfctr r5 @@ -592,20 +591,16 @@ mfmsr r11 std r11,_MSR(r1) - /* Unfortunatly, the stack pointer is also clobbered, so it is saved - * in the SPRG2 which allows us to restore our original state after - * PROM returns. - */ - mtspr SPRG2,r1 - - /* put a relocation offset into r3 */ + /* Get the PROM entrypoint */ bl .reloc_offset LOADADDR(r12,prom) sub r12,r12,r3 - ld r12,PROMENTRY(r12) /* get the prom->entry value */ + ld r12,PROMENTRY(r12) mtlr r12 - mfmsr r11 /* grab the current MSR */ + /* Switch MSR to 32 bits mode + */ + mfmsr r11 li r12,1 rldicr r12,r12,MSR_SF_LG,(63-MSR_SF_LG) andc r11,r11,r12 @@ -615,22 +610,25 @@ mtmsrd r11 isync - REST_8GPRS(2, r1) /* Restore the TOC & param(s) */ - REST_GPR(13, r1) /* Restore paca */ - REST_8GPRS(14, r1) /* Restore the non-volatiles */ - REST_10GPRS(22, r1) /* ditto */ - blrl /* Entering PROM here... */ - - mfspr r1,SPRG2 /* Restore the stack pointer */ - ld r6,_MSR(r1) /* Restore the MSR */ - mtmsrd r6 + /* Restore arguments & enter PROM here... */ + ld r3,GPR3(r1) + blrl + + /* Just make sure that r1 top 32 bits didn't get + * corrupt by OF + */ + rldicl r1,r1,0,32 + + /* Restore the MSR (back to 64 bits) */ + ld r0,_MSR(r1) + mtmsrd r0 isync - REST_GPR(2, r1) /* Restore the TOC */ - REST_GPR(13, r1) /* Restore paca */ - REST_8GPRS(14, r1) /* Restore the non-volatiles */ - REST_10GPRS(22, r1) /* ditto */ - + /* Restore other registers */ + REST_GPR(2, r1) + REST_GPR(13, r1) + REST_8GPRS(14, r1) + REST_10GPRS(22, r1) ld r4,_CCR(r1) mtcr r4 ld r5,_CTR(r1) @@ -645,9 +643,10 @@ mtsrr0 r9 ld r10,_SRR1(r1) mtsrr1 r10 + addi r1,r1,PROM_FRAME_SIZE - ld r0,16(r1) /* get return address */ - + ld r0,16(r1) mtlr r0 - blr /* return to caller */ + blr + #endif /* defined(CONFIG_PPC_PSERIES) */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Wed Mar 10 14:02:52 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Tue, 9 Mar 2004 21:02:52 -0600 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <404E5C55.6010909@ltcfwd.linux.ibm.com> References: <404E5C55.6010909@ltcfwd.linux.ibm.com> Message-ID: <6BCAB8D1-723F-11D8-A6E2-000A95A0560C@us.ibm.com> On Mar 9, 2004, at 6:07 PM, Linda Xie wrote: > > The attached is vio_h.patch which added vio_find_node prototype for > rpaphp. Linda, I already submitted this to Linus, and will be pinging him again in a couple days if it's not there yet. I cc'ed the ppc64 mailing list; the mail must be in your inbox. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Wed Mar 10 18:20:08 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Wed, 10 Mar 2004 08:20:08 +0100 Subject: Problem with OF entry point In-Reply-To: <1078867473.9739.14.camel@gaston> References: <1078827025.5700.256.camel@gaston> <23115037-71DC-11D8-B4F8-000A95A4DC02@kernel.crashing.org> <1078867473.9739.14.camel@gaston> Message-ID: <5C567FDA-7263-11D8-98DF-000A95A4DC02@kernel.crashing.org> >> The client interface is required to not change _either_ SPRG2 or R1. >> It is allowed to modify only LR, CTR, XER, CR[01567], and R3-R12. >> (And, when running in virtual mode, some of the stuff related to >> MMU functions, of course). > > Apparently, it's not about changing them, it's about using them ;) > > It seems OF sets some stuffs in these registers before calling us > and will die on the next exception if we happen to change those values. Ah yes. That behaviour is just as broken, of course. >> So normally you don't need any special save/restore, as the normal >> calling conventions will handle it Just Fine(tm). > > No (see below) This is also because of broken OF implementation (see below). >> Unfortunately, Apple's 64-bit OF only has a 32-bit Client Interface, >> and seems to be a bit buggy as well, as you found out ;-) >> >> Your changes look sane to me, though. But why do we need assembler >> code for this anyway? Except for fixing up buggy firmware... > > No. The firmware is 32 bits, so it will internally fail to properly Actually, it is 64 bit, just not all of it. Not saving the high half of the registers is a bug. And it doesn't have a 64-bit client interface. Which isn't really a bug, but is also not implementing the firm recommendations by the standard. > save/restore the high bits of non-volatile registers. We need the > asm wrapper to save them and switch to 32 bits for the OF call. Another bug; the OF entry point should switch to 32-bit mode itself. >> It might make a lot of sense to run the kernel-side client-interface >> code (i.e., prom.c) in 32-bit mode, if there's only a 32-bit >> client interface available? > > Hehe, it would, it would ... but then, we would also need to get > rid of all the shared data structures between prom.c "init" code > and the rest of the kernel ;) Which is a good idea, anyway :-) > Note that we do plan to do that at one point. Step #1 will be to > split prom.c, a bit like we did on ppc32. Step #2 will be to > define a cleaner "interface" between prom_init.c and the rest > of the kernel, at which point we may be able to just compile > prom_init as a completely separate (and 32 bits relocatable) > binary. /me cheers you on :-) Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 11 09:25:04 2004 From: greg at kroah.com (Greg KH) Date: Wed, 10 Mar 2004 14:25:04 -0800 Subject: [Fwd: Re: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices )] In-Reply-To: <404E5E02.4060005@ltcfwd.linux.ibm.com> References: <404E5E02.4060005@ltcfwd.linux.ibm.com> Message-ID: <20040310222504.GC23594@kroah.com> On Tue, Mar 09, 2004 at 06:14:58PM -0600, Linda Xie wrote: > Greg, > > Here is non-zipped rpa patch. > > > Please apply. Applied, thanks. greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From strosake at austin.ibm.com Thu Mar 11 11:03:56 2004 From: strosake at austin.ibm.com (Mike Strosaker) Date: Wed, 10 Mar 2004 18:03:56 -0600 Subject: [PATCH] (2.6) improvements to os-term on panic Message-ID: <404FACEC.5020805@austin.ibm.com> Hello: This patch handles the RTAS_BUSY value that ibm,os-term can return, and also passes the panic string to the os-term call so that it can be stored in a platform error log. Comments welcome. Thanks, Mike -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: osterm.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040310/8c734c74/attachment.txt From linas at austin.ibm.com Thu Mar 11 11:19:45 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 10 Mar 2004 18:19:45 -0600 Subject: RFC: WIP, Patch: bullet-proofing the system reset exception vector Message-ID: <20040310181945.Y74832@forte.austin.ibm.com> Request for comment/ work in progress ... The patch below tries to 'bullet-proof' the system-reset and machine-check exceptions so that even if the system is fairly badly hosed, one might still be able to invoke the kernel debugger. Its not a perfect patch, it doesn't solve all problems; I'm mostly trying to test the waters to see if anyone is interested, or thinks this is a good idea or waste of time (well, all of *my* ideas are *good* ideas :-) but maybe someone has a better idea), or if anyone has special requests. To be more precise, this patch attempts to handle a system-reset exception, even if the exception pointer and/or the kernel stack pointer is corrupted. I'm fighting a hard hang due to a corrupted stack pointer, have to use cpuctl to debug and thats no fun. I'll probably be doing more hacking on this over the next few days, possibly. --linas -------------- next part -------------- ===== head.S 1.55 vs edited ===== --- 1.55/arch/ppc64/kernel/head.S Sun Feb 29 20:24:55 2004 +++ edited/head.S Wed Mar 10 14:55:27 2004 @@ -200,6 +200,13 @@ #define EX_CCR 60 #define EX_TRAP 60 +#if 0 +/* the stack pointer must start with 0xc000 or else its error */ + rldicl r21,r21,16,48 /* top sixteen bits right justified */ + cmpli crf3,1,r21,0xc000 /* compare to 0xc000 */ + beq crf3, +#endif + #define EXCEPTION_PROLOG_PSERIES(n,label) \ mtspr SPRG2,r20; /* use SPRG2 as scratch reg */ \ mtspr SPRG1,r21; /* save r21 */ \ @@ -234,6 +241,48 @@ mfcr r23; /* save CR in r23 */ \ rfid +/* This exception prolog is almost identical to that above, except + * that it makes the paranoid assumption that the paca exception + * stack pointer may be corrpted. It is meant to be used for the + * SystemReset Exception, so that the 'little yellow button' will + * suceed in starting the debugger even if low-level subsystems + * are corrupted. */ +#define EXCEPTION_PROLOG_NMI_PSERIES(n,label) \ + mtspr SPRG2,r20; /* use SPRG2 as scratch reg */ \ + mtspr SPRG1,r21; /* save r21 */ \ + mfspr r20,SPRG3; /* get paca virt addr */ \ + LOADADDR(r21,emergency_stack); /* get exception stack ptr */ \ + addi r21,r21,8; \ + std r22,EX_R22(r21); /* Save r22 in exc. frame */ \ + ld r22,PACAEXCSP(r20); /* get old exception stack ptr */ \ + std r22,-8(r21); /* Save old paca stack pointer where we can find it */ \ + li r22,n; /* Save the ex # in exc. frame*/ \ + stw r22,EX_TRAP(r21); /* */ \ + std r23,EX_R23(r21); /* Save r23 in exc. frame */ \ + mfspr r22,SRR0; /* EA of interrupted instr */ \ + std r22,EX_SRR0(r21); /* Save SRR0 in exc. frame */ \ + mfspr r23,SRR1; /* machine state at interrupt */ \ + std r23,EX_SRR1(r21); /* Save SRR1 in exc. frame */ \ + \ + mfspr r23,DAR; /* Save DAR in exc. frame */ \ + std r23,EX_DAR(r21); \ + mfspr r23,DSISR; /* Save DSISR in exc. frame */ \ + stw r23,EX_DSISR(r21); \ + mfspr r23,SPRG2; /* Save r20 in exc. frame */ \ + std r23,EX_R20(r21); \ + \ + clrrdi r22,r20,60; /* Get 0xc part of the vaddr */ \ + ori r22,r22,(label)@l; /* add in the vaddr offset */ \ + /* assumes *_common < 16b */ \ + mfmsr r23; \ + rotldi r23,r23,4; \ + ori r23,r23,0x32B; /* Set IR, DR, RI, SF, ISF, HV*/ \ + rotldi r23,r23,60; /* for generic handlers */ \ + mtspr SRR0,r22; \ + mtspr SRR1,r23; \ + mfcr r23; /* save CR in r23 */ \ + rfid + /* * This is the start of the interrupt handlers for iSeries * This code runs with relocation on. @@ -315,6 +364,54 @@ ld r2,PACATOC(r20); \ mr r13,r20 +/* More or less the same as "EXCEPTION_PROLOG_COMMON" except that + * we assume that r1 is corrupt, and that PACAKSAVE(r20) is + * corrupt, and so we use the known-good 'emergency stack' r21 + * as the kernel stack. Also sets paca to use 'paranoid_exception_stack' + * for future excpetions. + */ +#define EXCEPTION_PROLOG_BULLETPROOF \ + mfspr r22,SPRG1; /* Save r21 in exc. frame */ \ + std r22,EX_R21(r21); \ + LOADADDR(r22,paranoid_exception_stack); /* start a new except stack */ \ + std r22,PACAEXCSP(r20); /* use this new stack in future */ \ + mr r22,r1; /* Save r1 */ \ + mr r1,r21; /* use the emergency stack */ \ + subi r1,r1,INT_FRAME_SIZE; /* alloc frame on emergency stack */ \ + std r22,GPR1(r1); /* save r1 in stackframe */ \ + std 0,0(r1); /* dead-end stack chain pointer */ \ + std r23,_CCR(r1); /* save CR in stackframe */ \ + ld r22,EX_R20(r21); /* move r20 to stackframe */ \ + std r22,GPR20(r1); \ + ld r23,EX_R21(r21); /* move r21 to stackframe */ \ + std r23,GPR21(r1); \ + ld r22,EX_R22(r21); /* move r22 to stackframe */ \ + std r22,GPR22(r1); \ + ld r23,EX_R23(r21); /* move r23 to stackframe */ \ + std r23,GPR23(r1); \ + mflr r22; /* save LR in stackframe */ \ + std r22,_LINK(r1); \ + mfctr r23; /* save CTR in stackframe */ \ + std r23,_CTR(r1); \ + mfspr r22,XER; /* save XER in stackframe */ \ + std r22,_XER(r1); \ + ld r23,EX_DAR(r21); /* move DAR to stackframe */ \ + std r23,_DAR(r1); \ + lwz r22,EX_DSISR(r21); /* move DSISR to stackframe */ \ + std r22,_DSISR(r1); \ + lbz r22,PACAPROCENABLED(r20); \ + std r22,SOFTE(r1); \ + ld r22,EX_SRR0(r21); /* get SRR0 from exc. frame */ \ + ld r23,EX_SRR1(r21); /* get SRR1 from exc. frame */ \ + LOADADDR(r21,paranoid_exception_stack); /* reset a new exception stack ptr */ \ + std r21,PACAEXCSP(r20); \ + SAVE_GPR(0, r1); /* save r0 in stackframe */ \ + SAVE_8GPRS(2, r1); /* save r2 - r13 in stackframe */ \ + SAVE_4GPRS(10, r1); \ + /* XXX TOC may be corrupted, fixme/workaround.... */ \ + ld r2,PACATOC(r20); \ + mr r13,r20 + /* * Note: code which follows this uses cr0.eq (set if from kernel), * r1, r22 (SRR0), and r23 (SRR1). @@ -329,6 +426,12 @@ label##_Pseries: \ EXCEPTION_PROLOG_PSERIES( n, label##_common ) +#define NMI_EXCEPTION_PSERIES(n, label ) \ + . = n; \ + .globl label##_Pseries; \ +label##_Pseries: \ + EXCEPTION_PROLOG_NMI_PSERIES( n, label##_common ) + #define STD_EXCEPTION_ISERIES( n, label ) \ .globl label##_Iseries; \ label##_Iseries: \ @@ -368,6 +471,17 @@ bl hdlr; \ b .ret_from_except +#define BULLETPROOF_EXCEPTION_COMMON( trap, label, hdlr ) \ + .globl label##_common; \ +label##_common: \ + EXCEPTION_PROLOG_BULLETPROOF; \ + addi r3,r1,STACK_FRAME_OVERHEAD; \ + li r20,0; \ + li r6,trap; \ + bl .save_remaining_regs; \ + bl hdlr; \ + b .ret_from_except + /* * Start of pSeries system interrupt routines */ @@ -375,8 +489,8 @@ .globl __start_interrupts __start_interrupts: - STD_EXCEPTION_PSERIES( 0x100, SystemReset ) - STD_EXCEPTION_PSERIES( 0x200, MachineCheck ) + NMI_EXCEPTION_PSERIES( 0x100, SystemReset ) + NMI_EXCEPTION_PSERIES( 0x200, MachineCheck ) STD_EXCEPTION_PSERIES( 0x300, DataAccess ) STD_EXCEPTION_PSERIES( 0x380, DataAccessSLB ) STD_EXCEPTION_PSERIES( 0x400, InstructionAccess ) @@ -575,10 +689,10 @@ . = 0x8000 .globl SystemReset_FWNMI SystemReset_FWNMI: - EXCEPTION_PROLOG_PSERIES(0x100, SystemReset_common) + EXCEPTION_PROLOG_NMI_PSERIES(0x100, SystemReset_common) .globl MachineCheck_FWNMI MachineCheck_FWNMI: - EXCEPTION_PROLOG_PSERIES(0x200, MachineCheck_common) + EXCEPTION_PROLOG_NMI_PSERIES(0x200, MachineCheck_common) /* * Space for the initial segment table @@ -596,8 +710,8 @@ /*** Common interrupt handlers ***/ - STD_EXCEPTION_COMMON( 0x100, SystemReset, .SystemResetException ) - STD_EXCEPTION_COMMON( 0x200, MachineCheck, .MachineCheckException ) + BULLETPROOF_EXCEPTION_COMMON( 0x100, SystemReset, .SystemResetException ) + BULLETPROOF_EXCEPTION_COMMON( 0x200, MachineCheck, .MachineCheckException ) STD_EXCEPTION_COMMON( 0x900, Decrementer, .timer_interrupt ) STD_EXCEPTION_COMMON( 0xa00, Trap_0a, .UnknownException ) STD_EXCEPTION_COMMON( 0xb00, Trap_0b, .UnknownException ) @@ -2178,6 +2292,19 @@ stab_array: .space 4096 * 48 +/* System Reset Exception (emergency) stack */ + .globl emergency_stack +emergency_stack: + .space 8192 + +/* Alternate exception stack to use after emergency exception. + * We use this so that we can take additional exceptions without + * having to assume that the paca exception stack is uncorrupted. + */ + .globl paranoid_exception_stack +paranoid_exception_stack: + .space 4096 + /* * This space gets a copy of optional info passed to us by the bootstrap * Used to pass parameters into the kernel like root=/dev/sda1, etc. From greg at kroah.com Thu Mar 11 12:19:50 2004 From: greg at kroah.com (Greg KH) Date: Wed, 10 Mar 2004 17:19:50 -0800 Subject: [PATCH] rpaphp/rpadlpar latest (support for vio and multifunction devices ) In-Reply-To: <404F4A77.9040702@ltcfwd.linux.ibm.com> References: <404E5C55.6010909@ltcfwd.linux.ibm.com> <6BCAB8D1-723F-11D8-A6E2-000A95A0560C@us.ibm.com> <404F4A77.9040702@ltcfwd.linux.ibm.com> Message-ID: <20040311011950.GB11828@kroah.com> On Wed, Mar 10, 2004 at 11:03:51AM -0600, Linda Xie wrote: > Hollis Blanchard wrote: > > >On Mar 9, 2004, at 6:07 PM, Linda Xie wrote: > > > >> > >>The attached is vio_h.patch which added vio_find_node prototype for > >>rpaphp. > > > > > >Linda, I already submitted this to Linus, and will be pinging him > >again in a couple days if it's not there yet. > > > Sounds good. > > Greg, > It seems that rpa patch won't be applied until all its dependencies are > in your tree. Am I right? As I can't build your driver due to a lack of ppc64 hardware, I'll gladly add your driver to my tree :) Just did it infact. I'll send it off to Linus after 2.6.4 is out. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Mon Mar 15 01:59:05 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 15 Mar 2004 01:59:05 +1100 Subject: POWER3 TCE updates Message-ID: <20040314145905.GE19737@krispykreme> Hi, We currently always allocate a TCE table that maps 2GB. Unfortunately there is at least one pSeries machine that has a 3GB IO hole, the nighthawk2 node. The following patch checks where the IO hole starts and will only use 1GB of the TCE table if it starts below 2GB. While I was in the area, I upped the maximum number of PHBs to give us some breathing room (it would be nice to make of_tce_table[] initdata but its currently referenced in IOMMU functions not marked __init). We also now only allocate a 4MB tce table on POWER3/RS64, we were only allocating an 8MB due to a firmware/hardware interaction on POWER4. Does it look OK? I havent tested it yet but I hope to tomorrow. Anton --- forakpm-anton/arch/ppc64/kernel/pSeries_iommu.c | 13 ++++++++++++- forakpm-anton/arch/ppc64/kernel/prom.c | 17 +++++++++-------- 2 files changed, 21 insertions(+), 9 deletions(-) diff -puN arch/ppc64/kernel/prom.c~power3_pci arch/ppc64/kernel/prom.c --- forakpm/arch/ppc64/kernel/prom.c~power3_pci 2004-03-15 01:44:39.173340849 +1100 +++ forakpm-anton/arch/ppc64/kernel/prom.c 2004-03-15 01:46:13.348758313 +1100 @@ -138,8 +138,8 @@ extern struct rtas_t rtas; extern unsigned long klimit; extern struct lmb lmb; -#define MAX_PHB 16 * 3 // 16 Towers * 3 PHBs/tower -struct _of_tce_table of_tce_table[MAX_PHB + 1] = {{0, 0, 0}}; +#define MAX_PHB 24 * 6 /* 24 drawers * 6 PHBs/drawer */ +struct _of_tce_table of_tce_table[MAX_PHB + 1]; char *bootpath = 0; char *bootdevice = 0; @@ -848,20 +848,21 @@ prom_initialize_tce_table(void) minsize = 4UL << 20; } - /* Even though we read what OF wants, we just set the table + /* + * Even though we read what OF wants, we just set the table * size to 4 MB. This is enough to map 2GB of PCI DMA space. * By doing this, we avoid the pitfalls of trying to DMA to * MMIO space and the DMA alias hole. - */ - /* + * * On POWER4, firmware sets the TCE region by assuming * each TCE table is 8MB. Using this memory for anything * else will impact performance, so we always allocate 8MB. * Anton - * - * XXX FIXME use a cpu feature here */ - minsize = 8UL << 20; + if (__is_processor(PV_POWER4) || __is_processor(PV_POWER4p)) + minsize = 8UL << 20; + else + minsize = 4UL << 20; /* Align to the greater of the align or size */ align = max(minalign, minsize); diff -puN arch/ppc64/kernel/pSeries_iommu.c~power3_pci arch/ppc64/kernel/pSeries_iommu.c --- forakpm/arch/ppc64/kernel/pSeries_iommu.c~power3_pci 2004-03-15 01:36:23.063897211 +1100 +++ forakpm-anton/arch/ppc64/kernel/pSeries_iommu.c 2004-03-15 01:41:39.493396173 +1100 @@ -104,6 +104,17 @@ static void iommu_buses_init(void) struct device_node *dn, *first_dn; int num_slots, num_slots_ilog2; int first_phb = 1; + unsigned long tcetable_ilog2; + + /* + * We default to a TCE table that maps 2GB (4MB table, 22 bits), + * however some machines have a 3GB IO hole and for these we + * create a table that maps 1GB (2MB table, 21 bits) + */ + if (io_hole_start < 0x80000000UL) + tcetable_ilog2 = 21; + else + tcetable_ilog2 = 22; /* XXX Should we be using pci_root_buses instead? -ojn */ @@ -119,7 +130,7 @@ static void iommu_buses_init(void) if ((1<dma_window_size = 1 << (22 - num_slots_ilog2); + phb->dma_window_size = 1 << (tcetable_ilog2 - num_slots_ilog2); /* Reserve 16MB of DMA space on the first PHB. * We should probably be more careful and use firmware props. _ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Mon Mar 15 02:11:31 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 15 Mar 2004 02:11:31 +1100 Subject: pSeries iommu=off Message-ID: <20040314151131.GF19737@krispykreme> Hi, Here is a patch to implement iommu=off on pSeries similar to how we do it on g5. I moved the cmdline parsing into a common early_cmdline_parse function and also made the memory limiting a little more robust. It should get some g5 testing to prove I didnt break things before merging. Anton ===== arch/ppc64/kernel/chrp_setup.c 1.58 vs edited ===== --- 1.58/arch/ppc64/kernel/chrp_setup.c Tue Mar 2 00:40:28 2004 +++ edited/arch/ppc64/kernel/chrp_setup.c Wed Mar 10 15:33:44 2004 @@ -205,15 +205,17 @@ fwnmi_active = 1; } - /* Early initialization. Relocation is on but do not reference unbolted pages */ void __init pSeries_init_early(void) { -#ifdef CONFIG_PPC_PSERIES /* This ifdef should go away */ void *comport; hpte_init_pSeries(); - tce_init_pSeries(); + + if (ppc64_iommu_off) + pci_dma_init_direct(); + else + tce_init_pSeries(); #ifdef CONFIG_SMP smp_init_pSeries(); @@ -226,7 +228,6 @@ ppc_md.udbg_putc = udbg_putc; ppc_md.udbg_getc = udbg_getc; ppc_md.udbg_getc_poll = udbg_getc_poll; -#endif } void __init ===== arch/ppc64/kernel/pSeries_pci.c 1.38 vs edited ===== --- 1.38/arch/ppc64/kernel/pSeries_pci.c Mon Mar 1 13:24:57 2004 +++ edited/arch/ppc64/kernel/pSeries_pci.c Wed Mar 10 15:33:33 2004 @@ -699,7 +699,8 @@ phbs_fixup_io(); chrp_request_regions(); pci_fix_bus_sysdata(); - iommu_setup_pSeries(); + if (!ppc64_iommu_off) + iommu_setup_pSeries(); } /*********************************************************************** ===== arch/ppc64/kernel/prom.c 1.68 vs edited ===== --- 1.68/arch/ppc64/kernel/prom.c Mon Mar 1 13:24:58 2004 +++ edited/arch/ppc64/kernel/prom.c Wed Mar 10 17:19:53 2004 @@ -295,7 +295,6 @@ prom_print(RELOC("\n")); } - static unsigned long prom_initialize_naca(unsigned long mem) { @@ -311,7 +310,7 @@ #ifdef DEBUG_PROM prom_print(RELOC("prom_initialize_naca: start...\n")); #endif - + _naca->pftSize = 0; /* ilog2 of htab size. computed below. */ for (node = 0; prom_next_node(&node); ) { @@ -516,25 +515,14 @@ return mem; } -#ifdef CONFIG_PMAC_DART -static int dart_force_on; -#endif +static int iommu_force_on; +int ppc64_iommu_off; -static unsigned long __init -prom_initialize_lmb(unsigned long mem) +static void early_cmdline_parse(void) { - phandle node; - char type[64]; - unsigned long i, offset = reloc_offset(); - struct prom_t *_prom = PTRRELOC(&prom); - struct systemcfg *_systemcfg = RELOC(systemcfg); - union lmb_reg_property reg; - unsigned long lmb_base, lmb_size; - unsigned long num_regs, bytes_per_reg = (_prom->encode_phys_size*2)/8; - int nodart = 0; - -#ifdef CONFIG_PMAC_DART + unsigned long offset = reloc_offset(); char *opt; + struct systemcfg *_systemcfg = RELOC(systemcfg); opt = strstr(RELOC(cmd_line), RELOC("iommu=")); if (opt) { @@ -545,16 +533,30 @@ while (*opt && *opt == ' ') opt++; if (!strncmp(opt, RELOC("off"), 3)) - nodart = 1; + RELOC(ppc64_iommu_off) = 1; else if (!strncmp(opt, RELOC("force"), 5)) - RELOC(dart_force_on) = 1; + RELOC(iommu_force_on) = 1; } -#else - nodart = 1; -#endif /* CONFIG_PMAC_DART */ - if (nodart) +#ifndef CONFIG_PMAC_DART + if (_systemcfg->platform == PLATFORM_POWERMAC) { + RELOC(ppc64_iommu_off) = 1; prom_print(RELOC("DART disabled on PowerMac !\n")); + } +#endif +} + +static unsigned long __init +prom_initialize_lmb(unsigned long mem) +{ + phandle node; + char type[64]; + unsigned long i, offset = reloc_offset(); + struct prom_t *_prom = PTRRELOC(&prom); + struct systemcfg *_systemcfg = RELOC(systemcfg); + union lmb_reg_property reg; + unsigned long lmb_base, lmb_size; + unsigned long num_regs, bytes_per_reg = (_prom->encode_phys_size*2)/8; lmb_init(); @@ -580,11 +582,6 @@ lmb_base = ((unsigned long)reg.addrPM[i].address_hi) << 32; lmb_base |= (unsigned long)reg.addrPM[i].address_lo; lmb_size = reg.addrPM[i].size; - if (nodart && lmb_base > 0x80000000ull) { - prom_print(RELOC("Skipping memory above 2Gb for " - "now, DART support disabled\n")); - continue; - } } else if (_prom->encode_phys_size == 32) { lmb_base = reg.addr32[i].address; lmb_size = reg.addr32[i].size; @@ -593,7 +590,16 @@ lmb_size = reg.addr64[i].size; } - if ( lmb_add(lmb_base, lmb_size) < 0 ) + /* We limit memory to 2GB if the IOMMU is off */ + if (RELOC(ppc64_iommu_off)) { + if (lmb_base >= 0x80000000UL) + continue; + + if ((lmb_base + lmb_size) > 0x80000000UL) + lmb_size = 0x80000000UL - lmb_base; + } + + if (lmb_add(lmb_base, lmb_size) < 0) prom_print(RELOC("Too many LMB's, discarding this one...\n")); } @@ -788,7 +794,6 @@ } #endif /* CONFIG_PMAC_DART */ - void prom_initialize_tce_table(void) { @@ -802,6 +807,9 @@ struct _of_tce_table *prom_tce_table = RELOC(of_tce_table); unsigned long tce_entry, *tce_entryp; + if (RELOC(ppc64_iommu_off)) + return; + #ifdef DEBUG_PROM prom_print(RELOC("starting prom_initialize_tce_table\n")); #endif @@ -1563,6 +1571,8 @@ if (p != NULL && p[0] != 0) strlcpy(RELOC(cmd_line), p, sizeof(cmd_line)); } + + early_cmdline_parse(); mem = prom_initialize_lmb(mem); ===== include/asm-ppc64/iommu.h 1.2 vs edited ===== --- 1.2/include/asm-ppc64/iommu.h Thu Mar 4 00:26:24 2004 +++ edited/include/asm-ppc64/iommu.h Wed Mar 10 15:37:25 2004 @@ -152,4 +152,6 @@ extern void pci_iommu_init(void); extern void pci_dma_init_direct(void); +extern int ppc64_iommu_off; + #endif ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Mar 16 05:47:12 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 15 Mar 2004 12:47:12 -0600 Subject: [PATCH] (2.6) improvements to os-term on panic In-Reply-To: <404FACEC.5020805@austin.ibm.com> References: <404FACEC.5020805@austin.ibm.com> Message-ID: <2C0A387E-76B1-11D8-894A-000A95A0560C@us.ibm.com> On Mar 10, 2004, at 6:03 PM, Mike Strosaker wrote: > > This patch handles the RTAS_BUSY value that ibm,os-term can return, > and also passes the panic string to the os-term call so that it can be > stored in a platform error log. Hey Mike, how would I access the platform error log after that? I recently had a problem where the panic message never appeared on console, but I guess the "ibm,os-term" call caused the partition to reboot instantly (i.e. no waiting for 180 seconds or whatever). So the result was boot kernel -> instant reboot, without any error message. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Tue Mar 16 09:00:27 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 15 Mar 2004 16:00:27 -0600 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1078780957.14366.214.camel@gaston> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> <1078529854.5704.106.camel@gaston> <1078768865.10402.252.camel@DYN279927END.austin.ibm.com> <1078780957.14366.214.camel@gaston> Message-ID: <1079388026.10402.2953.camel@DYN279927END.austin.ibm.com> > Anyway, I still don't see the problem. The PHB proper is not part of > the PCI bus, or if it is, it will respond to a devfn normally. Let me see if I can reiterate the problem a little clearer. On a normal pSeries box, they will have a "real" PHB. When trying to read the PHB's config space, the actual RTAS call would use devfn 0 for the PHB. The config cycles are done from the PHB. So the assumption that the PHB will have devfn 0 holds. On a JS20, the PHB is a U3. This is not a real PHB in terms of how we think of it. The config cycles are done directly from the bridges. So when we use devfn 0, we really want the first bridge. So what I am trying to say, is that it is possible/legal for a device to have a devfn of 0. We can not make the assumption that the PHB will be the one using devfn 0. Since there is this overlap, we need a way need another variable to tell us if the device is a PHB or not to differentiate. Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Tue Mar 16 09:35:16 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 15 Mar 2004 16:35:16 -0600 Subject: _IO_IS_ISA question Message-ID: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> In eeh.h, on all of our eeh_in/outb calls there is a check to see if the port is under 64k, and if it is we assume it's an ISA address. Isn't it legal for a port to be under 64k and be in PCI space? For instance: static inline u8 eeh_inb(unsigned long port) { u8 val; if (_IO_IS_ISA(port) && !_IO_HAS_ISA_BUS) return ~0; val = in_8((u8 *)(port+pci_io_base)); if (!_IO_IS_ISA(port) && EEH_POSSIBLE_IO_ERROR(val, u8)) return eeh_check_failure((void*)(port), val); return val; } Why do we have: if (_IO_IS_ISA(port) && !_IO_HAS_ISA_BUS) return ~0; Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From strosake at austin.ibm.com Tue Mar 16 10:33:36 2004 From: strosake at austin.ibm.com (Mike Strosaker) Date: Mon, 15 Mar 2004 17:33:36 -0600 Subject: [PATCH] (2.6) improvements to os-term on panic In-Reply-To: <2C0A387E-76B1-11D8-894A-000A95A0560C@us.ibm.com> References: <404FACEC.5020805@austin.ibm.com> <2C0A387E-76B1-11D8-894A-000A95A0560C@us.ibm.com> Message-ID: <40563D50.3090002@austin.ibm.com> Hollis Blanchard wrote: > > Hey Mike, how would I access the platform error log after that? I > recently had a problem where the panic message never appeared on > console, but I guess the "ibm,os-term" call caused the partition to > reboot instantly (i.e. no waiting for 180 seconds or whatever). So the > result was boot kernel -> instant reboot, without any error message. > The behavior after an ibm,os-term RTAS call is platform dependent. The panic message is printed to the op panel on HMC connected systems. Systems equipped with a service processor will call home the error if the SP is properly configured. The message may also be stored to NVRAM, but I'm not certain what platforms do that, or where it may be stored. I'm looking into that now. Since your partition rebooted immediately, I suspect that the RTAS call returned a "hardware busy, try again" condition, which my initial patch failed to handle. The patch that was attached to the initial message in this thread should prevent that behavior from happening again. Thanks, Mike ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Tue Mar 16 10:37:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 16 Mar 2004 10:37:44 +1100 Subject: [PATCH] pci_dev to device_node fix In-Reply-To: <1079388026.10402.2953.camel@DYN279927END.austin.ibm.com> References: <1078437665.10402.120.camel@DYN279927END.austin.ibm.com> <1078444595.5703.31.camel@gaston> <1078494500.10402.131.camel@DYN279927END.austin.ibm.com> <1078529854.5704.106.camel@gaston> <1078768865.10402.252.camel@DYN279927END.austin.ibm.com> <1078780957.14366.214.camel@gaston> <1079388026.10402.2953.camel@DYN279927END.austin.ibm.com> Message-ID: <1079393863.1968.186.camel@gaston> On Tue, 2004-03-16 at 09:00, Jake Moilanen wrote: > > Anyway, I still don't see the problem. The PHB proper is not part of > > the PCI bus, or if it is, it will respond to a devfn normally. > > Let me see if I can reiterate the problem a little clearer. On a normal > pSeries box, they will have a "real" PHB. When trying to read the PHB's > config space, the actual RTAS call would use devfn 0 for the PHB. The > config cycles are done from the PHB. So the assumption that the PHB > will have devfn 0 holds. Well, as I told you, a PHB may ... or may not have a config space, and that config space may or may not be at devfn 0. For example, on Macs, the PHB usually has a non-0 devfn. (devnum 10 iirc) > On a JS20, the PHB is a U3. This is not a real PHB in terms of how we > think of it. The config cycles are done directly from the bridges. So > when we use devfn 0, we really want the first bridge. Not exactly. What I did on pmac was just to have each bridge visible, just no PHB, there is abolutely no _need_ to have the PHB itself be visible as a PCI device. I don't see why we would get the first HT<->PCI bridge on devfn 0. They have their own devnums's (1 to 7 on pmacs). > So what I am trying to say, is that it is possible/legal for a device to > have a devfn of 0. We can not make the assumption that the PHB will be > the one using devfn 0. Since there is this overlap, we need a way need > another variable to tell us if the device is a PHB or not to > differentiate. > > Thanks, > Jake -- Benjamin Herrenschmidt ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Tue Mar 16 17:50:59 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 16 Mar 2004 17:50:59 +1100 Subject: _IO_IS_ISA question In-Reply-To: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> Message-ID: <1079419858.1967.237.camel@gaston> On Tue, 2004-03-16 at 09:35, Jake Moilanen wrote: > In eeh.h, on all of our eeh_in/outb calls there is a check to see if the > port is under 64k, and if it is we assume it's an ISA address. Isn't it > legal for a port to be under 64k and be in PCI space? Yes. The "ISA" IO space is just a a subset of the PCI space. If this is not the case, then the code is bogus. > For instance: > > static inline u8 eeh_inb(unsigned long port) { > u8 val; > if (_IO_IS_ISA(port) && !_IO_HAS_ISA_BUS) > return ~0; > val = in_8((u8 *)(port+pci_io_base)); > if (!_IO_IS_ISA(port) && EEH_POSSIBLE_IO_ERROR(val, u8)) > return eeh_check_failure((void*)(port), val); > return val; > } > > Why do we have: > > if (_IO_IS_ISA(port) && !_IO_HAS_ISA_BUS) > return ~0; It's a bug. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Mar 16 18:06:50 2004 From: anton at samba.org (Anton Blanchard) Date: Tue, 16 Mar 2004 18:06:50 +1100 Subject: [PATCH] ignore huge OF properties Message-ID: <20040316070649.GP19737@krispykreme> Hi, Im just about to commit this patch. We have some versions of firmware out there that have huge OF properties. So huge that we end up overwriting our initrd. Place a 1MB limit and warn bitterly if its over this. Also fix a use of package-to-path where the variable was 64bytes but we would pass in a length of 255. Anton forakpm-anton/arch/ppc64/kernel/prom.c | 36 ++++++++++++++++++++++++++++++--- 1 files changed, 33 insertions(+), 3 deletions(-) diff -puN arch/ppc64/kernel/prom.c~ppc64-large_properties arch/ppc64/kernel/prom.c --- forakpm/arch/ppc64/kernel/prom.c~ppc64-large_properties 2004-03-16 14:51:57.969341778 +1100 +++ forakpm-anton/arch/ppc64/kernel/prom.c 2004-03-16 16:58:57.881228283 +1100 @@ -61,6 +61,14 @@ extern const struct linux_logo logo_linu #endif /* + * Properties whose value is longer than this get excluded from our + * copy of the device tree. This value does need to be big enough to + * ensure that we don't lose things like the interrupt-map property + * on a PCI-PCI bridge. + */ +#define MAX_PROPERTY_LENGTH (1UL * 1024 * 1024) + +/* * prom_init() is called very early on, before the kernel text * and data have been mapped to KERNELBASE. At this point the code * is running at whatever address it has been loaded at, so @@ -908,9 +916,11 @@ prom_initialize_tce_table(void) *tce_entryp = tce_entry; } + /* It seems OF doesn't null-terminate the path :-( */ + memset(path, 0, sizeof(path)); /* Call OF to setup the TCE hardware */ if (call_prom(RELOC("package-to-path"), 3, 1, node, - path, 255) <= 0) { + path, sizeof(path)-1) <= 0) { prom_print(RELOC("package-to-path failed\n")); } else { prom_print(RELOC("opened ")); @@ -1687,6 +1697,11 @@ check_display(unsigned long mem) /* It seems OF doesn't null-terminate the path :-( */ path = (char *) mem; memset(path, 0, 256); + + /* + * leave some room at the end of the path for appending extra + * arguments + */ if ((long) call_prom(RELOC("package-to-path"), 3, 1, node, path, 250) < 0) continue; @@ -1794,8 +1809,7 @@ copy_device_tree(unsigned long mem_start return new_start; } -__init -static unsigned long +static unsigned long __init inspect_node(phandle node, struct device_node *dad, unsigned long mem_start, unsigned long mem_end, struct device_node ***allnextpp) @@ -1843,6 +1857,22 @@ inspect_node(phandle node, struct device valp, mem_end - mem_start); if (pp->length < 0) continue; + if (pp->length > MAX_PROPERTY_LENGTH) { + char path[128]; + + prom_print(RELOC("WARNING: ignoring large property ")); + /* It seems OF doesn't null-terminate the path :-( */ + memset(path, 0, sizeof(path)); + if (call_prom(RELOC("package-to-path"), 3, 1, node, + path, sizeof(path)-1) > 0) + prom_print(path); + prom_print(namep); + prom_print(RELOC(" length 0x")); + prom_print_hex(pp->length); + prom_print_nl(); + + continue; + } mem_start = DOUBLEWORD_ALIGN(mem_start + pp->length); *prev_propp = PTRUNRELOC(pp); prev_propp = &pp->next; ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Mar 17 03:54:59 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 16 Mar 2004 10:54:59 -0600 Subject: [bryan@staidm.org: [PATCH] ppc32 copy_to_user dcbt fixup] Message-ID: <20040316105459.A33924@forte.austin.ibm.com> So, Does the problem described below affect the ppc64 cpu's? --linas ----- Forwarded message from Bryan Rittmeyer ----- Date: Fri, 12 Mar 2004 20:15:47 -0800 From: Bryan Rittmeyer To: linux-kernel at vger.kernel.org Cc: linuxppc-dev list , Paul Mackerras , Benjamin Herrenschmidt Subject: [PATCH] ppc32 copy_to_user dcbt fixup X-Loop: linuxppc-dev at lists.linuxppc.org copy_tofrom_user and copy_page use dcbt to prefetch source data [1]. Since at least 2.4.17, these functions have been prefetching beyond the end of the source buffer, leading to two problems: 1. Subtly broken software cache coherency. If the area following src was invalidate_dcache_range'd prior to submitting for DMA, an out-of-bounds dcbt from copy_to_user of a separate slab object may read in the area before DMA completion. When the DMA does complete, data will not be loaded from RAM because stale data is already in cache. Thus you get a corrupt network packet, bogus audio capture, etc. This problem probably does not affect hardware coherent systems (all Apple machines?). However: 2. The extra 'dcbt' wastes bus bandwidth. Worst case: on a 128 byte copy, we currently dcbt 256 bytes. These extra loads trash cache, potentially causing writeback of more useful data. The attached patch attempts to reign in dcbt prefetching at the end of copies such that we do not read beyond the src area. This change fixes DMA data corruption on software coherent systems and improves performance slightly in my lame microbenchmark [2]. [1] csum_partial_copy_generic does not use dcbt/dcbz despite being scorching hot in TCP workloads. I'm cooking up another patch to dcb?ize it. [2] http://staidm.org/linux/ppc/copy_dcbt/copyuser-microbench.tar.bz2 Comments? -Bryan ----- End forwarded message ----- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Wed Mar 17 04:06:05 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 16 Mar 2004 11:06:05 -0600 Subject: [bryan@staidm.org: [PATCH] ppc32 copy_to_user dcbt fixup] In-Reply-To: <20040316105459.A33924@forte.austin.ibm.com> References: <20040316105459.A33924@forte.austin.ibm.com> Message-ID: <405733FD.9010001@austin.ibm.com> linas at austin.ibm.com wrote: > So, > > Does the problem described below affect the ppc64 cpu's? (I assume you mean the ppc64 kernel, not cpus; some of our 64-bit machines can run 32-bit kernels and would be affected there :-) Besides that: No, there's no cache prefetch in the loops on ppc64. The only places where dcbt is used is to fetch the first block in memcpy and __copy_tofrom_user. -Olof -- Olof Johansson Office: 4F005/905 Linux on Power Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 05:21:15 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 10:21:15 -0800 Subject: eeh patch Message-ID: <20040316182115.GB19290@kroah.com> > In an attempt to ease the number of EEH events some people have > seen, it was decided to add the ability to hot-unplug certain > devices when they received an eeh event instead of calling panic(). > > This is done via an user-space daemon that does a read on a /proc > file that blocks in the kernel until an eeh event occurs. The > kernel then returns the device name to the daemon who handles > the actual unplugging of the device. > > The (2.6) patch and user-space daemon code have been tested, so I am > hoping to get some feedback before pushing this on to Ameslab. How about "what the hell are you doing?" as feedback. :) What's wrong with sysfs and the hotplug interface for doing this kind of stuff? We need another user-space daemon reading from a /proc file like we need another implementation of devfs... Please, stop reinventing the wheel and use the subsytems and interfaces that are already present in the kernel to do this kind of stuff. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Wed Mar 17 06:58:04 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 16 Mar 2004 13:58:04 -0600 Subject: eeh patch In-Reply-To: <1079371123.8840.23.camel@mudbug.austin.ibm.com> References: <1079371123.8840.23.camel@mudbug.austin.ibm.com> Message-ID: <40575C4C.6090905@austin.ibm.com> Nathan Fontenot wrote: > The (2.6) patch and user-space daemon code have been tested, so I am > hoping to get some feedback before pushing this on to Ameslab. See comments/questions below > diff -Nru a/arch/ppc64/kernel/eeh.c b/arch/ppc64/kernel/eeh.c > --- a/arch/ppc64/kernel/eeh.c Mon Mar 15 09:41:21 2004 > +++ b/arch/ppc64/kernel/eeh.c Mon Mar 15 09:41:21 2004 > @@ -50,6 +50,13 @@ > static char *eeh_opts; > static int eeh_opts_last; > > +static spinlock_t slot_errbuf_lock = SPIN_LOCK_UNLOCKED; > +static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; > + > +DECLARE_WAIT_QUEUE_HEAD(eeh_hotplug_wait); > +static unsigned char eeh_hotplug_buf[128]; /* should be long enough */ Maybe a define + using snprintf with the limit, just to play safe and be a good kernel citizen? We've seen weird/overly large data from OF in other areas lately. > + if (strcmp(dn->name, "ethernet") == 0) { > + dn->eeh_mode |= EEH_MODE_NOCHECK; > + sprintf(eeh_hotplug_buf, "%s", dn->full_name); Why is ethernet treated separately here? Is it the only device type we support hotplug of at the moment? > + > +static int proc_eeh_hotplug_show(struct seq_file *m, void *v) > +{ > + int rc; > + > + rc = wait_event_interruptible(eeh_hotplug_wait, eeh_hotplug_event); > + if (rc) > + return 0; > + > + seq_printf(m, "%s", eeh_hotplug_buf); > + memset(eeh_hotplug_buf, 0, 128); > + eeh_hotplug_event = 0; > + > + return 0; > +} > + > +static int proc_eeh_hotplug_open(struct inode *inode, struct file *file) > +{ > + return single_open(file, proc_eeh_hotplug_show, NULL); > +} Besides Greg's opinions about the userspace daemon design, there's one drawback with this: doing a simple cat of the proc file in question will hang. It might not be a desired default behaviour, since it'll surprise people. Especially since permissions are 444 on the file. -Olof -- Olof Johansson Office: 4F005/905 Linux on Power Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 08:39:08 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 13:39:08 -0800 Subject: eeh patch In-Reply-To: <20040316182115.GB19290@kroah.com> References: <20040316182115.GB19290@kroah.com> Message-ID: <20040316213908.GA29738@kroah.com> On Tue, Mar 16, 2004 at 10:21:15AM -0800, Greg KH wrote: > > In an attempt to ease the number of EEH events some people have > > seen, it was decided to add the ability to hot-unplug certain > > devices when they received an eeh event instead of calling panic(). > > > > This is done via an user-space daemon that does a read on a /proc > > file that blocks in the kernel until an eeh event occurs. The > > kernel then returns the device name to the daemon who handles > > the actual unplugging of the device. > > > > The (2.6) patch and user-space daemon code have been tested, so I am > > hoping to get some feedback before pushing this on to Ameslab. > > How about "what the hell are you doing?" as feedback. :) Ok, sorry if anyone got offended by this comment, I didn't mean to be so snippy. I apologize for this. But my main point stands. If you are going to try to invent a new way to talk to the kernel from userspace, please take about 10 deep breaths and back away from the keyboard. Then go see how the kernel does things today and please please please use that interface instead. This is only about the 240th time this same topic has come up in the past on any one of a zillion different kernel mailing lists (linux-kernel, linux-hotplug-devel, etc.) Feel free to read the archives of those lists to see what Linus thinks of interfaces like you just created, and what I think of /proc. So again, if you are going to be writing kernel code, please do your homework first before adding new stuff. And if you think you do want to create a new interface, please, bring it up in linux-kernel where it belongs. There is no reason the ppc64 tree should be doing things on their own for such an important interaction. We should spread the goodness to all other arches too. Again, sorry for any hurt feelings that my post might have caused, I was only critiquing the code that was published. greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Wed Mar 17 08:40:12 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Tue, 16 Mar 2004 15:40:12 -0600 Subject: eeh patch In-Reply-To: <20040316182115.GB19290@kroah.com> References: <20040316182115.GB19290@kroah.com> Message-ID: <1079473212.8840.59.camel@mudbug.austin.ibm.com> On Tue, 2004-03-16 at 12:21, Greg KH wrote: > How about "what the hell are you doing?" as feedback. :) > > What's wrong with sysfs and the hotplug interface for doing this kind of > stuff? We need another user-space daemon reading from a /proc file like > we need another implementation of devfs... > > Please, stop reinventing the wheel and use the subsytems and interfaces > that are already present in the kernel to do this kind of stuff. > > thanks, > > greg k-h Yes, I agree that containing all of this behavior in the kernel would be the best way to go. There's nothing wrong with the hotplug interfaces, I just wasn't able to find anything that worked for me. Any help you could provide as to which interfaces are the proper ones to use to hotplug/unplug devices from the kernel would be very helpful. thanks. -Nathan F. -- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 08:49:32 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 13:49:32 -0800 Subject: eeh patch In-Reply-To: <1079473212.8840.59.camel@mudbug.austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> Message-ID: <20040316214932.GA30202@kroah.com> On Tue, Mar 16, 2004 at 03:40:12PM -0600, Nathan Fontenot wrote: > On Tue, 2004-03-16 at 12:21, Greg KH wrote: > > How about "what the hell are you doing?" as feedback. :) > > > > What's wrong with sysfs and the hotplug interface for doing this kind of > > stuff? We need another user-space daemon reading from a /proc file like > > we need another implementation of devfs... > > > > Please, stop reinventing the wheel and use the subsytems and interfaces > > that are already present in the kernel to do this kind of stuff. > > > > thanks, > > > > greg k-h > > Yes, I agree that containing all of this behavior in the kernel > would be the best way to go. There's nothing wrong with the > hotplug interfaces, I just wasn't able to find anything that > worked for me. Any help you could provide as to which interfaces > are the proper ones to use to hotplug/unplug devices from the > kernel would be very helpful. thanks. First off, what are you "hotplugging"? What type of devices are these? And why are they not covered by the current hotplug interface in the kernel? thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Mar 17 09:00:07 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Tue, 16 Mar 2004 14:00:07 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040316214932.GA30202@kroah.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> Message-ID: <1079474407.20826.1088.camel@nighthawk> On Tue, 2004-03-16 at 13:49, Greg KH wrote: > First off, what are you "hotplugging"? What type of devices are these? > And why are they not covered by the current hotplug interface in the > kernel? In general, do any of the devices in the OpenFirmware tree generate linux hotplug events? When something happens to the tree, does /sbin/hotplug get called? Greg, I think a large portion of the problem lies in the current way that the device tree is exported to userspace. Take one look at /proc/device-tree, and I think you'll see what I mean. -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 09:09:28 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 14:09:28 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079474407.20826.1088.camel@nighthawk> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> Message-ID: <20040316220928.GA30672@kroah.com> On Tue, Mar 16, 2004 at 02:00:07PM -0800, Dave Hansen wrote: > On Tue, 2004-03-16 at 13:49, Greg KH wrote: > > First off, what are you "hotplugging"? What type of devices are these? > > And why are they not covered by the current hotplug interface in the > > kernel? > > In general, do any of the devices in the OpenFirmware tree generate > linux hotplug events? When something happens to the tree, does > /sbin/hotplug get called? If that tree is in sysfs, yes. > Greg, I think a large portion of the problem lies in the current way > that the device tree is exported to userspace. Take one look at > /proc/device-tree, and I think you'll see what I mean. Eeek! Hm, I thought this tree was moved under /sys/firmware which is where it rightly belongs. If that is done, then you get all the hotplug events you could ever ask for, for free :) Is anyone working on moving this tree to sysfs? thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Wed Mar 17 09:19:16 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Tue, 16 Mar 2004 16:19:16 -0600 Subject: eeh patch In-Reply-To: <20040316213908.GA29738@kroah.com> References: <20040316182115.GB19290@kroah.com> <20040316213908.GA29738@kroah.com> Message-ID: <1079475555.8840.84.camel@mudbug.austin.ibm.com> On Tue, 2004-03-16 at 15:39, Greg KH wrote: > Ok, sorry if anyone got offended by this comment, I didn't mean to be so > snippy. I apologize for this. No offense taken. EEH always seems to be a hot topic. > > But my main point stands. If you are going to try to invent a new way > to talk to the kernel from userspace, please take about 10 deep breaths > and back away from the keyboard. Then go see how the kernel does things > today and please please please use that interface instead. > This wasn't an attempt to create a new way to talk to the kernel. I was simply following another model I had found that is currently in the ppc64 kernel. AFAIK, procfs, sysfs and syscalls are the only ways to communicate with the kernel. How would you suggest we notify a user-space application that an EEH event has occured? The goal of the code is to reduce the number of panics resulting from EEH events. To do this we wanted to hot-unplug network devices when they received an EEH event. I figure doing this for network devices is safe, other devices this would be a big no-no for. If there is a way to unplug a device from within the kernel please let me know? I tried to do this using some of the routines I found in the kernel, but not with any good results. > This is only about the 240th time this same topic has come up in the > past on any one of a zillion different kernel mailing lists > (linux-kernel, linux-hotplug-devel, etc.) Feel free to read the > archives of those lists to see what Linus thinks of interfaces like you > just created, and what I think of /proc. So again, if you are going to > be writing kernel code, please do your homework first before adding new > stuff. > > And if you think you do want to create a new interface, please, bring it > up in linux-kernel where it belongs. There is no reason the ppc64 tree > should be doing things on their own for such an important interaction. > We should spread the goodness to all other arches too. > I brought this up here because this deals with EEH, something that is definitely ppc64 specific. No reason to inflict EEH on all the other arches :) > Again, sorry for any hurt feelings that my post might have caused, I was > only critiquing the code that was published. > > greg k-h -- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Mar 17 09:25:44 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 17 Mar 2004 09:25:44 +1100 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040316220928.GA30672@kroah.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> Message-ID: <20040316222544.GW19737@krispykreme> > Eeek! Hm, I thought this tree was moved under /sys/firmware which is > where it rightly belongs. If that is done, then you get all the hotplug > events you could ever ask for, for free :) Basically we want a mechanism to receive EEH errors in userspace and decide whether we can recover from them by hotplugging the device. EEH errors are PCI errors. eg we get a target abort on a slot, the hardware locks the slot out and we get notification. Its possible this is a transient fault and we should reset the card (ie hotplug it out and in). If it happens to much we want to deconfigure the card. This policy obviously lives in userspace. How do you suggest we hook it up? > Is anyone working on moving this tree to sysfs? Not at the moment. Too many things refer to it. We'll remove /proc/device-tree the day you get rid of /proc :) But seriously it would be nice to merge sysfs and the device-trr, we can look at this in 2.7. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 09:33:37 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 14:33:37 -0800 Subject: eeh patch In-Reply-To: <1079475555.8840.84.camel@mudbug.austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <20040316213908.GA29738@kroah.com> <1079475555.8840.84.camel@mudbug.austin.ibm.com> Message-ID: <20040316223337.GA31162@kroah.com> On Tue, Mar 16, 2004 at 04:19:16PM -0600, Nathan Fontenot wrote: > On Tue, 2004-03-16 at 15:39, Greg KH wrote: > > Ok, sorry if anyone got offended by this comment, I didn't mean to be so > > snippy. I apologize for this. > > No offense taken. EEH always seems to be a hot topic. Well, other people at your employeer seemed to have taken offense to my response. I view it as a learning curve of the open source development process for people to get used to :) > > > > But my main point stands. If you are going to try to invent a new way > > to talk to the kernel from userspace, please take about 10 deep breaths > > and back away from the keyboard. Then go see how the kernel does things > > today and please please please use that interface instead. > > > This wasn't an attempt to create a new way to talk to the kernel. I was > simply following another model I had found that is currently in the > ppc64 kernel. AFAIK, procfs, sysfs and syscalls are the only ways to > communicate with the kernel. How would you suggest we notify a > user-space application that an EEH event has occured? Um, the hotplug event mechanism? Like the way the rest of the kernel does such a thing? > The goal of the code is to reduce the number of panics resulting from > EEH events. To do this we wanted to hot-unplug network devices when > they received an EEH event. I figure doing this for network devices > is safe, other devices this would be a big no-no for. > > If there is a way to unplug a device from within the kernel please let > me know? There's loads of ways to do this. It all depends on the type of the device. See the USB, IEEE1394, PCI, PCMCIA, and other bus interfaces that all handle this kind of action all the time. But wait, if you tell the kernel to disconnect such a device, you will generate a hotplug event automatically. Haven't you seen that in your testing? > I tried to do this using some of the routines I found in the kernel, but > not with any good results. What routines did you use? > I brought this up here because this deals with EEH, something that is > definitely ppc64 specific. No reason to inflict EEH on all the other > arches :) True, but no reason to invent a new mechanism which has explicitly been forbidden to add to the kernel either (see the archives of lkml where this has been discussed many times...) This is what the hotplug interface is for, please use it to provide a consistant interface for all types of kernel events. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 17 09:37:12 2004 From: greg at kroah.com (Greg KH) Date: Tue, 16 Mar 2004 14:37:12 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040316222544.GW19737@krispykreme> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> Message-ID: <20040316223712.GB31162@kroah.com> On Wed, Mar 17, 2004 at 09:25:44AM +1100, Anton Blanchard wrote: > > > Eeek! Hm, I thought this tree was moved under /sys/firmware which is > > where it rightly belongs. If that is done, then you get all the hotplug > > events you could ever ask for, for free :) > > Basically we want a mechanism to receive EEH errors in userspace and > decide whether we can recover from them by hotplugging the device. > > EEH errors are PCI errors. eg we get a target abort on a slot, the > hardware locks the slot out and we get notification. Its possible this > is a transient fault and we should reset the card (ie hotplug it out and > in). If it happens to much we want to deconfigure the card. > > This policy obviously lives in userspace. How do you suggest we hook it > up? As a hotplug event? That should work, right? > > Is anyone working on moving this tree to sysfs? > > Not at the moment. Too many things refer to it. We'll remove > /proc/device-tree the day you get rid of /proc :) Heh, don't tempt me :) Remember, /proc is only for "processes", not device stuff. > But seriously it would be nice to merge sysfs and the device-trr, we > can look at this in 2.7. It looks like a portion of the device-tree is in sysfs today, with the different busses, right? It shouldn't be that big of a leap. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ahuja at austin.ibm.com Wed Mar 17 10:38:06 2004 From: ahuja at austin.ibm.com (ahuja at austin.ibm.com) Date: Tue, 16 Mar 2004 17:38:06 -0600 (CST) Subject: Patch: cpu utilization monitor. Message-ID: This patch adds the framework required by performace team and on demand computing. At this point only the important bits/framework are covered. All the kobjects/calculations are yet to be written. We are still contiunuing to disscuss methods for performace monitoring. Purr is a vcpu performance counter. This collects purr/tb periodically to be used later in computations from any given cpu without needing to acquire a cpu exclusively. Thanks, Manish p.s My first patch, so go easy guys... -------------- next part -------------- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1506 -> 1.1507 # arch/ppc64/kernel/smp.c 1.68 -> 1.69 # arch/ppc64/kernel/Makefile 1.40 -> 1.41 # (new) -> 1.1 arch/ppc64/kernel/profile.h # (new) -> 1.1 arch/ppc64/kernel/profile.c # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 04/03/16 ahuja at threadlp13.austin.ibm.com 1.1507 # Added new functionality for performance monitoring mahuja at us.ibm.com # -------------------------------------------- # diff -Nru a/arch/ppc64/kernel/Makefile b/arch/ppc64/kernel/Makefile --- a/arch/ppc64/kernel/Makefile Tue Mar 16 16:46:34 2004 +++ b/arch/ppc64/kernel/Makefile Tue Mar 16 16:46:34 2004 @@ -43,7 +43,7 @@ obj-$(CONFIG_PPC_RTAS) += rtas-proc.o obj-$(CONFIG_SCANLOG) += scanlog.o obj-$(CONFIG_VIOPATH) += viopath.o -obj-$(CONFIG_LPARCFG) += lparcfg.o +obj-$(CONFIG_LPARCFG) += lparcfg.o profile.o obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_BOOTX_TEXT) += btext.o diff -Nru a/arch/ppc64/kernel/profile.c b/arch/ppc64/kernel/profile.c --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/arch/ppc64/kernel/profile.c Tue Mar 16 16:46:34 2004 @@ -0,0 +1,81 @@ +/* + * PPC64 Cpu util performace monitoring. + * + * Manish Ahuja mahuja at us.ibm.com + * Copyright (c) 2004 Manish Ahuja IBM CORP. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This file will also report many of the perf values for 2.6 + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "profile.h" + +#define SAMPLE_TICK HZ + +DEFINE_PER_CPU(struct cpu_util_store, cpu_util_sampler); + +/* + * This is a timer handler. There is on per CPU. It gets scheduled + * every SAMPLE_TICK ticks. + */ + +static void util_timer_func(unsigned long data) +{ + struct cpu_util_store * cus = &__get_cpu_var(cpu_util_sampler); + struct timer_list *tl = &cus->my_timer; + + if (PVR_VER(systemcfg->processor) == PV_POWER5) { + cus->current_purr = mfspr(PURR); + cus->tb = mftb(); + } + /*printk(KERN_INFO "PURR VAL %ld %lld %lld\n", data, cus->current_purr, cus->tb);*/ + + mod_timer(tl, jiffies + SAMPLE_TICK); +} + +/* + * One time function that gets called when all the cpu's are online + * to start collection. It adds the timer to each cpu on the system. + * start_purr is collected during smp_init time in __cpu_up code + */ + +static void start_util_timer(int cpu) +{ + struct cpu_util_store * cus = &per_cpu(cpu_util_sampler, cpu); + struct timer_list *tl = &cus->my_timer; + + if (tl->function == NULL) { + init_timer(tl); + tl->expires = jiffies + SAMPLE_TICK; + tl->data = cpu; + tl->function = util_timer_func; + add_timer_on(tl, cpu); + } +} + +int __init cpu_util_init(void) +{ + int cpu; + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + if (cpu_online(cpu)) + start_util_timer(cpu); + } + + return 0; +} + +__initcall(cpu_util_init); diff -Nru a/arch/ppc64/kernel/profile.h b/arch/ppc64/kernel/profile.h --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/arch/ppc64/kernel/profile.h Tue Mar 16 16:46:34 2004 @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2004 Manish Ahuja + * + * Module name: profile.h + * + * Description: + * Architecture- / platform-specific boot-time initialization code for + * tracking purr utilization and other performace features in coming + * releases for splpar/smt machines. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define PURR 309 + +DECLARE_PER_CPU(struct cpu_util_store, cpu_util_sampler); + +struct cpu_util_store { + struct timer_list my_timer; + unsigned long long start_purr; + unsigned long long current_purr; + unsigned long long tb; +}; + diff -Nru a/arch/ppc64/kernel/smp.c b/arch/ppc64/kernel/smp.c --- a/arch/ppc64/kernel/smp.c Tue Mar 16 16:46:34 2004 +++ b/arch/ppc64/kernel/smp.c Tue Mar 16 16:46:34 2004 @@ -52,6 +52,7 @@ #include #include #include +#include "profile.h" #ifdef CONFIG_KDB #include @@ -967,6 +968,7 @@ int __devinit __cpu_up(unsigned int cpu) { int c; + struct cpu_util_store * cus = &per_cpu(cpu_util_sampler, cpu); /* At boot, don't bother with non-present cpus -JSCHOPP */ if (!system_running && !cpu_present_at_boot(cpu)) @@ -1001,6 +1003,12 @@ /* wake up cpus */ smp_ops->kick_cpu(cpu); + + /* Collect starting purr */ + if (PVR_VER(systemcfg->processor) == PV_POWER5) { + cus->start_purr = mfspr(PURR); + cus->tb = mftb(); + } /* * wait to see if the cpu made a callin (is actually up). From nfont at austin.ibm.com Wed Mar 17 11:40:53 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Tue, 16 Mar 2004 18:40:53 -0600 Subject: more eeh Message-ID: <1079484053.8840.99.camel@mudbug.austin.ibm.com> Hope I don't spoil anyones dinner with another dose of eeh :) I have attached a patch that generates the hotplug event in the kernel. At least it's supposed to do that. This eliminates the need for any kind of an eeh daemon and any /proc usage (two good things). The code works by calling disable_slot() in the rpaphp driver code (rpaphp_core.c) to have a pci device removed. Looking through the kernel it appears that this is the routine that drives hotplug remove requests that come in through sysfs. The issue I kept running into with this code is that it would either DSI in schedule() or the partition would quit responding when an eeh event is injected. Unfortunately, I have not been able to reproduce the DSI in schedule to give you any debugging info. Any comments I could get on the code would be greatly appreciated. Namely, is this the correct way to generate a hotplug event, or is there another interface I should use. And, is this safe? This causes the removal of a slot in the middle of a read on the slot. thanks. Nathan F. -- -------------- next part -------------- A non-text attachment was scrubbed... Name: eeh_khotplug.patch Type: text/x-patch Size: 4888 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040316/73eb434e/attachment.bin From haveblue at us.ibm.com Wed Mar 17 12:30:03 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Tue, 16 Mar 2004 17:30:03 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: References: Message-ID: <1079487002.20826.1151.camel@nighthawk> On Tue, 2004-03-16 at 15:38, ahuja at austin.ibm.com wrote: > This patch adds the framework required by performace team and on demand > computing. At this point only the important bits/framework are covered. > > All the kobjects/calculations are yet to be written. > > We are still contiunuing to disscuss methods for performace monitoring. > > Purr is a vcpu performance counter. This collects purr/tb > periodically to be used later in computations from any given > cpu without needing to acquire a cpu exclusively. Your code is very tidy, but I do have a few questions. How does this complement the existing oprofile performance counter interface which is already included in the kernel? You probably want to have your own config symbol for these obj-$(CONFIG_LPARCFG) += lparcfg.o profile.o For instance, since your code appears to only work on certain CPUs, you might want to do this in arch/ppc64/Kconfig: config POWER5_PROFILE... bool depends on LPARCFG && YOUR_CHIP_MODEL_HERE... Also, do you really want to call your files profile.[ch]? Isn't it just for specific CPU models? Is the built-in timer functionality in the kernel accurate enough for what you want? Are you sure you really want a new timer, instead of just plugging into the existing timer interrupt like ppc64_do_profile()? Instead of having the entire functional part of a function body indented, like in start_util_timer(), I like seeing something like this: if (tl->function != NULL) return; It saves a lot of indenting mess, and I think it looks better. Also, how can start_util_timer() get called twice? Shouldn't there simply be a BUG() if that happens? Should you really be compensating for that? I think this code really needs to go out into another function int __devinit __cpu_up(unsigned int cpu) ... /* Collect starting purr */ if (PVR_VER(systemcfg->processor) == PV_POWER5) { cus->start_purr = mfspr(PURR); cus->tb = mftb(); } There's no need to put performance counter initialization inline with the rest of the cpu bringup code. I'd at least put that code into a function that goes into your profile.c, and is called from the CPU bringup code. Are these struct cpu_util_store { struct timer_list my_timer; unsigned long long start_purr; unsigned long long current_purr; unsigned long long tb; }; really "unsigned long long", or are they always a particular size? If they're a constant size in the hardware, it's probably best to use the u64 notation. Also, my_timer is probably a bad variable name. You might try something like "timer" or cpu_util_timer. The 'my_' definitely needs to go. Also, some other questions to think about as you develop this further: How easy will it be to expand this code to sample other performance counters? Can this code be used with other CPUs types? Is this code compatible with hotplug CPUs? Can you get the CPU to generate interrupts when the counter overflows, instead of sampling? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Mar 17 12:33:38 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Tue, 16 Mar 2004 17:33:38 -0800 Subject: more eeh In-Reply-To: <1079484053.8840.99.camel@mudbug.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> Message-ID: <1079487217.20826.1154.camel@nighthawk> > BUID_LO(dn->phb->buid), NULL, 0, > - __pa(slot_err_buf), > + __pa(slot_errbuf), > RTAS_ERROR_LOG_MAX, Are you sure you want __pa there? Isn't virt_to_phys() more applicable? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Mar 17 12:43:28 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 17 Mar 2004 12:43:28 +1100 Subject: more eeh In-Reply-To: <1079487217.20826.1154.camel@nighthawk> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <1079487217.20826.1154.camel@nighthawk> Message-ID: <20040317014328.GB19737@krispykreme> > Are you sure you want __pa there? Isn't virt_to_phys() more applicable? Im not convinced :) static inline unsigned long virt_to_phys(volatile void * address) { #ifdef __IO_DEBUG printk("virt_to_phys: 0x%08lx -> 0x%08lx\n", (unsigned long) address, __pa((unsigned long)address)); #endif return __pa((unsigned long)address); } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Mar 17 12:58:41 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Tue, 16 Mar 2004 17:58:41 -0800 Subject: __pa() vs. virt_to_phys() In-Reply-To: <20040317014328.GB19737@krispykreme> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <1079487217.20826.1154.camel@nighthawk> <20040317014328.GB19737@krispykreme> Message-ID: <1079488720.20826.1163.camel@nighthawk> On Tue, 2004-03-16 at 17:43, Anton Blanchard wrote: > > Are you sure you want __pa there? Isn't virt_to_phys() more applicable? > > Im not convinced :) > > static inline unsigned long virt_to_phys(volatile void * address) > { > #ifdef __IO_DEBUG > printk("virt_to_phys: 0x%08lx -> 0x%08lx\n", > (unsigned long) address, > __pa((unsigned long)address)); > #endif > return __pa((unsigned long)address); > } __pa() is simply supposed to be the addr-PAGE_OFFSET calculation. virt_to_phys() will be guaranteed to take care of any layout changes if kernel addresses ever fail to be mapped flat, and 1:1 with the physical address layout. So, let's say that someone is working on ... say ... memory hotplug. They will be modifying the virt_to_phys() function to make up for any weird mappings that are going on. But, they'll leave __{v,p}a alone, because those are used for stuff that occurs very early, even at compile time. More virt_to_phys() and less __pa() will save me lots of auditing later on :) If you're not in early boot, or really know what you're doing, use virt_to_phys() and cousins. Plus, it's more type safe. -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Wed Mar 17 17:46:03 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 17 Mar 2004 00:46:03 -0600 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040316223712.GB31162@kroah.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> Message-ID: <4057F42B.7030706@austin.ibm.com> Greg KH wrote: > It looks like a portion of the device-tree is in sysfs today, with the > different busses, right? It shouldn't be that big of a leap. One sticking point is that sysfs attribute "show" methods are restricted to returning no more than PAGE_SIZE bytes (according to Documentation/filesystems/sysfs.txt). Assuming that we map Open Firmware properties to sysfs attributes, the device tree representation can easily have attributes in excess of PAGE_SIZE. Some of these are actually important -- the "ibm,drc-indexes" property enumerates resources (e.g. CPUs, memory) that may be acquired from the hypervisor. I believe Anton had one of these that was over 600K the other day... Another possible issue: how stringent is the requirement that attributes' show methods return only ascii text? Many (if not most) of the properties that exist in an OF device tree are just binary blobs, and that's how they are exported to userspace through /proc/device-tree. Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Wed Mar 17 22:07:55 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 17 Mar 2004 22:07:55 +1100 Subject: more eeh In-Reply-To: <1079484053.8840.99.camel@mudbug.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> Message-ID: <16472.12683.301713.2497@cargo.ozlabs.ibm.com> Nathan Fontenot writes: > Any comments I could get on the code would be greatly appreciated. > Namely, is this the correct way to generate a hotplug event, or > is there another interface I should use. And, is this safe? This > causes the removal of a slot in the middle of a read on the slot. Is this potentially calling disable_slot() at interrupt level (e.g. if the driver is doing a readl() at interrupt level)? If so, I think that's probably a bad idea. I think it would be better to do it at process level, using a workqueue or something similar. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Thu Mar 18 01:57:18 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 17 Mar 2004 14:57:18 +0000 Subject: _IO_IS_ISA question In-Reply-To: <1079419858.1967.237.camel@gaston> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> Message-ID: <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> > Yes. The "ISA" IO space is just a a subset of the PCI space. If this is > not the case, then the code is bogus. After further investigation, I realize why this was done. What will happen is that a ISA device like the 8250 serial will try doing inb/outb's to the ISA space to detect the device. This will cause a page fault and the kernel will attempt to create a PTE for this page. On a hypervisor system, if the ISA device is not assigned to that partition, the address will cause the H_ENTER to fail with a H_PARM. This will cause the kernel to panic. So this code was blocking ISA IO range accesses when there wasn't a ISA bus. I wrote a patch to create a valid mask of pages that a PCI device can access inside the ISA IO range. It will punch holes into this mask while probing the PCI devices. FW must either give us full access to a page, or not. So we do not have to worry about a valid PCI device IO range and a invalid ISA device IO range overlapping within the same page. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-io-page-mask-1.patch Type: text/x-patch Size: 6344 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040317/2e340882/attachment.bin From brking at us.ibm.com Thu Mar 18 03:11:47 2004 From: brking at us.ibm.com (Brian King) Date: Wed, 17 Mar 2004 10:11:47 -0600 Subject: OpenFirmware devices and hotplug events References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> Message-ID: <405878C3.3030407@us.ibm.com> Nathan Lynch wrote: > One sticking point is that sysfs attribute "show" methods are restricted > to returning no more than PAGE_SIZE bytes (according to > Documentation/filesystems/sysfs.txt). Assuming that we map Open > Firmware properties to sysfs attributes, the device tree representation > can easily have attributes in excess of PAGE_SIZE. Some of these are > actually important -- the "ibm,drc-indexes" property enumerates > resources (e.g. CPUs, memory) that may be acquired from the hypervisor. > I believe Anton had one of these that was over 600K the other day... > > Another possible issue: how stringent is the requirement that > attributes' show methods return only ascii text? Many (if not most) of > the properties that exist in an OF device tree are just binary blobs, > and that's how they are exported to userspace through /proc/device-tree. It is possible to create binary sysfs attributes. Look at request_firmware. These also do not have the PAGE_SIZE limitation on them, however, they get read a page at a time, so this only works for fairly static data and not for data that is in constant flux. -- Brian King eServer Storage I/O IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 03:16:28 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 08:16:28 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <4057F42B.7030706@austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> Message-ID: <1079540187.5789.26.camel@nighthawk> On Tue, 2004-03-16 at 22:46, Nathan Lynch wrote: > Greg KH wrote: > > It looks like a portion of the device-tree is in sysfs today, with the > > different busses, right? It shouldn't be that big of a leap. > > One sticking point is that sysfs attribute "show" methods are restricted > to returning no more than PAGE_SIZE bytes (according to > Documentation/filesystems/sysfs.txt). Assuming that we map Open > Firmware properties to sysfs attributes, the device tree representation > can easily have attributes in excess of PAGE_SIZE. Some of these are > actually important -- the "ibm,drc-indexes" property enumerates > resources (e.g. CPUs, memory) that may be acquired from the hypervisor. > I believe Anton had one of these that was over 600K the other day... The thing to remember here is that we're not suggesting that the entire /proc/device-tree be copied, verbatim, to sysfs. The "ibm,drc-indexes" is a single node in the OpenFirmware tree, but why does it need to be one in the sysfs tree? If it enumerates CPUs and memory, then each unit of enumeration would be a likely cantidate for a sysfs entry. > Another possible issue: how stringent is the requirement that > attributes' show methods return only ascii text? Many (if not most) of > the properties that exist in an OF device tree are just binary blobs, > and that's how they are exported to userspace through /proc/device-tree. I think that the ASCII requirement is much more stringent for things that the kernel itself produces. For instance, since the kernel is generating the device number itself, the device number is exported as ASCII. If a device has a binary property that doesn't make sense for a human to read, and userspace wants to deal with it in binary, then I'd say that it's a pretty strong case to have the kernel leave it the hell alone and simply pass it along. However, beware of files that aggregate different kinds of data like "ibm,drc-indexes" that you mentioned above. Those need to be broken up. Another idea would be to have sysfs enumerate the structure of the OpenFirmware tree, but have a device driver or separate filesystem actually deal with getting data in and out. Sysfs would describe where and what to look for, but some other mechanism would be responsible for actually dishing the data out. Remember, sysfs is generally there to *describe*. -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:08:45 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:08:45 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <4057F42B.7030706@austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> Message-ID: <20040317170845.GD17740@kroah.com> On Wed, Mar 17, 2004 at 12:46:03AM -0600, Nathan Lynch wrote: > Greg KH wrote: > >It looks like a portion of the device-tree is in sysfs today, with the > >different busses, right? It shouldn't be that big of a leap. > > One sticking point is that sysfs attribute "show" methods are restricted > to returning no more than PAGE_SIZE bytes (according to > Documentation/filesystems/sysfs.txt). Assuming that we map Open > Firmware properties to sysfs attributes, the device tree representation > can easily have attributes in excess of PAGE_SIZE. Some of these are > actually important -- the "ibm,drc-indexes" property enumerates > resources (e.g. CPUs, memory) that may be acquired from the hypervisor. > I believe Anton had one of these that was over 600K the other day... Then use the binary file type for sysfs. That is allowed for binary blobs which the kernel does not translate at all, it is merely a conduit to the raw data. Firmware devices do this, as well as some EEPROM i2c devices, and some BIOS data on i386 boxes. So it would be very simple to move to sysfs... thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:10:27 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:10:27 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079540187.5789.26.camel@nighthawk> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> Message-ID: <20040317171027.GE17740@kroah.com> On Wed, Mar 17, 2004 at 08:16:28AM -0800, Dave Hansen wrote: > On Tue, 2004-03-16 at 22:46, Nathan Lynch wrote: > > Greg KH wrote: > > > It looks like a portion of the device-tree is in sysfs today, with the > > > different busses, right? It shouldn't be that big of a leap. > > > > One sticking point is that sysfs attribute "show" methods are restricted > > to returning no more than PAGE_SIZE bytes (according to > > Documentation/filesystems/sysfs.txt). Assuming that we map Open > > Firmware properties to sysfs attributes, the device tree representation > > can easily have attributes in excess of PAGE_SIZE. Some of these are > > actually important -- the "ibm,drc-indexes" property enumerates > > resources (e.g. CPUs, memory) that may be acquired from the hypervisor. > > I believe Anton had one of these that was over 600K the other day... > > The thing to remember here is that we're not suggesting that the entire > /proc/device-tree be copied, verbatim, to sysfs. Why not? That's the proper place for it, if you really need it. > The "ibm,drc-indexes" is a single node in the OpenFirmware tree, but > why does it need to be one in the sysfs tree? If it enumerates CPUs > and memory, then each unit of enumeration would be a likely cantidate > for a sysfs entry. They probably aready are, just like all of the pci devices and other system info that is aready in the /sys/bus tree. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:23:27 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:23:27 -0800 Subject: more eeh In-Reply-To: <1079484053.8840.99.camel@mudbug.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> Message-ID: <20040317172327.GA18810@kroah.com> On Tue, Mar 16, 2004 at 06:40:53PM -0600, Nathan Fontenot wrote: > Hope I don't spoil anyones dinner with another dose of eeh :) > > I have attached a patch that generates the hotplug event in the > kernel. At least it's supposed to do that. This eliminates the > need for any kind of an eeh daemon and any /proc usage (two good > things). No, you aren't generating a hotplug event here, you are instantly shutting the power to the device off, after telling the driver bound to the device to disconnect. Is that what you really want to do? It's quite severe, and is a pretty harsh policy. Think scsi devices with lots of filesystems mounted. boom. Think multiport ethernet devices with loads of network traffic going over the other ethernet devices. boom. Both of those are not very "robust" solutions I might point out :) It's also not going to work, as you are doing this from interrupt context, and the pci disconnect sequence is expecting to have a task context and will sleep. Why not do this (as this is what I think Anton was suggesting you do): - get eeh event - determine which pci_dev this happened to. - switch back to a task context - call kobject_hotplug for the pci_dev with the action="fault" - put a script in /etc/hotplug.d/pci/ that catches all ACTION=fault events and decides what to do with them. You have a full pointer to the sysfs directory of the pci device at this moment in time, so you can see what driver is bound to the device, and if you really want to, you can turn the device off (after bringing down the network connection or unmounting any attached filesystems.) This pushes all of your policy to userspace, allows you to fit into the proper kernel event notifier, and allows you to write a shell script if you want to do so. And it makes the kernel code a whole lot smaller and simpler. Sound good? thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 04:23:33 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 09:23:33 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040317171027.GE17740@kroah.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> Message-ID: <1079544213.5789.190.camel@nighthawk> On Wed, 2004-03-17 at 09:10, Greg KH wrote: > On Wed, Mar 17, 2004 at 08:16:28AM -0800, Dave Hansen wrote: > > The thing to remember here is that we're not suggesting that the entire > > /proc/device-tree be copied, verbatim, to sysfs. > > Why not? That's the proper place for it, if you really need it. Could someone elaborate a little more about "ibm,drc-indexes"? I was imagining that it is just metadata, and likely to be best represented as properties for other devices, not a device itself. But, I obviously don't know what I'm talking about. :) But, I guess the OF tree could be considered a system device, all by itself. Then, it's free to use whatever layout it desires, right? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:27:13 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:27:13 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: References: Message-ID: <20040317172713.GB18810@kroah.com> On Tue, Mar 16, 2004 at 05:38:06PM -0600, ahuja at austin.ibm.com wrote: > > > This patch adds the framework required by performace team and on demand > computing. At this point only the important bits/framework are covered. Is this what the ckrm people need? If not, that is what the kernel is going to be supporting soon, so I would suggest you look into their solution. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 04:32:16 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 09:32:16 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: <20040317172713.GB18810@kroah.com> References: <20040317172713.GB18810@kroah.com> Message-ID: <1079544736.5789.208.camel@nighthawk> On Wed, 2004-03-17 at 09:27, Greg KH wrote: > On Tue, Mar 16, 2004 at 05:38:06PM -0600, ahuja at austin.ibm.com wrote: > > This patch adds the framework required by performace team and on demand > > computing. At this point only the important bits/framework are covered. > > Is this what the ckrm people need? If not, that is what the kernel is > going to be supporting soon, so I would suggest you look into their > solution. What are the CKRM people doing? Scheduling classes based on usage of certain performance counters? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:34:28 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:34:28 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079544213.5789.190.camel@nighthawk> References: <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> Message-ID: <20040317173428.GB19060@kroah.com> On Wed, Mar 17, 2004 at 09:23:33AM -0800, Dave Hansen wrote: > On Wed, 2004-03-17 at 09:10, Greg KH wrote: > > On Wed, Mar 17, 2004 at 08:16:28AM -0800, Dave Hansen wrote: > > > The thing to remember here is that we're not suggesting that the entire > > > /proc/device-tree be copied, verbatim, to sysfs. > > > > Why not? That's the proper place for it, if you really need it. > > Could someone elaborate a little more about "ibm,drc-indexes"? I was > imagining that it is just metadata, and likely to be best represented as > properties for other devices, not a device itself. But, I obviously > don't know what I'm talking about. :) > > But, I guess the OF tree could be considered a system device, all by > itself. Then, it's free to use whatever layout it desires, right? Yes, that's exactly what /sys/firmware is for. A userspace representation of the machine's firmware. Look at a i386 box running ACPI for an example of just that. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 04:36:23 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 09:36:23 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: <1079544736.5789.208.camel@nighthawk> References: <20040317172713.GB18810@kroah.com> <1079544736.5789.208.camel@nighthawk> Message-ID: <20040317173623.GC19060@kroah.com> On Wed, Mar 17, 2004 at 09:32:16AM -0800, Dave Hansen wrote: > On Wed, 2004-03-17 at 09:27, Greg KH wrote: > > On Tue, Mar 16, 2004 at 05:38:06PM -0600, ahuja at austin.ibm.com wrote: > > > This patch adds the framework required by performace team and on demand > > > computing. At this point only the important bits/framework are covered. > > > > Is this what the ckrm people need? If not, that is what the kernel is > > going to be supporting soon, so I would suggest you look into their > > solution. > > What are the CKRM people doing? Scheduling classes based on usage of > certain performance counters? They are doing "resource management" type stuff for almost everything in the kernel, including scheduling and performance monitoring. See their documentation for more details, but it sounds like it is exactly what this author is looking for. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at us.ibm.com Thu Mar 18 05:33:19 2004 From: johnrose at us.ibm.com (John H Rose) Date: Wed, 17 Mar 2004 12:33:19 -0600 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079540187.5789.26.camel@nighthawk> Message-ID: > The "ibm,drc-indexes" is a single node in the OpenFirmware tree, but > why does it need to be one in the sysfs tree? Currently, /proc/device-tree lays things out in a way that is consistent w/ device tree structure. One directory per node, one file per property. What you're describing would obscure this. IMHO, there's something to be said for having a faithful representation of the kernel's device tree in userspace, even if some consider procfs passe. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Thu Mar 18 05:35:46 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 17 Mar 2004 12:35:46 -0600 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079544213.5789.190.camel@nighthawk> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> Message-ID: <40589A82.2050204@austin.ibm.com> Dave Hansen wrote: > Could someone elaborate a little more about "ibm,drc-indexes"? I was > imagining that it is just metadata, and likely to be best represented as > properties for other devices, not a device itself. A "drc index" is a token (a 32-bit int, iirc) that identifies a resource that can be acquired from or released to the hypervisor. An ibm,drc-indexes property is just a list of such tokens. There is an ibm,drc-indexes property at each level of the device tree where resources can be added or removed. For instance, there is one in /proc/device-tree/cpus. There is also an ibm,drc-indexes property at the root of the device tree; this one is for memory nodes. I believe PCI host bridges also have the property for logical slots. Each OF node (or "device") that can be released to the hypervisor has an ibm,my-drc-index property. The value of this property should correspond to an element in the ibm,drc-indexes property of the device's parent node. So, for example, on my 2-way partition on a 32 cpu p690, we have: # xxd /proc/device-tree/cpus/PowerPC,POWER4 at 2/ibm,my-drc-index 0000000: 0000 1000 # xxd /proc/device-tree/cpus/PowerPC,POWER4 at b/ibm,my-drc-index 0000000: 0000 1001 # xxd /proc/device-tree/cpus/ibm,drc-indexes 0000000: 0000 0020 0000 1000 0000 1001 0000 1002 ... ............ 0000010: 0000 1003 0000 1004 0000 1005 0000 1006 ................ 0000020: 0000 1007 0000 1008 0000 1009 0000 100a ................ 0000030: 0000 100b 0000 100c 0000 100d 0000 100e ................ 0000040: 0000 100f 0000 1010 0000 1011 0000 1012 ................ 0000050: 0000 1013 0000 1014 0000 1015 0000 1016 ................ 0000060: 0000 1017 0000 1018 0000 1019 0000 101a ................ 0000070: 0000 101b 0000 101c 0000 101d 0000 101e ................ 0000080: 0000 101f .... The first 32-bit quantity (0x20) in ibm,drc-indexes is the total number of indexes, btw. So when we want to add a cpu to the kernel's resource pool, we go down the list of drc indexes, making a query to the hv for each index that we don't already own to see if it's available. Does this clear things up? Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 05:53:15 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 10:53:15 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <40589A82.2050204@austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <40589A82.2050204@austin.ibm.com> Message-ID: <1079549594.5789.414.camel@nighthawk> On Wed, 2004-03-17 at 10:35, Nathan Lynch wrote: > Each OF node (or "device") that can be released to the hypervisor has an > ibm,my-drc-index property. The value of this property should correspond > to an element in the ibm,drc-indexes property of the device's parent node. ... > So when we want to add a cpu to the kernel's resource pool, we go down > the list of drc indexes, making a query to the hv for each index that we > don't already own to see if it's available. > > Does this clear things up? Yes, it gives me a much clearer picture of what is going on. Thanks. I see what the data are for, but not why they are exposed to userspace. Do applications make the decisions about which drc index to use, or is that all done inside the kernel? If each ibm,my-drc-index were suddenly removed today, what applications would break? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 18 05:56:03 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 17 Mar 2004 12:56:03 -0600 Subject: Patch: cpu utilization monitor. In-Reply-To: <20040317173623.GC19060@kroah.com>; from greg@kroah.com on Wed, Mar 17, 2004 at 09:36:23AM -0800 References: <20040317172713.GB18810@kroah.com> <1079544736.5789.208.camel@nighthawk> <20040317173623.GC19060@kroah.com> Message-ID: <20040317125603.C33924@forte.austin.ibm.com> On Wed, Mar 17, 2004 at 09:36:23AM -0800, Greg KH wrote: > > On Wed, Mar 17, 2004 at 09:32:16AM -0800, Dave Hansen wrote: > > On Wed, 2004-03-17 at 09:27, Greg KH wrote: > > > On Tue, Mar 16, 2004 at 05:38:06PM -0600, ahuja at austin.ibm.com wrote: > > > > This patch adds the framework required by performace team and on demand > > > > computing. At this point only the important bits/framework are covered. > > > > > > Is this what the ckrm people need? If not, that is what the kernel is > > > going to be supporting soon, so I would suggest you look into their > > > solution. > > > > What are the CKRM people doing? Scheduling classes based on usage of > > certain performance counters? > > They are doing "resource management" type stuff for almost everything in > the kernel, including scheduling and performance monitoring. See their > documentation for more details, but it sounds like it is exactly what > this author is looking for. This patch differs from other efforts in that it gets data directly from the hypervisor. Think multiple virtual cpus running on one physical cpu. The traditional tools, whether CKRM or top or vmstat, are blind to the fact that any given 'virtual cpu' might be getting only 10% of the physical cycles in one hypervisor time-slice, and 90% in another. Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may think its 100% busy, but in fact it might be getting only 1% of the actual physical hardware cycles. The goal here is to be able to report the fraction of the total physical cycles, and do so on a HZ or even sub-HZ level of granularity. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 05:56:27 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 10:56:27 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: References: Message-ID: <1079549787.5789.425.camel@nighthawk> On Wed, 2004-03-17 at 10:33, John H Rose wrote: > > The "ibm,drc-indexes" is a single node in the OpenFirmware tree, > > but why does it need to be one in the sysfs tree? > > Currently, /proc/device-tree lays things out in a way that is > consistent w/ device tree structure. One directory per node, one file > per property. What you're describing would obscure this. IMHO, there's > something to be said for having a faithful representation of the > kernel's device tree in userspace, even if some consider procfs passe. OK, got it. See Greg's comment about the firmware interfaces and sysfs. BTW, there is a relatively well placed set of people that also consider /proc passe, especially new interfaces that use or encourage its use. Those people stand a good chance of keeping any unjustified expanding use of /proc out of the kernel. Is this a concern, or is it more important to get features into distribution kernels, and leave mainline until later? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 06:13:59 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 11:13:59 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: <20040317125603.C33924@forte.austin.ibm.com> References: <20040317172713.GB18810@kroah.com> <1079544736.5789.208.camel@nighthawk> <20040317173623.GC19060@kroah.com> <20040317125603.C33924@forte.austin.ibm.com> Message-ID: <1079550838.5789.461.camel@nighthawk> On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: > This patch differs from other efforts in that it gets data directly from > the hypervisor. Think multiple virtual cpus running on one physical cpu. > The traditional tools, whether CKRM or top or vmstat, are blind to the > fact that any given 'virtual cpu' might be getting only 10% of the physical > cycles in one hypervisor time-slice, and 90% in another. > > Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may > think its 100% busy, but in fact it might be getting only 1% of the actual > physical hardware cycles. The goal here is to be able to report the > fraction of the total physical cycles, and do so on a HZ or even sub-HZ > level of granularity. But, the number is still just another performance counter, right? Is the interface to fetch it the same as the other CPU performance counters? I think what Greg was getting at is that CKRM aims to be able to make resource decisions based on data it gets from all kinds of sources, including performance counters. If you export this 'virtual cpu' slice in the same way that other CKRM-handled data are, then you can probably access it in whatever way you wanted, and you get the code reuse benefit of using the rest of the CKRM work. Shailabh, am I on the right track here? I'm kinda guessing at what the CKRM goals are here. What is the planned use of this counter? Will it simply be exported to userspace, or will the kernel need it internally for something? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kravetz at us.ibm.com Thu Mar 18 06:28:55 2004 From: kravetz at us.ibm.com (Mike Kravetz) Date: Wed, 17 Mar 2004 11:28:55 -0800 Subject: Patch: cpu utilization monitor. In-Reply-To: <1079550838.5789.461.camel@nighthawk> References: <20040317172713.GB18810@kroah.com> <1079544736.5789.208.camel@nighthawk> <20040317173623.GC19060@kroah.com> <20040317125603.C33924@forte.austin.ibm.com> <1079550838.5789.461.camel@nighthawk> Message-ID: <20040317192855.GD5538@w-mikek2.beaverton.ibm.com> On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote: > > On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: > > This patch differs from other efforts in that it gets data directly from > > the hypervisor. Think multiple virtual cpus running on one physical cpu. > > The traditional tools, whether CKRM or top or vmstat, are blind to the > > fact that any given 'virtual cpu' might be getting only 10% of the physical > > cycles in one hypervisor time-slice, and 90% in another. > > > > Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may > > think its 100% busy, but in fact it might be getting only 1% of the actual > > physical hardware cycles. The goal here is to be able to report the > > fraction of the total physical cycles, and do so on a HZ or even sub-HZ > > level of granularity. > > But, the number is still just another performance counter, right? Is > the interface to fetch it the same as the other CPU performance > counters? > > I think what Greg was getting at is that CKRM aims to be able to make > resource decisions based on data it gets from all kinds of sources, > including performance counters. If you export this 'virtual cpu' slice > in the same way that other CKRM-handled data are, then you can probably > access it in whatever way you wanted, and you get the code reuse benefit > of using the rest of the CKRM work. Shailabh, am I on the right track > here? I'm kinda guessing at what the CKRM goals are here. > > What is the planned use of this counter? Will it simply be exported to > userspace, or will the kernel need it internally for something? > Actually, this type of data sounds like something that (forgive me for mentioning this!!!) the IBM eWLM product would want to know. I don't think CKRM, or the OS can do much with this type of data except report it for further analysis. More interesting is what something that let's say 'controls the entire machine' can do with this data. For example, one OS isn't getting enough CPU cycles and another OS has excess cycles. Let's turn the knobs to balance things out at the machine/hypervisor level. Perhaps this is what was meant by Linas's original reference to 'on demand'? -- Mike ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Thu Mar 18 06:38:24 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 17 Mar 2004 13:38:24 -0600 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079549594.5789.414.camel@nighthawk> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <40589A82.2050204@austin.ibm.com> <1079549594.5789.414.camel@nighthawk> Message-ID: <4058A930.40408@austin.ibm.com> Dave Hansen wrote: > I see what the data are for, but not why they are exposed to userspace. Well, everything that firmware places in the device tree is exposed, for better or worse. > Do applications make the decisions about which drc index to use, or is > that all done inside the kernel? If each ibm,my-drc-index were > suddenly removed today, what applications would break? If not explicitly specified, the decision about which drc index to add or remove is made by a userspace program which is invoked by the machine's management console (the HMC). So that would break. Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Thu Mar 18 06:46:33 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Wed, 17 Mar 2004 11:46:33 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <4058A930.40408@austin.ibm.com> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <40589A82.2050204@austin.ibm.com> <1079549594.5789.414.camel@nighthawk> <4058A930.40408@austin.ibm.com> Message-ID: <1079552793.5789.502.camel@nighthawk> On Wed, 2004-03-17 at 11:38, Nathan Lynch wrote: > Dave Hansen wrote: > > I see what the data are for, but not why they are exposed to userspace. > > Well, everything that firmware places in the device tree is exposed, for > better or worse. OK, I was just wondering what it was for. According to Greg, it doesn't look like exporting large binary blobs like that is really a problem, so I'm in search of a solution without anything to solve :) > > Do applications make the decisions about which drc index to use, or is > > that all done inside the kernel? If each ibm,my-drc-index were > > suddenly removed today, what applications would break? > > If not explicitly specified, the decision about which drc index to add > or remove is made by a userspace program which is invoked by the > machine's management console (the HMC). So that would break. Really? I didn't know that the HMC had any access to the OS on the machines that it is controlling, other than through hypervisor events. Can you give me a better description of what this interface is, or how it works? We are talking about a userspace program running on the LPAR or full system, not on the HMC, right? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ahuja at austin.ibm.com Thu Mar 18 06:51:58 2004 From: ahuja at austin.ibm.com (ahuja at austin.ibm.com) Date: Wed, 17 Mar 2004 13:51:58 -0600 (CST) Subject: Patch: cpu utilization monitor. In-Reply-To: <20040317192855.GD5538@w-mikek2.beaverton.ibm.com> Message-ID: Thanks for the comments everyone. Like linas said earlier, the value getting reported by OS whether the cpu is 100% busy or 50% busy does not hold any relation to the actual physical CPU allocated to it anymore. I am attempting to normalize the value that the OS reports to the actual cpu use and give a more accurate picture to other tools/user space. Now there are couple of different requirements and I hope to get to all of them as this progresses. I will try and rectify the code from the comments I have received so far. I did give CKRM a cursory glance, not sure that I am duplicating effort here. But let me look further on that. Thanks, Manish On Wed, 17 Mar 2004, Mike Kravetz wrote: > > On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote: > > > > On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: > > > This patch differs from other efforts in that it gets data directly from > > > the hypervisor. Think multiple virtual cpus running on one physical cpu. > > > The traditional tools, whether CKRM or top or vmstat, are blind to the > > > fact that any given 'virtual cpu' might be getting only 10% of the physical > > > cycles in one hypervisor time-slice, and 90% in another. > > > > > > Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may > > > think its 100% busy, but in fact it might be getting only 1% of the actual > > > physical hardware cycles. The goal here is to be able to report the > > > fraction of the total physical cycles, and do so on a HZ or even sub-HZ > > > level of granularity. > > > > But, the number is still just another performance counter, right? Is > > the interface to fetch it the same as the other CPU performance > > counters? > > > > I think what Greg was getting at is that CKRM aims to be able to make > > resource decisions based on data it gets from all kinds of sources, > > including performance counters. If you export this 'virtual cpu' slice > > in the same way that other CKRM-handled data are, then you can probably > > access it in whatever way you wanted, and you get the code reuse benefit > > of using the rest of the CKRM work. Shailabh, am I on the right track > > here? I'm kinda guessing at what the CKRM goals are here. > > > > What is the planned use of this counter? Will it simply be exported to > > userspace, or will the kernel need it internally for something? > > > > Actually, this type of data sounds like something that (forgive me > for mentioning this!!!) the IBM eWLM product would want to know. > I don't think CKRM, or the OS can do much with this type of data > except report it for further analysis. More interesting is what > something that let's say 'controls the entire machine' can do with > this data. For example, one OS isn't getting enough CPU cycles > and another OS has excess cycles. Let's turn the knobs to balance > things out at the machine/hypervisor level. > > Perhaps this is what was meant by Linas's original reference to > 'on demand'? > > -- > Mike > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jdewand at redhat.com Thu Mar 18 07:32:14 2004 From: jdewand at redhat.com (Julie DeWandel) Date: Wed, 17 Mar 2004 15:32:14 -0500 Subject: [PATCH] ignore huge OF properties References: <20040316070649.GP19737@krispykreme> Message-ID: <4058B5CE.1010503@redhat.com> Hi, In addition to the patch you provided, it is also necessary to ensure that the initrd image cannot be overwritten by calls into prom such as: pp->length = (int)(long) call_prom(RELOC("getprop"), 4, 1, node, namep,valp, mem_end - mem_start); Here, mem_end needs to have been carefully chosen so that it doesn't start somewhere in the middle of the initrd image or past it. However, mem_end is arbitrarily chosen by copy_device_node to be 8MB beyond the starting mem_start value. In code I have been working with, mem_end has landed well into the initrd memory image. The attached patch corrects this problem for the 2.6 ameslab tree. Please consider pushing it to ameslab, as I don't know how to do that. Julie Anton Blanchard wrote: >Hi, > >Im just about to commit this patch. We have some versions of firmware >out there that have huge OF properties. So huge that we end up overwriting >our initrd. > >Place a 1MB limit and warn bitterly if its over this. Also fix a use >of package-to-path where the variable was 64bytes but we would pass in >a length of 255. > >Anton > > > -- Julie DeWandel Red Hat, Inc. Tel (978) 692-3113 x23251 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: initrd_overwrite_fix Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040317/2c097333/attachment.txt From davis at skink.net Thu Mar 18 07:34:50 2004 From: davis at skink.net (davis) Date: Wed, 17 Mar 2004 15:34:50 -0500 Subject: poor performance between a dual ppc970 and a p3 laptop Message-ID: <20040317203450.GC19820@skink.net> Hello I am trying to understand why the code below runs so poorly on a Dual 970 1.6Ghz compared to a P3 700Mhz Laptop. I have tried using a different compiler and I get similar results. I am beginning to think the 970 is somehow configured wrong in some way. Summary: computer -O1 time gcc version p3 700 Mhz 30 secs 3.3.3 (stock debian unstable) dual 970 1.6Ghz 26 secs 3.2 (stock suse 8) dual 970 1.6Ghz 27 secs 3.4 (i built using Janis's script) Kernel Levels: Linux gpul-01 2.4.21-147-pseries64 #1 SMP Linux tadpole 2.4.22-pre2 #14 Details: p3 700Mhz 256 KB cache ---- l3$ gcc -v Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.3/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.3 20040125 (prerelease) (Debian) --- real 0m31.737s user 0m30.760s sys 0m0.070s processor : 0 cpu : PPC970 clock : 1600MHz revision : 2.2 processor : 1 cpu : PPC970 clock : 1600MHz revision : 2.2 timebase : 199836808 machine : CHRP IBM,884221X ----- eading specs from /home/davis/testy/lib/gcc/powerpc-linux/3.4.0/specs Configured with: /home/davis/gcc/gcc-3.4-20040310/configure --prefix=/home/davis/testy --build=powerpc-linux --host=powerpc-linux --target=powerpc-linux --enable-languages=c --disable-nls --disable-multilib --disable-checking Thread model: posix gcc version 3.4.0 20040310 (prerelease) --- real 0m27.219s user 0m25.560s sys 0m0.060s and using a different compiler... ---- Reading specs from /usr/lib/gcc-lib/powerpc-suse-linux/3.2/specs Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --enable-languages=c,c++,f77,objc,java,ada --enable-libgcj --with-gxx-include-dir=/usr/include/g++ --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit powerpc-suse-linux Thread model: posix gcc version 3.2 --- real 0m26.353s user 0m25.670s sys 0m0.040s ---------------------------------------------- #include #include #include #include float foo = 5.0; float goo = 2.3; float cig = 7.3; float gar = 4.2; float car = 9.8; #define array_size 14000 float big_array[array_size]; float small_array[array_size]; void someFun(float var1, float var2, float fY, float fX) { int iY, iX, iN; float temp; for (iY=0; iY<500 ;iY++) { fY = iY*goo + fY - foo/2.0; temp = (fY*var1+cig)/gar + car; for(iX=0; iXarray_size)? array_size : iN; iN = (iN<0)? 0 : iN; big_array[iX] = big_array[iX] + iN*small_array[iN]; } } } void main(void) { int i; float fY = 2.3; float fX = 1/2.3; for (i=0; i of "Wed, 17 Mar 2004 15:34:50 EST." <20040317203450.GC19820@skink.net> References: <20040317203450.GC19820@skink.net> Message-ID: <200403172103.i2HL3vT35248@makai.watson.ibm.com> You do not mention what options you used to compile the application. Did you ask the compiler to tune for PPC970 processor? David ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Thu Mar 18 08:28:07 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Wed, 17 Mar 2004 15:28:07 -0600 Subject: more eeh In-Reply-To: <16472.12683.301713.2497@cargo.ozlabs.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <16472.12683.301713.2497@cargo.ozlabs.ibm.com> Message-ID: <1079558886.19192.18.camel@mudbug.austin.ibm.com> Well, lets see if I can fill everyones mailbox with some more lively eeh discussions again today. :) I decided to follow Paul's suggestion and use a workqueue to handle the hotplug remove of the device. This was before I saw Greg's reply about using kobject_hotplug with a user-space script. The relevant piece is in eeh.c in the attached patch. Greg, you're right that this is a harsh policy. In this case I think it's the right thing. When an EEH event happens the slot is basically, device stores will fail and all reads will return all F's. The idea with this code is to avoid a panic() call and hopefully allow some time to do any cleanup before shutting down. Also, this is why we want to limit this to network devices. This kind of policy for a hard disk would just be begging for data corruption. So, bring on the comments. at least I now know you guys aren't shy. thanks. again. -Nathan F. On Wed, 2004-03-17 at 05:07, Paul Mackerras wrote: > Is this potentially calling disable_slot() at interrupt level (e.g. if > the driver is doing a readl() at interrupt level)? If so, I think > that's probably a bad idea. I think it would be better to do it at > process level, using a workqueue or something similar. > > Paul. -- -------------- next part -------------- A non-text attachment was scrubbed... Name: eeh_khp.patch Type: text/x-patch Size: 7859 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040317/9d868079/attachment.bin From nagar at watson.ibm.com Thu Mar 18 08:41:24 2004 From: nagar at watson.ibm.com (Shailabh Nagar) Date: Wed, 17 Mar 2004 16:41:24 -0500 Subject: Patch: cpu utilization monitor. In-Reply-To: References: Message-ID: <4058C604.3060904@watson.ibm.com> Greg is right and so is Mike... CKRM's province is a single Linux kernel image. So it controls whatever the image gets from the hypervisor and cannot, by itself, be used by users to achieve "true" control. For that you need some workload management middleware (like eWLM) whose control spans LPARS. However, it can report whatever a kernel entity is willing to provide. In CKRM's current design, we have a number that represents the "100%" of a resource (its not 100 for reasons that are not relevant here). This number is currently irrelevant to the user (except as an upper limit for what can be distributed amongst the classes defined - all of them must, obviously, consume less than 100% of the available resource). It would be easy enough to dynamically modify that number to represent the real fraction being served to an OS image, provided CKRM and the consumer of that number (middleware or sysadmin) agree on the units. Manish/Linas, if you're writing the entity to determine the real fraction, there's no duplication of effort. If you're getting into reporting it to higher level users (human or software), you might be - we currently have two kernel-user paths for sending such info up to the user (one for manual users of CKRM, one for middleware). We'll be doing a code drop on lkml in the next day or two so you'll be able to determine for yourself. Up in user space, CKRM's tooling is rudimentary. With the new filesystem API that we're using, its even more likely we'll be leaning towards scripts initially. Naturally, we'd be happy to discuss all this further. The CKRM project has quite a few high-priority stuff on its plate that integration with other projects (such as cpumemsets for NUMA, or yours for LPAR) isn't important yet but if we keep in sync at a high level, it may be possible to avoid duplication/incompatible design choices. Hope this helps, Shailabh ahuja at austin.ibm.com wrote: >Thanks for the comments everyone. > >Like linas said earlier, the value getting reported by OS whether the cpu >is 100% busy or 50% busy does not hold any relation to the actual physical >CPU allocated to it anymore. > >I am attempting to normalize the value that the OS reports to the actual >cpu use and give a more accurate picture to other tools/user space. Now >there are couple of different requirements and I hope to get to all of >them as this progresses. > >I will try and rectify the code from the comments I have received so far. >I did give CKRM a cursory glance, not sure that I am duplicating effort >here. But let me look further on that. > >Thanks, >Manish > > >On Wed, 17 Mar 2004, Mike Kravetz wrote: > > > >>On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote: >> >> >>>On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: >>> >>> >>>>This patch differs from other efforts in that it gets data directly from >>>>the hypervisor. Think multiple virtual cpus running on one physical cpu. >>>>The traditional tools, whether CKRM or top or vmstat, are blind to the >>>>fact that any given 'virtual cpu' might be getting only 10% of the physical >>>>cycles in one hypervisor time-slice, and 90% in another. >>>> >>>>Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may >>>>think its 100% busy, but in fact it might be getting only 1% of the actual >>>>physical hardware cycles. The goal here is to be able to report the >>>>fraction of the total physical cycles, and do so on a HZ or even sub-HZ >>>>level of granularity. >>>> >>>> >>>But, the number is still just another performance counter, right? Is >>>the interface to fetch it the same as the other CPU performance >>>counters? >>> >>>I think what Greg was getting at is that CKRM aims to be able to make >>>resource decisions based on data it gets from all kinds of sources, >>>including performance counters. If you export this 'virtual cpu' slice >>>in the same way that other CKRM-handled data are, then you can probably >>>access it in whatever way you wanted, and you get the code reuse benefit >>>of using the rest of the CKRM work. Shailabh, am I on the right track >>>here? I'm kinda guessing at what the CKRM goals are here. >>> >>>What is the planned use of this counter? Will it simply be exported to >>>userspace, or will the kernel need it internally for something? >>> >>> >>> >>Actually, this type of data sounds like something that (forgive me >>for mentioning this!!!) the IBM eWLM product would want to know. >>I don't think CKRM, or the OS can do much with this type of data >>except report it for further analysis. More interesting is what >>something that let's say 'controls the entire machine' can do with >>this data. For example, one OS isn't getting enough CPU cycles >>and another OS has excess cycles. Let's turn the knobs to balance >>things out at the machine/hypervisor level. >> >>Perhaps this is what was meant by Linas's original reference to >>'on demand'? >> >>-- >>Mike >> >> >> >> >> > > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 18 08:54:02 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 17 Mar 2004 15:54:02 -0600 Subject: Patch: cpu utilization monitor. In-Reply-To: <1079550838.5789.461.camel@nighthawk>; from haveblue@us.ibm.com on Wed, Mar 17, 2004 at 11:13:59AM -0800 References: <20040317172713.GB18810@kroah.com> <1079544736.5789.208.camel@nighthawk> <20040317173623.GC19060@kroah.com> <20040317125603.C33924@forte.austin.ibm.com> <1079550838.5789.461.camel@nighthawk> Message-ID: <20040317155402.D33924@forte.austin.ibm.com> On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote: > On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: > > This patch differs from other efforts in that it gets data directly from > > the hypervisor. Think multiple virtual cpus running on one physical cpu. > > The traditional tools, whether CKRM or top or vmstat, are blind to the > > fact that any given 'virtual cpu' might be getting only 10% of the physical > > cycles in one hypervisor time-slice, and 90% in another. > > > > Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may > > think its 100% busy, but in fact it might be getting only 1% of the actual > > physical hardware cycles. The goal here is to be able to report the > > fraction of the total physical cycles, and do so on a HZ or even sub-HZ > > level of granularity. > > But, the number is still just another performance counter, right? Is Yes. > the interface to fetch it the same as the other CPU performance > counters? Maybe. I don't know where/how top, vmstat, ckrm get thier data. I don't know if the top/vmstat/ckrm authors are even interested in having this data. The current patch merely gathers the data in a 'correct' fashion; there are some users for it, but the last word has not yet been written as to how to present the data. I think Manish is open to suggestion. I for one, don't know how to 'give' the data to ckrm, but maybe that will become clear in a later thread? Yes, eWLM is one of the users; the local performance team is interested too. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ahuja at austin.ibm.com Thu Mar 18 08:57:42 2004 From: ahuja at austin.ibm.com (ahuja at austin.ibm.com) Date: Wed, 17 Mar 2004 15:57:42 -0600 (CST) Subject: Patch: cpu utilization monitor. In-Reply-To: <4058C604.3060904@watson.ibm.com> Message-ID: Shailab, This patch only collects. Does no reporting at all. We can work out stuff offline so that we dont duplicate any efforts at all for reporting any data. Cheers, Manish > Manish/Linas, if you're writing the entity to determine the real > fraction, there's no duplication of effort. If you're getting into > reporting it to higher level users (human or software), you might be - > we currently have two kernel-user paths for sending such > info up to the user (one for manual users of CKRM, one for middleware). > We'll be doing a code drop on lkml in the next day or > two so you'll be able to determine for yourself. > > Up in user space, CKRM's tooling is rudimentary. With the new filesystem > API that we're using, its even more likely we'll be leaning towards > scripts initially. > > Naturally, we'd be happy to discuss all this further. The CKRM project > has quite a few high-priority stuff on its plate that integration with > other projects (such as cpumemsets for NUMA, or yours for LPAR) isn't > important yet but if we keep in sync at a high level, it may be > possible to avoid duplication/incompatible design choices. > > Hope this helps, > Shailabh > > ahuja at austin.ibm.com wrote: > > >Thanks for the comments everyone. > > > >Like linas said earlier, the value getting reported by OS whether the cpu > >is 100% busy or 50% busy does not hold any relation to the actual physical > >CPU allocated to it anymore. > > > >I am attempting to normalize the value that the OS reports to the actual > >cpu use and give a more accurate picture to other tools/user space. Now > >there are couple of different requirements and I hope to get to all of > >them as this progresses. > > > >I will try and rectify the code from the comments I have received so far. > >I did give CKRM a cursory glance, not sure that I am duplicating effort > >here. But let me look further on that. > > > >Thanks, > >Manish > > > > > >On Wed, 17 Mar 2004, Mike Kravetz wrote: > > > > > > > >>On Wed, Mar 17, 2004 at 11:13:59AM -0800, Dave Hansen wrote: > >> > >> > >>>On Wed, 2004-03-17 at 10:56, linas at austin.ibm.com wrote: > >>> > >>> > >>>>This patch differs from other efforts in that it gets data directly from > >>>>the hypervisor. Think multiple virtual cpus running on one physical cpu. > >>>>The traditional tools, whether CKRM or top or vmstat, are blind to the > >>>>fact that any given 'virtual cpu' might be getting only 10% of the physical > >>>>cycles in one hypervisor time-slice, and 90% in another. > >>>> > >>>>Very crudely, its sort-of like VM on the 390/zSeries. Your kernel may > >>>>think its 100% busy, but in fact it might be getting only 1% of the actual > >>>>physical hardware cycles. The goal here is to be able to report the > >>>>fraction of the total physical cycles, and do so on a HZ or even sub-HZ > >>>>level of granularity. > >>>> > >>>> > >>>But, the number is still just another performance counter, right? Is > >>>the interface to fetch it the same as the other CPU performance > >>>counters? > >>> > >>>I think what Greg was getting at is that CKRM aims to be able to make > >>>resource decisions based on data it gets from all kinds of sources, > >>>including performance counters. If you export this 'virtual cpu' slice > >>>in the same way that other CKRM-handled data are, then you can probably > >>>access it in whatever way you wanted, and you get the code reuse benefit > >>>of using the rest of the CKRM work. Shailabh, am I on the right track > >>>here? I'm kinda guessing at what the CKRM goals are here. > >>> > >>>What is the planned use of this counter? Will it simply be exported to > >>>userspace, or will the kernel need it internally for something? > >>> > >>> > >>> > >>Actually, this type of data sounds like something that (forgive me > >>for mentioning this!!!) the IBM eWLM product would want to know. > >>I don't think CKRM, or the OS can do much with this type of data > >>except report it for further analysis. More interesting is what > >>something that let's say 'controls the entire machine' can do with > >>this data. For example, one OS isn't getting enough CPU cycles > >>and another OS has excess cycles. Let's turn the knobs to balance > >>things out at the machine/hypervisor level. > >> > >>Perhaps this is what was meant by Linas's original reference to > >>'on demand'? > >> > >>-- > >>Mike > >> > >> > >> > >> > >> > > > > > > > > > > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 18 09:09:02 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 17 Mar 2004 16:09:02 -0600 Subject: more eeh In-Reply-To: <16472.12683.301713.2497@cargo.ozlabs.ibm.com>; from paulus@samba.org on Wed, Mar 17, 2004 at 10:07:55PM +1100 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <16472.12683.301713.2497@cargo.ozlabs.ibm.com> Message-ID: <20040317160902.E33924@forte.austin.ibm.com> On Wed, Mar 17, 2004 at 10:07:55PM +1100, Paul Mackerras wrote: > > Nathan Fontenot writes: > > > Any comments I could get on the code would be greatly appreciated. > > Namely, is this the correct way to generate a hotplug event, or > > is there another interface I should use. And, is this safe? This > > causes the removal of a slot in the middle of a read on the slot. > > Is this potentially calling disable_slot() at interrupt level (e.g. if > the driver is doing a readl() at interrupt level)? If so, I think > that's probably a bad idea. I think it would be better to do it at > process level, using a workqueue or something similar. Yes, right. FWIW, at some 'future' date, the idea was to have a eeh recovery kernel daemon that would be able to coordinate the recovery of multi-function devices as well as individual devices. I lost time on the blasted superbug, I hope get reinvolved w/ eeh starting about now. After reading my stale email. :-/ --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Thu Mar 18 09:24:07 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 17 Mar 2004 16:24:07 -0600 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079552793.5789.502.camel@nighthawk> References: <20040316182115.GB19290@kroah.com> <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <40589A82.2050204@austin.ibm.com> <1079549594.5789.414.camel@nighthawk> <4058A930.40408@austin.ibm.com> <1079552793.5789.502.camel@nighthawk> Message-ID: <4058D007.7080203@austin.ibm.com> Dave Hansen wrote: >>If not explicitly specified, the decision about which drc index to add >>or remove is made by a userspace program which is invoked by the >>machine's management console (the HMC). So that would break. > > Really? I didn't know that the HMC had any access to the OS on the > machines that it is controlling, other than through hypervisor events. > Can you give me a better description of what this interface is, or how > it works? I'll do my best, hopefully someone will jump in and correct me if I get anything wrong. I believe the HMC communicates with both the hypervisor and the LPAR in question for such an operation. There is a daemon which runs on the LPAR, and the HMC communicates commands to it via the local network. That daemon invokes the program which I was talking about, which goes about acquiring or releasing drc indexes as well as hotplugging the corresponding devices. The daemon reports the results back to the HMC. Hopefully this description isn't too vague; the software I'm describing is not released yet for Linux. > We are talking about a userspace program running on the LPAR or full > system, not on the HMC, right? Right. Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 09:31:02 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 14:31:02 -0800 Subject: more eeh In-Reply-To: <1079558886.19192.18.camel@mudbug.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <16472.12683.301713.2497@cargo.ozlabs.ibm.com> <1079558886.19192.18.camel@mudbug.austin.ibm.com> Message-ID: <20040317223102.GA3627@kroah.com> On Wed, Mar 17, 2004 at 03:28:07PM -0600, Nathan Fontenot wrote: > Well, lets see if I can fill everyones mailbox with some more > lively eeh discussions again today. :) > > I decided to follow Paul's suggestion and use a workqueue to handle > the hotplug remove of the device. This was before I saw Greg's reply > about using kobject_hotplug with a user-space script. The relevant > piece is in eeh.c in the attached patch. > > Greg, you're right that this is a harsh policy. In this case I > think it's the right thing. When an EEH event happens the > slot is basically, device stores will fail and all reads will > return all F's. The idea with this code is to avoid a panic() call and > hopefully allow some time to do any cleanup before shutting down. Also, > this is why we want to limit this to network devices. This kind of > policy for a hard disk would just be begging for data corruption. > > So, bring on the comments. at least I now know you guys aren't shy. I still say drop this to userspace, and have it do the power down of the slot. Otherwise this will not work with any other PCI hotplug driver. It also requires that the PPC64 pci hotplug driver be built into your kernel, which will not be true for any vendor kernel. Also, I would never allow the generic "disable_slot" symbol become global in the kernel, that's just bad form :) You can also do the "is this a ethernet device or not" type of checking in userspace, which is the proper place for it too. Oh, what happens if that ethernet device contained some NFS mounts or a iSCSI device? thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 18 09:51:47 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 17 Mar 2004 16:51:47 -0600 Subject: more eeh In-Reply-To: <20040317172327.GA18810@kroah.com>; from greg@kroah.com on Wed, Mar 17, 2004 at 09:23:27AM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> Message-ID: <20040317165147.F33924@forte.austin.ibm.com> On Wed, Mar 17, 2004 at 09:23:27AM -0800, Greg KH wrote: > > On Tue, Mar 16, 2004 at 06:40:53PM -0600, Nathan Fontenot wrote: > > Hope I don't spoil anyones dinner with another dose of eeh :) > > > > I have attached a patch that generates the hotplug event in the > > kernel. At least it's supposed to do that. This eliminates the > > need for any kind of an eeh daemon and any /proc usage (two good > > things). > > No, you aren't generating a hotplug event here, you are instantly > shutting the power to the device off, after telling the driver bound to > the device to disconnect. Is that what you really want to do? It's > quite severe, and is a pretty harsh policy. Yes, its harsh. This is the 'short-term' solution. Hope to have something better (a lot better) later. > Think scsi devices with lots of filesystems mounted. boom. > Think multiport ethernet devices with loads of network traffic going > over the other ethernet devices. boom. Yes, boom. Currently, its a kernel panic, which is even worse. Nathan was trying to at least get rid of the kernel panic, so that at least the system can limp for just long enough for the sysadmin to do something. > Why not do this > - get eeh event > - determine which pci_dev this happened to. > - switch back to a task context > - call kobject_hotplug for the pci_dev with the action="fault" > - put a script in /etc/hotplug.d/pci/ that catches all > ACTION=fault events and decides what to do with them. You Well, there are some subtle points that make this clomplicated. 1) The're not 'events' in the sense of being interrupts or messages or something like that. By the time the linux kernel finds out about it, in interrupt or task context, the eeh hardware has already off-lined the adapter. An adapter the is offlined by eeh hardware returns -1 on reads and ignores all writes. An adapter that has the power turned off returns -1 on reads and ignores all writes. So, in this certain narrow sense, turning off the power is a no-op as far as hardware behaviour is concerned. 2) you're right, paulus is right, most of the recovery and etc. needs to happen in a task context. For the 'ultimate' solution, I was thinking a kernel daemon; but maybe something else is possible. 3) We know that some fraction of EEH events are perma-failures (hardware is busted), and these need to trickle up to user scripts, presumably exactly with the scenario you describe. We also know that some are one-shot parity errors that can be transparently recovered from. For the later, I was really hoping for a design that reset/restarted in the device driver, and the higher layers (block device/sockets) aren't even aware that that there was a momentary interruption of service. But at this time, that's not in this current patch. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Mar 18 09:54:21 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 18 Mar 2004 09:54:21 +1100 Subject: more eeh In-Reply-To: <20040317172327.GA18810@kroah.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> Message-ID: <16472.55069.619464.519643@cargo.ozlabs.ibm.com> Greg KH writes: > No, you aren't generating a hotplug event here, you are instantly > shutting the power to the device off, after telling the driver bound to > the device to disconnect. Is that what you really want to do? It's > quite severe, and is a pretty harsh policy. It's the hardware and firmware designers that have taken this policy. The question is: what do you do with the sort of PCI errors that would normally result in assertion of the SERR# (system error) line, such as an address parity error? On a desktop system you can make the SERR# signal cause a machine check, but you don't want to do that on a partitioned system since that would stop all the partitions, not just the one that was using the device in question. So the scheme that the hardware designers came up with was to add logic to the PCI-PCI bridges (we have one per slot, to support hotplug) to allow a slot to be electrically isolated from the rest of the system. Then, if the system detects an address parity error on a DMA transaction initiated by a particular device, it can just abort that transaction and isolate that device immediately, and thus stop the error from affecting any other part of the system. When the slot is in this state, any writes to the device get thrown away and any reads return all 1's. There are in fact ways to get through the bridge, via firmware calls, which the driver can use to (for example) dump the state of the device. There are also firmware calls to reset the device and to restore the normal connection through the PCI-PCI bridge. The idea of presenting this to drivers as a hot-unplug event followed by a hot-plug event (after the device has been reset and reconnected) was my suggestion as the best way to present to the drivers what the hardware is doing. I envisaged three classes of drivers: (a) those that were very pSeries-specific and could use a pSeries-specific API to cope with all this; (b) drivers that could cope with asynchronous plug and unplug events, to which the EEH shenanigans could be presented as plug/unplug events, and (c) drivers which couldn't cope at all. My hope was that a lot of drivers could be in class (b). I was hoping that most hot-plug aware drivers could be hardened sufficiently to be in class (b) without too much effort, and that that hardening would be acceptable to the driver maintainers (whereas the changes to put a driver in class (a) would, I expect, not be acceptable). The main things are that the device can be unplugged without prior notification and that the driver has to not do anything silly (like spinning forever) if reads from the device start returning all 1's. I was thinking that the unplug event generation, resetting and reconnecting of the device, and plug event generation would be done by a kernel thread. I don't think we want to rely on userspace for that, because userspace may get blocked while the device is gone. > Think scsi devices with lots of filesystems mounted. boom. > Think multiport ethernet devices with loads of network traffic going > over the other ethernet devices. boom. Well yes. At least with network devices, if they get unplugged, reset and replugged, we have the chance for the hotplug scripts to restore the correct addresses and routes, based on the device's MAC address. For scsi host adaptors, it's less pretty. There might be an argument for writing a class (a) driver for the scsi HBA for your root disk. Such a driver could present the whole EEH disconnect/reset/reconnect thing to the SCSI subsystem as a bus reset, for example. > It's also not going to work, as you are doing this from interrupt > context, and the pci disconnect sequence is expecting to have a task > context and will sleep. > > Why not do this (as this is what I think Anton was suggesting you do): > - get eeh event > - determine which pci_dev this happened to. > - switch back to a task context > - call kobject_hotplug for the pci_dev with the action="fault" > - put a script in /etc/hotplug.d/pci/ that catches all > ACTION=fault events and decides what to do with them. You > have a full pointer to the sysfs directory of the pci device > at this moment in time, so you can see what driver is bound to > the device, and if you really want to, you can turn the device > off (after bringing down the network connection or unmounting > any attached filesystems.) > > This pushes all of your policy to userspace, allows you to fit into the > proper kernel event notifier, and allows you to write a shell script if > you want to do so. > > And it makes the kernel code a whole lot smaller and simpler. > > Sound good? I would rather get the notification to the driver quickly without relying on userspace (but of course from task context not interrupt context). What happens after that could be driven by userspace, except that I worry about what happens if userspace gets blocked by the device being unavailable. Greg, I would really value your considered thoughts about how to handle this stuff properly. EEH is a fact of life for us - I don't want to defend the approach, but it is in hardware today and we have to deal with it. Thanks, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ahuja at austin.ibm.com Thu Mar 18 11:01:22 2004 From: ahuja at austin.ibm.com (ahuja at austin.ibm.com) Date: Wed, 17 Mar 2004 18:01:22 -0600 (CST) Subject: cpu utilization monitor. In-Reply-To: Message-ID: I made some of the changes requested... Will try and incorporate other requests in future versions of this file. Cheers, Manish On Tue, 16 Mar 2004 ahuja at forte.austin.ibm.com wrote: > > > This patch adds the framework required by performace team and on demand > computing. At this point only the important bits/framework are covered. > > All the kobjects/calculations are yet to be written. > > We are still contiunuing to disscuss methods for performace monitoring. > > Purr is a vcpu performance counter. This collects purr/tb > periodically to be used later in computations from any given > cpu without needing to acquire a cpu exclusively. > > Thanks, > Manish > > p.s My first patch, so go easy guys... > -------------- next part -------------- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1506 -> 1.1508 # arch/ppc64/kernel/smp.c 1.68 -> 1.70 # arch/ppc64/kernel/Makefile 1.40 -> 1.41 # (new) -> 1.2 arch/ppc64/kernel/profile.h # (new) -> 1.2 arch/ppc64/kernel/profile.c # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 04/03/16 ahuja at threadlp13.austin.ibm.com 1.1507 # Added new functionality for performance monitoring mahuja at us.ibm.com # -------------------------------------------- # 04/03/17 ahuja at threadlp13.austin.ibm.com 1.1508 # smp.c: # Made it a function and invoking that only... # profile.h: # Changed my timer & made unbsigned long long to u64 # profile.c: # Removed all refs to my_timer # -------------------------------------------- # diff -Nru a/arch/ppc64/kernel/Makefile b/arch/ppc64/kernel/Makefile --- a/arch/ppc64/kernel/Makefile Wed Mar 17 17:49:43 2004 +++ b/arch/ppc64/kernel/Makefile Wed Mar 17 17:49:43 2004 @@ -43,7 +43,7 @@ obj-$(CONFIG_PPC_RTAS) += rtas-proc.o obj-$(CONFIG_SCANLOG) += scanlog.o obj-$(CONFIG_VIOPATH) += viopath.o -obj-$(CONFIG_LPARCFG) += lparcfg.o +obj-$(CONFIG_LPARCFG) += lparcfg.o profile.o obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_BOOTX_TEXT) += btext.o diff -Nru a/arch/ppc64/kernel/profile.c b/arch/ppc64/kernel/profile.c --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/arch/ppc64/kernel/profile.c Wed Mar 17 17:49:43 2004 @@ -0,0 +1,95 @@ +/* + * PPC64 Cpu util performace monitoring. + * + * Manish Ahuja mahuja at us.ibm.com + * Copyright (c) 2004 Manish Ahuja IBM CORP. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * This file will also report many of the perf values for 2.6 + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "profile.h" + +#define SAMPLE_TICK HZ + +DEFINE_PER_CPU(struct cpu_util_store, cpu_util_sampler); + +/* + * This is a timer handler. There is on per CPU. It gets scheduled + * every SAMPLE_TICK ticks. + */ + +static void util_timer_func(unsigned long data) +{ + struct cpu_util_store * cus = &__get_cpu_var(cpu_util_sampler); + struct timer_list *tl = &cus->cpu_util_timer; + + if (PVR_VER(systemcfg->processor) == PV_POWER5) { + cus->current_purr = mfspr(PURR); + cus->tb = mftb(); + } + /*printk(KERN_INFO "PURR VAL %ld %lld %lld\n", data, cus->current_purr, cus->tb);*/ + + mod_timer(tl, jiffies + SAMPLE_TICK); +} + +/* + * One time function that gets called when all the cpu's are online + * to start collection. It adds the timer to each cpu on the system. + * start_purr is collected during smp_init time in __cpu_up code + */ + +static void start_util_timer(int cpu) +{ + struct cpu_util_store * cus = &per_cpu(cpu_util_sampler, cpu); + struct timer_list *tl = &cus->cpu_util_timer; + + if (tl->function != NULL) + return; + + init_timer(tl); + tl->expires = jiffies + SAMPLE_TICK; + tl->data = cpu; + tl->function = util_timer_func; + add_timer_on(tl, cpu); +} + +int __init cpu_util_init(void) +{ + int cpu; + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + if (cpu_online(cpu)) + start_util_timer(cpu); + } + + return 0; +} + +__initcall(cpu_util_init); + +/* Collect starting purr */ + +void collect_startpurr(int cpu) +{ + struct cpu_util_store * cus = &per_cpu(cpu_util_sampler, cpu); + + if (PVR_VER(systemcfg->processor) == PV_POWER5) { + cus->start_purr = mfspr(PURR); + cus->tb = mftb(); + } +} + diff -Nru a/arch/ppc64/kernel/profile.h b/arch/ppc64/kernel/profile.h --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/arch/ppc64/kernel/profile.h Wed Mar 17 17:49:43 2004 @@ -0,0 +1,28 @@ +/* + * Copyright (c) 2004 Manish Ahuja + * + * Module name: profile.h + * + * Description: + * Architecture- / platform-specific boot-time initialization code for + * tracking purr utilization and other performace features in coming + * releases for splpar/smt machines. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define PURR 309 + +DECLARE_PER_CPU(struct cpu_util_store, cpu_util_sampler); + +struct cpu_util_store { + struct timer_list cpu_util_timer; + u64 start_purr; + u64 current_purr; + u64 tb; +}; + +void collect_startpurr(int cpu); diff -Nru a/arch/ppc64/kernel/smp.c b/arch/ppc64/kernel/smp.c --- a/arch/ppc64/kernel/smp.c Wed Mar 17 17:49:43 2004 +++ b/arch/ppc64/kernel/smp.c Wed Mar 17 17:49:43 2004 @@ -52,6 +52,7 @@ #include #include #include +#include "profile.h" #ifdef CONFIG_KDB #include @@ -1001,6 +1002,9 @@ /* wake up cpus */ smp_ops->kick_cpu(cpu); + + /* Collect starting purr */ + collect_startpurr(cpu); /* * wait to see if the cpu made a callin (is actually up). From benh at kernel.crashing.org Thu Mar 18 11:43:34 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 18 Mar 2004 11:43:34 +1100 Subject: OpenFirmware devices and hotplug events In-Reply-To: <20040317173428.GB19060@kroah.com> References: <1079473212.8840.59.camel@mudbug.austin.ibm.com> <20040316214932.GA30202@kroah.com> <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <20040317173428.GB19060@kroah.com> Message-ID: <1079570613.909.14.camel@gaston> > Yes, that's exactly what /sys/firmware is for. A userspace > representation of the machine's firmware. Look at a i386 box running > ACPI for an example of just that. I'd prefer on-the-fly creation of dentries & inodes rather than having it all in sysfs statically. BTW. Doesn't sparc has a filesystem for exposing the firmware device-tree ? maybe that's worth looking at. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Thu Mar 18 12:03:58 2004 From: greg at kroah.com (Greg KH) Date: Wed, 17 Mar 2004 17:03:58 -0800 Subject: OpenFirmware devices and hotplug events In-Reply-To: <1079570613.909.14.camel@gaston> References: <1079474407.20826.1088.camel@nighthawk> <20040316220928.GA30672@kroah.com> <20040316222544.GW19737@krispykreme> <20040316223712.GB31162@kroah.com> <4057F42B.7030706@austin.ibm.com> <1079540187.5789.26.camel@nighthawk> <20040317171027.GE17740@kroah.com> <1079544213.5789.190.camel@nighthawk> <20040317173428.GB19060@kroah.com> <1079570613.909.14.camel@gaston> Message-ID: <20040318010358.GA26162@kroah.com> On Thu, Mar 18, 2004 at 11:43:34AM +1100, Benjamin Herrenschmidt wrote: > > > Yes, that's exactly what /sys/firmware is for. A userspace > > representation of the machine's firmware. Look at a i386 box running > > ACPI for an example of just that. > > I'd prefer on-the-fly creation of dentries & inodes rather than > having it all in sysfs statically. BTW. Doesn't sparc has a > filesystem for exposing the firmware device-tree ? maybe that's > worth looking at. There are some patches out there that do just that for sysfs, but they still need work before making it into the mainline kernel. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Mar 18 18:40:17 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 18 Mar 2004 18:40:17 +1100 Subject: [anton@samba.org: Recent input patch broke my keyboard] Message-ID: <20040318074017.GI28212@krispykreme> FYI. You many need to back this patch out if you are running a recent BK on a ppc64 box with a PC keyboard. Anton ----- Forwarded message from Anton Blanchard ----- From: Anton Blanchard To: vojtech at suse.cz Cc: torvalds at osdl.org, linux-kernel at vger.kernel.org Subject: Recent input patch broke my keyboard Hi, The patch below breaks my ppc64 box. None of the keys behave as expected :) I also get a bunch of stuff in the dmesg: atkbd.c: Use 'setkeycodes 66 ' to make it known. atkbd.c: Unknown key pressed (translated set 2, code 0x66 on isa0060/serio0). The boot messages show: serio: i8042 AUX port at 0x60,0x64 irq 12 input: PS/2 Logitech Mouse on isa0060/serio1 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Translated Set 2 keyboard on isa0060/serio0 If I back the patch out, things work again and I get: serio: i8042 AUX port at 0x60,0x64 irq 12 input: PS/2 Logitech Mouse on isa0060/serio1 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Raw Set 2 keyboard on isa0060/serio0 Sounds like assuming we are always in translate mode is bad for me. Anton # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/03/03 15:14:01+01:00 vojtech at suse.cz # input: i8042.c: # Assume the chip always is in XLATE mode, even when it doesn't # have the XLATE bit set - apparently IBM PS/2 model 70 behaves # this way. # # drivers/input/serio/i8042.c # 2004/03/03 15:13:56+01:00 vojtech at suse.cz +0 -8 # input: i8042.c: # Assume the chip always is in XLATE mode, even when it doesn't # have the XLATE bit set - apparently IBM PS/2 model 70 behaves # this way. # diff -Nru a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c --- a/drivers/input/serio/i8042.c Thu Mar 18 15:06:59 2004 +++ b/drivers/input/serio/i8042.c Thu Mar 18 15:06:59 2004 @@ -722,14 +722,6 @@ } /* - * If the chip is configured into nontranslated mode by the BIOS, don't - * bother enabling translating and be happy. - */ - - if (~i8042_ctr & I8042_CTR_XLATE) - i8042_direct = 1; - -/* * Set nontranslated mode for the kbd interface if requested by an option. * After this the kbd interface becomes a simple serial in/out, like the aux * interface is. We don't do this by default, since it can confuse notebook ----- End forwarded message ----- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Fri Mar 19 04:35:03 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Thu, 18 Mar 2004 09:35:03 -0800 Subject: cpu utilization monitor. In-Reply-To: References: Message-ID: <1079631303.21552.15.camel@nighthawk> I've thought about this a little bit more. Do you really need to be calling collect_startpurr() from __cpu_up()? Why can't that be done from the initcall before you start the timer? I don't see start_purr actually used anywhere, so I'm not quite sure what it will be used for. Although it is __init, should cpu_util_init() be static too? -- dave ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 19 08:35:41 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 18 Mar 2004 15:35:41 -0600 Subject: KDB updates In-Reply-To: <20040318141647.GA6238@in.ibm.com>; from ananth@in.ibm.com on Thu, Mar 18, 2004 at 07:16:47PM +0500 References: <20040318141647.GA6238@in.ibm.com> Message-ID: <20040318153540.I33924@forte.austin.ibm.com> On Thu, Mar 18, 2004 at 07:16:47PM +0500, Ananth N Mavinakayanahalli wrote: > Hello Linas, > > We are working with your patch (now in ameslab) and found an issue with > stack backtrace for the "current" process. Inlined is a patch that fixes > it (uses a method similar to the i386 code) and it also contains a few > other checks. Please comment.... > > (Oh, I had to define irq_affinity[NR_IRQS] as extern in xics.c for a > successful UNI build) > > We have been working on a Power3 box all this while and KDB works fine. > We are now trying it on a p630 but the machine locks up solid immediately > upon entry; we have to reset it through the hmc console. > > Just curious if you had the patch working on SMP? Are there any other > issues to take care of in this case? Any pointers would be helpful. > > > Thanks, > Ananth > -- > Ananth Narayan > Linux Technology Center, > IBM Software Lab, INDIA > > > diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdba_bp.c ameslab/arch/ppc64/kdb/kdba_bp.c > --- temp/ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-15 10:03:56.000000000 +0530 > +++ ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-16 14:46:47.000000000 +0530 > @@ -92,6 +92,9 @@ > unsigned long primary; > unsigned long extended; > > + if (KDB_NULL_REGS(ef)) > + return KDB_DB_NOBPT; > + > msr = get_msr(); > trap = ef->trap; > if (KDB_DEBUG(BP)) > @@ -187,9 +190,6 @@ > if (rv > 0) > goto handled; > > - goto handle; > - > - > handle: > > /* > @@ -271,6 +271,8 @@ > kdb_dbtrap_t rv; > kdb_bp_t *bp; > > + if (KDB_NULL_REGS(ef)) > + return KDB_DB_NOBPT; > /* > * Determine which breakpoint was encountered. > */ > @@ -294,6 +296,13 @@ > kdb_id1(ef->nip); > rv = KDB_DB_BPT; > bp->bp_delay = 1; > + /* SSBPT is set when the kernel debugger must single > + * step a task in order to re-establish an instruction > + * breakpoint which uses the instruction replacement > + * mechanism. It is cleared by any action that removes > + * the need to single-step the breakpoint > + */ > + KDB_STATE_SET(SSBPT); > break; > } > } > @@ -324,7 +333,7 @@ > static void > kdba_handle_bp(kdb_eframe_t ef, kdb_bp_t *bp) > { > - if (!ef) { > + if (KDB_NULL_REGS(ef)) { > kdb_printf("kdba_handle_bp: ef == NULL\n"); > return; > } > @@ -337,12 +346,6 @@ > */ > kdba_setsinglestep(ef); > > - /* KDB_STATE_SSBPT is set when the kernel debugger must single step > - * a task in order to re-establish an instruction breakpoint which > - * uses the instruction replacement mechanism. > - */ > - KDB_STATE_SET(SSBPT); > - > /* > * Reset delay attribute > */ > @@ -665,7 +668,9 @@ > * > * For instruction replacement breakpoints, we must single-step > * over the replaced instruction at this point so we can re-install > - * the breakpoint instruction after the single-step. > + * the breakpoint instruction after the single-step. SSBPT is set > + * when the breakpoint is initially hit and is cleared by any action > + * that removes the need for single-step over the breakpoint. > */ > > int > @@ -679,6 +684,8 @@ > if (KDB_DEBUG(BP)) { > kdb_printf("kdba_installbp bp_installed %d\n", bp->bp_installed); > } > + if (!KDB_STATE(SSBPT)) > + bp->bp_delay = 0; > if (!bp->bp_installed) { > if (bp->bp_hardtype) { > kdba_installdbreg(bp); > @@ -695,14 +702,15 @@ > if (KDB_DEBUG(BP)) > kdb_printf("0x%lx 0x%lx 0x%lx\n",bp->bp_inst,bp->bp_addr,sizeof(bp->bp_addr)); > rc = kdb_getword(&bp->bp_inst, bp->bp_addr,sizeof(bp->bp_addr)); > - kdb_putword(bp->bp_addr, PPC64_BREAKPOINT_INSTRUCTION,sizeof(PPC64_BREAKPOINT_INSTRUCTION)); > + if (kdb_putword(bp->bp_addr, PPC64_BREAKPOINT_INSTRUCTION,sizeof(PPC64_BREAKPOINT_INSTRUCTION))) > + return (1); > if (KDB_DEBUG(BP)) > kdb_printf("kdba_installbp instruction 0x%x at " kdb_bfd_vma_fmt "\n", > PPC64_BREAKPOINT_INSTRUCTION, bp->bp_addr); > bp->bp_installed = 1; > } > } > -return 0; > + return 0; > } > > /* > diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdba_id.c ameslab/arch/ppc64/kdb/kdba_id.c > --- temp/ameslab/arch/ppc64/kdb/kdba_id.c 2004-03-15 10:03:56.000000000 +0530 > +++ ameslab/arch/ppc64/kdb/kdba_id.c 2004-03-16 14:33:30.000000000 +0530 > @@ -194,8 +194,6 @@ > int > kdba_id_parsemode(const char *mode, disassemble_info *dip) > { > - > - > return 0; > } > > diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdbasupport.c ameslab/arch/ppc64/kdb/kdbasupport.c > --- temp/ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-15 10:03:56.000000000 +0530 > +++ ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-16 10:17:26.000000000 +0530 > @@ -516,7 +516,9 @@ > ef = ®s; > } > cpus_in_kdb++; > + kdb_save_running(ef); > rv = kdb_main_loop(reason, reason2, error, db_result, ef); > + kdb_unsave_running(ef); > cpus_in_kdb--; > return rv; > } > diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kernel/xics.c ameslab/arch/ppc64/kernel/xics.c > --- temp/ameslab/arch/ppc64/kernel/xics.c 2004-03-15 10:03:58.000000000 +0530 > +++ ameslab/arch/ppc64/kernel/xics.c 2004-03-16 09:40:21.000000000 +0530 > @@ -238,6 +238,7 @@ > > /* XXX Fix this when we clean up large irq support */ > extern cpumask_t get_irq_affinity(unsigned int irq); > +extern cpumask_t irq_affinity[NR_IRQS]; > > static int get_irq_server(unsigned int irq) > { > --- temp/ameslab/arch/ppc64/kdb/kdba_bt.c 2004-03-15 10:03:56.000000000 +0530 > +++ ameslab/arch/ppc64/kdb/kdba_bt.c 2004-03-17 13:39:53.000000000 +0530 > @@ -113,12 +113,6 @@ > unsigned long symsize,symoffset; > char *symmodname; > > - if (!regs && !addr) > - { > - kdb_printf(" invalid regs pointer \n"); > - return 0; > - } > - > /* > * The caller may have supplied an address at which the > * stack traceback operation should begin. This address > @@ -129,12 +123,29 @@ > * entitled '' was called from the function which > * contains return-address. > */ > - if (addr) { > + if (!addr) > + addr = (kdb_machreg_t *)p->thread.ksp; > + > + if (addr && !task_curr(p)) { > eip = 0; > esp = *addr; > - ebp=0; > + ebp = 0; > } else { > - ebp=regs->link; > + if (task_curr(p)) { > + struct kdb_running_process *krp = kdb_running_process + task_cpu(p); > + if (!krp->seqno) { > + kdb_printf("Process did not save state, cannot backtrace \n"); > + kdb_ps1(p); > + return 0; > + } > + regs = krp->regs; > + } else { > + if (!regs) > + regs = p->thread.regs; > + } > + if (KDB_NULL_REGS(regs)) > + return KDB_BADREG; > + ebp = regs->link; > eip = regs->nip; > if (regs_esp) > esp = regs->gpr[1]; > @@ -318,5 +329,5 @@ > int > kdba_bt_process(struct task_struct *p, int argcount) > { > - return (kdba_bt_stack_ppc(p->thread.regs, (kdb_machreg_t *) p->thread.ksp, argcount, p, 0)); > + return (kdba_bt_stack_ppc(NULL, NULL, argcount, p, 0)); > } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 19 08:55:27 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 18 Mar 2004 15:55:27 -0600 Subject: KDB updates In-Reply-To: <20040318141647.GA6238@in.ibm.com>; from ananth@in.ibm.com on Thu, Mar 18, 2004 at 07:16:47PM +0500 References: <20040318141647.GA6238@in.ibm.com> Message-ID: <20040318155527.J33924@forte.austin.ibm.com> Hi, I am cc'ing the public mailing list since I figure this might be of general interest. On Thu, Mar 18, 2004 at 07:16:47PM +0500, Ananth N Mavinakayanahalli wrote: > > We are working with your patch (now in ameslab) and found an issue with > stack backtrace for the "current" process. Inlined is a patch that fixes > it (uses a method similar to the i386 code) and it also contains a few > other checks. Please comment.... > We have been working on a Power3 box all this while and KDB works fine. > We are now trying it on a p630 but the machine locks up solid immediately > upon entry; we have to reset it through the hmc console. > > Just curious if you had the patch working on SMP? Are there any other > issues to take care of in this case? Any pointers would be helpful. works on power3 smp for me, I will try power4 lpar when I get the chance. ------ I'm about to try your patch. In the meanwhile, there are several other bugs that should be fixed. -- it takes 10-15 seconds between 'startKDB' and getting the prompt. I think there is something wrong abuot the way the IPI to stop the other cpus is handled. I tried reading the code but couldn't find the problem. -- When started with 'startKDB' it seems to work, but when started via 'little yellow button', the 'go' command doesn't seem to be handled. It seems like the linux kernel oops handler also wants to run, and that handler doesn't correctly handle the system reset interrupt, at which point the machine powers off. (That's wrong, the system reset interrupt is fully recoverable, and the debugger should allow system to resume where it was before the system reset.) I think the bug is related to arch/ppc64/kernel/traps.c:173 if (!debugger(regs)) die("System Reset", regs, 0); Either those lines are wrong, or kdb should be returning a non-zero return code, I'm not sure which. Note also some print garbage should be cleaned up. panic:~ # panic:~ # panic:~ # O:[ aStytsetnetmi oRne]s32et00, KsDiBg :C a0l l[ # 1 ] S M NR_CPUS=32 NIP: C0000000000145B0 XER: 0000000020000000 LR: C0000000000145E0 REGS: c00000003ff93b60 TRAP: 0100 Not tainted (2.6.5-rc1-ames) MSR: a000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c00000003ff96e00[0] 'swapper' THREAD: c00000003ff90000 CPU: 1 GPR00: 0000000000000000 C00000003FF93DE0 C0000000006D5230 C00000003F5A27D8 GPR04: C00000003FF972B0 0000000000000001 0000000022014852 0000000000000000 GPR08: 0000000002B15480 C00000003FF90000 C0000000006D3008 C00000003FF90000 GPR12: 0000000024022822 C0000000004B2000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 FFFFFFFF8AC00300 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 GPR24: 0000000000000001 0000000000000010 0000000000000010 C00000003FF90010 GPR28: C00000003FF90000 0000000000000008 C00000003FF90000 C00000003FF90010 NIP [c0000000000145b0] .default_idle+0x78/0xc4 LR [c0000000000145e0] .default_idle+0xa8/0xc4 Call Trace: [c000000000014860] .cpu_idle+0x2c/0x44 [c000000000043070] .start_secondary+0xf0/0x130 [c00000000000bb94] .enable_64b_mode+0x0/0x28 2 cpus are not in kdb, their state is unknown Entering kdb (current=0xc00000000053ef30, pid 0) on processor 0 due to KDB_ENTE)[0]kdb> [0]kdb> [0]kdb> go [attention]3300 KDB Done Oops: System Reset, sig: 0 [#2] SMP NR_CPUS=32 NIP: C0000000000145BC XER: 0000000000000000 LR: C0000000000145E0 REGS: c0000000004a0K erTRnAelP :p a01n0ic0: A Ntotetm pttaeindt etdo s! kMSRIn: ai0d0l0e0 0t0a0s0k 0-0 0n9o0t3 2 sEyEn:c i1n gP : 0 FP: 0 ME: 1 IR/DR: 11 R TASK: c00000000053ef30[0] 'swapper' THREAD: c0000000004ac000 CPU: 0 GPR00: 0000000000000000 C0000000004AFD50 C0000000006D5230 0000000000000000 GPR04: C00000000053F3E0 0000000000000001 0000000028000022 0000000000000000 GPR08: 0000000002B0D480 C0000000004AC000 C0000000006D3008 C0000000004AC000 GPR12: 0000000028004488 C0000000004B0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000001410000 GPR24: C0000000004B0000 0000000000000010 0000000000000010 C0000000004AC010 GPR28: C0000000004AC000 0000000000000008 C0000000004AC000 C0000000004AC010 NIP [c0000000000145bc] .default_idle+0x84/0xc4 LR [c0000000000145e0] .default_idle+0xa8/0xc4 Call Trace: [c000000000014860] .cpu_idle+0x2c/0x44 [c00000000000bf78] .rest_init+0x74/0x8c [c00000000045fabc] .start_kernel+0x274/0x2ec [c00000000000beec] .__setup_cpu_power3+0x0/0x4 Badness in do_unblank_screen at drivers/char/vt.c:2822 Call Trace: [c0000000001e5610] .bust_spinlocks+0x58/0x84 [c000000000011f54] .die+0xf4/0x184 [c0000000000121f8] .SystemResetException+0x74/0xb4 [c00000000000a0e0] SystemReset_common+0xe0/0x0 [c0000000000145e0] .default_idle+0xa8/0xc4 [c000000000014860] .cpu_idle+0x2c/0x44 [c00000000000bf78] .rest_init+0x74/0x8c [c00000000045fabc] .start_kernel+0x274/0x2ec [c00000000000beec] .__setup_cpu_power3+0x0/0x4 smp_call_function on cpu 1: other cpus not responding (0) kdb: Debugger re-entered on cpu 1, new reason = 12 Not executing a kdb command No longjmp available for recovery Cannot recover, allowing event to proceed 2 cpus are not in kdb, their state is unknown Entering kdb (current=0xc00000003ff96e00, pid 0) on processor 1 [1]kdb> [1]kdb> go Catastrophic error detected kdb_continue_catastrophic=0, type go a second time if you really want to contine[1]kdb> go Catastrophic error detected kdb_continue_catastrophic=0, attempting to continue ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 19 09:31:07 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 18 Mar 2004 16:31:07 -0600 Subject: [PATCH]: Re: KDB updates In-Reply-To: <20040318141647.GA6238@in.ibm.com>; from ananth@in.ibm.com on Thu, Mar 18, 2004 at 07:16:47PM +0500 References: <20040318141647.GA6238@in.ibm.com> Message-ID: <20040318163107.K33924@forte.austin.ibm.com> Paul, I don't have bk push authority to ameslab, can you commit the attached patch from Ananth? (unless Ananth writes back and says 'stop don't do it') I tested it on power3, it handles user-space stacks much better, and also fixes my '10 seconds to enter kdb' complaint. --linas On Thu, Mar 18, 2004 at 07:16:47PM +0500, Ananth N Mavinakayanahalli wrote: > > We are working with your patch (now in ameslab) and found an issue with > stack backtrace for the "current" process. Inlined is a patch that fixes > it (uses a method similar to the i386 code) and it also contains a few > other checks. Please comment.... > > (Oh, I had to define irq_affinity[NR_IRQS] as extern in xics.c for a > successful UNI build) > > We have been working on a Power3 box all this while and KDB works fine. > We are now trying it on a p630 but the machine locks up solid immediately > upon entry; we have to reset it through the hmc console. > > Just curious if you had the patch working on SMP? Are there any other > issues to take care of in this case? Any pointers would be helpful. > > > Thanks, > Ananth > -- > Ananth Narayan > Linux Technology Center, > IBM Software Lab, INDIA ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Fri Mar 19 10:49:21 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Thu, 18 Mar 2004 17:49:21 -0600 Subject: eeh Message-ID: <1079653760.23533.9.camel@mudbug.austin.ibm.com> Just wanted to put out an updated version the eeh patch, one that I can push to Ameslab. I hope. Thanks to everyone for their help in working on this code. This patch updates a few things. - The need for rpaphp to be built into the kernel is gone. Code was added to let rpaphp register a mechanism with eeh to do the disabling of the slot. - The disable_slot routine is back to static. As Greg pointed out, having a routine called disable_slot become a global symbol become a global symbol is not a good idea. Especially since this is a very arch specific routine. -Instances of __pa() are replaced with virt_to_phys() -Moves the slot_errbuf and slot_errbuf_lock off of the stack. As always, all comments are welcome. thanks. -Nathan F. -- -------------- next part -------------- A non-text attachment was scrubbed... Name: eeh.patch Type: text/x-patch Size: 8408 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040318/c06d638f/attachment.bin From anton at samba.org Fri Mar 19 10:52:59 2004 From: anton at samba.org (Anton Blanchard) Date: Fri, 19 Mar 2004 10:52:59 +1100 Subject: KDB updates In-Reply-To: <20040318155527.J33924@forte.austin.ibm.com> References: <20040318141647.GA6238@in.ibm.com> <20040318155527.J33924@forte.austin.ibm.com> Message-ID: <20040318235259.GO28212@krispykreme> > -- When started with 'startKDB' it seems to work, but when started via > 'little yellow button', the 'go' command doesn't seem to be handled. > It seems like the linux kernel oops handler also wants to run, and > that handler doesn't correctly handle the system reset interrupt, > at which point the machine powers off. (That's wrong, the system > reset interrupt is fully recoverable, and the debugger should allow > system to resume where it was before the system reset.) > > I think the bug is related to arch/ppc64/kernel/traps.c:173 > if (!debugger(regs)) > die("System Reset", regs, 0); > > Either those lines are wrong, or kdb should be returning a non-zero > return code, I'm not sure which. Its a feature. xmon now has the ability to recover from any exception, eg if you get a NULL pointer deref you can fix the instruction and retry it. x will exit and retry, X will exit and pass the error on (ie it will oops). It might be useful if kdb got this technology. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Fri Mar 19 11:01:16 2004 From: greg at kroah.com (Greg KH) Date: Thu, 18 Mar 2004 16:01:16 -0800 Subject: more eeh In-Reply-To: <16472.55069.619464.519643@cargo.ozlabs.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> Message-ID: <20040319000116.GC17586@kroah.com> On Thu, Mar 18, 2004 at 09:54:21AM +1100, Paul Mackerras wrote: > > So the scheme that the hardware designers came up with was to add > logic to the PCI-PCI bridges (we have one per slot, to support > hotplug) to allow a slot to be electrically isolated from the rest of > the system. Then, if the system detects an address parity error on a > DMA transaction initiated by a particular device, it can just abort > that transaction and isolate that device immediately, and thus stop > the error from affecting any other part of the system. > > When the slot is in this state, any writes to the device get thrown > away and any reads return all 1's. Which is the same as PCMCIA sees when the device is disconnected, right? > The idea of presenting this to drivers as a hot-unplug event followed > by a hot-plug event (after the device has been reset and reconnected) > was my suggestion as the best way to present to the drivers what the > hardware is doing. I envisaged three classes of drivers: (a) those > that were very pSeries-specific and could use a pSeries-specific API > to cope with all this; (b) drivers that could cope with asynchronous > plug and unplug events, to which the EEH shenanigans could be > presented as plug/unplug events, and (c) drivers which couldn't cope > at all. > > My hope was that a lot of drivers could be in class (b). They should be, if they work with PCI hotplug systems. Unfortunately a lot of SCSI drivers are still not there, but with 2.6 it's gotten a lot better. > I was hoping that most hot-plug aware drivers could be hardened > sufficiently to be in class (b) without too much effort, and that that > hardening would be acceptable to the driver maintainers I don't think anyone would disagree with this. > (whereas the changes to put a driver in class (a) would, I expect, not > be acceptable). Agreed. > I was thinking that the unplug event generation, resetting and > reconnecting of the device, and plug event generation would be done by > a kernel thread. I don't think we want to rely on userspace for that, > because userspace may get blocked while the device is gone. But you want userspace to do this. There are systems with a few different PCI Hotplug controller drivers on them. The different controller drivers control different slots. Userspace is the only place that can reliably handle this. And if you are a kernel thread, you would have the same issues that dropping to userspace and doing the disconnect there causes. So I still think that my userspace proposal is the proper way to do this. It works with all pci hotplug drivers, and allows userspace to implement any type of policy that it wishes to (disconnecting filesystems, bringing down network connections, logging the event to the proper place, etc.) > I would rather get the notification to the driver quickly without > relying on userspace (but of course from task context not interrupt > context). What happens after that could be driven by userspace, > except that I worry about what happens if userspace gets blocked by > the device being unavailable. You've never actually timed a hotplug event have you :) They are blindly fast. So bloody fast that I had to put a lot of dumb logic in the hotplug and udev code to sit and spin and wait for the kernel to catch up. Now the issue of putting the hotplug script on a disk that just got a error would indicate that you really need a type (a) driver for that kind of thing. Hope this helps, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 19 11:21:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 19 Mar 2004 11:21:29 +1100 Subject: more eeh In-Reply-To: <20040319000116.GC17586@kroah.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> Message-ID: <1079655688.1947.76.camel@gaston> > > When the slot is in this state, any writes to the device get thrown > > away and any reads return all 1's. > > Which is the same as PCMCIA sees when the device is disconnected, right? Well, that's the "natural" thing to do with a disconnected/dead device indeed. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Mar 19 11:32:29 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 19 Mar 2004 11:32:29 +1100 Subject: more eeh In-Reply-To: <20040319000116.GC17586@kroah.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> Message-ID: <16474.16285.55962.349029@cargo.ozlabs.ibm.com> Greg KH writes: > > When the slot is in this state, any writes to the device get thrown > > away and any reads return all 1's. > > Which is the same as PCMCIA sees when the device is disconnected, right? Yes, exactly. > > I was thinking that the unplug event generation, resetting and > > reconnecting of the device, and plug event generation would be done by > > a kernel thread. I don't think we want to rely on userspace for that, > > because userspace may get blocked while the device is gone. > > But you want userspace to do this. There are systems with a few > different PCI Hotplug controller drivers on them. The different > controller drivers control different slots. Userspace is the only place > that can reliably handle this. I don't understand this; surely if you get a pointer to a pci_dev you can get to pci_dev->driver->remove and call that, can't you? Or are you saying that there is no consistent API to the drivers for the hot-plug capable PCI bridges? > And if you are a kernel thread, you would have the same issues that > dropping to userspace and doing the disconnect there causes. Not all of the same issues; a kernel thread doesn't have to worry about its code or data being paged out, for instance. > So I still think that my userspace proposal is the proper way to do > this. It works with all pci hotplug drivers, and allows userspace to > implement any type of policy that it wishes to (disconnecting > filesystems, bringing down network connections, logging the event to the > proper place, etc.) > > > I would rather get the notification to the driver quickly without > > relying on userspace (but of course from task context not interrupt > > context). What happens after that could be driven by userspace, > > except that I worry about what happens if userspace gets blocked by > > the device being unavailable. > > You've never actually timed a hotplug event have you :) Well, I would be concerned about the maximum latency, not the average latency. I accept that the average would be milliseconds, but the maximum could be tens of seconds on a heavily loaded system, couldn't it? Especially if it involves execing a new process and that requires disk I/O. > Now the issue of putting the hotplug script on a disk that just got a > error would indicate that you really need a type (a) driver for that > kind of thing. Part of my thinking is that I would like the API for type (a) drivers to be an extension of the PCI hotplug API rather than being completely disjoint. In other words, I would like the type (a) driver to get the unplug event, and then determine (via a special call, or a parameter to the remove() function) that this is an EEH event and therefore the absence of the device is likely to be transient. The driver would then not report the removal immediately, but would wait (with a timeout) for the device to come back. When it came back it would recognize that this is the same device, reinitialize it and carry on. If the device didn't come back shortly, then it would do the normal device removal things. In any case, whatever the API, we are going to have to have the infrastructure in the kernel to do the slot reset and reconnect, for type (a) drivers to use. Type (a) drivers need to be able to recover without relying on userspace, obviously. It doesn't make sense to me to have the same logic in two places, in the kernel and in userspace, and use one or the other depending on what sort of driver we have. Thoughts? Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Fri Mar 19 11:50:26 2004 From: greg at kroah.com (Greg KH) Date: Thu, 18 Mar 2004 16:50:26 -0800 Subject: more eeh In-Reply-To: <16474.16285.55962.349029@cargo.ozlabs.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> Message-ID: <20040319005026.GD19053@kroah.com> On Fri, Mar 19, 2004 at 11:32:29AM +1100, Paul Mackerras wrote: > > > I was thinking that the unplug event generation, resetting and > > > reconnecting of the device, and plug event generation would be done by > > > a kernel thread. I don't think we want to rely on userspace for that, > > > because userspace may get blocked while the device is gone. > > > > But you want userspace to do this. There are systems with a few > > different PCI Hotplug controller drivers on them. The different > > controller drivers control different slots. Userspace is the only place > > that can reliably handle this. > > I don't understand this; surely if you get a pointer to a pci_dev you > can get to pci_dev->driver->remove and call that, can't you? No, you need to call pci_remove_bus_device() for that device so that the pci core properly cleans up after your device is gone. That's what the pci hotplug controller drivers eventually call, after handling a bunch of other housekeeping that their hardware requires (putting resources back into a pool, etc.) > Or are you saying that there is no consistent API to the drivers for > the hot-plug capable PCI bridges? There is. The drivers see the remove() call. The pci core needs to be called with pci_remove_bus_device(). It's the pci hotplug controller drivers that don't have a "consistant" interface yet, as I've never thought to expose one. It wouldn't be that hard to do that in the pci hotplug core if you really think it's necessary. But realize that there is no universal way to get from a pci device to the pci slot that it is associated with in order to turn it off through the pci hotplug core. That's probably going to be your main stumbling block. > > And if you are a kernel thread, you would have the same issues that > > dropping to userspace and doing the disconnect there causes. > > Not all of the same issues; a kernel thread doesn't have to worry > about its code or data being paged out, for instance. Good point, I didn't think of that. > Well, I would be concerned about the maximum latency, not the average > latency. I accept that the average would be milliseconds, but the > maximum could be tens of seconds on a heavily loaded system, couldn't > it? Especially if it involves execing a new process and that requires > disk I/O. Yes it could. But so could execution of a kernel thread, right? This isn't a "real time" system by any means :) > > Now the issue of putting the hotplug script on a disk that just got a > > error would indicate that you really need a type (a) driver for that > > kind of thing. > > Part of my thinking is that I would like the API for type (a) drivers > to be an extension of the PCI hotplug API rather than being completely > disjoint. In other words, I would like the type (a) driver to get the > unplug event, and then determine (via a special call, or a parameter > to the remove() function) that this is an EEH event and therefore the > absence of the device is likely to be transient. The driver would > then not report the removal immediately, but would wait (with a > timeout) for the device to come back. When it came back it would > recognize that this is the same device, reinitialize it and carry on. > If the device didn't come back shortly, then it would do the normal > device removal things. Ok, that would work. It would take a bunch of driver tweeking, but I'll leave that to you. > In any case, whatever the API, we are going to have to have the > infrastructure in the kernel to do the slot reset and reconnect, for > type (a) drivers to use. Type (a) drivers need to be able to recover > without relying on userspace, obviously. It doesn't make sense to me > to have the same logic in two places, in the kernel and in userspace, > and use one or the other depending on what sort of driver we have. No, use the logic only in userspace, don't duplicate it in the kernel too. Remember, you are asking to put policy about EEH events into the kernel. Do you really want that? If you can say that you will _always_ just want to call release() on the driver, then I'll buy it and it will remain a PPC64 specific "feature". If, on the otherhand, you want to do something that will work for all platforms, I suggest the userspace hotplug method. That will work for everyone. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Mar 19 13:05:11 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 19 Mar 2004 13:05:11 +1100 Subject: [PATCH]: Re: KDB updates In-Reply-To: <20040318163107.K33924@forte.austin.ibm.com> References: <20040318141647.GA6238@in.ibm.com> <20040318163107.K33924@forte.austin.ibm.com> Message-ID: <16474.21847.546643.969930@cargo.ozlabs.ibm.com> Linas, > I don't have bk push authority to ameslab, can you commit the attached > patch from Ananth? (unless Ananth writes back and says 'stop don't do it') > I tested it on power3, it handles user-space stacks much better, and also > fixes my '10 seconds to enter kdb' complaint. -ENOPATCH :) Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ananth at in.ibm.com Fri Mar 19 20:12:11 2004 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 19 Mar 2004 14:12:11 +0500 Subject: [PATCH]: Re: KDB updates In-Reply-To: <16474.21847.546643.969930@cargo.ozlabs.ibm.com> References: <20040318141647.GA6238@in.ibm.com> <20040318163107.K33924@forte.austin.ibm.com> <16474.21847.546643.969930@cargo.ozlabs.ibm.com> Message-ID: <20040319091211.GA7776@in.ibm.com> On Fri, Mar 19, 2004 at 01:05:11PM +1100, Paul Mackerras wrote: > Linas, > > > I don't have bk push authority to ameslab, can you commit the attached > > patch from Ananth? (unless Ananth writes back and says 'stop don't do it') > > I tested it on power3, it handles user-space stacks much better, and also > > fixes my '10 seconds to enter kdb' complaint. > > -ENOPATCH :) Inlined is the patch.. Anton, I needed to declare extern cpumask_t irq_affinity[NR_IRQS]; in xics.c for a successful "uni" build with ameslab on a Power3 system. The inlined patch contains this addition too. -- Ananth Narayan Linux Technology Center, IBM Software Lab, INDIA diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kernel/xics.c ameslab/arch/ppc64/kernel/xics.c --- temp/ameslab/arch/ppc64/kernel/xics.c 2004-03-15 10:03:58.000000000 +0530 +++ ameslab/arch/ppc64/kernel/xics.c 2004-03-16 09:40:21.000000000 +0530 @@ -238,6 +238,7 @@ /* XXX Fix this when we clean up large irq support */ extern cpumask_t get_irq_affinity(unsigned int irq); +extern cpumask_t irq_affinity[NR_IRQS]; static int get_irq_server(unsigned int irq) { diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdba_bp.c ameslab/arch/ppc64/kdb/kdba_bp.c --- temp/ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-15 10:03:56.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-16 14:46:47.000000000 +0530 @@ -92,6 +92,9 @@ unsigned long primary; unsigned long extended; + if (KDB_NULL_REGS(ef)) + return KDB_DB_NOBPT; + msr = get_msr(); trap = ef->trap; if (KDB_DEBUG(BP)) @@ -187,9 +190,6 @@ if (rv > 0) goto handled; - goto handle; - - handle: /* @@ -271,6 +271,8 @@ kdb_dbtrap_t rv; kdb_bp_t *bp; + if (KDB_NULL_REGS(ef)) + return KDB_DB_NOBPT; /* * Determine which breakpoint was encountered. */ @@ -294,6 +296,13 @@ kdb_id1(ef->nip); rv = KDB_DB_BPT; bp->bp_delay = 1; + /* SSBPT is set when the kernel debugger must single + * step a task in order to re-establish an instruction + * breakpoint which uses the instruction replacement + * mechanism. It is cleared by any action that removes + * the need to single-step the breakpoint + */ + KDB_STATE_SET(SSBPT); break; } } @@ -324,7 +333,7 @@ static void kdba_handle_bp(kdb_eframe_t ef, kdb_bp_t *bp) { - if (!ef) { + if (KDB_NULL_REGS(ef)) { kdb_printf("kdba_handle_bp: ef == NULL\n"); return; } @@ -337,12 +346,6 @@ */ kdba_setsinglestep(ef); - /* KDB_STATE_SSBPT is set when the kernel debugger must single step - * a task in order to re-establish an instruction breakpoint which - * uses the instruction replacement mechanism. - */ - KDB_STATE_SET(SSBPT); - /* * Reset delay attribute */ @@ -665,7 +668,9 @@ * * For instruction replacement breakpoints, we must single-step * over the replaced instruction at this point so we can re-install - * the breakpoint instruction after the single-step. + * the breakpoint instruction after the single-step. SSBPT is set + * when the breakpoint is initially hit and is cleared by any action + * that removes the need for single-step over the breakpoint. */ int @@ -679,6 +684,8 @@ if (KDB_DEBUG(BP)) { kdb_printf("kdba_installbp bp_installed %d\n", bp->bp_installed); } + if (!KDB_STATE(SSBPT)) + bp->bp_delay = 0; if (!bp->bp_installed) { if (bp->bp_hardtype) { kdba_installdbreg(bp); @@ -695,14 +702,15 @@ if (KDB_DEBUG(BP)) kdb_printf("0x%lx 0x%lx 0x%lx\n",bp->bp_inst,bp->bp_addr,sizeof(bp->bp_addr)); rc = kdb_getword(&bp->bp_inst, bp->bp_addr,sizeof(bp->bp_addr)); - kdb_putword(bp->bp_addr, PPC64_BREAKPOINT_INSTRUCTION,sizeof(PPC64_BREAKPOINT_INSTRUCTION)); + if (kdb_putword(bp->bp_addr, PPC64_BREAKPOINT_INSTRUCTION,sizeof(PPC64_BREAKPOINT_INSTRUCTION))) + return (1); if (KDB_DEBUG(BP)) kdb_printf("kdba_installbp instruction 0x%x at " kdb_bfd_vma_fmt "\n", PPC64_BREAKPOINT_INSTRUCTION, bp->bp_addr); bp->bp_installed = 1; } } -return 0; + return 0; } /* diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdba_id.c ameslab/arch/ppc64/kdb/kdba_id.c --- temp/ameslab/arch/ppc64/kdb/kdba_id.c 2004-03-15 10:03:56.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdba_id.c 2004-03-16 14:33:30.000000000 +0530 @@ -194,8 +194,6 @@ int kdba_id_parsemode(const char *mode, disassemble_info *dip) { - - return 0; } diff -Naur --exclude=BitKeeper --exclude=SCCS temp/ameslab/arch/ppc64/kdb/kdbasupport.c ameslab/arch/ppc64/kdb/kdbasupport.c --- temp/ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-15 10:03:56.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-16 10:17:26.000000000 +0530 @@ -516,7 +516,9 @@ ef = ®s; } cpus_in_kdb++; + kdb_save_running(ef); rv = kdb_main_loop(reason, reason2, error, db_result, ef); + kdb_unsave_running(ef); cpus_in_kdb--; return rv; } --- temp/ameslab/arch/ppc64/kdb/kdba_bt.c 2004-03-15 10:03:56.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdba_bt.c 2004-03-17 13:39:53.000000000 +0530 @@ -113,12 +113,6 @@ unsigned long symsize,symoffset; char *symmodname; - if (!regs && !addr) - { - kdb_printf(" invalid regs pointer \n"); - return 0; - } - /* * The caller may have supplied an address at which the * stack traceback operation should begin. This address @@ -129,12 +123,29 @@ * entitled '' was called from the function which * contains return-address. */ - if (addr) { + if (!addr) + addr = (kdb_machreg_t *)p->thread.ksp; + + if (addr && !task_curr(p)) { eip = 0; esp = *addr; - ebp=0; + ebp = 0; } else { - ebp=regs->link; + if (task_curr(p)) { + struct kdb_running_process *krp = kdb_running_process + task_cpu(p); + if (!krp->seqno) { + kdb_printf("Process did not save state, cannot backtrace \n"); + kdb_ps1(p); + return 0; + } + regs = krp->regs; + } else { + if (!regs) + regs = p->thread.regs; + } + if (KDB_NULL_REGS(regs)) + return KDB_BADREG; + ebp = regs->link; eip = regs->nip; if (regs_esp) esp = regs->gpr[1]; @@ -318,5 +329,5 @@ int kdba_bt_process(struct task_struct *p, int argcount) { - return (kdba_bt_stack_ppc(p->thread.regs, (kdb_machreg_t *) p->thread.ksp, argcount, p, 0)); + return (kdba_bt_stack_ppc(NULL, NULL, argcount, p, 0)); } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 04:45:30 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 11:45:30 -0600 Subject: more eeh In-Reply-To: <20040319000116.GC17586@kroah.com>; from greg@kroah.com on Thu, Mar 18, 2004 at 04:01:16PM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> Message-ID: <20040319114529.M33924@forte.austin.ibm.com> On Thu, Mar 18, 2004 at 04:01:16PM -0800, Greg KH wrote: > > But you want userspace to do this. There are systems with a few > different PCI Hotplug controller drivers on them. The different > controller drivers control different slots. Userspace is the only place > that can reliably handle this. There are several issues here. First, the only controllers that have this feature are the ppc64 phb controllers. Which is not an excuse for sloppy coding, since , yes, there might be other hardware in the future that does this. More importantly, you've got to recognize that many (most?) EEH events are going to be 'transient' i.e. single-shot parity errors and the like. If the error occured e.g. on a scsi controller, this type of errors can be recovered without any need to unmount the file system that sits above the block device that sits on the scsi driver. In particular, if the EEH error hit the scsi controller that has the root volume, there would be no way to actually call user-space code (since this code is probably not paged into the kernel, and there can't be any disk access till the error is cleared.) To reiterate: if there is a *permanent* hardware failure that EEH cannot recover from, then, yes, the right thing is to bounce it back up to the user-space scripts that can then deal with the event. Else, for transient events, its is far more elegent to handle these in a layer that hides them from the affected block devices/socket layer. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 05:01:36 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 12:01:36 -0600 Subject: more eeh In-Reply-To: <20040319005026.GD19053@kroah.com>; from greg@kroah.com on Thu, Mar 18, 2004 at 04:50:26PM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> Message-ID: <20040319120136.N33924@forte.austin.ibm.com> On Thu, Mar 18, 2004 at 04:50:26PM -0800, Greg KH wrote: > > No, you need to call pci_remove_bus_device() for that device so that the > pci core properly cleans up after your device is gone. That's what the > pci hotplug controller drivers eventually call, after handling a bunch > of other housekeeping that their hardware requires (putting resources > back into a pool, etc.) If the error is transient and can be cleared, does one still need to call pci_remove_bus_device()? Or is it possible to reset the device state without making this call? Remember, the goal is to reset the device into a working state asap, with minimal disturbance of higher layers. > Remember, you are asking to put policy about EEH events into the kernel. If the EEH event takes down the disk on which the user-land scripts reside, then the event can't be recovered withut putting some kind of policy in the kernel. > remain a PPC64 specific "feature". If, on the otherhand, you want to do > something that will work for all platforms, I suggest the userspace > hotplug method. That will work for everyone. As far as I know, there are no other pci controllers that support this function. If you know of any plans by Dell or Sun or HP to ship something like this in the future, I would welcome the contact, the introduction to said parties. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From frowand at mvista.com Sat Mar 20 05:28:40 2004 From: frowand at mvista.com (Frank Rowand) Date: Fri, 19 Mar 2004 10:28:40 -0800 Subject: more eeh In-Reply-To: <20040319120136.N33924@forte.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> <20040319120136.N33924@forte.austin.ibm.com> Message-ID: <405B3BD8.4090909@mvista.com> linas at austin.ibm.com wrote: > On Thu, Mar 18, 2004 at 04:50:26PM -0800, Greg KH wrote: > >>No, you need to call pci_remove_bus_device() for that device so that the >>pci core properly cleans up after your device is gone. That's what the >>pci hotplug controller drivers eventually call, after handling a bunch >>of other housekeeping that their hardware requires (putting resources >>back into a pool, etc.) > > > If the error is transient and can be cleared, does one still need to call > pci_remove_bus_device()? Or is it possible to reset the device state > without making this call? Remember, the goal is to reset the device > into a working state asap, with minimal disturbance of higher layers. > > >>Remember, you are asking to put policy about EEH events into the kernel. > > > If the EEH event takes down the disk on which the user-land scripts reside, > then the event can't be recovered withut putting some kind of policy > in the kernel. > > >>remain a PPC64 specific "feature". If, on the otherhand, you want to do >>something that will work for all platforms, I suggest the userspace >>hotplug method. That will work for everyone. > > > As far as I know, there are no other pci controllers that support this > function. If you know of any plans by Dell or Sun or HP to ship something > like this in the future, I would welcome the contact, the introduction > to said parties. > > --linas HP servers have had similar functionality since around 1999. I don't know the official names for customers, but internally they were Prelude, Rhapsody, Staccato, Superdome. They were PA-RISC systems. I'm not sure if they same PCI controllers were used for the IA-64 systems. I also don't know if there is any Linux support for these systems. You could try the pa-risc linux email list for more info. If no one responds, let me know and I'll call some of my old contacts. -Frank -- Frank Rowand MontaVista Software, Inc ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 20 05:37:11 2004 From: greg at kroah.com (Greg KH) Date: Fri, 19 Mar 2004 10:37:11 -0800 Subject: more eeh In-Reply-To: <20040319120136.N33924@forte.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> <20040319120136.N33924@forte.austin.ibm.com> Message-ID: <20040319183707.GB11135@kroah.com> On Fri, Mar 19, 2004 at 12:01:36PM -0600, linas at austin.ibm.com wrote: > On Thu, Mar 18, 2004 at 04:50:26PM -0800, Greg KH wrote: > > > > No, you need to call pci_remove_bus_device() for that device so that the > > pci core properly cleans up after your device is gone. That's what the > > pci hotplug controller drivers eventually call, after handling a bunch > > of other housekeeping that their hardware requires (putting resources > > back into a pool, etc.) > > If the error is transient and can be cleared, does one still need to call > pci_remove_bus_device()? Depends, if you want the pci device to be removed from the kernel entirely, you need that call. If not, then you do not. But you better not be calling release() on the device on your own either... > Or is it possible to reset the device state without making this call? Depends on the device and the driver. > Remember, the goal is to reset the device into a working state asap, > with minimal disturbance of higher layers. See Paul's discussion of the different ways of doing this. You are talking about a device controlled by type (a) of a driver. For the rest of the world, you will have a hard time doing this. > > Remember, you are asking to put policy about EEH events into the kernel. > > If the EEH event takes down the disk on which the user-land scripts reside, > then the event can't be recovered withut putting some kind of policy > in the kernel. True, but it all depends on the level of "robustness" you want to be able to handle. Odds are you do not need this kind all of the time. > > remain a PPC64 specific "feature". If, on the otherhand, you want to do > > something that will work for all platforms, I suggest the userspace > > hotplug method. That will work for everyone. > > As far as I know, there are no other pci controllers that support this > function. If you know of any plans by Dell or Sun or HP to ship something > like this in the future, I would welcome the contact, the introduction > to said parties. PCI Express handles this kind of functionality. And as I already have a PCI Express box sitting next to me right now, this kind of functionality is not limited to the PPC64 platform anymore. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 20 05:42:53 2004 From: greg at kroah.com (Greg KH) Date: Fri, 19 Mar 2004 10:42:53 -0800 Subject: more eeh In-Reply-To: <20040319114529.M33924@forte.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <20040319114529.M33924@forte.austin.ibm.com> Message-ID: <20040319184253.GC11135@kroah.com> On Fri, Mar 19, 2004 at 11:45:30AM -0600, linas at austin.ibm.com wrote: > On Thu, Mar 18, 2004 at 04:01:16PM -0800, Greg KH wrote: > > > > But you want userspace to do this. There are systems with a few > > different PCI Hotplug controller drivers on them. The different > > controller drivers control different slots. Userspace is the only place > > that can reliably handle this. > > There are several issues here. First, the only controllers that > have this feature are the ppc64 phb controllers. For the PPC64 platform today. I guarantee that this will not be true this time next year. > Which is not an excuse for sloppy coding, since , yes, there might be > other hardware in the future that does this. There is other hardware shipping next month that does this (or whenever PCI Express finally makes it out into the real world, should be any day now...) So, you are correct, there is no excuse for sloppy coding, or special casing this kind of stuff. > More importantly, you've got to recognize that many (most?) EEH > events are going to be 'transient' i.e. single-shot parity errors > and the like. I don't know, is this really true? Do you have any research showing this? I've seen flaky pci cards die horrible deaths all the time in my testing. > If the error occured e.g. on a scsi controller, this type of errors > can be recovered without any need to unmount the file system that sits > above the block device that sits on the scsi driver. "transient", yes. But what determines if this is such a error and not a more serious one? Do you have that level of "seriousness" detection in your hardware controller? > In particular, if the EEH error hit the scsi controller that has > the root volume, there would be no way to actually call user-space > code (since this code is probably not paged into the kernel, and > there can't be any disk access till the error is cleared.) True, but again, it's a rare case, right? If you are really worried about this kind of stuff, put your hotplug scripts (and bash) on a ramfs partition. I've heard of embedded people doing this all the time to allow disks to spin down and yet still have a system with good response times to different events. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jschopp at austin.ibm.com Sat Mar 20 07:28:35 2004 From: jschopp at austin.ibm.com (jschopp at austin.ibm.com) Date: Fri, 19 Mar 2004 14:28:35 -0600 (CST) Subject: [PATCH] xmon compiler warning Message-ID: I've been getting the compiler warning: arch/ppc64/xmon/start.c: In function `xmon_readchar': arch/ppc64/xmon/start.c:104: warning: implicit declaration of function `xmon_printf' for awhile now. I've attached a very simple patch to make it go away. I'll push to ameslab if there are no objections. diff -Nru a/arch/ppc64/xmon/start.c b/arch/ppc64/xmon/start.c --- a/arch/ppc64/xmon/start.c Fri Mar 19 14:25:12 2004 +++ b/arch/ppc64/xmon/start.c Fri Mar 19 14:25:12 2004 @@ -19,6 +19,7 @@ #include #include #include +#include "nonstdio.h" #ifdef CONFIG_MAGIC_SYSRQ @@ -63,10 +64,6 @@ return udbg_getc_poll(); } -void *xmon_stdin; -void *xmon_stdout; -void *xmon_stderr; - int xmon_putc(int c, void *f) { ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 08:03:47 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 15:03:47 -0600 Subject: more eeh In-Reply-To: <20040319183707.GB11135@kroah.com>; from greg@kroah.com on Fri, Mar 19, 2004 at 10:37:11AM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> <20040319120136.N33924@forte.austin.ibm.com> <20040319183707.GB11135@kroah.com> Message-ID: <20040319150347.Q33924@forte.austin.ibm.com> On Fri, Mar 19, 2004 at 10:37:11AM -0800, Greg KH wrote: > Depends on the device and the driver. Untill I do some prototyping and studying, I don't want to (won't be able to) speculate on the next level of detail. > PCI Express handles this kind of functionality. And as I already have a > PCI Express box sitting next to me right now, this kind of functionality > is not limited to the PPC64 platform anymore. Hmm. OK. So... is there anyone else that you know of who is interested in adding this function to the Linux kernel? I'm hoping to prototype the full-fledged recovery mechanism over the upcoming weeks/months, and will try hard to think in a generic/abstract way. How does one go about getting the pci-X spec? I went to the website, discovered that I had to be a member of the consortium to get my hands on the goodies ... -- linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 20 08:13:01 2004 From: greg at kroah.com (Greg KH) Date: Fri, 19 Mar 2004 13:13:01 -0800 Subject: more eeh In-Reply-To: <20040319150347.Q33924@forte.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> <20040319120136.N33924@forte.austin.ibm.com> <20040319183707.GB11135@kroah.com> <20040319150347.Q33924@forte.austin.ibm.com> Message-ID: <20040319211300.GA6674@kroah.com> On Fri, Mar 19, 2004 at 03:03:47PM -0600, linas at austin.ibm.com wrote: > > PCI Express handles this kind of functionality. And as I already have a > > PCI Express box sitting next to me right now, this kind of functionality > > is not limited to the PPC64 platform anymore. > > Hmm. OK. So... is there anyone else that you know of who is interested > in adding this function to the Linux kernel? There are a number of Intel people working on the PCI Express support for this. Try asking on linux-kernel > How does one go about getting the pci-X spec? You mean PCI Express, right? It's very different from PCI X :) > I went to the website, discovered that I had to be a member of the > consortium to get my hands on the goodies ... You are a member, due to your employeer. I'll send you a copy off-list. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 08:24:07 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 15:24:07 -0600 Subject: more eeh In-Reply-To: <20040319184253.GC11135@kroah.com>; from greg@kroah.com on Fri, Mar 19, 2004 at 10:42:53AM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <20040319114529.M33924@forte.austin.ibm.com> <20040319184253.GC11135@kroah.com> Message-ID: <20040319152407.R33924@forte.austin.ibm.com> On Fri, Mar 19, 2004 at 10:42:53AM -0800, Greg KH wrote: > > > More importantly, you've got to recognize that many (most?) EEH > > events are going to be 'transient' i.e. single-shot parity errors > > and the like. > > I don't know, is this really true? Do you have any research showing > this? I've seen flaky pci cards die horrible deaths all the time in my > testing. Yes, I wish I had good data on this stuff; I haven't yet found anyone who has it, and at the moment, I'm just getting anectodal feeback. Basically, there are complaints from the field that the Linux kernel panics on error, and after reboot, the hardware's fine, so please fix the panic. It would take some sort of data mining of customer support calls to get good science out of this; I don't know if thats been done or who has done this. Who knows, IBM research journal might have an article on this. > > If the error occured e.g. on a scsi controller, this type of errors > > can be recovered without any need to unmount the file system that sits > > above the block device that sits on the scsi driver. > > "transient", yes. But what determines if this is such a error and not a > more serious one? Do you have that level of "seriousness" detection in > your hardware controller? I'm expecting the one-shot errors to be parity errors, the hardware detects parity errors. I don't know PCI well enough to know the other likely scenarios. I do remember from my childhood that on the scsi bus, if you short the bus attn line to ground, it will take down the whole scsi chain. I assume there are similar scenarios on pci. > > In particular, if the EEH error hit the scsi controller that has > > the root volume, there would be no way to actually call user-space > > code (since this code is probably not paged into the kernel, and > > there can't be any disk access till the error is cleared.) > > True, but again, it's a rare case, right? If you are really worried > about this kind of stuff, put your hotplug scripts (and bash) on a ramfs > partition. I've heard of embedded people doing this all the time to > allow disks to spin down and yet still have a system with good response > times to different events. Hmm. That's an idea. Can I use the scripts to recover the device without having to unmount the filesystem above? I was thinking of a recovery mechanism similar to the current scsi-generic mechanism of reseting the adapter first, then, if that doesn't work, then the bus, if that doesn't work the hba, and if it still fails, only then report the error to higher layers. The scsi error resets aren't scripted (or weren't last time I looked :). I'm not sure if they should be; I suppose in some wild SAN fabric one would need to but I don't know that level of stuff. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 20 08:29:30 2004 From: greg at kroah.com (Greg KH) Date: Fri, 19 Mar 2004 13:29:30 -0800 Subject: more eeh In-Reply-To: <20040319152407.R33924@forte.austin.ibm.com> References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <20040319114529.M33924@forte.austin.ibm.com> <20040319184253.GC11135@kroah.com> <20040319152407.R33924@forte.austin.ibm.com> Message-ID: <20040319212930.GA13226@kroah.com> On Fri, Mar 19, 2004 at 03:24:07PM -0600, linas at austin.ibm.com wrote: > > True, but again, it's a rare case, right? If you are really worried > > about this kind of stuff, put your hotplug scripts (and bash) on a ramfs > > partition. I've heard of embedded people doing this all the time to > > allow disks to spin down and yet still have a system with good response > > times to different events. > > Hmm. That's an idea. Can I use the scripts to recover the device without > having to unmount the filesystem above? I don't know. How are you supposed to "recover the device"? thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 08:39:38 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 15:39:38 -0600 Subject: more eeh In-Reply-To: <20040319212930.GA13226@kroah.com>; from greg@kroah.com on Fri, Mar 19, 2004 at 01:29:30PM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <20040319114529.M33924@forte.austin.ibm.com> <20040319184253.GC11135@kroah.com> <20040319152407.R33924@forte.austin.ibm.com> <20040319212930.GA13226@kroah.com> Message-ID: <20040319153938.S33924@forte.austin.ibm.com> On Fri, Mar 19, 2004 at 01:29:30PM -0800, Greg KH wrote: > On Fri, Mar 19, 2004 at 03:24:07PM -0600, linas at austin.ibm.com wrote: > > > True, but again, it's a rare case, right? If you are really worried > > > about this kind of stuff, put your hotplug scripts (and bash) on a ramfs > > > partition. I've heard of embedded people doing this all the time to > > > allow disks to spin down and yet still have a system with good response > > > times to different events. > > > > Hmm. That's an idea. Can I use the scripts to recover the device without > > having to unmount the filesystem above? > > I don't know. How are you supposed to "recover the device"? Don't know yet. TBD. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 08:42:21 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 15:42:21 -0600 Subject: KDB updates In-Reply-To: <20040318155527.J33924@forte.austin.ibm.com>; from linas@austin.ibm.com on Thu, Mar 18, 2004 at 03:55:27PM -0600 References: <20040318141647.GA6238@in.ibm.com> <20040318155527.J33924@forte.austin.ibm.com> Message-ID: <20040319154220.T33924@forte.austin.ibm.com> On Thu, Mar 18, 2004 at 03:55:27PM -0600, linas at austin.ibm.com wrote: > > Just curious if you had the patch working on SMP? Are there any other > > works on power3 smp for me, I will try power4 lpar when I get the chance. works on a power-4 lpar with 2 cpu's in it, too. Did not test in non-LPAR mode. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 20 11:24:31 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 19 Mar 2004 18:24:31 -0600 Subject: EEH, hotplug, bash scripts, error reporting. Message-ID: <20040319182431.X33924@forte.austin.ibm.com> Greg, Is there any sort of architecture or policy w.r.t. hot-plug scripts and how they relate to syslog and to RAS? On the pSeries platform, hardware errors are supposed to go through RTAS into NVRAM, which then somehow gets into the syslog. Then some sort of "Error Log Analysis" (ELA) tool parses the syslog entries, and reports the error up to the "Remote Management Control". Which then sends an event to the Service Focal Point (SFP) on the Hardware Management Console (HMC). This is the current offical pSeries Linux RAS architecture. But I'm sitting here thinking "gee, here's my EEH code, its detected a permanant failure on some PCI slot. It can/it should invoke some hotplug script". So it occured to me that the whole /sbin/hotplug architecture is completely orthogonal to the whole syslog architecture. And the syslog architecture is orhtogonal to the pSeries RAS architecture. I beleive there's some work on a next-gen syslogd (in sourceforge, I forgot the name) which will allow the sysadmin to write shell scripts to deal with various system events, such as hotplug events as well as demonic icmp redirect packets. But those events seem to be orthogonal to the subsystem of hotplug events. So at this point my mind boggled. Any words of insight on this state of affairs? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 20 11:31:01 2004 From: greg at kroah.com (Greg KH) Date: Fri, 19 Mar 2004 16:31:01 -0800 Subject: EEH, hotplug, bash scripts, error reporting. In-Reply-To: <20040319182431.X33924@forte.austin.ibm.com> References: <20040319182431.X33924@forte.austin.ibm.com> Message-ID: <20040320003101.GA17972@kroah.com> On Fri, Mar 19, 2004 at 06:24:31PM -0600, linas at austin.ibm.com wrote: > > Is there any sort of architecture or policy w.r.t. hot-plug scripts > and how they relate to syslog and to RAS? In one word, no. The hotplug framework allows you to run any program you want based on the type of event that happened. It's up to you to do what you want to with that event. > But I'm sitting here thinking "gee, here's my EEH code, its detected > a permanant failure on some PCI slot. It can/it should invoke some > hotplug script". So it occured to me that the whole /sbin/hotplug > architecture is completely orthogonal to the whole syslog architecture. > And the syslog architecture is orhtogonal to the pSeries RAS architecture. The two things are separate, correct. But it is quite easy to have a hotplug script/program write to the syslog if that is what your platform requires. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Sun Mar 21 10:27:04 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 20 Mar 2004 17:27:04 -0600 Subject: ameslab build break w/CONFIG_HOTPLUG_CPU=y Message-ID: <405CD348.5060600@austin.ibm.com> Hi- The last hand-merge of kernel/cpu.c broke builds with cpu hotplug enabled. Looks like we duplicated the #ifdef CONFIG_HOTPLUG_CPU section in the file, and failed to include notifier.h. Nathan -------------- next part -------------- A non-text attachment was scrubbed... Name: cpu_hotplug_build_break.patch Type: text/x-patch Size: 3151 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040320/07298b59/attachment.bin From anton at samba.org Mon Mar 22 01:02:52 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 22 Mar 2004 01:02:52 +1100 Subject: [PATCH]: Re: KDB updates In-Reply-To: <20040319091211.GA7776@in.ibm.com> References: <20040318141647.GA6238@in.ibm.com> <20040318163107.K33924@forte.austin.ibm.com> <16474.21847.546643.969930@cargo.ozlabs.ibm.com> <20040319091211.GA7776@in.ibm.com> Message-ID: <20040321140252.GP1153@krispykreme> > Inlined is the patch.. Thanks Ananth, patch applied to ameslab. > I needed to declare > > extern cpumask_t irq_affinity[NR_IRQS]; > > in xics.c for a successful "uni" build with ameslab on a Power3 system. > The inlined patch contains this addition too. I had a look over that code, we should really compile out all the get_irq_server() tricks in xics.c. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Mon Mar 22 01:33:50 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 22 Mar 2004 01:33:50 +1100 Subject: pci_dma_error Message-ID: <20040321143349.GT1153@krispykreme> Hi, pci_dma_error() just went into 2.6. In order to catch the drivers that have used NO_TCE directly, Im proposing the following patch. Anton ===== iommu.c 1.2 vs edited ===== --- 1.2/arch/ppc64/kernel/iommu.c Thu Mar 4 00:26:23 2004 +++ edited/iommu.c Mon Mar 22 01:30:18 2004 @@ -73,7 +73,7 @@ if (unlikely(npages) == 0) { if (printk_ratelimit()) WARN_ON(1); - return NO_TCE; + return PCI_DMA_ERROR_CODE; } if (handle && *handle) @@ -109,7 +109,7 @@ goto again; } else { /* Third failure, give up */ - return NO_TCE; + return PCI_DMA_ERROR_CODE; } } @@ -143,15 +143,15 @@ unsigned int npages, int direction) { unsigned long entry, flags; - dma_addr_t ret = NO_TCE; + dma_addr_t ret = PCI_DMA_ERROR_CODE; spin_lock_irqsave(&(tbl->it_lock), flags); entry = iommu_range_alloc(tbl, npages, NULL); - if (unlikely(entry == NO_TCE)) { + if (unlikely(entry == PCI_DMA_ERROR_CODE)) { spin_unlock_irqrestore(&(tbl->it_lock), flags); - return NO_TCE; + return PCI_DMA_ERROR_CODE; } entry += tbl->it_offset; /* Offset into real TCE table */ @@ -262,7 +262,7 @@ DBG(" - vaddr: %lx, size: %lx\n", vaddr, slen); /* Handle failure */ - if (unlikely(entry == NO_TCE)) { + if (unlikely(entry == PCI_DMA_ERROR_CODE)) { if (printk_ratelimit()) printk(KERN_INFO "iommu_alloc failed, tbl %p vaddr %lx" " npages %lx\n", tbl, vaddr, npages); @@ -326,7 +326,7 @@ */ if (outcount < nelems) { outs++; - outs->dma_address = NO_TCE; + outs->dma_address = PCI_DMA_ERROR_CODE; outs->dma_length = 0; } return outcount; ===== pci_iommu.c 1.2 vs edited ===== --- 1.2/arch/ppc64/kernel/pci_iommu.c Thu Mar 4 00:26:23 2004 +++ edited/pci_iommu.c Mon Mar 22 01:30:23 2004 @@ -82,7 +82,7 @@ if (order >= IOMAP_MAX_ORDER) { printk("PCI_DMA: pci_alloc_consistent size too large: 0x%lx\n", size); - return (void *)NO_TCE; + return (void *)PCI_DMA_ERROR_CODE; } tbl = devnode_table(hwdev); @@ -101,7 +101,7 @@ /* Set up tces to cover the allocated range */ mapping = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL); - if (mapping == NO_TCE) { + if (mapping == PCI_DMA_ERROR_CODE) { free_pages((unsigned long)ret, order); ret = NULL; } else @@ -139,7 +139,7 @@ size_t size, int direction) { struct iommu_table * tbl; - dma_addr_t dma_handle = NO_TCE; + dma_addr_t dma_handle = PCI_DMA_ERROR_CODE; unsigned long uaddr; unsigned int npages; @@ -153,7 +153,7 @@ if (tbl) { dma_handle = iommu_alloc(tbl, vaddr, npages, direction); - if (dma_handle == NO_TCE) { + if (dma_handle == PCI_DMA_ERROR_CODE) { if (printk_ratelimit()) { printk(KERN_INFO "iommu_alloc failed, tbl %p vaddr %p npages %d\n", tbl, vaddr, npages); ===== vio.c 1.11 vs edited ===== --- 1.11/arch/ppc64/kernel/vio.c Tue Mar 16 22:30:36 2004 +++ edited/vio.c Mon Mar 22 01:30:27 2004 @@ -419,7 +419,7 @@ size_t size, int direction ) { struct iommu_table *tbl; - dma_addr_t dma_handle = NO_TCE; + dma_addr_t dma_handle = PCI_DMA_ERROR_CODE; unsigned long uaddr; unsigned int npages; @@ -504,7 +504,7 @@ /* It is easier to debug here for the drivers than in the tce tables.*/ if(order >= IOMAP_MAX_ORDER) { printk("VIO_DMA: vio_alloc_consistent size to large: 0x%lx \n", size); - return (void *)NO_TCE; + return (void *)PCI_DMA_ERROR_CODE; } tbl = dev->iommu_table; @@ -517,7 +517,7 @@ memset(ret, 0, npages << PAGE_SHIFT); /* Set up tces to cover the allocated range */ tce = iommu_alloc(tbl, ret, npages, PCI_DMA_BIDIRECTIONAL); - if (tce == NO_TCE) { + if (tce == PCI_DMA_ERROR_CODE) { PPCDBG(PPCDBG_TCE, "vio_alloc_consistent: iommu_alloc failed\n" ); free_pages((unsigned long)ret, order); ret = NULL; ===== iommu.h 1.2 vs edited ===== --- 1.2/include/asm-ppc64/iommu.h Thu Mar 4 00:26:24 2004 +++ edited/iommu.h Mon Mar 22 01:29:27 2004 @@ -33,8 +33,6 @@ */ #define IOMAP_MAX_ORDER 10 -#define NO_TCE ((dma_addr_t)-1) - /* * Tces come in two formats, one for the virtual bus and a different * format for PCI ===== pci.h 1.20 vs edited ===== --- 1.20/include/asm-ppc64/pci.h Sun Mar 14 17:54:58 2004 +++ edited/pci.h Mon Mar 22 01:29:12 2004 @@ -169,6 +169,12 @@ return 0; } +#define PCI_DMA_ERROR_CODE (~(dma_addr_t)0x0) +static inline int pci_dma_error(dma_addr_t dma_addr) +{ + return (dma_addr == PCI_DMA_ERROR_CODE); +} + extern int pci_domain_nr(struct pci_bus *bus); /* Set the name of the bus as it appears in /proc/bus/pci */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Mon Mar 22 21:07:16 2004 From: anton at samba.org (Anton Blanchard) Date: Mon, 22 Mar 2004 21:07:16 +1100 Subject: [PATCH] ignore huge OF properties In-Reply-To: <4058B5CE.1010503@redhat.com> References: <20040316070649.GP19737@krispykreme> <4058B5CE.1010503@redhat.com> Message-ID: <20040322100716.GY1153@krispykreme> Hi Julie, > In addition to the patch you provided, it is also necessary to ensure > that the initrd image cannot be overwritten by calls into prom such as: > > pp->length = (int)(long) call_prom(RELOC("getprop"), 4, 1, node, > namep,valp, mem_end - mem_start); > > > Here, mem_end needs to have been carefully chosen so that it doesn't > start somewhere in the middle of the initrd image or past it. However, > mem_end is arbitrarily chosen by copy_device_node to be 8MB beyond the > starting mem_start value. In code I have been working with, mem_end has > landed well into the initrd memory image. > > The attached patch corrects this problem for the 2.6 ameslab tree. > Please consider pushing it to ameslab, as I don't know how to do that. I think we should be checking further down in inspect_node as well. Also we should rethink that 8MB limit, on our big machines we might have device trees bigger than that. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Mar 23 04:31:26 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 22 Mar 2004 11:31:26 -0600 Subject: pci_dma_error In-Reply-To: <20040321143349.GT1153@krispykreme>; from anton@samba.org on Mon, Mar 22, 2004 at 01:33:50AM +1100 References: <20040321143349.GT1153@krispykreme> Message-ID: <20040322113126.A33924@forte.austin.ibm.com> On Mon, Mar 22, 2004 at 01:33:50AM +1100, Anton Blanchard wrote: > > pci_dma_error() just went into 2.6. In order to catch the drivers that > have used NO_TCE directly, Im proposing the following patch. Go for it. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Tue Mar 23 04:51:23 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Mon, 22 Mar 2004 11:51:23 -0600 Subject: eeh In-Reply-To: <1079653760.23533.9.camel@mudbug.austin.ibm.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> Message-ID: <1079977883.3629.28.camel@mudbug.austin.ibm.com> I haven't received any comments on this patch. Does anyone have objections to it going into Ameslab? thanks. -Nathan On Thu, 2004-03-18 at 17:49, Nathan Fontenot wrote: > Just wanted to put out an updated version the eeh patch, one that > I can push to Ameslab. I hope. Thanks to everyone for their > help in working on this code. > > This patch updates a few things. > > - The need for rpaphp to be built into the kernel is gone. Code > was added to let rpaphp register a mechanism with eeh to do the > disabling of the slot. > > - The disable_slot routine is back to static. As Greg pointed > out, having a routine called disable_slot become a global symbol > become a global symbol is not a good idea. Especially since this > is a very arch specific routine. > > -Instances of __pa() are replaced with virt_to_phys() > > -Moves the slot_errbuf and slot_errbuf_lock off of the stack. > > As always, all comments are welcome. thanks. > > -Nathan F. > -- -- -------------- next part -------------- A non-text attachment was scrubbed... Name: eeh.patch Type: text/x-patch Size: 8408 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040322/c0e36954/attachment.bin From greg at kroah.com Tue Mar 23 06:59:55 2004 From: greg at kroah.com (Greg KH) Date: Mon, 22 Mar 2004 11:59:55 -0800 Subject: eeh In-Reply-To: <1079977883.3629.28.camel@mudbug.austin.ibm.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> Message-ID: <20040322195954.GA27589@kroah.com> On Mon, Mar 22, 2004 at 11:51:23AM -0600, Nathan Fontenot wrote: > I haven't received any comments on this patch. Does anyone have > objections to it going into Ameslab? I still don't like the drivers/pci/hotplug changes, as they are very platform specific. You should at least call /sbin/hotplug too to let userspace know something odd just happened. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Tue Mar 23 08:22:31 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Mon, 22 Mar 2004 15:22:31 -0600 Subject: eeh In-Reply-To: <20040322195954.GA27589@kroah.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> Message-ID: <1079990551.3629.42.camel@mudbug.austin.ibm.com> Yes, the driver/pci/hotplug changes are platform specific, but so are all the changes for this patch. There is a call to /sbin/hotplug. I had to trace it through the code, but the path is ... rpaphp_disable_slot disable_slot rpaphp_unconfig_pci_adapter pci_remove_bus_device pci_destroy_dev device_unregister device_del kobject_del kobject_hotplug kset_hotplug call_usermodehelper("/sbin/hotplug") Linas found some other issues with the patch, so there will be an updated patch later today (i hope). thanks. -Nathan F. On Mon, 2004-03-22 at 13:59, Greg KH wrote: > On Mon, Mar 22, 2004 at 11:51:23AM -0600, Nathan Fontenot wrote: > > I haven't received any comments on this patch. Does anyone have > > objections to it going into Ameslab? > > I still don't like the drivers/pci/hotplug changes, as they are very > platform specific. You should at least call /sbin/hotplug too to let > userspace know something odd just happened. > > thanks, > > greg k-h -- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Tue Mar 23 08:40:40 2004 From: greg at kroah.com (Greg KH) Date: Mon, 22 Mar 2004 13:40:40 -0800 Subject: eeh In-Reply-To: <1079990551.3629.42.camel@mudbug.austin.ibm.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> <1079990551.3629.42.camel@mudbug.austin.ibm.com> Message-ID: <20040322214039.GA7889@kroah.com> On Mon, Mar 22, 2004 at 03:22:31PM -0600, Nathan Fontenot wrote: > Yes, the driver/pci/hotplug changes are platform specific, but so are > all the changes for this patch. My point remains. Try to do this in a non-platform specific way, as you will have to do it eventually, might as well be now. > There is a call to /sbin/hotplug. I had to trace it through the code, > but the path is ... > > rpaphp_disable_slot > disable_slot > rpaphp_unconfig_pci_adapter > pci_remove_bus_device > pci_destroy_dev > device_unregister > device_del > kobject_del > kobject_hotplug > kset_hotplug > call_usermodehelper("/sbin/hotplug") Yeah, fun isn't it :) I mean a "fault" ACTION for the hotplug event before the "this device is now gone" event that your above call chain causes. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at us.ibm.com Tue Mar 23 08:50:44 2004 From: johnrose at us.ibm.com (John H Rose) Date: Mon, 22 Mar 2004 15:50:44 -0600 Subject: eeh In-Reply-To: <20040322195954.GA27589@kroah.com> Message-ID: > I still don't like the drivers/pci/hotplug changes, as they are very > platform specific. Are you commenting on the changes to the rpaphp files? The affected files are entirely platform specific, so I don't understand. Regardless, the hotplug script does get called as Nathan points out. So if you guys decide to go the userspace route, you may be able to start there. John ----------------------- John Rose pSeries Linux Development johnrose at austin.ibm.com Office: 512-838-0298 Tieline: 678-0298 ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Tue Mar 23 09:00:17 2004 From: greg at kroah.com (Greg KH) Date: Mon, 22 Mar 2004 14:00:17 -0800 Subject: eeh In-Reply-To: References: <20040322195954.GA27589@kroah.com> Message-ID: <20040322220017.GA8978@kroah.com> On Mon, Mar 22, 2004 at 03:50:44PM -0600, John H Rose wrote: > > > I still don't like the drivers/pci/hotplug changes, as they are very > > platform specific. > > Are you commenting on the changes to the rpaphp files? The affected > files are entirely platform specific, so I don't understand. He has implemented a PPC64 specific way to power off a pci slot. I object to that. > Regardless, the hotplug script does get called as Nathan points out. But it gets called for a ACTION=remove event, after the card is gone. I want a ACTION=fault event before anything touches the card to give usersapce an option of what it wants to do about the fault. See the archives of this thread for details... thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Mar 23 09:04:06 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 22 Mar 2004 16:04:06 -0600 Subject: eeh In-Reply-To: <20040322214039.GA7889@kroah.com>; from greg@kroah.com on Mon, Mar 22, 2004 at 01:40:40PM -0800 References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> <1079990551.3629.42.camel@mudbug.austin.ibm.com> <20040322214039.GA7889@kroah.com> Message-ID: <20040322160406.D33924@forte.austin.ibm.com> On Mon, Mar 22, 2004 at 01:40:40PM -0800, Greg KH wrote: > > On Mon, Mar 22, 2004 at 03:22:31PM -0600, Nathan Fontenot wrote: > > Yes, the driver/pci/hotplug changes are platform specific, but so are > > all the changes for this patch. > > My point remains. Try to do this in a non-platform specific way, as you > will have to do it eventually, might as well be now. re: now vs. eventually: we are working hard to remove the kernel panic before the SUSE SLES9 deadlines, (ditto RH) i.e. today, so that there can be some semblance of testing before it gets widespread use. > > There is a call to /sbin/hotplug. I had to trace it through the code, > > but the path is ... > > > > rpaphp_disable_slot > > disable_slot > > rpaphp_unconfig_pci_adapter > > pci_remove_bus_device > > pci_destroy_dev > > device_unregister > > device_del > > kobject_del > > kobject_hotplug > > kset_hotplug > > call_usermodehelper("/sbin/hotplug") > > Yeah, fun isn't it :) > > I mean a "fault" ACTION for the hotplug event before the "this device is > now gone" event that your above call chain causes. The EEH hardware makes the device 'gone' before any software anywhere is able to do anything. From the kernel point of view, the 'gone-ness' of the device is completely asynchronous, its as if the sysadmin yanked out the the card, without telling the kernel in advance. The card is gone, the best we can do is to try to clean up after it. Actually, I'm not clear on this, I might be wrong, but: if the sysadmin really does unplug a hot card, I think it will take this EEH error path. I haven't tried this (yet). --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Tue Mar 23 09:11:27 2004 From: greg at kroah.com (Greg KH) Date: Mon, 22 Mar 2004 14:11:27 -0800 Subject: eeh In-Reply-To: <20040322160406.D33924@forte.austin.ibm.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> <1079990551.3629.42.camel@mudbug.austin.ibm.com> <20040322214039.GA7889@kroah.com> <20040322160406.D33924@forte.austin.ibm.com> Message-ID: <20040322221127.GB9201@kroah.com> On Mon, Mar 22, 2004 at 04:04:06PM -0600, linas at austin.ibm.com wrote: > On Mon, Mar 22, 2004 at 01:40:40PM -0800, Greg KH wrote: > > > > On Mon, Mar 22, 2004 at 03:22:31PM -0600, Nathan Fontenot wrote: > > > Yes, the driver/pci/hotplug changes are platform specific, but so are > > > all the changes for this patch. > > > > My point remains. Try to do this in a non-platform specific way, as you > > will have to do it eventually, might as well be now. > > re: now vs. eventually: we are working hard to remove the kernel panic > before the SUSE SLES9 deadlines, (ditto RH) i.e. today, so that there > can be some semblance of testing before it gets widespread use. Fine, if you want to fix the panic stuff, I don't mind. It's just that I'm not going to accept your drivers/pci/hotplug patch to the mainline kernel as is. That means nothing when it comes to the distros, so we should be fine with this. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Mar 23 10:13:42 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 22 Mar 2004 17:13:42 -0600 Subject: PATCH: [olh@suse.de: kdb and softirqs] Message-ID: <20040322171342.A38070@forte.austin.ibm.com> Anton, Can you apply the following patch to ameslab? It removes some leftover crud from the earlier versions of KDB ... --linas ----- Forwarded message from Olaf Hering ----- Date: Mon, 22 Mar 2004 13:17:02 +0100 From: Olaf Hering To: Linas Vepstas , Chris Mason Subject: kdb and softirqs I assume softirqs should not run when kdb is active? Then we need this patch in our cvs. --- linuxppc64-2.5/kernel/softirq.c 2004-03-20 11:56:19.000000000 +0000 +++ /usr/src/linux-2.6.4-15/kernel/softirq.c 2004-03-22 10:13:04.000000000 +0000 @@ -16,9 +16,6 @@ #include #include -#ifdef CONFIG_KDB -#include -#endif /* - No shared variables, all the data are CPU local. - If a softirq needs serialization, let it serialize itself @@ -80,11 +77,6 @@ asmlinkage void do_softirq(void) if (in_interrupt()) return; -#ifdef CONFIG_KDB - if (KDB_IS_RUNNING()) - return; -#endif /*CONFIG_KDB */ - local_irq_save(flags); -- USB is for mice, FireWire is for men! sUse lINUX ag, n??RNBERG ----- End forwarded message ----- ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Mar 23 10:34:03 2004 From: anton at samba.org (Anton Blanchard) Date: Tue, 23 Mar 2004 10:34:03 +1100 Subject: PATCH: [olh@suse.de: kdb and softirqs] In-Reply-To: <20040322171342.A38070@forte.austin.ibm.com> References: <20040322171342.A38070@forte.austin.ibm.com> Message-ID: <20040322233403.GB27747@krispykreme> Hi Linas, > Can you apply the following patch to ameslab? It removes some > leftover crud from the earlier versions of KDB ... Applied. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Tue Mar 23 11:32:07 2004 From: paulus at samba.org (Paul Mackerras) Date: Tue, 23 Mar 2004 11:32:07 +1100 Subject: eeh In-Reply-To: <20040322220017.GA8978@kroah.com> References: <20040322195954.GA27589@kroah.com> <20040322220017.GA8978@kroah.com> Message-ID: <16479.34183.44513.719945@cargo.ozlabs.ibm.com> Greg KH writes: > But it gets called for a ACTION=remove event, after the card is gone. I > want a ACTION=fault event before anything touches the card to give I agree it would be good to give an ACTION=fault event to userspace early on, using kobject_hotplug(). I'm wondering what you mean by "before anything touches the card" though. The driver will be in the middle of doing things with the card when the EEH event happens, so it will be touching the card until we get the notification to it that the event has happened. We also still need to work out some way for type (a) drivers to be informed about EEH events early on. Such a driver will want to be able to say "I can handle this". If the driver can handle it then userspace, although it may be informed about the EEH fault, doesn't need to do anything, at least until the driver throws up its hands and says it can't cope any more. Something like an ACTION=fault-but-driver-is-handling-it code. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Wed Mar 24 01:19:05 2004 From: olh at suse.de (Olaf Hering) Date: Tue, 23 Mar 2004 15:19:05 +0100 Subject: PATCH: [olh@suse.de: kdb and softirqs] In-Reply-To: <20040322233403.GB27747@krispykreme> References: <20040322171342.A38070@forte.austin.ibm.com> <20040322233403.GB27747@krispykreme> Message-ID: <20040323141905.GC17536@suse.de> On Tue, Mar 23, Anton Blanchard wrote: > > Hi Linas, > > > Can you apply the following patch to ameslab? It removes some > > leftover crud from the earlier versions of KDB ... > > Applied. Here is another one: #if defined(CONFIG_X86) | defined(CONFIG_PPC64) linuxppc64-2.5/kdb/modules/kdbm_pg.c, around line 500. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Wed Mar 24 02:32:23 2004 From: meissner at suse.de (Marcus Meissner) Date: Tue, 23 Mar 2004 16:32:23 +0100 Subject: getdents patch for 32 -> 64 converter Message-ID: <20040323153223.GC23529@suse.de> Hi, With 2.6.4 we now have the glorious hidden d_type passing in getdents. glibc CVS expects this to be passed if we have a kernel version after 2.6.4, so we have to also handle it in the 32bit syscall converter. The kernel at least bootet here and my testcase (bootstrap gcc) works fine now. Can someone please review / apply? Ciao, Marcus --- linux/arch/ppc64/kernel/sys_ppc32.c.dtype 2004-03-23 14:10:10.000000000 +0100 +++ linux/arch/ppc64/kernel/sys_ppc32.c 2004-03-23 14:11:02.000000000 +0100 @@ -534,7 +534,7 @@ { struct linux_dirent32 * dirent; struct getdents_callback32 * buf = (struct getdents_callback32 *) __buf; - int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 1); + int reclen = ROUND_UP(NAME_OFFSET(dirent) + namlen + 2); buf->error = -EINVAL; /* only used if we fail.. */ if (reclen > buf->count) @@ -548,6 +548,7 @@ put_user(reclen, &dirent->d_reclen); copy_to_user(dirent->d_name, name, namlen); put_user(0, dirent->d_name + namlen); + put_user(d_type, (char *) dirent + reclen - 1); ((char *) dirent) += reclen; buf->current_dir = dirent; buf->count -= reclen; ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Mar 24 05:07:35 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 23 Mar 2004 12:07:35 -0600 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: <20040323141905.GC17536@suse.de>; from olh@suse.de on Tue, Mar 23, 2004 at 03:19:05PM +0100 References: <20040322171342.A38070@forte.austin.ibm.com> <20040322233403.GB27747@krispykreme> <20040323141905.GC17536@suse.de> Message-ID: <20040323120735.A50148@forte.austin.ibm.com> Hi Olaf, On Tue, Mar 23, 2004 at 03:19:05PM +0100, Olaf Hering wrote: > > Here is another one: > > #if defined(CONFIG_X86) | defined(CONFIG_PPC64) > > linuxppc64-2.5/kdb/modules/kdbm_pg.c, around line 500. You want to leave that one in. It enables a kdb command that enables the display of of memmap pages. I added that; I think I sent a patch for it to the kdb list, but if not, below is a pseudo-patch that should be applied to KDB. I tried to make the patch be 'minimal'; I think this command works fine for all arches, and so the CONFIG_X86 is a red herring. --linas diff -up kdbm_pg.c.orig kdbm_pg.c --- kdbm_pg.c.orig 2004-03-23 11:58:14.000000000 -0600 +++ kdbm_pg.c 2004-03-23 11:59:26.000000000 -0600 @@ -490,10 +490,12 @@ out: -#ifdef CONFIG_X86 +#if defined(CONFIG_X86) | defined(CONFIG_PPC64) /* According to Steve Lord, this code is ix86 specific. * Patches to extend it to other architectures will be * greatefully accepted. + * I think the above comments are crazy. I think this + * works fine for all arches. At least for ppc64. */ static int kdbm_memmap(int argc, const char **argv, const char **envp, @@ -556,7 +558,7 @@ kdbm_memmap(int argc, const char **argv, kdb_printf(" high page count: %6d\n", page_counts[8]); return 0; } -#endif /* CONFIG_X86 */ +#endif /* CONFIG_X86 | CONFIG_PPC64 */ static int __init kdbm_pg_init(void) { ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Wed Mar 24 05:10:41 2004 From: olh at suse.de (Olaf Hering) Date: Tue, 23 Mar 2004 19:10:41 +0100 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: <20040323120735.A50148@forte.austin.ibm.com> References: <20040322171342.A38070@forte.austin.ibm.com> <20040322233403.GB27747@krispykreme> <20040323141905.GC17536@suse.de> <20040323120735.A50148@forte.austin.ibm.com> Message-ID: <20040323181041.GA6747@suse.de> On Tue, Mar 23, linas at austin.ibm.com wrote: > Hi Olaf, > > On Tue, Mar 23, 2004 at 03:19:05PM +0100, Olaf Hering wrote: > > > > Here is another one: > > > > #if defined(CONFIG_X86) | defined(CONFIG_PPC64) > > > > linuxppc64-2.5/kdb/modules/kdbm_pg.c, around line 500. > > You want to leave that one in. It enables a kdb command sure, but with 2 | please. unless cpp can handle both :) -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hch at infradead.org Wed Mar 24 05:22:15 2004 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 23 Mar 2004 18:22:15 +0000 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: <20040323120735.A50148@forte.austin.ibm.com>; from linas@austin.ibm.com on Tue, Mar 23, 2004 at 12:07:35PM -0600 References: <20040322171342.A38070@forte.austin.ibm.com> <20040322233403.GB27747@krispykreme> <20040323141905.GC17536@suse.de> <20040323120735.A50148@forte.austin.ibm.com> Message-ID: <20040323182215.A3277@infradead.org> On Tue, Mar 23, 2004 at 12:07:35PM -0600, linas at austin.ibm.com wrote: > > linuxppc64-2.5/kdb/modules/kdbm_pg.c, around line 500. > > You want to leave that one in. It enables a kdb command > that enables the display of of memmap pages. I added that; > I think I sent a patch for it to the kdb list, but if not, > below is a pseudo-patch that should be applied to KDB. > > I tried to make the patch be 'minimal'; I think this command > works fine for all arches, and so the CONFIG_X86 is a red herring. In Linux 2.6 it works for all arches _if_ CONFIG_DISCONTIGMEM is not set. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kaos at sgi.com Wed Mar 24 08:01:52 2004 From: kaos at sgi.com (Keith Owens) Date: Wed, 24 Mar 2004 08:01:52 +1100 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: Your message of "Tue, 23 Mar 2004 12:07:35 MDT." <20040323120735.A50148@forte.austin.ibm.com> Message-ID: <6536.1080075712@ocs3.ocs.com.au> On Tue, 23 Mar 2004 12:07:35 -0600, linas at austin.ibm.com wrote: >+++ kdbm_pg.c 2004-03-23 11:59:26.000000000 -0600 >@@ -490,10 +490,12 @@ out: > > > >-#ifdef CONFIG_X86 >+#if defined(CONFIG_X86) | defined(CONFIG_PPC64) > /* According to Steve Lord, this code is ix86 specific. > * Patches to extend it to other architectures will be > * greatefully accepted. >+ * I think the above comments are crazy. I think this >+ * works fine for all arches. At least for ppc64. kdbm_memmap does not work on ia64, it oopses. When I discussed this with Steve Lord, he said he wrote the code for i386 only. The problem may be the lack of DISCONTIG support, I do not know the vm subsystem well enough to change it. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hch at infradead.org Wed Mar 24 08:21:44 2004 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 23 Mar 2004 21:21:44 +0000 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: <6536.1080075712@ocs3.ocs.com.au>; from kaos@sgi.com on Wed, Mar 24, 2004 at 08:01:52AM +1100 References: <20040323120735.A50148@forte.austin.ibm.com> <6536.1080075712@ocs3.ocs.com.au> Message-ID: <20040323212144.A4808@infradead.org> On Wed, Mar 24, 2004 at 08:01:52AM +1100, Keith Owens wrote: > kdbm_memmap does not work on ia64, it oopses. When I discussed this > with Steve Lord, he said he wrote the code for i386 only. The problem > may be the lack of DISCONTIG support, I do not know the vm subsystem > well enough to change it. It certainly does not work on DISCONTIG systems. It should work on all machines without discontig, maybe except for the very strange virtual mem map hack that can be optinionall be used on ia64. Was your ia64 an Altix or some whitebox? ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From kaos at sgi.com Wed Mar 24 08:39:05 2004 From: kaos at sgi.com (Keith Owens) Date: Wed, 24 Mar 2004 08:39:05 +1100 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: Your message of "Tue, 23 Mar 2004 21:21:44 -0000." <20040323212144.A4808@infradead.org> Message-ID: <7399.1080077945@ocs3.ocs.com.au> On Tue, 23 Mar 2004 21:21:44 +0000, Christoph Hellwig wrote: >It certainly does not work on DISCONTIG systems. It should work on all >machines without discontig, maybe except for the very strange virtual mem >map hack that can be optinionall be used on ia64. > >Was your ia64 an Altix or some whitebox? >From a dim, distant memory, it was an Altix. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hch at infradead.org Wed Mar 24 08:45:32 2004 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 23 Mar 2004 21:45:32 +0000 Subject: PATCH for displaying pages [was: Re: PATCH: [olh@suse.de: kdb and softirqs]] In-Reply-To: <7399.1080077945@ocs3.ocs.com.au>; from kaos@sgi.com on Wed, Mar 24, 2004 at 08:39:05AM +1100 References: <20040323212144.A4808@infradead.org> <7399.1080077945@ocs3.ocs.com.au> Message-ID: <20040323214532.A5083@infradead.org> On Wed, Mar 24, 2004 at 08:39:05AM +1100, Keith Owens wrote: > >It certainly does not work on DISCONTIG systems. It should work on all > >machines without discontig, maybe except for the very strange virtual mem > >map hack that can be optinionall be used on ia64. > > > >Was your ia64 an Altix or some whitebox? > > From a dim, distant memory, it was an Altix. Ok. I think a !defined(CONFIG_DISCONTIGMEM) && !defined(CONFIG_VIRTUAL_MEM_MAP) should put you on the safe side. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nfont at austin.ibm.com Wed Mar 24 09:22:12 2004 From: nfont at austin.ibm.com (Nathan Fontenot) Date: Tue, 23 Mar 2004 16:22:12 -0600 Subject: eeh In-Reply-To: <1079990551.3629.42.camel@mudbug.austin.ibm.com> References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> <1079990551.3629.42.camel@mudbug.austin.ibm.com> Message-ID: <1080080532.3629.49.camel@mudbug.austin.ibm.com> Here's the updated patch I hoped to get out yesterday. This fixes a couple things Linas found and cleans things up a bit. Please let me know if you have any comments. thanks. -Nathan > > Linas found some other issues with the patch, so there will be an > updated patch later today (i hope). thanks. > > -Nathan F. -------------- next part -------------- A non-text attachment was scrubbed... Name: eeh.patch Type: text/x-patch Size: 8961 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040323/cf0be23f/attachment.bin From linas at austin.ibm.com Wed Mar 24 09:48:02 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 23 Mar 2004 16:48:02 -0600 Subject: eeh In-Reply-To: <1080080532.3629.49.camel@mudbug.austin.ibm.com>; from nfont@austin.ibm.com on Tue, Mar 23, 2004 at 04:22:12PM -0600 References: <1079653760.23533.9.camel@mudbug.austin.ibm.com> <1079977883.3629.28.camel@mudbug.austin.ibm.com> <20040322195954.GA27589@kroah.com> <1079990551.3629.42.camel@mudbug.austin.ibm.com> <1080080532.3629.49.camel@mudbug.austin.ibm.com> Message-ID: <20040323164802.B50148@forte.austin.ibm.com> Hi Paul, If you can review/commit this patch, that would be good. On Tue, Mar 23, 2004 at 04:22:12PM -0600, Nathan Fontenot wrote: > > Here's the updated patch I hoped to get out yesterday. This fixes a > couple things Linas found and cleans things up a bit. Please let me know > if you have any comments. thanks. I looked over Nathan's shoulder while he wrote this & so it looks good to me. For the short-term I've no problem with this not making it into the torvlads/akpm kernels, due e.g. to the controversial/unfinished nature of the eeh/hotplug interactions. I just want it in ameslab so that the RH/SUSE distros can pick this up for thier product delivery schedules. (I hope to work with nathan over the upcoming weeks/months to address all of the different issues that have been raised, etc. etc. ) --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jschopp at austin.ibm.com Wed Mar 24 11:41:21 2004 From: jschopp at austin.ibm.com (jschopp at austin.ibm.com) Date: Tue, 23 Mar 2004 18:41:21 -0600 (CST) Subject: [PATCH] ppc64 cpu hotplug Message-ID: I found a couple bugs that affect cpu DLPAR/hotplug on ppc64. The first bug I have never actually produced but am pretty sure this is the correct fix. The second bug only affects certain hardware that I do not have access to test this patch on. So two untested bugfixes are below. Thought I'd share them anyway since somebody with hardware different than mine will likely need them and can tell me if they work. As always feedback and flames welcome. # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1518 -> 1.1519 # arch/ppc64/kernel/smp.c 1.68 -> 1.69 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 04/03/23 jschopp at threadlp13.austin.ibm.com 1.1519 # Fix two bugs related to cpu DLPAR/hotplug. The first is to properly handle extended busy. The second is to properly handle dynamically adding # secondary SMT threads. # -------------------------------------------- # diff -Nru a/arch/ppc64/kernel/smp.c b/arch/ppc64/kernel/smp.c --- a/arch/ppc64/kernel/smp.c Tue Mar 23 18:33:17 2004 +++ b/arch/ppc64/kernel/smp.c Tue Mar 23 18:33:17 2004 @@ -371,6 +371,7 @@ wait_time = rtas_extended_busy_delay_time(status); set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(wait_time); + break; } default: /* shouldn't happen */ return -ENOMSG; @@ -382,13 +383,13 @@ /* Search all cpu DR entities, looking for one which is present. If * the same hw index as before is available, grab that in preference.a - * Match the dr-index to a cpu node in the device tree. Use the reg - * (hw index) from the node to query rtas if the cpu is in a stopped - * state. + * Match the dr-index to a cpu node in the device tree. Use the + * ibm,ppc-interrupt-server#s (hw index) from the node to query rtas + * if the cpu is in a stopped state. */ static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) { - int i, idx; + int i, j, idx; int count = 0; int num_addr_cell, num_size_cell, len; struct device_node *np; @@ -426,18 +427,23 @@ if (!ireg || ireg[0] != idx) continue; - ireg = (unsigned int *)get_property(np, "reg", &len); - if (!ireg) - continue; + ireg = (unsigned int*) get_property(np, "ibm,ppc-interrupt-server#s", &len); - status = query_cpu_stopped(*ireg); - if (status == 0) { - best = *ireg; + if (unlikely(!ireg)){ + ireg = (unsigned int*) get_property(np, "reg", &len);/* fake it with phys id */ + if(!ireg) + continue; + } + for(j = 0; j < (len / sizeof(u32)); j++){ + status = query_cpu_stopped(ireg[j]); + if (status == 0) { + best = ireg[j]; if (best == old_hwindex) { - of_node_put(np); - goto out; + of_node_put(np); + goto out; } + } } } } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Mar 24 11:53:49 2004 From: haveblue at us.ibm.com (Dave Hansen) Date: Tue, 23 Mar 2004 16:53:49 -0800 Subject: [PATCH] ppc64 cpu hotplug In-Reply-To: References: Message-ID: <1080089628.29545.22.camel@nighthawk> On Tue, 2004-03-23 at 16:41, jschopp at austin.ibm.com wrote: > As always feedback and flames welcome. > + if (unlikely(!ireg)){ > + ireg = (unsigned int*) get_property(np, "reg", &len);/* fake it with phys id */ > + if(!ireg) > + continue; > + } See Documentation/CodingStyle. Should be if (unlikely(!ireg)) { not if (unlikely(!ireg)){ And should have tabs instead of 4 spaces for indenting. Please don't use unlikely() except in real hot paths. I don't think find_physical_cpu_to_start() is in a hot path. That for loop is getting a little bit too indented. You might want to wrap it up in another function or break it up somehow. Or, instead of having the entire for loop's contents in an if(), use a continue: for(...) { if (status != 0) continue; ... } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Mar 24 12:16:17 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 24 Mar 2004 12:16:17 +1100 Subject: getdents patch for 32 -> 64 converter In-Reply-To: <20040323153223.GC23529@suse.de> References: <20040323153223.GC23529@suse.de> Message-ID: <20040324011617.GE27747@krispykreme> Hi, > With 2.6.4 we now have the glorious hidden d_type passing in > getdents. > > glibc CVS expects this to be passed if we have a kernel version after 2.6.4, > so we have to also handle it in the 32bit syscall converter. Thanks, looks good. I notice we arent checking error returns and are missing a gcc-3.5 compile fix, both which are fixed in the non compat getdents code. I'll fix that too. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Wed Mar 24 12:46:31 2004 From: greg at kroah.com (Greg KH) Date: Tue, 23 Mar 2004 17:46:31 -0800 Subject: eeh In-Reply-To: <16479.34183.44513.719945@cargo.ozlabs.ibm.com> References: <20040322195954.GA27589@kroah.com> <20040322220017.GA8978@kroah.com> <16479.34183.44513.719945@cargo.ozlabs.ibm.com> Message-ID: <20040324014631.GA306@kroah.com> On Tue, Mar 23, 2004 at 11:32:07AM +1100, Paul Mackerras wrote: > Greg KH writes: > > > But it gets called for a ACTION=remove event, after the card is gone. I > > want a ACTION=fault event before anything touches the card to give > > I agree it would be good to give an ACTION=fault event to userspace > early on, using kobject_hotplug(). > > I'm wondering what you mean by "before anything touches the card" > though. The driver will be in the middle of doing things with the > card when the EEH event happens, so it will be touching the card until > we get the notification to it that the event has happened. I mean, "before the power is yanked away from the card", which is what you all are doing today (disable the power, and as that happens the hotplug event happens.) > We also still need to work out some way for type (a) drivers to be > informed about EEH events early on. Such a driver will want to be > able to say "I can handle this". If the driver can handle it then > userspace, although it may be informed about the EEH fault, doesn't > need to do anything, at least until the driver throws up its hands and > says it can't cope any more. Something like an > ACTION=fault-but-driver-is-handling-it code. Agreed. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Wed Mar 24 13:23:47 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 24 Mar 2004 13:23:47 +1100 Subject: eeh In-Reply-To: <20040324014631.GA306@kroah.com> References: <20040322195954.GA27589@kroah.com> <20040322220017.GA8978@kroah.com> <16479.34183.44513.719945@cargo.ozlabs.ibm.com> <20040324014631.GA306@kroah.com> Message-ID: <16480.61747.129106.678017@cargo.ozlabs.ibm.com> Greg KH writes: > On Tue, Mar 23, 2004 at 11:32:07AM +1100, Paul Mackerras wrote: > > I'm wondering what you mean by "before anything touches the card" > > though. The driver will be in the middle of doing things with the > > card when the EEH event happens, so it will be touching the card until > > we get the notification to it that the event has happened. > > I mean, "before the power is yanked away from the card", which is what > you all are doing today (disable the power, and as that happens the > hotplug event happens.) Well, power doesn't get yanked away from the card. The card is cut off from the processor by the PCI-PCI bridge, but that has already happened -- the hardware did that some small number of nanoseconds after it detected an error. We have no control over that. The card remains powered on, just inaccessible. We can reset the slot and then reconnect it, using firmware (RTAS) calls. Whether or not we do that is, or how many times we do that before giving up, is a policy we can decide. But the initial disconnection isn't policy -- or at least isn't policy that we can decide. By the time we know that anything has gone wrong, the disconnection has already happened. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From rusty at rustcorp.com.au Wed Mar 24 17:30:48 2004 From: rusty at rustcorp.com.au (Rusty Russell) Date: Wed, 24 Mar 2004 17:30:48 +1100 Subject: [PATCH] Clean Up Absolute Address Macros Message-ID: <1080109847.6596.320.camel@bach> [ Boots in iSeries, thanks Stephen ] Name: Clean Up Absolute Address Macros Author: Rusty Russell Status: Compiled The iSeries has an arch-specific mapping from physical <-> absolute addresses. Fortunately this is only used in a few places. However, the following arch-specific macros/functions are provided in addition to the standard macros: __a2p() __a2v() __p2a() __p2v() __v2a() __v2p() absolute_to_phys() phys_to_absolute() virt_to_absolute() absolute_to_virt() Reduce them to these, with slightly shorter names, and taking either pointers or unsigned long (as per __va and __pa) rather than making the caller cast: abs_to_phys() phys_to_abs() And helper macros: virt_to_abs() abs_to_virt() As is standard, virtual addresses are returned as void *, physical and absolute as unsigned long. Note that the change the iSeries_setup is a little subtle: ea is set to __va(pa) above, so "phys_to_abs(pa)" is the same as "virt_to_abs(ea)". Also, REALADDR is renamed to ISERIES_HV_ADDR and used in a couple of places where appropriate. diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/HvCall.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/HvCall.c --- ppc64-linux-2.5/arch/ppc64/kernel/HvCall.c 2004-01-20 16:05:53.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/HvCall.c 2004-03-24 15:48:18.914384613 +1100 @@ -19,7 +19,7 @@ void HvCall_writeLogBuffer(const void *b { struct HvLpBufferList hv_buf; u64 left_this_page; - u64 cur = virt_to_absolute((unsigned long)buffer); + u64 cur = virt_to_abs(buffer); while (len) { hv_buf.addr = cur; @@ -29,7 +29,7 @@ void HvCall_writeLogBuffer(const void *b hv_buf.len = left_this_page; len -= left_this_page; HvCall2(HvCallBaseWriteLogBuffer, - virt_to_absolute((unsigned long)&hv_buf), + virt_to_abs(&hv_buf), left_this_page); cur = (cur & PAGE_MASK) + PAGE_SIZE; } diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/iSeries_VpdInfo.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_VpdInfo.c --- ppc64-linux-2.5/arch/ppc64/kernel/iSeries_VpdInfo.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_VpdInfo.c 2004-03-24 15:48:18.916384461 +1100 @@ -293,7 +293,8 @@ void iSeries_Get_Location_Code(struct iS return; } BusVpdLen = HvCallPci_getBusVpd(ISERIES_BUS(DevNode), - REALADDR(BusVpdPtr), BUS_VPDSIZE); + ISERIES_HV_ADDR(BusVpdPtr), + BUS_VPDSIZE); if (BusVpdLen == 0) { kfree(BusVpdPtr); printk("PCI: Bus VPD Buffer zero length.\n"); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/iSeries_iommu.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_iommu.c --- ppc64-linux-2.5/arch/ppc64/kernel/iSeries_iommu.c 2004-03-08 12:04:43.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_iommu.c 2004-03-24 15:48:18.917384384 +1100 @@ -76,7 +76,7 @@ static void tce_build_iSeries(struct iom while (npages--) { tce.te_word = 0; - tce.te_bits.tb_rpn = (virt_to_absolute(uaddr)) >> PAGE_SHIFT; + tce.te_bits.tb_rpn = virt_to_abs(uaddr) >> PAGE_SHIFT; if (tbl->it_type == TCE_VB) { /* Virtual Bus */ @@ -130,7 +130,7 @@ void __init iommu_vio_init(void) cb.itc_busno = 255; /* Bus 255 is the virtual bus */ cb.itc_virtbus = 0xff; /* Ask for virtual bus */ - cbp = virt_to_absolute((unsigned long)&cb); + cbp = virt_to_abs(&cb); HvCallXm_getTceTableParms(cbp); veth_iommu_table.it_size = cb.itc_size / 2; @@ -209,7 +209,7 @@ static void iommu_table_getparms(struct parms->itc_slotno = dn->LogicalSlot; parms->itc_virtbus = 0; - HvCallXm_getTceTableParms(REALADDR(parms)); + HvCallXm_getTceTableParms(ISERIES_HV_ADDR(parms)); if (parms->itc_size == 0) panic("PCI_DMA: parms->size is zero, parms is 0x%p", parms); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/iSeries_pci.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_pci.c --- ppc64-linux-2.5/arch/ppc64/kernel/iSeries_pci.c 2004-03-23 17:07:56.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_pci.c 2004-03-24 15:48:18.921384079 +1100 @@ -283,7 +283,7 @@ static void iSeries_Scan_PHBs_Slots(stru */ for (IdSel = 1; IdSel < MaxAgents; ++IdSel) { HvRc = HvCallPci_getDeviceInfo(bus, SubBus, IdSel, - REALADDR(DevInfo), + ISERIES_HV_ADDR(DevInfo), sizeof(struct HvCallPci_DeviceInfo)); if (HvRc == 0) { if (DevInfo->deviceType == HvCallPci_NodeDevice) @@ -324,7 +324,7 @@ static void iSeries_Scan_EADs_Bridge(HvB "PCI:Connect EADs: 0x%02X.%02X.%02X\n", bus, SubBus, AgentId); HvRc = HvCallPci_getBusUnitInfo(bus, SubBus, AgentId, - REALADDR(BridgeInfo), + ISERIES_HV_ADDR(BridgeInfo), sizeof(struct HvCallPci_BridgeInfo)); if (HvRc == 0) { printk("bridge info: type %x subbus %x maxAgents %x maxsubbus %x logslot %x\n", diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/iSeries_setup.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_setup.c --- ppc64-linux-2.5/arch/ppc64/kernel/iSeries_setup.c 2004-03-23 17:07:56.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/iSeries_setup.c 2004-03-24 15:48:18.925383774 +1100 @@ -657,8 +657,7 @@ static void __init iSeries_bolt_kernel(u HvCallHpt_setPp(slot, PP_RWXX); } else /* No HPTE exists, so create a new bolted one */ - iSeries_make_pte(va, (unsigned long)__v2a(ea), - mode_rw); + iSeries_make_pte(va, phys_to_abs(pa), mode_rw); } } diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/mf.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/mf.c --- ppc64-linux-2.5/arch/ppc64/kernel/mf.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/mf.c 2004-03-24 15:48:18.928383546 +1100 @@ -763,14 +763,10 @@ void mf_getSrcHistory(char *buffer, int ev->event.data.vsp_cmd.lp_index = HvLpConfig_getLpIndex(); ev->event.data.vsp_cmd.result_code = 0xFF; ev->event.data.vsp_cmd.reserved = 0; - ev->event.data.vsp_cmd.sub_data.page[0] = - (0x8000000000000000ULL | virt_to_absolute((unsigned long)pages[0])); - ev->event.data.vsp_cmd.sub_data.page[1] = - (0x8000000000000000ULL | virt_to_absolute((unsigned long)pages[1])); - ev->event.data.vsp_cmd.sub_data.page[2] = - (0x8000000000000000ULL | virt_to_absolute((unsigned long)pages[2])); - ev->event.data.vsp_cmd.sub_data.page[3] = - (0x8000000000000000ULL | virt_to_absolute((unsigned long)pages[3])); + ev->event.data.vsp_cmd.sub_data.page[0] = ISERIES_HV_ADDR(pages[0]); + ev->event.data.vsp_cmd.sub_data.page[1] = ISERIES_HV_ADDR(pages[1]); + ev->event.data.vsp_cmd.sub_data.page[2] = ISERIES_HV_ADDR(pages[2]); + ev->event.data.vsp_cmd.sub_data.page[3] = ISERIES_HV_ADDR(pages[3]); mb(); if (signal_event(ev) != 0) return; diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/pSeries_iommu.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pSeries_iommu.c --- ppc64-linux-2.5/arch/ppc64/kernel/pSeries_iommu.c 2004-03-21 00:04:50.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pSeries_iommu.c 2004-03-24 15:48:18.929383469 +1100 @@ -61,7 +61,7 @@ static void tce_build_pSeries(struct iom while (npages--) { /* can't move this out since we might cross LMB boundary */ - t.te_rpn = (virt_to_absolute(uaddr)) >> PAGE_SHIFT; + t.te_rpn = (virt_to_abs(uaddr)) >> PAGE_SHIFT; tp->te_word = t.te_word; diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/pSeries_lpar.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pSeries_lpar.c --- ppc64-linux-2.5/arch/ppc64/kernel/pSeries_lpar.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pSeries_lpar.c 2004-03-24 15:48:18.931383317 +1100 @@ -36,6 +36,7 @@ #include #include #include +#include /* in pSeries_hvCall.S */ EXPORT_SYMBOL(plpar_hcall); @@ -135,7 +136,7 @@ static void tce_build_pSeriesLP(struct i union tce_entry tce; tce.te_word = 0; - tce.te_rpn = (virt_to_absolute(uaddr)) >> PAGE_SHIFT; + tce.te_rpn = (virt_to_abs(uaddr)) >> PAGE_SHIFT; tce.te_rdwr = 1; if (direction != PCI_DMA_TODEVICE) tce.te_pciwr = 1; diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/pci_dma_direct.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pci_dma_direct.c --- ppc64-linux-2.5/arch/ppc64/kernel/pci_dma_direct.c 2004-02-12 14:47:54.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pci_dma_direct.c 2004-03-24 15:48:18.933383164 +1100 @@ -37,7 +37,7 @@ static void *pci_direct_alloc_consistent ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); if (ret != NULL) { memset(ret, 0, size); - *dma_handle = virt_to_absolute((unsigned long)ret); + *dma_handle = virt_to_abs(ret); } return ret; } @@ -51,7 +51,7 @@ static void pci_direct_free_consistent(s static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, size_t size, int direction) { - return virt_to_absolute((unsigned long)ptr); + return virt_to_abs(ptr); } static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/pmac_iommu.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pmac_iommu.c --- ppc64-linux-2.5/arch/ppc64/kernel/pmac_iommu.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/pmac_iommu.c 2004-03-24 15:48:18.934383088 +1100 @@ -154,7 +154,7 @@ static void dart_build_pmac(struct iommu * out of the loop. */ while (npages--) { - rpn = (virt_to_absolute(uaddr)) >> PAGE_SHIFT; + rpn = virt_to_abs(uaddr) >> PAGE_SHIFT; *(dp++) = DARTMAP_VALID | (rpn & DARTMAP_RPNMASK); @@ -210,7 +210,7 @@ static int dart_init(struct device_node if (tmp == 0) panic("U3-DART: Cannot allocate spare page !"); dart_emptyval = DARTMAP_VALID | - ((virt_to_absolute(tmp) >> PAGE_SHIFT) & DARTMAP_RPNMASK); + ((virt_to_abs(tmp) >> PAGE_SHIFT) & DARTMAP_RPNMASK); /* Map in DART registers. FIXME: Use device node to get base address */ dart = ioremap(DART_BASE, 0x7000); @@ -225,7 +225,7 @@ static int dart_init(struct device_node (((dart_tablesize >> PAGE_SHIFT) & DARTCNTL_SIZE_MASK) << DARTCNTL_SIZE_SHIFT); p = virt_to_page(dart_tablebase); - dart_vbase = ioremap(virt_to_absolute(dart_tablebase), dart_tablesize); + dart_vbase = ioremap(virt_to_abs(dart_tablebase), dart_tablesize); /* Fill initial table */ for (i = 0; i < dart_tablesize/4; i++) diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/prom.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/prom.c --- ppc64-linux-2.5/arch/ppc64/kernel/prom.c 2004-03-23 17:07:56.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/prom.c 2004-03-24 15:48:18.942382478 +1100 @@ -900,7 +900,7 @@ prom_initialize_tce_table(void) prom_panic(RELOC("ERROR, cannot find space for TCE table.\n")); } - vbase = absolute_to_virt(base); + vbase = (unsigned long)abs_to_virt(base); /* Save away the TCE table attributes for later use. */ prom_tce_table[table].node = node; @@ -1007,9 +1007,12 @@ prom_hold_cpus(unsigned long mem) extern void __secondary_hold(void); extern unsigned long __secondary_hold_spinloop; extern unsigned long __secondary_hold_acknowledge; - unsigned long *spinloop = __v2a(&__secondary_hold_spinloop); - unsigned long *acknowledge = __v2a(&__secondary_hold_acknowledge); - unsigned long secondary_hold = (unsigned long)__v2a(*PTRRELOC((unsigned long *)__secondary_hold)); + unsigned long *spinloop + = (void *)virt_to_abs(&__secondary_hold_spinloop); + unsigned long *acknowledge + = (void *)virt_to_abs(&__secondary_hold_acknowledge); + unsigned long secondary_hold + = virt_to_abs(*PTRRELOC((unsigned long *)__secondary_hold)); struct systemcfg *_systemcfg = RELOC(systemcfg); struct paca_struct *_xPaca = PTRRELOC(&paca[0]); struct prom_t *_prom = PTRRELOC(&prom); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/rtas.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/rtas.c --- ppc64-linux-2.5/arch/ppc64/kernel/rtas.c 2004-03-23 17:07:56.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/rtas.c 2004-03-24 15:48:18.945382249 +1100 @@ -362,7 +362,7 @@ rtas_flash_firmware(void) */ rtas_firmware_flash_list.num_blocks = 0; flist = (struct flash_block_list *)&rtas_firmware_flash_list; - rtas_block_list = virt_to_absolute((unsigned long)flist); + rtas_block_list = virt_to_abs(flist); if (rtas_block_list >= (4UL << 20)) { printk(KERN_ALERT "FLASH: kernel bug...flash list header addr above 4GB\n"); return; @@ -374,13 +374,13 @@ rtas_flash_firmware(void) for (f = flist; f; f = next) { /* Translate data addrs to absolute */ for (i = 0; i < f->num_blocks; i++) { - f->blocks[i].data = (char *)virt_to_absolute((unsigned long)f->blocks[i].data); + f->blocks[i].data = (char *)virt_to_abs(f->blocks[i].data); image_size += f->blocks[i].length; } next = f->next; /* Don't translate NULL pointer for last entry */ - if(f->next) - f->next = (struct flash_block_list *)virt_to_absolute((unsigned long)f->next); + if (f->next) + f->next = (struct flash_block_list *)virt_to_abs(f->next); else f->next = 0LL; /* make num_blocks into the version/length field */ diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/kernel/smp.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/smp.c --- ppc64-linux-2.5/arch/ppc64/kernel/smp.c 2004-03-11 07:04:34.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/kernel/smp.c 2004-03-24 15:48:18.949381944 +1100 @@ -986,7 +986,7 @@ int __devinit __cpu_up(unsigned int cpu) tmp = &stab_array[PAGE_SIZE * cpu]; memset(tmp, 0, PAGE_SIZE); paca[cpu].xStab_data.virt = (unsigned long)tmp; - paca[cpu].xStab_data.real = (unsigned long)__v2a(tmp); + paca[cpu].xStab_data.real = virt_to_abs(tmp); } /* The information for processor bringup must be written out diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/mm/hash_utils.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/mm/hash_utils.c --- ppc64-linux-2.5/arch/ppc64/mm/hash_utils.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/mm/hash_utils.c 2004-03-24 15:48:18.951381792 +1100 @@ -48,6 +48,8 @@ #include #include #include +#include + /* * Note: pte --> Linux PTE * HPTE --> PowerPC Hashed Page Table Entry @@ -107,11 +109,11 @@ static inline void create_pte_mapping(un if (systemcfg->platform == PLATFORM_PSERIES_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, - (unsigned long)__v2a(addr) >> PAGE_SHIFT, + virt_to_abs(addr) >> PAGE_SHIFT, 0, mode, 1, large); else ret = pSeries_hpte_insert(hpteg, va, - (unsigned long)__v2a(addr) >> PAGE_SHIFT, + virt_to_abs(addr) >> PAGE_SHIFT, 0, mode, 1, large); if (ret == -1) { @@ -154,7 +156,7 @@ void __init htab_initialize(void) ppc64_terminate_msg(0x20, "hpt space"); loop_forever(); } - htab_data.htab = (HPTE *)__a2v(table); + htab_data.htab = abs_to_virt(table); /* htab absolute addr + encoded htabsize */ _SDR1 = table + __ilog2(pteg_count) - 11; diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/arch/ppc64/mm/init.c ppc64-linux-2.5-absolute_cleanup/arch/ppc64/mm/init.c --- ppc64-linux-2.5/arch/ppc64/mm/init.c 2004-03-17 14:12:38.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/arch/ppc64/mm/init.c 2004-03-24 15:48:18.954381563 +1100 @@ -60,6 +60,7 @@ #include #include #include +#include struct mmu_context_queue_t mmu_context_queue; @@ -153,7 +154,7 @@ static void map_io_page(unsigned long ea pmdp = pmd_alloc(&ioremap_mm, pgdp, ea); ptep = pte_alloc_kernel(&ioremap_mm, pmdp, ea); - pa = absolute_to_phys(pa); + pa = abs_to_phys(pa); set_pte(ptep, pfn_pte(pa >> PAGE_SHIFT, __pgprot(flags))); spin_unlock(&ioremap_mm.page_table_lock); } else { @@ -539,7 +540,7 @@ void __init do_init_bootmem(void) */ bootmap_pages = bootmem_bootmap_pages(total_pages); - start = (unsigned long)__a2p(lmb_alloc(bootmap_pages<> PAGE_SHIFT, total_pages); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/drivers/net/iseries_veth.c ppc64-linux-2.5-absolute_cleanup/drivers/net/iseries_veth.c --- ppc64-linux-2.5/drivers/net/iseries_veth.c 2004-03-23 17:07:56.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/drivers/net/iseries_veth.c 2004-03-24 17:25:15.583723109 +1100 @@ -71,6 +71,7 @@ #include #include #include +#include #include #include @@ -137,11 +138,6 @@ MODULE_LICENSE("GPL"); static int VethModuleReopen = 1; -static inline u64 veth_dma_addr(void *p) -{ - return 0x8000000000000000LL | virt_to_absolute((unsigned long) p); -} - static inline HvLpEvent_Rc veth_signalevent(struct veth_lpar_connection *cnx, u16 subtype, HvLpEvent_AckInd ackind, HvLpEvent_AckType acktype, @@ -1178,13 +1174,13 @@ static inline void veth_build_dma_list(s * really need to break it into PAGE_SIZE chunks, or can we do * it just at the granularity of iSeries real->absolute * mapping? */ - list[0].addr = veth_dma_addr(p); + list[0].addr = ISERIES_HV_ADDR(p); list[0].size = min(length, PAGE_SIZE - ((unsigned long)p & ~PAGE_MASK)); done = list[0].size; while (done < length) { - list[i].addr = veth_dma_addr(p + done); + list[i].addr = ISERIES_HV_ADDR(p + done); list[i].size = min(done, PAGE_SIZE); done += list[i].size; i++; @@ -1255,8 +1251,8 @@ static void veth_receive(struct veth_lpa cnx->dst_inst, HvLpDma_AddressType_RealAddress, HvLpDma_AddressType_TceIndex, - veth_dma_addr(&local_list), - veth_dma_addr(&remote_list), + ISERIES_HV_ADDR(&local_list), + ISERIES_HV_ADDR(&remote_list), length); if (rc != HvLpDma_Rc_Good) { dev_kfree_skb_irq(skb); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/include/asm-ppc64/abs_addr.h ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/abs_addr.h --- ppc64-linux-2.5/include/asm-ppc64/abs_addr.h 2002-03-25 21:14:52.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/abs_addr.h 2004-03-24 15:51:13.905358578 +1100 @@ -71,26 +71,22 @@ abs_chunk(unsigned long pchunk) return PTRRELOC(_msChunks->abs)[pchunk]; } - -static inline unsigned long -phys_to_absolute(unsigned long pa) -{ - return chunk_to_addr(abs_chunk(addr_to_chunk(pa))) + chunk_offset(pa); -} +/* A macro so it can take pointers or unsigned long. */ +#define phys_to_abs(pa) \ + ({ unsigned long _pa = (unsigned long)(pa); \ + chunk_to_addr(abs_chunk(addr_to_chunk(_pa))) + chunk_offset(_pa); \ + }) static inline unsigned long physRpn_to_absRpn(unsigned long rpn) { unsigned long pa = rpn << PAGE_SHIFT; - unsigned long aa = phys_to_absolute(pa); + unsigned long aa = phys_to_abs(pa); return (aa >> PAGE_SHIFT); } -static inline unsigned long -absolute_to_phys(unsigned long aa) -{ - return lmb_abs_to_phys(aa); -} +/* A macro so it can take pointers or unsigned long. */ +#define abs_to_phys(aa) lmb_abs_to_phys((unsigned long)(aa)) #else /* !CONFIG_MSCHUNKS */ @@ -99,23 +95,14 @@ absolute_to_phys(unsigned long aa) #define chunk_offset(addr) (0) #define abs_chunk(pchunk) (pchunk) -#define phys_to_absolute(pa) (pa) +#define phys_to_abs(pa) (pa) #define physRpn_to_absRpn(rpn) (rpn) -#define absolute_to_phys(aa) (aa) +#define abs_to_phys(aa) (aa) #endif /* !CONFIG_MSCHUNKS */ - -static inline unsigned long -virt_to_absolute(unsigned long ea) -{ - return phys_to_absolute(__pa(ea)); -} - -static inline unsigned long -absolute_to_virt(unsigned long aa) -{ - return (unsigned long)__va(absolute_to_phys(aa)); -} +/* Convenience macros */ +#define virt_to_abs(va) phys_to_abs(__pa(va)) +#define abs_to_virt(aa) __va(abs_to_phys(aa)) #endif /* _ABS_ADDR_H */ diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/include/asm-ppc64/iSeries/HvCallEvent.h ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/iSeries/HvCallEvent.h --- ppc64-linux-2.5/include/asm-ppc64/iSeries/HvCallEvent.h 2004-01-20 16:05:55.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/iSeries/HvCallEvent.h 2004-03-24 15:48:18.958381258 +1100 @@ -100,7 +100,7 @@ static inline void HvCallEvent_setLpEven { u64 abs_addr; - abs_addr = virt_to_absolute((unsigned long)eventStackAddr); + abs_addr = virt_to_abs(eventStackAddr); HvCall3(HvCallEventSetLpEventStack, queueIndex, abs_addr, eventStackSize); // getPaca()->adjustHmtForNoOfSpinLocksHeld(); @@ -123,7 +123,7 @@ static inline HvLpEvent_Rc HvCallEvent_s printk("HvCallEvent_signalLpEvent: *event = %016lx\n ", (unsigned long)event); #endif - abs_addr = virt_to_absolute((unsigned long)event); + abs_addr = virt_to_abs(event); retVal = (HvLpEvent_Rc)HvCall1(HvCallEventSignalLpEvent, abs_addr); // getPaca()->adjustHmtForNoOfSpinLocksHeld(); return retVal; @@ -164,7 +164,7 @@ static inline HvLpEvent_Rc HvCallEvent_a u64 abs_addr; HvLpEvent_Rc retVal; - abs_addr = virt_to_absolute((unsigned long)event); + abs_addr = virt_to_abs(event); retVal = (HvLpEvent_Rc)HvCall1(HvCallEventAckLpEvent, abs_addr); // getPaca()->adjustHmtForNoOfSpinLocksHeld(); return retVal; @@ -175,7 +175,7 @@ static inline HvLpEvent_Rc HvCallEvent_c u64 abs_addr; HvLpEvent_Rc retVal; - abs_addr = virt_to_absolute((unsigned long)event); + abs_addr = virt_to_abs(event); retVal = (HvLpEvent_Rc)HvCall1(HvCallEventCancelLpEvent, abs_addr); // getPaca()->adjustHmtForNoOfSpinLocksHeld(); return retVal; @@ -286,7 +286,7 @@ static inline HvLpDma_Rc HvCallEvent_dma u64 abs_addr; HvLpDma_Rc retVal; - abs_addr = virt_to_absolute((unsigned long)local); + abs_addr = virt_to_abs(local); retVal = (HvLpDma_Rc)HvCall4(HvCallEventDmaToSp, abs_addr, remote, length, dir); // getPaca()->adjustHmtForNoOfSpinLocksHeld(); diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/include/asm-ppc64/iSeries/iSeries_pci.h ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/iSeries/iSeries_pci.h --- ppc64-linux-2.5/include/asm-ppc64/iSeries/iSeries_pci.h 2004-03-01 15:13:25.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/iSeries/iSeries_pci.h 2004-03-24 15:48:18.960381105 +1100 @@ -31,6 +31,7 @@ /************************************************************************/ #include +#include struct pci_dev; /* For Forward Reference */ struct iSeries_Device_Node; @@ -71,7 +72,7 @@ struct iSeries_Device_Node; /* Converts Virtual Address to Real Address for Hypervisor calls */ /************************************************************************/ -#define REALADDR(virtaddr) (0x8000000000000000 | (virt_to_absolute((u64)virtaddr) )) +#define ISERIES_HV_ADDR(virtaddr) (0x8000000000000000 | virt_to_abs(virtaddr)) /************************************************************************/ /* iSeries Device Information */ diff -urNp --exclude TAGS -X /home/rusty/current-dontdiff --minimal ppc64-linux-2.5/include/asm-ppc64/page.h ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/page.h --- ppc64-linux-2.5/include/asm-ppc64/page.h 2004-03-08 12:04:44.000000000 +1100 +++ ppc64-linux-2.5-absolute_cleanup/include/asm-ppc64/page.h 2004-03-24 15:48:18.962380953 +1100 @@ -212,19 +212,6 @@ extern int page_is_ram(unsigned long phy #define __va(x) ((void *)((unsigned long)(x) + KERNELBASE)) -/* Given that physical addresses do not map 1-1 to absolute addresses, we - * use these macros to better specify exactly what we want to do. - * The only restriction on their use is that the absolute address - * macros cannot be used until after the LMB structure has been - * initialized in prom.c. -Peter - */ -#define __v2p(x) ((void *) __pa(x)) -#define __v2a(x) ((void *) phys_to_absolute(__pa(x))) -#define __p2a(x) ((void *) phys_to_absolute(x)) -#define __p2v(x) ((void *) __va(x)) -#define __a2p(x) ((void *) absolute_to_phys(x)) -#define __a2v(x) ((void *) __va(absolute_to_phys(x))) - #ifdef CONFIG_DISCONTIGMEM #define page_to_pfn(page) discontigmem_page_to_pfn(page) #define pfn_to_page(pfn) discontigmem_pfn_to_page(pfn) -- Anyone who quotes me in their signature is an idiot -- Rusty Russell ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Wed Mar 24 19:15:40 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 24 Mar 2004 02:15:40 -0600 Subject: [PATCH] ppc64 cpu hotplug In-Reply-To: References: Message-ID: <406143AC.5080409@austin.ibm.com> Hi- > @@ -371,6 +371,7 @@ > wait_time = rtas_extended_busy_delay_time(status); > set_current_state(TASK_UNINTERRUPTIBLE); > schedule_timeout(wait_time); > + break; > } > default: /* shouldn't happen */ > return -ENOMSG; While I think this is technically correct, I don't understand why check_dr_state() is there at all. There is a similar rtas_get_sensor() function which basically does the same thing, except that it uses udelay instead of schedule_timeout. Why don't we use that? > @@ -382,13 +383,13 @@ > > /* Search all cpu DR entities, looking for one which is present. If > * the same hw index as before is available, grab that in preference.a > - * Match the dr-index to a cpu node in the device tree. Use the reg > - * (hw index) from the node to query rtas if the cpu is in a stopped > - * state. > + * Match the dr-index to a cpu node in the device tree. Use the > + * ibm,ppc-interrupt-server#s (hw index) from the node to query rtas > + * if the cpu is in a stopped state. > */ > static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) > { > - int i, idx; > + int i, j, idx; > int count = 0; > int num_addr_cell, num_size_cell, len; > struct device_node *np; > @@ -426,18 +427,23 @@ > > if (!ireg || ireg[0] != idx) > continue; > - ireg = (unsigned int *)get_property(np, "reg", &len); > > - if (!ireg) > - continue; > + ireg = (unsigned int*) get_property(np, "ibm,ppc-interrupt-server#s", &len); > > - status = query_cpu_stopped(*ireg); > - if (status == 0) { > - best = *ireg; > + if (unlikely(!ireg)){ > + ireg = (unsigned int*) get_property(np, "reg", &len);/* fake it with phys id */ > + if(!ireg) > + continue; > + } > + for(j = 0; j < (len / sizeof(u32)); j++){ > + status = query_cpu_stopped(ireg[j]); > + if (status == 0) { > + best = ireg[j]; > if (best == old_hwindex) { > - of_node_put(np); > - goto out; > + of_node_put(np); > + goto out; > } > + } > } > } > } I do not have any problem (save coding style) with this part of the patch; however, I do not understand why find_physical_cpu_to_start is potentially polling every possible drc-index. If a cpu is present in the device tree, it belongs to the partition and it is configured (except when we've *just* released a cpu to the hypervisor - we currently rely on userspace to remove the node from the device tree - yuck). Seems to me that we should simply query each logical cpu for each cpu device node. To be safe, we could check the dr state of each cpu node using its ibm,my-drc-index property. I am including a patch which implements my suggestions and adapts the second part of Joel's patch to it: o use rtas_get_sensor, ditch check_dr_state o poll a drc-index only when it is attached to a known cpu o handle SMT cpus properly by using the "ibm,ppc-interrupt-server#s" property when available instead of the "reg" property I have tested this on a 2-way Power4 lpar, but I am not able to test the SMT case at this time. Patch is against a current-ish Ameslab tree (2.6.5-rc2). Nathan -------------- next part -------------- A non-text attachment was scrubbed... Name: find_phys_cpu_rework.patch Type: text/x-patch Size: 4303 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040324/bed168cd/attachment.bin From jdewand at redhat.com Thu Mar 25 01:20:27 2004 From: jdewand at redhat.com (Julie DeWandel) Date: Wed, 24 Mar 2004 09:20:27 -0500 Subject: [PATCH] ignore huge OF properties References: <20040316070649.GP19737@krispykreme> <4058B5CE.1010503@redhat.com> <20040322100716.GY1153@krispykreme> Message-ID: <4061992B.4030406@redhat.com> Hello Anton, Anton Blanchard wrote: >>In addition to the patch you provided, it is also necessary to ensure >>that the initrd image cannot be overwritten by calls into prom such as: >> >> pp->length = (int)(long) call_prom(RELOC("getprop"), 4, 1, node, >> namep,valp, mem_end - mem_start); >> >> >>Here, mem_end needs to have been carefully chosen so that it doesn't >>start somewhere in the middle of the initrd image or past it. However, >>mem_end is arbitrarily chosen by copy_device_node to be 8MB beyond the >>starting mem_start value. In code I have been working with, mem_end has >>landed well into the initrd memory image. >> >>The attached patch corrects this problem for the 2.6 ameslab tree. >>Please consider pushing it to ameslab, as I don't know how to do that. >> >> > >I think we should be checking further down in inspect_node as well. Also >we should rethink that 8MB limit, on our big machines we might have >device trees bigger than that. > >Anton > I agree that we may want to reconsider the 8 MB limit, but please push the patch I suggested as that is required for linux to boot on one of the machines we have. Without it, the system crashes because the initrd image gets overwritten with zeros. The real problem on our machine wasn't the 8 MB limit since the resultant device tree (once the errant properties are discarded) is only about 1 MB. The 8 MB limit was an issue only because it was beyond the start of the initrd image in memory. And when this size was specified to the firmware as being legitimate memory to use, the firmware used it all because it encountered a property whose length was nearly 17MB. The code in inspect_node() doesn't know ahead of time how big a property is -- it only finds out after it has asked for and received it. By then it is too late if the memory size passed to the firmware initially was incorrect. I have attached an updated patch against 2.6 which checks for an overflow after the device tree is built. I added the check in copy_device_tree(), rather than in inspect_node(), since the end result of an overflow is to abandon ship. Hopefully this adequately addresses the issue you raised. Comments are welcome. Regards, Julie -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: initrd_overwrite_fix Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20040324/7b516ab0/attachment.txt From jschopp at austin.ibm.com Thu Mar 25 04:59:54 2004 From: jschopp at austin.ibm.com (jschopp at austin.ibm.com) Date: Wed, 24 Mar 2004 11:59:54 -0600 (CST) Subject: [PATCH] ppc64 cpu hotplug In-Reply-To: <406143AC.5080409@austin.ibm.com> Message-ID: I was aiming for the simple fix. Comments on your more serious rewrite below. On Wed, 24 Mar 2004, Nathan Lynch wrote: > Hi- > > > @@ -371,6 +371,7 @@ > > wait_time = rtas_extended_busy_delay_time(status); > > set_current_state(TASK_UNINTERRUPTIBLE); > > schedule_timeout(wait_time); > > + break; > > } > > default: /* shouldn't happen */ > > return -ENOMSG; > > While I think this is technically correct, I don't understand why > check_dr_state() is there at all. There is a similar rtas_get_sensor() > function which basically does the same thing, except that it uses udelay > instead of schedule_timeout. Why don't we use that? check_dr_state has a built it 5 second timeout so it doesn't keep retrying forever. rtas_get_sensor will keep trying forever. I am not opposed to using rtas_get_sensor (though we should change it to check 9900..9905 instead of 9900..9909 for extended delay). > patch; however, I do not understand why find_physical_cpu_to_start is > potentially polling every possible drc-index. If a cpu is present in > the device tree, it belongs to the partition and it is configured > (except when we've *just* released a cpu to the hypervisor - we > currently rely on userspace to remove the node from the device tree - > yuck). Seems to me that we should simply query each logical cpu for > each cpu device node. To be safe, we could check the dr state of each > cpu node using its ibm,my-drc-index property. Sure, that would work. > > I am including a patch which implements my suggestions and adapts the > second part of Joel's patch to it: > > o use rtas_get_sensor, ditch check_dr_state > o poll a drc-index only when it is attached to a known cpu > o handle SMT cpus properly by using the "ibm,ppc-interrupt-server#s" > property when available instead of the "reg" property > > I have tested this on a 2-way Power4 lpar, but I am not able to test the > SMT case at this time. Patch is against a current-ish Ameslab tree > (2.6.5-rc2). > > Nathan > + while (nr_threads--) { + if (0 == query_cpu_stopped(tid[nr_threads])) { + best = tid[nr_threads]; + if (best == old_hwindex) goto out; This part is a little hard to read. I actually had to think about nr_threads being 1 for the while and 0 for the indexing tid. If you clean this bit up to be more readable I will love the patch. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 25 05:22:00 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 24 Mar 2004 12:22:00 -0600 Subject: _IO_IS_ISA question In-Reply-To: <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com>; from moilanen@austin.ibm.com on Wed, Mar 17, 2004 at 02:57:18PM +0000 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> Message-ID: <20040324122200.F50148@forte.austin.ibm.com> Hi Jake, Reviewed the patch... that patch looked reasonable from what I could tell; I have one request: could you cut&paste some portion of this email into the comments for that code? e.g. the second paragaph could go into the include/asm/eeh.h file right before the eeh_in* macros .... --linas On Wed, Mar 17, 2004 at 02:57:18PM +0000, Jake Moilanen wrote: > > After further investigation, I realize why this was done. What will > happen is that a ISA device like the 8250 serial will try doing > inb/outb's to the ISA space to detect the device. This will cause a > page fault and the kernel will attempt to create a PTE for this page. > On a hypervisor system, if the ISA device is not assigned to that > partition, the address will cause the H_ENTER to fail with a H_PARM. > This will cause the kernel to panic. So this code was blocking ISA IO > range accesses when there wasn't a ISA bus. > > I wrote a patch to create a valid mask of pages that a PCI device can > access inside the ISA IO range. It will punch holes into this mask > while probing the PCI devices. FW must either give us full access to a > page, or not. So we do not have to worry about a valid PCI device IO > range and a invalid ISA device IO range overlapping within the same > page. > > Thanks, > Jake > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Mar 25 05:29:09 2004 From: anton at samba.org (Anton Blanchard) Date: Thu, 25 Mar 2004 05:29:09 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040324122200.F50148@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> Message-ID: <20040324182909.GK27747@krispykreme> > Reviewed the patch... that patch looked reasonable from what I > could tell; I have one request: could you cut&paste some portion > of this email into the comments for that code? e.g. the second > paragaph could go into the include/asm/eeh.h file right before > the eeh_in* macros .... Paul, Ben and I were discussing it, and it would seem we can use the existing exception stuff (aka get_user etc). This would remove all the ISA_* checks we currently have and not require the added complexity that this patch introduces. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 25 06:25:43 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 24 Mar 2004 13:25:43 -0600 Subject: _IO_IS_ISA question In-Reply-To: <20040324182909.GK27747@krispykreme>; from anton@samba.org on Thu, Mar 25, 2004 at 05:29:09AM +1100 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> Message-ID: <20040324132543.G50148@forte.austin.ibm.com> On Thu, Mar 25, 2004 at 05:29:09AM +1100, Anton Blanchard wrote: > > > Reviewed the patch... that patch looked reasonable from what I > > could tell; I have one request: could you cut&paste some portion > > of this email into the comments for that code? e.g. the second > > paragaph could go into the include/asm/eeh.h file right before > > the eeh_in* macros .... > > Paul, Ben and I were discussing it, and it would seem we can use the > existing exception stuff (aka get_user etc). This would remove all the > ISA_* checks we currently have and not require the added complexity that > this patch introduces. Ohh, yes, I like that, much better idea. I'm guessing a little bit how one might do that; I'm thinking a new 'copy_to_io' type function with the isa_port<0x10000 check in the ex_table code? I'm somewhat confused, since bad_page_fault usually runs at the end, not at the start of the exception handling. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Mar 25 08:14:19 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 24 Mar 2004 15:14:19 -0600 Subject: more eeh In-Reply-To: <20040319183707.GB11135@kroah.com>; from greg@kroah.com on Fri, Mar 19, 2004 at 10:37:11AM -0800 References: <1079484053.8840.99.camel@mudbug.austin.ibm.com> <20040317172327.GA18810@kroah.com> <16472.55069.619464.519643@cargo.ozlabs.ibm.com> <20040319000116.GC17586@kroah.com> <16474.16285.55962.349029@cargo.ozlabs.ibm.com> <20040319005026.GD19053@kroah.com> <20040319120136.N33924@forte.austin.ibm.com> <20040319183707.GB11135@kroah.com> Message-ID: <20040324151419.A51948@forte.austin.ibm.com> On Fri, Mar 19, 2004 at 10:37:11AM -0800, Greg KH wrote: > > As far as I know, there are no other pci controllers that support this > > PCI Express handles this kind of functionality. And as I already have a > PCI Express box sitting next to me right now, this kind of functionality > is not limited to the PPC64 platform anymore. I'm reading the pci express spec and its far from clear if/how they handle EEH-like functionality. In fact, it almost seems to disclaim support: page 266 of the base spec, paragraph 6.2.3.2.1, second sentance: "Uncorrectable errors are not recoverable using defined PCI Express mechanisms". The goal of the pSeries EEH is to deal with "unreportable" errors (errors which the older PCI didn't define any mechanism for reporting back to the cpu, other than with a check-stop.) It does seem that PCI express now provides a reporting mechanism: it will 'interrupt' the CEC/aka 'root complex', and will report various fatal errors in various ragisters. It doesn't state what it will do if a device driver attempts any further i/o after a fatal error occured. (The EEH hardware explicitly cuts off all i/o after a ""fatal"" error occurs), and it doesn't deal with how software can recover from an 'unrecoverable' error (the RTAS provides a defined way of recovering from those errors, although the tool is blunt: for all practical purposes, one 'power cycles' the slot). I'll try to see what the folks on the LKML might think... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Thu Mar 25 09:28:24 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 24 Mar 2004 16:28:24 -0600 Subject: [PATCH] ppc64 cpu hotplug In-Reply-To: References: Message-ID: <40620B88.2050709@austin.ibm.com> jschopp at austin.ibm.com wrote: > check_dr_state has a built it 5 second timeout so it doesn't keep retrying > forever. rtas_get_sensor will keep trying forever. I am not opposed to > using rtas_get_sensor (though we should change it to check 9900..9905 > instead of 9900..9909 for extended delay). Ah, that is an important difference. Looks as if we should implement an rtas_get_sensor_timeout() interface that allows the caller to specify a timeout value. And I believe you're right about the range for extended delay. I think we could move check_dr_state to rtas.c (along with your fix) and tweak it for this purpose. > + while (nr_threads--) { > + if (0 == query_cpu_stopped(tid[nr_threads])) { > + best = tid[nr_threads]; > + if (best == old_hwindex) > goto out; > > This part is a little hard to read. I actually had to think about > nr_threads being 1 for the while and 0 for the indexing tid. If you clean > this bit up to be more readable I will love the patch. Yeah I guess a traditional "for" loop would be nicer. I'll resubmit once we're able to test that bit. Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Thu Mar 25 10:32:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 25 Mar 2004 10:32:29 +1100 Subject: Dcbz vs dcbzl In-Reply-To: <566378770CC92042B17AB4A586ACC582049052@zrtpd0u3.corp.nortel.com> References: <566378770CC92042B17AB4A586ACC582049052@zrtpd0u3.corp.nortel.com> Message-ID: <1080171148.1115.20.camel@gaston> On Thu, 2004-03-25 at 05:32, David Ober wrote: > Hi > > Is anyone working on the compile/assembler to support the dcbzl > instruction. Since the dcbz operates on a 32 byte line and the dcbzl > is on the 128 bytes there is a warning on the Apple developers page > about not using the dcbz instruction on the G5. There are two > references in the 32 bit kernel that are causing problems in some > imbedded application. When attempting to update to use the dcbzl I > find that the assembler does not yet understand it. Well, to be clear, "dcbz" only operates on 32 bytes when the special HID5 compatiblity bit that apple added to the 970 is set. This is _NOT_ the normal case and this bit isn't set in linux unless you explicitely set it by modifying the kernel, as discussed for a while now by Chris Friesen. Your mail seem to imply there is a kernel bug of some sort of whatever, which is _NOT_ the case. That whole dcbz vs. dcbzl thing was added for Apple to cope with broken existing code that makes assumption about the cache line size. Previous ppc64 CPUs had already a 128 bytes cache line and by setting the HID5 bit, keep in mind that you are taking the risk of breaking existing 64 bits binaries that make the same (broken) assumption on the cache line size. I'll try to put my hand on some binutils folks, but in general, such a request should be posted to binutils mailing lists. Note that the proper fix is for code to use the cache line size as provided by the ELF AUX tables by the kernel to any launched executable, though if you set this magic HID5 bit, you break perfectly valid code that correctly reads the cache line size and gets a value of 128 passed down by the kernel. To properly handle this bit, we would have to ammend the ABI to add a way for the kernel to inform userland that a different cache line size applies to dcbz, or to only set this HID5 bit for special processes that have some magic ELF flag set or whatever (& context switch it). Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Thu Mar 25 10:34:18 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 25 Mar 2004 10:34:18 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040324132543.G50148@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> Message-ID: <1080171257.1148.22.camel@gaston> > > Paul, Ben and I were discussing it, and it would seem we can use the > > existing exception stuff (aka get_user etc). This would remove all the > > ISA_* checks we currently have and not require the added complexity that > > this patch introduces. > > Ohh, yes, I like that, much better idea. I'm guessing a little bit > how one might do that; I'm thinking a new 'copy_to_io' type function > with the isa_port<0x10000 check in the ex_table code? I'm somewhat > confused, since bad_page_fault usually runs at the end, not at the start > of the exception handling. Look what ppc32 does Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Thu Mar 25 15:18:03 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 24 Mar 2004 22:18:03 -0600 (CST) Subject: [PATCH] ppc64: SMT snooze fix in idle loop Message-ID: Hi, A smt_snooze_delay of 0 is supposed to mean "disabled", but current idle loop logic doesn't take that into account and snoozes immediately instead. Below patch fixes the logic in the idle loop, as well as cleans up the test a bit. An idling processor might no longer see a snooze change immediately, but that's not needed anyway. -Olof Olof Johansson Office: 4F005/905 Linux on Power Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ===== arch/ppc64/kernel/idle.c 1.20 vs edited ===== --- 1.20/arch/ppc64/kernel/idle.c Tue Mar 9 17:39:55 2004 +++ edited/arch/ppc64/kernel/idle.c Wed Mar 24 21:52:07 2004 @@ -175,16 +175,16 @@ oldval = test_and_clear_thread_flag(TIF_NEED_RESCHED); if (!oldval) { set_thread_flag(TIF_POLLING_NRFLAG); - start_snooze = __get_tb(); + start_snooze = __get_tb() + + naca->smt_snooze_delay*tb_ticks_per_usec; while (!need_resched()) { /* need_resched could be 1 or 0 at this * point. If it is 0, set it to 0, so * an IPI/Prod is sent. If it is 1, keep * it that way & schedule work. */ - if (__get_tb() < - (start_snooze + - naca->smt_snooze_delay*tb_ticks_per_usec)) { + if (naca->smt_snooze_delay == 0 || + __get_tb() < start_snooze) { HMT_low(); /* Low thread priority */ continue; } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Thu Mar 25 19:00:06 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 25 Mar 2004 09:00:06 +0100 Subject: Dcbz vs dcbzl In-Reply-To: <1080171148.1115.20.camel@gaston> References: <566378770CC92042B17AB4A586ACC582049052@zrtpd0u3.corp.nortel.com> <1080171148.1115.20.camel@gaston> Message-ID: <6DF79F2B-7E32-11D8-9199-000A95A4DC02@kernel.crashing.org> > On Thu, 2004-03-25 at 05:32, David Ober wrote: >> Is anyone working on the compile/assembler to support the dcbzl >> instruction. On 25-mrt-04, at 0:32, Benjamin Herrenschmidt wrote: > I'll try to put my hand on some binutils folks, but in general, such a > request should be posted to binutils mailing lists. That has already been done, and I wrote a patch: http://sources.redhat.com/ml/binutils/2004-03/msg00502.html > Note that the proper fix is for code to use the cache line size as > provided by the ELF AUX tables by the kernel to any launched > executable, Yep, all that, etc. For Linux, we should just fix broken code, we have access to all source code after all. We don't need this weird "combatibility to incompatible code" mode. Of course, if you only have binary code... well we pity you ;-) Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 26 04:15:50 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 25 Mar 2004 11:15:50 -0600 Subject: _IO_IS_ISA question In-Reply-To: <1080171257.1148.22.camel@gaston>; from benh@kernel.crashing.org on Thu, Mar 25, 2004 at 10:34:18AM +1100 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> Message-ID: <20040325111549.A26076@forte.austin.ibm.com> On Thu, Mar 25, 2004 at 10:34:18AM +1100, Benjamin Herrenschmidt wrote: > > > > Paul, Ben and I were discussing it, and it would seem we can use the > > > existing exception stuff (aka get_user etc). This would remove all the > > > ISA_* checks we currently have and not require the added complexity that > > > this patch introduces. > > > > Ohh, yes, I like that, much better idea. I'm guessing a little bit > > how one might do that; I'm thinking a new 'copy_to_io' type function > > with the isa_port<0x10000 check in the ex_table code? I'm somewhat > > confused, since bad_page_fault usually runs at the end, not at the start > > of the exception handling. > > Look what ppc32 does Hmmm. include/asm-ppc/io.h ... twi ... isync ... I hope that we can do something that doesn't require either twi or isync, or sync for that matter, as the cure seems worse than the disease. I was assuming (and still kinda hoping) that a load/store to a bad ISA address would generate an exception that is synchronous with the load/store. Jake mentioned that exceptions are generated, but implied that the kernel tried to page-fault these in; my biggest concern was to figure out how to elegantly pass these to the ex_table lookup table instead. By 'cure worse than disease' I mean: performance-wise, Jake's if-test ((port<10000) && maskbits) (and the half dozen insns that this genarates) sounds to be faster than sync, or so I would think ... --linas p.s. isync? I can't imagine why isync would have anything to do with io ... ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 26 04:24:47 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 25 Mar 2004 11:24:47 -0600 Subject: _IO_IS_ISA question In-Reply-To: <20040325111549.A26076@forte.austin.ibm.com>; from linas@austin.ibm.com on Thu, Mar 25, 2004 at 11:15:50AM -0600 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> Message-ID: <20040325112447.B26076@forte.austin.ibm.com> On Thu, Mar 25, 2004 at 11:15:50AM -0600, linas at austin.ibm.com wrote: > > On Thu, Mar 25, 2004 at 10:34:18AM +1100, Benjamin Herrenschmidt wrote: > > > > > > Paul, Ben and I were discussing it, and it would seem we can use the > > > > existing exception stuff (aka get_user etc). This would remove all the > > > > ISA_* checks we currently have and not require the added complexity that > > > > this patch introduces. > > > > > > Ohh, yes, I like that, much better idea. I'm guessing a little bit > > > how one might do that; I'm thinking a new 'copy_to_io' type function > > > with the isa_port<0x10000 check in the ex_table code? I'm somewhat > > > confused, since bad_page_fault usually runs at the end, not at the start > > > of the exception handling. > > > > Look what ppc32 does > > Hmmm. include/asm-ppc/io.h ... twi ... isync ... > > I hope that we can do something that doesn't require either twi or isync, > or sync for that matter, as the cure seems worse than the disease. > I was assuming (and still kinda hoping) that a load/store to a bad > ISA address would generate an exception that is synchronous with > the load/store. I take that all back, my comments are nonsense. I just actually read the page for isync. Duhh. I thought it did something completely different. > By 'cure worse than disease' I mean: performance-wise, Jake's > if-test ((port<10000) && maskbits) (and the half dozen insns that this > genarates) sounds to be faster than sync, or so I would think ... Well, I suppose that's still a concern; how many cycles delay for io-to-valid-addr+isync on power3/4/5 ? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 26 04:30:47 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 25 Mar 2004 11:30:47 -0600 Subject: _IO_IS_ISA question In-Reply-To: <20040325112447.B26076@forte.austin.ibm.com>; from linas@austin.ibm.com on Thu, Mar 25, 2004 at 11:24:47AM -0600 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <20040325112447.B26076@forte.austin.ibm.com> Message-ID: <20040325113047.C26076@forte.austin.ibm.com> On Thu, Mar 25, 2004 at 11:24:47AM -0600, linas at austin.ibm.com wrote: > > On Thu, Mar 25, 2004 at 11:15:50AM -0600, linas at austin.ibm.com wrote: > > > > On Thu, Mar 25, 2004 at 10:34:18AM +1100, Benjamin Herrenschmidt wrote: > > > > > > > > Paul, Ben and I were discussing it, and it would seem we can use the > > > > > existing exception stuff (aka get_user etc). This would remove all the > > > > > ISA_* checks we currently have and not require the added complexity that > > > > > this patch introduces. > > > > > > > > Ohh, yes, I like that, much better idea. I'm guessing a little bit > > > > how one might do that; I'm thinking a new 'copy_to_io' type function > > > > with the isa_port<0x10000 check in the ex_table code? I'm somewhat > > > > confused, since bad_page_fault usually runs at the end, not at the start > > > > of the exception handling. > > > > > > Look what ppc32 does > > > > Hmmm. include/asm-ppc/io.h ... twi ... isync ... > > > > I hope that we can do something that doesn't require either twi or isync, > > or sync for that matter, as the cure seems worse than the disease. > > I was assuming (and still kinda hoping) that a load/store to a bad > > ISA address would generate an exception that is synchronous with > > the load/store. > > I take that all back, my comments are nonsense. I just actually read > the page for isync. Duhh. I thought it did something completely different. > > > By 'cure worse than disease' I mean: performance-wise, Jake's > > if-test ((port<10000) && maskbits) (and the half dozen insns that this > > genarates) sounds to be faster than sync, or so I would think ... > > Well, I suppose that's still a concern; how many cycles delay for > io-to-valid-addr+isync on power3/4/5 ? I take back what I took back and put forth my initial argument: isync does do what I thought it did. It waits for completion (which is what we want) but it also causes a refetch (which we don't want). The refetch can be something like 200 or 300 clock cycles on power5 I think (antonb knows for sure) which is one hell of a price to pay. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Mar 26 08:22:09 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 26 Mar 2004 08:22:09 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040325113047.C26076@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <20040325112447.B26076@forte.austin.ibm.com> <20040325113047.C26076@forte.austin.ibm.com> Message-ID: <16483.19841.874059.307067@cargo.ozlabs.ibm.com> linas at austin.ibm.com writes: > I take back what I took back and put forth my initial argument: isync does > do what I thought it did. It waits for completion (which is what we want) > but it also causes a refetch (which we don't want). The refetch can be > something like 200 or 300 clock cycles on power5 I think (antonb knows for > sure) which is one hell of a price to pay. POWER4 and later cpus use a scoreboard to record whether any changes are made that would cause a refetch to be necessary, and then only do the refetch on isync if necessary. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Mar 26 08:25:17 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 26 Mar 2004 08:25:17 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040325111549.A26076@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> Message-ID: <16483.20029.188188.811275@cargo.ozlabs.ibm.com> > Hmmm. include/asm-ppc/io.h ... twi ... isync ... > > I hope that we can do something that doesn't require either twi or isync, > or sync for that matter, as the cure seems worse than the disease. The twi/isync sequence has nothing to do with exceptions. It is there to ensure that subsequent instructions are not executed until the data has come back from the device. It would be nice not to have to do that, but there are quite a few drivers in the kernel that assume that I/O operations are synchronous, and we avoid a lot of subtle bugs by doing this. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 26 09:00:54 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 25 Mar 2004 16:00:54 -0600 Subject: _IO_IS_ISA question In-Reply-To: <16483.20029.188188.811275@cargo.ozlabs.ibm.com>; from paulus@samba.org on Fri, Mar 26, 2004 at 08:25:17AM +1100 References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <16483.20029.188188.811275@cargo.ozlabs.ibm.com> Message-ID: <20040325160054.D26076@forte.austin.ibm.com> On Fri, Mar 26, 2004 at 08:25:17AM +1100, Paul Mackerras wrote: > > Hmmm. include/asm-ppc/io.h ... twi ... isync ... > > > > I hope that we can do something that doesn't require either twi or isync, > > or sync for that matter, as the cure seems worse than the disease. > > The twi/isync sequence has nothing to do with exceptions. It is there > to ensure that subsequent instructions are not executed until the data > has come back from the device. It would be nice not to have to do > that, but there are quite a few drivers in the kernel that assume that > I/O operations are synchronous, and we avoid a lot of subtle bugs by > doing this. Dang, I could have sworn that the ppc64 macros didn't do that, but now that I really did look at them, I see that indeed they do, so objections withdrawn & I am feeling slightly foolish. Moving off-topic: anyone ever experiment with removing these isyncs from the ppc64 code, and seeing if at least our most common adapters/controllers still work? Are these broken drivers for things that no one would ever plug into ppc64? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 26 09:26:58 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 26 Mar 2004 09:26:58 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040325111549.A26076@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> Message-ID: <1080253617.1217.23.camel@gaston> > Hmmm. include/asm-ppc/io.h ... twi ... isync ... > > I hope that we can do something that doesn't require either twi or isync, > or sync for that matter, as the cure seems worse than the disease. > I was assuming (and still kinda hoping) that a load/store to a bad > ISA address would generate an exception that is synchronous with > the load/store. I added the twi,isync to all normal IOs recently for ppc64. This is necessary. The problem is that you simply cannot expect driver writers to understand how to deal with IO reads not beeing synchronous. Also, an IO read is so slow in the first place that I don't think the impact of doing a twi/isync will add any critical performance loss, but then I may be wrong. > Jake mentioned that exceptions are generated, but implied that the > kernel tried to page-fault these in; my biggest concern was to figure > out how to elegantly pass these to the ex_table lookup table instead. It depends how things are implented. If it's a machine check, then those are essentially asynchronous, though ppc32 manages +/- to catch them on the right instruction. If you don't map the IOs in the MMU, then you get faults. There are 2 solutions at this point. Either keep the fact they aren't mapped in the hash table while keeping the linux page table mapping (ioremap), in which case you need some way to catch errors in __hash_page without a panic, or change the PCI host initialization to do mappings differently so that those holes aren't mapped in the PTE (but the virtual space is still reserved), thus causing normal kernel page faults. The later would work in non-HV configs as well and already knows how to recover on ex_table. > By 'cure worse than disease' I mean: performance-wise, Jake's > if-test ((port<10000) && maskbits) (and the half dozen insns that this > genarates) sounds to be faster than sync, or so I would think ... > > --linas > > p.s. isync? I can't imagine why isync would have anything to do with io ... isync ensures the twi got executed, which ensure the data loaded was considered as "used" by the CPU, thus making the load effective. At least this is my understanding ;) Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 26 09:28:17 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 26 Mar 2004 09:28:17 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040325113047.C26076@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <20040325112447.B26076@forte.austin.ibm.com> <20040325113047.C26076@forte.austin.ibm.com> Message-ID: <1080253696.1195.26.camel@gaston> > I take back what I took back and put forth my initial argument: isync does > do what I thought it did. It waits for completion (which is what we want) > but it also causes a refetch (which we don't want). The refetch can be > something like 200 or 300 clock cycles on power5 I think (antonb knows for > sure) which is one hell of a price to pay. That long ? Even when re-fetching from a hot cache line ? Also, think how long the actual IO is ... Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 26 09:36:36 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 26 Mar 2004 09:36:36 +1100 Subject: _IO_IS_ISA question In-Reply-To: <20040325160054.D26076@forte.austin.ibm.com> References: <1079390116.10402.2981.camel@DYN279927END.austin.ibm.com> <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <16483.20029.188188.811275@cargo.ozlabs.ibm.com> <20040325160054.D26076@forte.austin.ibm.com> Message-ID: <1080254196.1201.35.camel@gaston> > Dang, I could have sworn that the ppc64 macros didn't do that, but now > that I really did look at them, I see that indeed they do, so objections > withdrawn & I am feeling slightly foolish. > > Moving off-topic: anyone ever experiment with removing these isyncs > from the ppc64 code, and seeing if at least our most common > adapters/controllers still work? Are these broken drivers for things > that no one would ever plug into ppc64? Why the heck would we want to take such a risk ? I just added them actually ;) The cost of the twi/isync is, I beleive, small compared to the overall cost of doing an IO read, and I'm not prepared to go back to some "unsafe" situation just because the 3 1/2 adapters used on pSeries are happy with it (and I'm sure they aren't anyway). There is more to ppc64 than just pSeries anyway ;) Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Mar 26 10:08:16 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 25 Mar 2004 17:08:16 -0600 Subject: _IO_IS_ISA question In-Reply-To: <1080253696.1195.26.camel@gaston>; from benh@kernel.crashing.org on Fri, Mar 26, 2004 at 09:28:17AM +1100 References: <1079419858.1967.237.camel@gaston> <1079535438.28366.1511.camel@DYN279927END.austin.ibm.com> <20040324122200.F50148@forte.austin.ibm.com> <20040324182909.GK27747@krispykreme> <20040324132543.G50148@forte.austin.ibm.com> <1080171257.1148.22.camel@gaston> <20040325111549.A26076@forte.austin.ibm.com> <20040325112447.B26076@forte.austin.ibm.com> <20040325113047.C26076@forte.austin.ibm.com> <1080253696.1195.26.camel@gaston> Message-ID: <20040325170816.E26076@forte.austin.ibm.com> On Fri, Mar 26, 2004 at 09:28:17AM +1100, Benjamin Herrenschmidt wrote: > > That long ? Even when re-fetching from a hot cache line ? If its out of cache, it would have been fast, but I had the impression that it didn't pull from there. I dunno. Don't let me add to the confusion. > Also, think how long the actual IO is ... In ye olden days, we used to build devices with blisteringly fast PIO. Not only were we often faster than DMA, but so fast that the bottleneck was in software, not the bus. But that's a whole nuther thing. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Fri Mar 26 17:58:21 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 26 Mar 2004 17:58:21 +1100 Subject: Bug on syscall error test for 64 bits apps Message-ID: <1080284301.1389.11.camel@gaston> Hi ! It seems that we have a bug in the syscall exit code path, where we use a 32 bits compare to check if the result is an error, thus potentially returning spurrious error SO bits in CR to userspace for 64 bits applications. If anybody found a problem that could be caused by that, test the enclosed patch. I'd appreciate some regression testing for people without a problem as well. I intend to push to Linus in a couple of days at most. ===== arch/ppc64/kernel/entry.S 1.32 vs edited ===== --- 1.32/arch/ppc64/kernel/entry.S Fri Mar 19 16:59:29 2004 +++ edited/arch/ppc64/kernel/entry.S Fri Mar 26 17:56:07 2004 @@ -139,7 +139,7 @@ 91: #endif li r10,-_LAST_ERRNO - cmpl 0,r3,r10 + cmpld 0,r3,r10 blt 30f neg r3,r3 22: ld r10,_CCR(r1) /* Set SO bit in CR */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From ananth at in.ibm.com Fri Mar 26 22:03:41 2004 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 26 Mar 2004 16:03:41 +0500 Subject: [PATCH] TCE fix for KDB Message-ID: <20040326110341.GA9256@in.ibm.com> Hello, Here is a patch to fix the handling of TCEs in KDB. I also noticed that the iommu->it_type was not set to TCE_PCI - patch sets that too. Thanks, Ananth -- Ananth Narayan Linux Technology Center, IBM Software Lab, INDIA diff -Naur temp/ameslab/arch/ppc64/kdb/kdba_bp.c ameslab/arch/ppc64/kdb/kdba_bp.c --- temp/ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-22 14:17:52.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdba_bp.c 2004-03-25 18:23:57.000000000 +0530 @@ -190,8 +190,6 @@ if (rv > 0) goto handled; -handle: - /* * Determine which breakpoint was encountered. */ diff -Naur temp/ameslab/arch/ppc64/kdb/kdbasupport.c ameslab/arch/ppc64/kdb/kdbasupport.c --- temp/ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-22 14:17:52.000000000 +0530 +++ ameslab/arch/ppc64/kdb/kdbasupport.c 2004-03-26 12:02:08.000000000 +0530 @@ -37,6 +37,7 @@ #include #include #include +#include #include "../kernel/pci.h" // for traverse_all_pci_devices() extern const char *kdb_diemsg; @@ -1617,21 +1618,13 @@ } } - -#ifdef TCE_HAS_CHANGED_FIX_LATER - int kdba_dump_tce_table(int argc, const char **argv, const char **envp, struct pt_regs *regs) { - struct TceTable kt; - long tce_table_address; - int nr; - int i,j,k; - int full,empty; - int fulldump=0; - u64 mapentry; - int totalpages; - int levelpages; + struct iommu_table it; + unsigned long tce_table_address; + unsigned long bitmap, alloced = 0; + int nr, i, j; if (argc == 0) { kdb_printf("need address\n"); @@ -1639,75 +1632,52 @@ } else kdbgetularg(argv[1], &tce_table_address); - - if (argc==2) - if (strcmp(argv[2], "full") == 0) - fulldump=1; - + /* with address, read contents of memory and dump tce table. */ - /* possibly making some assumptions on the depth and size of table..*/ + nr = kdba_readarea_size(tce_table_address + 0, &it.it_busno, 8); + if (nr == 0) { + kdb_printf("Invalid address\n"); + return 0; + } - nr = kdba_readarea_size(tce_table_address+0 ,&kt.busNumber,8); - nr = kdba_readarea_size(tce_table_address+8 ,&kt.size,8); - nr = kdba_readarea_size(tce_table_address+16,&kt.startOffset,8); - nr = kdba_readarea_size(tce_table_address+24,&kt.base,8); - nr = kdba_readarea_size(tce_table_address+32,&kt.index,8); - nr = kdba_readarea_size(tce_table_address+40,&kt.tceType,8); - nr = kdba_readarea_size(tce_table_address+48,&kt.lock,8); + nr = kdba_readarea_size(tce_table_address, &it, sizeof(struct iommu_table)); kdb_printf("\n"); - kdb_printf("TceTable at address %s:\n",argv[1]); - kdb_printf("BusNumber: 0x%x \n",(uint)kt.busNumber); - kdb_printf("size: 0x%x \n",(uint)kt.size); - kdb_printf("startOffset: 0x%x \n",(uint)kt.startOffset); - kdb_printf("base: 0x%x \n",(uint)kt.base); - kdb_printf("index: 0x%x \n",(uint)kt.index); - kdb_printf("tceType: 0x%x \n",(uint)kt.tceType); + kdb_printf("tce_table at address %s:\n",argv[1]); + kdb_printf("it_busno: 0x%lx\n", (unsigned long)it.it_busno); + kdb_printf("it_size: 0x%lx\n", (unsigned long)it.it_size); + kdb_printf("it_offset: 0x%lx\n", (unsigned long)it.it_offset); + kdb_printf("it_base: 0x%lx\n", (unsigned long)it.it_base); + kdb_printf("it_index: 0x%lx\n", (unsigned long)it.it_index); + kdb_printf("it_type: 0x%lx\n", (unsigned long)it.it_type); + kdb_printf("it_entrysize: 0x%lx\n", (unsigned long)it.it_entrysize); + kdb_printf("it_blocksize: 0x%lx\n", (unsigned long)it.it_blocksize); + kdb_printf("it_hint: 0x%lx\n", (unsigned long)it.it_hint); + kdb_printf("it_largehint: 0x%lx\n", (unsigned long)it.it_largehint); + kdb_printf("it_halfpoint: 0x%lx\n", (unsigned long)it.it_halfpoint); #ifdef CONFIG_SMP - kdb_printf("lock: 0x%x \n",(uint)kt.lock.lock); + kdb_printf("it_lock: 0x%lx\n", (unsigned long)it.it_lock.lock); #endif + kdb_printf("it_mapsize: 0x%lx\n", (unsigned long)it.it_mapsize); + kdb_printf("it_map: 0x%lx\n", (unsigned long)it.it_map); - nr = kdba_readarea_size(tce_table_address+56,&kt.mlbm.maxLevel,8); - kdb_printf(" maxLevel: 0x%x \n",(uint)kt.mlbm.maxLevel); - totalpages=0; - for (i=0;i> j) & 0x01) + alloced++; + } + } } - kdb_printf(" full:0x%x empty:0x%x pages:0x%x\n",full,empty,levelpages); - } else { - kdb_printf(" numBits/numBytes mismatch..? \n"); - } - totalpages+=levelpages; } - kdb_printf(" Total pages:0x%x\n",totalpages); + kdb_printf("TCE entries alloced = %ld\n", alloced); + kdb_printf("TCE entries free = %ld\n", it.it_mapsize - alloced); + kdb_printf("\n"); return 0; } -#endif int kdba_kernelversion(int argc, const char **argv, const char **envp, struct pt_regs *regs){ @@ -1732,13 +1702,12 @@ dn->phb = phb; kdb_printf("dn: %p \n",dn); - kdb_printf(" phb : %p\n",dn->phb); - kdb_printf(" name : %s\n",dn->name); - kdb_printf(" full_name: %s\n",dn->full_name); - kdb_printf(" busno : 0x%x\n",dn->busno); - kdb_printf(" devfn : 0x%x\n",dn->devfn); - // XXX fix me later, bring up to date - // kdb_printf(" tce_table: %p\n",dn->tce_table); + kdb_printf(" phb : %p\n", dn->phb); + kdb_printf(" name : %s\n", dn->name); + kdb_printf(" full_name : %s\n", dn->full_name); + kdb_printf(" busno : 0x%x\n", dn->busno); + kdb_printf(" devfn : 0x%x\n", dn->devfn); + kdb_printf(" iommu_table : %p\n", dn->iommu_table); return NULL; } @@ -2043,8 +2012,7 @@ kdb_register("superreg", kdba_super_regs, "superreg", "display super_regs", 0); kdb_register("msr", kdba_dissect_msr, "msr", "dissect msr", 0); kdb_register("halt", kdba_halt, "halt", "halt machine", 0); - // XXX fix me later, tce has changed radically - // kdb_register("tce_table", kdba_dump_tce_table, "tce_table [full]", "dump the tce table located at ", 0); + kdb_register("tce_table", kdba_dump_tce_table, "tce_table ", "dump the tce table located at ", 0); kdb_register("kernel", kdba_kernelversion, "version", "display running kernel version", 0); kdb_register("pci_info", kdba_dump_pci_info, "dump_pci_info", "dump pci device info", 0); kdb_register("dump", kdba_dump, "dump (all|basic)", "dump all info", 0); diff -Naur temp/ameslab/arch/ppc64/kernel/pSeries_iommu.c ameslab/arch/ppc64/kernel/pSeries_iommu.c --- temp/ameslab/arch/ppc64/kernel/pSeries_iommu.c 2004-03-22 14:17:52.000000000 +0530 +++ ameslab/arch/ppc64/kernel/pSeries_iommu.c 2004-03-24 15:58:14.000000000 +0530 @@ -210,6 +210,7 @@ tbl->it_index = 0; tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; + tbl->it_type = TCE_PCI; } /* @@ -245,6 +246,7 @@ tbl->it_index = dma_window[0]; tbl->it_entrysize = sizeof(union tce_entry); tbl->it_blocksize = 16; + tbl->it_type = TCE_PCI; } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 27 03:31:38 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 26 Mar 2004 10:31:38 -0600 Subject: PCI Hotplug Slot Naming Scheme Message-ID: <1080318698.17590.0.camel@verve.austin.ibm.com> Opinions requested. The PCI Hotplug module currently uses IBM-style bus names when naming the PCI Hotplug slots. These are registered as kobjects, and represented by sysfs directories. For a particular new hardware platform, to remain unnamed :), a firmware guy tells us these names can be up to 48 chars long. We have been dealing with 10 character slot names up to this point. Our options moving forward are 1) Continue to use ibm slot names, for which the hotplug directory could look like: # ls /sys/bus/pci/slots .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 These are just 23 chars long, imagine 48. 2) Use linux-style bus names, as in xxxx:xx:xx:x. This is more consistent with other PCI Hotplug implementations, and the names are always 12 chars long. Although it's late in the game to be asking such questions, we'd rather change things now if necessary than after this functionality ships. Thoughts? Thanks- John ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 27 03:42:26 2004 From: greg at kroah.com (Greg KH) Date: Fri, 26 Mar 2004 08:42:26 -0800 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <1080318698.17590.0.camel@verve.austin.ibm.com> References: <1080318698.17590.0.camel@verve.austin.ibm.com> Message-ID: <20040326164226.GA21259@kroah.com> On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > # ls /sys/bus/pci/slots > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > These are just 23 chars long, imagine 48. Ick, ick, ick. I say no. Those names mean nothing to anyone familiar with Linux. > 2) Use linux-style bus names, as in xxxx:xx:xx:x. This is more consistent > with other PCI Hotplug implementations, and the names are always 12 chars > long. Yes. It's time to bring ppc64 kicking and screaming into the Linux fold :) > Although it's late in the game to be asking such questions, we'd rather change > things now if necessary than after this functionality ships. Thoughts? It's not too late. It's just a name change. The tools out there don't care what the name of the slots are, right? I know my tools don't, and they work accross all Linux platforms that have pci hotplug slots :) thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Sat Mar 27 03:59:55 2004 From: olh at suse.de (Olaf Hering) Date: Fri, 26 Mar 2004 17:59:55 +0100 Subject: [PATCH] pci_dma_mapping_error in 2.6.5 Message-ID: <20040326165955.GA15785@suse.de> I'm not sure if this patch is correct, at least the compiler is now happy. diff -purN /tmp/linux-2.6.4/drivers/net/iseries_veth.c ./drivers/net/iseries_veth.c --- /tmp/linux-2.6.4/drivers/net/iseries_veth.c 2004-03-26 16:28:22.000000000 +0000 +++ ./drivers/net/iseries_veth.c 2004-03-26 16:46:26.000000000 +0000 @@ -455,7 +455,7 @@ static int veth_transmit_to_one(struct s /* Is it really necessary to check the length and address * fields of the first entry here? */ - if (!pci_dma_error(dma_address)) { + if (!pci_dma_mapping_error(dma_address)) { msg->skb = skb; msg->data.addr[0] = dma_address; msg->data.len[0] = dma_length; diff -purN /tmp/linux-2.6.4/drivers/scsi/ibmvscsi/ibmvscsis.c ./drivers/scsi/ibmvscsi/ibmvscsis.c --- /tmp/linux-2.6.4/drivers/scsi/ibmvscsi/ibmvscsis.c 2004-03-26 16:28:22.000000000 +0000 +++ ./drivers/scsi/ibmvscsi/ibmvscsis.c 2004-03-26 16:46:22.000000000 +0000 @@ -1863,7 +1863,7 @@ static int initialize_crq_queue(struct c queue->size * sizeof(*queue->msgs), PCI_DMA_BIDIRECTIONAL); - if (pci_dma_error(queue->msg_token)) + if (pci_dma_mapping_error(queue->msg_token)) goto map_failed; rc = plpar_hcall_norets(H_REG_CRQ, adapter->dma_dev->unit_address, queue->msg_token, PAGE_SIZE); diff -purN /tmp/linux-2.6.4/drivers/scsi/ibmvscsi/rpa_vscsi.c ./drivers/scsi/ibmvscsi/rpa_vscsi.c --- /tmp/linux-2.6.4/drivers/scsi/ibmvscsi/rpa_vscsi.c 2004-03-26 16:28:22.000000000 +0000 +++ ./drivers/scsi/ibmvscsi/rpa_vscsi.c 2004-03-26 16:47:44.000000000 +0000 @@ -169,7 +169,7 @@ int ibmvscsi_init_crq_queue(struct crq_q queue->size * sizeof(*queue->msgs), PCI_DMA_BIDIRECTIONAL); - if (pci_dma_error(queue->msg_token)) + if (pci_dma_mapping_error(queue->msg_token)) goto map_failed; rc = plpar_hcall_norets(H_REG_CRQ, -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From garyhade at us.ibm.com Sat Mar 27 04:10:39 2004 From: garyhade at us.ibm.com (Gary Hade) Date: Fri, 26 Mar 2004 09:10:39 -0800 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <1080318698.17590.0.camel@verve.austin.ibm.com>; from johnrose@austin.ibm.com on Fri, Mar 26, 2004 at 10:31:38AM -0600 References: <1080318698.17590.0.camel@verve.austin.ibm.com> Message-ID: <20040326091039.B1294@us.ibm.com> John, The 'acpiphp' ACPI PCI hotplug driver uses a simple 1,2,3,... slot directory naming scheme with the longer linux-style names stored in an 'address' files within each slot directory. If you changed to this simplified slot directory naming scheme you could store both styles in separate files within each slot directory. Gary -- Gary Hade IBM Linux Technology Center 503-578-4503 IBM T/L: 775-4503 garyhade at us.ibm.com http://www.ibm.com/linux/ltc On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > Opinions requested. The PCI Hotplug module currently uses IBM-style bus > names when naming the PCI Hotplug slots. These are registered as kobjects, > and represented by sysfs directories. > > For a particular new hardware platform, to remain unnamed :), a firmware guy > tells us these names can be up to 48 chars long. We have been dealing with > 10 character slot names up to this point. Our options moving forward are > > 1) Continue to use ibm slot names, for which the hotplug directory could > look like: > > # ls /sys/bus/pci/slots > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > These are just 23 chars long, imagine 48. > > 2) Use linux-style bus names, as in xxxx:xx:xx:x. This is more consistent > with other PCI Hotplug implementations, and the names are always 12 chars > long. > > Although it's late in the game to be asking such questions, we'd rather change > things now if necessary than after this functionality ships. Thoughts? > > Thanks- > John > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 27 04:22:12 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 26 Mar 2004 11:22:12 -0600 Subject: Fw: PCI Hotplug Slot Naming Scheme In-Reply-To: References: Message-ID: <1080321732.17585.4.camel@verve.austin.ibm.com> Interesting idea. Given that we're already required to keep up with two identifiers (linu-style and ibm-style), I'm kinda hesitant to introduce another arbitrary one. But along these lines, we could store the 48-char IBM-style name in an "address" or similar attribute. Thanks- John > > John, > > The 'acpiphp' ACPI PCI hotplug driver uses a simple 1,2,3,... > > slot directory naming scheme with the longer linux-style > > names stored in an 'address' files within each slot directory. > > > If you changed to this simplified slot directory naming > > scheme you could store both styles in separate files within > > each slot directory. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 27 04:32:14 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 26 Mar 2004 11:32:14 -0600 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: References: Message-ID: <1080322334.17590.6.camel@verve.austin.ibm.com> Don't we already do that? The problem is directories with 48 character names. Thanks- John On Fri, 2004-03-26 at 11:29, Linda Xie wrote: > We can keep symlinks between "bus ids" and "loc-codes" and create > symlinks look like, > 0001:00:02.6 --> U7311.D11.104CE9A-P1-C1, So our tools can still use > "loc-codes". > > Thoughts? > > Thanks, > > Linda > > Inactive hide details for > garyhade at us.ltcfwd.linux.ibm.comgaryhade@us.ltcfwd.linux.ibm.com > > > > > garyhade at us.ltcfwd.linux.ibm.com > Sent by: owner-linuxppc64-dev at lists.linuxppc.org > > 03/26/04 11:10 AM > > > > > > To: John Rose > > cc: External List > > Subject: Re: PCI > Hotplug Slot Naming > Scheme > > > John, > The 'acpiphp' ACPI PCI hotplug driver uses a simple 1,2,3,... > slot directory naming scheme with the longer linux-style > names stored in an 'address' files within each slot directory. > > If you changed to this simplified slot directory naming > scheme you could store both styles in separate files within > each slot directory. > > Gary > > -- > Gary Hade > IBM Linux Technology Center > 503-578-4503 IBM T/L: 775-4503 > garyhade at us.ibm.com > http://www.ibm.com/linux/ltc > > On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > > > Opinions requested. The PCI Hotplug module currently uses IBM-style > bus > > names when naming the PCI Hotplug slots. These are registered as > kobjects, > > and represented by sysfs directories. > > > > For a particular new hardware platform, to remain unnamed :), a > firmware guy > > tells us these names can be up to 48 chars long. We have been > dealing with > > 10 character slot names up to this point. Our options moving > forward are > > > > 1) Continue to use ibm slot names, for which the hotplug directory > could > > look like: > > > > # ls /sys/bus/pci/slots > > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > > > These are just 23 chars long, imagine 48. > > > > 2) Use linux-style bus names, as in xxxx:xx:xx:x. This is more > consistent > > with other PCI Hotplug implementations, and the names are always > 12 chars > > long. > > > > Although it's late in the game to be asking such questions, we'd > rather change > > things now if necessary than after this functionality ships. > Thoughts? > > > > Thanks- > > John > > > > > > > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 27 04:44:55 2004 From: greg at kroah.com (Greg KH) Date: Fri, 26 Mar 2004 09:44:55 -0800 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <1080322334.17590.6.camel@verve.austin.ibm.com> References: <1080322334.17590.6.camel@verve.austin.ibm.com> Message-ID: <20040326174455.GA22722@kroah.com> On Fri, Mar 26, 2004 at 11:32:14AM -0600, John Rose wrote: > > Don't we already do that? The problem is directories with 48 character > names. What problem? It should work just fine. I thought we detailed that yesterday. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Mar 27 05:50:13 2004 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 26 Mar 2004 12:50:13 -0600 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326164226.GA21259@kroah.com>; from greg@kroah.com on Fri, Mar 26, 2004 at 08:42:26AM -0800 References: <1080318698.17590.0.camel@verve.austin.ibm.com> <20040326164226.GA21259@kroah.com> Message-ID: <20040326125013.G26076@forte.austin.ibm.com> On Fri, Mar 26, 2004 at 08:42:26AM -0800, Greg KH wrote: > > On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > > > # ls /sys/bus/pci/slots > > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > > > These are just 23 chars long, imagine 48. > > Ick, ick, ick. I say no. Those names mean nothing to anyone familiar > with Linux. Yes, but they do mean something to the sysadmin who has to figure out the physical location of some device. In particular, the HMC does not use the Linux naming scheme. There's need for both until more sophisticated management tools can be developed. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 27 05:56:50 2004 From: greg at kroah.com (Greg KH) Date: Fri, 26 Mar 2004 10:56:50 -0800 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326125013.G26076@forte.austin.ibm.com> References: <1080318698.17590.0.camel@verve.austin.ibm.com> <20040326164226.GA21259@kroah.com> <20040326125013.G26076@forte.austin.ibm.com> Message-ID: <20040326185650.GA24407@kroah.com> On Fri, Mar 26, 2004 at 12:50:13PM -0600, linas at austin.ibm.com wrote: > On Fri, Mar 26, 2004 at 08:42:26AM -0800, Greg KH wrote: > > On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > > > > > # ls /sys/bus/pci/slots > > > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > > > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > > > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > > > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > > > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > > > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > > > > > These are just 23 chars long, imagine 48. > > > > Ick, ick, ick. I say no. Those names mean nothing to anyone familiar > > with Linux. > > Yes, but they do mean something to the sysadmin who has to figure out > the physical location of some device. Then don't reinvent the wheel and do what the acpi pci hotplug driver does today and put that system specific information into a file in the slot directory. > In particular, the HMC does not use the Linux naming scheme. There's > need for both until more sophisticated management tools can be > developed. Hm, pcihpview handles the acpi stuff just fine today. It would also handle the ppc64 driver just as well. thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 27 05:57:26 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 26 Mar 2004 12:57:26 -0600 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326174455.GA22722@kroah.com> References: <1080322334.17590.6.camel@verve.austin.ibm.com> <20040326174455.GA22722@kroah.com> Message-ID: <1080327445.17590.10.camel@verve.austin.ibm.com> It's not that I doubt that sysfs can handle 48 character kobjects, I just think it's ugly. Whether it's a directory or symlink. And I'd like to avoid such ugliness when it's possible to do so. Thanks- John On Fri, 2004-03-26 at 11:44, Greg KH wrote: > On Fri, Mar 26, 2004 at 11:32:14AM -0600, John Rose wrote: > > > > Don't we already do that? The problem is directories with 48 character > > names. > > What problem? It should work just fine. I thought we detailed that > yesterday. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Mar 27 06:01:45 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 26 Mar 2004 13:01:45 -0600 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326125013.G26076@forte.austin.ibm.com> References: <1080318698.17590.0.camel@verve.austin.ibm.com> <20040326164226.GA21259@kroah.com> <20040326125013.G26076@forte.austin.ibm.com> Message-ID: <1080327705.17590.16.camel@verve.austin.ibm.com> > > Ick, ick, ick. I say no. Those names mean nothing to anyone familiar > > with Linux. > > Yes, but they do mean something to the sysadmin who has to figure out > the physical location of some device. In particular, the HMC does not > use the Linux naming scheme. A sysadmin performing a hotplug/DLPAR operation uses either the HMC or a command-line tool to do so, and the command-line tools will still accept the ibm-style name no matter what interface is decided upon. So keeping around ibm-style directory names doesn't particularly ease the job of a sysadmin, IMHO. Thanks- John ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Sat Mar 27 06:02:53 2004 From: meissner at suse.de (Marcus Meissner) Date: Fri, 26 Mar 2004 20:02:53 +0100 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326185650.GA24407@kroah.com> References: <1080318698.17590.0.camel@verve.austin.ibm.com> <20040326164226.GA21259@kroah.com> <20040326125013.G26076@forte.austin.ibm.com> <20040326185650.GA24407@kroah.com> Message-ID: <20040326190253.GB29512@suse.de> On Fri, Mar 26, 2004 at 10:56:50AM -0800, Greg KH wrote: > > On Fri, Mar 26, 2004 at 12:50:13PM -0600, linas at austin.ibm.com wrote: > > On Fri, Mar 26, 2004 at 08:42:26AM -0800, Greg KH wrote: > > > On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > > > > > > > # ls /sys/bus/pci/slots > > > > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > > > > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > > > > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > > > > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > > > > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > > > > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > > > > > > > These are just 23 chars long, imagine 48. > > > > > > Ick, ick, ick. I say no. Those names mean nothing to anyone > > > familiar with Linux. Or at least do a seperate directory tree with those names... CIao, Marcus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From greg at kroah.com Sat Mar 27 07:47:38 2004 From: greg at kroah.com (Greg KH) Date: Fri, 26 Mar 2004 12:47:38 -0800 Subject: PCI Hotplug Slot Naming Scheme In-Reply-To: <20040326125013.G26076@forte.austin.ibm.com> References: <1080318698.17590.0.camel@verve.austin.ibm.com> <20040326164226.GA21259@kroah.com> <20040326125013.G26076@forte.austin.ibm.com> Message-ID: <20040326204737.GA27861@kroah.com> On Fri, Mar 26, 2004 at 12:50:13PM -0600, linas at austin.ibm.com wrote: > On Fri, Mar 26, 2004 at 08:42:26AM -0800, Greg KH wrote: > > > > On Fri, Mar 26, 2004 at 10:31:38AM -0600, John Rose wrote: > > > > > > # ls /sys/bus/pci/slots > > > .. U7311.D11.104CE9A-P1-C1 U7879.001.DQD0027-P1-C2 > > > U7311.D11.104CE9A-P1-C2 U7879.001.DQD0027-P1-C3 > > > U7311.D11.104CE9A-P1-C3 U7879.001.DQD0027-P1-C4 > > > U7311.D11.104CE9A-P1-C5 U7879.001.DQD0027-P1-C5 > > > U7311.D11.104CE9A-P1-C6 U7879.001.DQD0027-P1-C6 > > > U7311.D11.104CE9A-P1-C7 U9117.570.104F3DC-V1-C0 > > > > > > These are just 23 chars long, imagine 48. > > > > Ick, ick, ick. I say no. Those names mean nothing to anyone familiar > > with Linux. > > Yes, but they do mean something to the sysadmin who has to figure out > the physical location of some device. In particular, the HMC does not > use the Linux naming scheme. There's need for both until more sophisticated > management tools can be developed. Again, follow the way of the other pci hotplug drivers, like acpi, and put that physical location stuff into a file in the directory per device. Consistancy people, consistancy... thanks, greg k-h ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From justinb at constantdata.com Sat Mar 27 13:01:11 2004 From: justinb at constantdata.com (Justin Banks) Date: Fri, 26 Mar 2004 20:01:11 -0600 Subject: ppc64 cross-compiler on iSeries and stat() Message-ID: <20040327020111.GC18611@homesrv.constantdata.com> Hello - Consider the following program : #include #include int main(int argc, char *argv[]) { struct stat sb; if (stat(argv[1], &sb) == 0) { printf("File '%s' ", argv[1]); if (S_ISDIR(sb.st_mode)) printf("is a directory\n"); else if (S_ISREG(sb.st_mode)) printf("is a regular file\n"); else printf("has mode %ld\n", sb.st_mode); } return (0); } [root at PW840L02 root]# gcc foo.c -o foo [root at PW840L02 root]# ./foo Mail File 'Mail' is a directory [root at PW840L02 root]# ./foo hwlist.txt File 'hwlist.txt' is a regular file All is well. however, [root at PW840L02 root]# powerpc64-linux-gcc foo.c -o foo [root at PW840L02 root]# ./foo Mail File 'Mail' has mode 0 [root at PW840L02 root]# ./foo hwlist.txt File 'hwlist.txt' has mode 0 [root at PW840L02 root]# ls -ld Mail hwlist.txt drwx------ 7 root root 4096 Feb 3 06:07 Mail -rw-r--r-- 1 root root 18250 Apr 7 2003 hwlist.txt gdb confirms this odd (to me) behaviour. For reasons I won't go into here, I must compile with the cross compiler. Unfortunately, I also need to call stat(). Any ideas? -justinb -- Justin Banks Constant Data, Inc. http://www.constantdata.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Sat Mar 27 14:49:38 2004 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Fri, 26 Mar 2004 21:49:38 -0600 Subject: ppc64 cross-compiler on iSeries and stat() In-Reply-To: <20040327020111.GC18611@homesrv.constantdata.com> Message-ID: Which Distro? Which Service Pack? Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From justinb at constantdata.com Sun Mar 28 02:22:44 2004 From: justinb at constantdata.com (Justin Banks) Date: Sat, 27 Mar 2004 09:22:44 -0600 Subject: ppc64 cross-compiler on iSeries and stat() In-Reply-To: References: <20040327020111.GC18611@homesrv.constantdata.com> Message-ID: <20040327152244.GG18611@homesrv.constantdata.com> Steve Munroe wrote > > Which Distro? Which Service Pack? SuSE SLES 8, Kernel 2.4.21-111-iseries64 -justinb -- Justin Banks Constant Data, Inc. http://www.constantdata.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Mon Mar 29 04:27:43 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Sun, 28 Mar 2004 12:27:43 -0600 (CST) Subject: [PATCH] ppc64: Use full DART table on G5 Message-ID: Hi, Below patch increases the DART table to use the full size. We allocate a full 16MB page anyway, so there's no difference in memory consumption. Thanks to Ben for spotting this, it was left over from debugging... ===== arch/ppc64/kernel/prom.c 1.64 vs edited ===== --- 1.64/arch/ppc64/kernel/prom.c Mon Mar 22 04:17:13 2004 +++ edited/arch/ppc64/kernel/prom.c Sun Mar 28 11:54:59 2004 @@ -792,8 +792,8 @@ if (lmb_end_of_DRAM() <= 0x80000000ull && !RELOC(iommu_force_on)) return; - /* 512 pages is max DART tablesize. */ - RELOC(dart_tablesize) = 1UL << 19; + /* 512 pages (2MB) is max DART tablesize. */ + RELOC(dart_tablesize) = 1UL << 21; /* 16MB (1 << 24) alignment. We allocate a full 16Mb chuck since we * will blow up an entire large page anyway in the kernel mapping */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Mon Mar 29 04:54:17 2004 From: meissner at suse.de (Marcus Meissner) Date: Sun, 28 Mar 2004 20:54:17 +0200 Subject: ppc64 cross-compiler on iSeries and stat() In-Reply-To: <20040327152244.GG18611@homesrv.constantdata.com> References: <20040327020111.GC18611@homesrv.constantdata.com> <20040327152244.GG18611@homesrv.constantdata.com> Message-ID: <20040328185417.GA25515@suse.de> On Sat, Mar 27, 2004 at 09:22:44AM -0600, Justin Banks wrote: > > Steve Munroe wrote > > > > Which Distro? Which Service Pack? > > SuSE SLES 8, Kernel 2.4.21-111-iseries64 Versions of: cross-ppc64-glibc glibc ? Ciao, Marcus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From justinb at constantdata.com Mon Mar 29 06:10:28 2004 From: justinb at constantdata.com (Justin Banks) Date: Sun, 28 Mar 2004 14:10:28 -0600 Subject: ppc64 cross-compiler on iSeries and stat() In-Reply-To: <20040328185417.GA25515@suse.de> References: <20040327020111.GC18611@homesrv.constantdata.com> <20040327152244.GG18611@homesrv.constantdata.com> <20040328185417.GA25515@suse.de> Message-ID: <20040328201028.GC11277@homesrv.constantdata.com> Marcus Meissner wrote > On Sat, Mar 27, 2004 at 09:22:44AM -0600, Justin Banks wrote: > > > > Steve Munroe wrote > > > > > > Which Distro? Which Service Pack? > > > > SuSE SLES 8, Kernel 2.4.21-111-iseries64 > > Versions of: > cross-ppc64-glibc > glibc Hopefully this is what you're after : [root at PW840L02 root]# rpm -qa | grep glibc glibc-2.2.5-139 glibc-64bit-8.1-69 glibc-devel-2.2.5-139 glibc-locale-2.2.5-139 cross-ppc64-glibc-2.2.5-82 Additionally, here's something else I discovered : [root at PW840L02 root]# cat foo3.c #include #include void *foo(void *x) { while(1) { printf("here I am\n"); sleep(5); } return NULL; } int main(void) { pthread_attr_t a; pthread_t t; pthread_attr_init(&a); pthread_attr_setstacksize(&a, 65536); pthread_attr_setdetachstate(&a, PTHREAD_CREATE_DETACHED); pthread_create(&t, &a, foo, NULL); while (1) { printf("do de da\n"); sleep(3); } exit(0); } [root at PW840L02 root]# gcc foo3.c -o foo3 -lpthread [root at PW840L02 root]# ./foo3 here I am do de da [root at PW840L02 root]# gcc foo3.c -o foo3 -static -lpthread [root at PW840L02 root]# ./foo3 here I am do de da [root at PW840L02 root]# powerpc64-linux-gcc foo3.c -o foo3 -lpthread [root at PW840L02 root]# ./foo3 here I am do de da [root at PW840L02 root]# powerpc64-linux-gcc foo3.c -o foo3 -static -lpthread [root at PW840L02 root]# ./foo3 Segmentation fault Any application statically linked against libpthread and compiled with the cross-compiler SEGVs in libc_internal_tsd_set() -justinb -- Justin Banks Constant Data, Inc. http://www.constantdata.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Mon Mar 29 06:29:34 2004 From: meissner at suse.de (Marcus Meissner) Date: Sun, 28 Mar 2004 22:29:34 +0200 Subject: ppc64 cross-compiler on iSeries and stat() In-Reply-To: <20040328201028.GC11277@homesrv.constantdata.com> References: <20040327020111.GC18611@homesrv.constantdata.com> <20040327152244.GG18611@homesrv.constantdata.com> <20040328185417.GA25515@suse.de> <20040328201028.GC11277@homesrv.constantdata.com> Message-ID: <20040328202934.GA29837@suse.de> On Sun, Mar 28, 2004 at 02:10:28PM -0600, Justin Banks wrote: > Marcus Meissner wrote > > On Sat, Mar 27, 2004 at 09:22:44AM -0600, Justin Banks wrote: > > > > > > Steve Munroe wrote > > > > > > > > Which Distro? Which Service Pack? > > > > > > SuSE SLES 8, Kernel 2.4.21-111-iseries64 > > > > Versions of: > > cross-ppc64-glibc > > glibc > > Hopefully this is what you're after : > > [root at PW840L02 root]# rpm -qa | grep glibc > glibc-2.2.5-139 > glibc-64bit-8.1-69 > glibc-devel-2.2.5-139 > glibc-locale-2.2.5-139 > cross-ppc64-glibc-2.2.5-82 Thanks. Please upgrade to the current maintenance update level from our maintenance web, which is: cross-ppc64-glibc-2.2.5-140 glibc-2.2.5-143 glibc-devel-2.2.5-143 This fixes at least the static pthread linking issue. Ciao, Marcus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Mon Mar 29 19:50:10 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 29 Mar 2004 11:50:10 +0200 Subject: Bug on syscall error test for 64 bits apps In-Reply-To: <1080284301.1389.11.camel@gaston> References: <1080284301.1389.11.camel@gaston> Message-ID: <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> > It seems that we have a bug in the syscall exit code path, where > @@ -139,7 +139,7 @@ > 91: > #endif > li r10,-_LAST_ERRNO > - cmpl 0,r3,r10 > + cmpld 0,r3,r10 > blt 30f > neg r3,r3 > 22: ld r10,_CCR(r1) /* Set SO bit in CR */ Patch looks fine to me. Also, cmpl with three arguments is not a legal PowerPC instruction, only a POWER instruction. I can't find any GAS flag to stop it assembling POWER-only instructions/mnemonics, though. binutils bug. Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Tue Mar 30 10:50:37 2004 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 30 Mar 2004 10:20:37 +0930 Subject: Bug on syscall error test for 64 bits apps In-Reply-To: <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> References: <1080284301.1389.11.camel@gaston> <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> Message-ID: <20040330005037.GD23980@bubble.modra.org> On Mon, Mar 29, 2004 at 11:50:10AM +0200, Segher Boessenkool wrote: > Also, cmpl with three arguments is not a legal PowerPC instruction, > only a POWER instruction. I can't find any GAS flag to stop it > assembling POWER-only instructions/mnemonics, though. > > binutils bug. feature :) -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sfr at canb.auug.org.au Tue Mar 30 15:54:31 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 30 Mar 2004 15:54:31 +1000 Subject: [PATCH] Make 2.4 boot when built with a newer compiler Message-ID: <20040330155431.175e9333.sfr@canb.auug.org.au> Hi Anton, This is a backport of the __attribute_used__ stuff from 2.6 so that 2.4 will build with the GCC hammer branch and 3.4 etc. Built and booted for iSeries - without this patch, it does not boot when built with gcc 3.3.3-hammer. Please apply to AmesLab. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN ppc64-linux-2.4/include/linux/compiler.h ppc64-linux-2.4.used/include/linux/compiler.h --- ppc64-linux-2.4/include/linux/compiler.h 2003-08-22 12:30:56.000000000 +1000 +++ ppc64-linux-2.4.used/include/linux/compiler.h 2004-03-30 15:15:18.000000000 +1000 @@ -13,4 +13,18 @@ #define likely(x) __builtin_expect((x),1) #define unlikely(x) __builtin_expect((x),0) +#if __GNUC__ > 3 +#define __attribute_used__ __attribute((__used__)) +#elif __GNUC__ == 3 +#if __GNUC_MINOR__ >= 3 +# define __attribute_used__ __attribute__((__used__)) +#else +# define __attribute_used__ __attribute__((__unused__)) +#endif /* __GNUC_MINOR__ >= 3 */ +#elif __GNUC__ == 2 +#define __attribute_used__ __attribute__((__unused__)) +#else +#define __attribute_used__ /* not implemented */ +#endif /* __GNUC__ */ + #endif /* __LINUX_COMPILER_H */ diff -ruN ppc64-linux-2.4/include/linux/init.h ppc64-linux-2.4.used/include/linux/init.h --- ppc64-linux-2.4/include/linux/init.h 2003-08-22 12:30:56.000000000 +1000 +++ ppc64-linux-2.4.used/include/linux/init.h 2004-03-30 15:22:50.000000000 +1000 @@ -2,6 +2,7 @@ #define _LINUX_INIT_H #include +#include /* These macros are used to mark some functions or * initialized data (doesn't apply to uninitialized data) @@ -51,7 +52,7 @@ extern initcall_t __initcall_start, __initcall_end; #define __initcall(fn) \ - static initcall_t __initcall_##fn __init_call = fn + static initcall_t __initcall_##fn __attribute_used__ __init_call = fn #define __exitcall(fn) \ static exitcall_t __exitcall_##fn __exit_call = fn @@ -67,7 +68,7 @@ #define __setup(str, fn) \ static char __setup_str_##fn[] __initdata = str; \ - static struct kernel_param __setup_##fn __attribute__((unused)) __initsetup = { __setup_str_##fn, fn } + static struct kernel_param __setup_##fn __attribute_used__ __initsetup = { __setup_str_##fn, fn } #endif /* __ASSEMBLY__ */ @@ -76,12 +77,12 @@ * or exit time. */ #define __init __attribute__ ((__section__ (".text.init"))) -#define __exit __attribute__ ((unused, __section__(".text.exit"))) +#define __exit __attribute_used__ __attribute__ (( __section__(".text.exit"))) #define __initdata __attribute__ ((__section__ (".data.init"))) -#define __exitdata __attribute__ ((unused, __section__ (".data.exit"))) -#define __initsetup __attribute__ ((unused,__section__ (".setup.init"))) -#define __init_call __attribute__ ((unused,__section__ (".initcall.init"))) -#define __exit_call __attribute__ ((unused,__section__ (".exitcall.exit"))) +#define __exitdata __attribute_used__ __attribute__ ((__section__ (".data.exit"))) +#define __initsetup __attribute_used__ __attribute__ ((__section__ (".setup.init"))) +#define __init_call __attribute_used__ __attribute__ ((__section__ (".initcall.init"))) +#define __exit_call __attribute_used__ __attribute__ ((__section__ (".exitcall.exit"))) /* For assembly routines */ #define __INIT .section ".text.init","ax" diff -ruN ppc64-linux-2.4/include/linux/module.h ppc64-linux-2.4.used/include/linux/module.h --- ppc64-linux-2.4/include/linux/module.h 2003-08-22 12:30:56.000000000 +1000 +++ ppc64-linux-2.4.used/include/linux/module.h 2004-03-30 15:24:25.000000000 +1000 @@ -254,9 +254,9 @@ */ #define MODULE_GENERIC_TABLE(gtype,name) \ static const unsigned long __module_##gtype##_size \ - __attribute__ ((unused)) = sizeof(struct gtype##_id); \ + __attribute_used__ = sizeof(struct gtype##_id); \ static const struct gtype##_id * __module_##gtype##_table \ - __attribute__ ((unused)) = name + __attribute_used__ = name /* * The following license idents are currently accepted as indicating free @@ -319,7 +319,7 @@ */ #define MODULE_GENERIC_TABLE(gtype,name) \ static const struct gtype##_id * __module_##gtype##_table \ - __attribute__ ((unused, __section__(".data.exit"))) = name + __attribute_used__ __attribute__ ((__section__(".data.exit"))) = name #ifndef __GENKSYMS__ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Tue Mar 30 16:54:45 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 30 Mar 2004 08:54:45 +0200 Subject: Bug on syscall error test for 64 bits apps In-Reply-To: <20040330005037.GD23980@bubble.modra.org> References: <1080284301.1389.11.camel@gaston> <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> <20040330005037.GD23980@bubble.modra.org> Message-ID: <20C5AFFE-8217-11D8-B279-000A95A4DC02@kernel.crashing.org> On 30-mrt-04, at 2:50, Alan Modra wrote: > On Mon, Mar 29, 2004 at 11:50:10AM +0200, Segher Boessenkool wrote: >> Also, cmpl with three arguments is not a legal PowerPC instruction, >> only a POWER instruction. I can't find any GAS flag to stop it >> assembling POWER-only instructions/mnemonics, though. >> >> binutils bug. > > feature :) Bug, actually. It refuses all other POWER insns (I used -mppc64, but anything else that refuses POWER insns will do). Looking deeper into it, the cause is that the L field is defined as optional. But it is not. (Ref.: Book I and/or the PEM v2). The L field (in binutils) is only used for one other insn: tlbie. But tlbie always has two operands (says Book III). But tlbie always has one operand (says the PEM). Furthermore, tlbiel is defined as having only one operand. But tlbiel always has two operands (says Book III). But tlbiel doesn't exist (says the PEM). I suggest for now, we fix the cmp[l] thing, and make both tlbie and tlbiel take either one or two operands. If that's okay with you, I'll send some patches shortly. Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Tue Mar 30 16:56:46 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 30 Mar 2004 16:56:46 +1000 Subject: Bug on syscall error test for 64 bits apps In-Reply-To: <20C5AFFE-8217-11D8-B279-000A95A4DC02@kernel.crashing.org> References: <1080284301.1389.11.camel@gaston> <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> <20040330005037.GD23980@bubble.modra.org> <20C5AFFE-8217-11D8-B279-000A95A4DC02@kernel.crashing.org> Message-ID: <1080629806.1216.26.camel@gaston> > Bug, actually. > > It refuses all other POWER insns (I used -mppc64, but anything else > that refuses POWER insns will do). > > Looking deeper into it, the cause is that the L field is defined as > optional. But it is not. (Ref.: Book I and/or the PEM v2). > > The L field (in binutils) is only used for one other insn: tlbie. But > tlbie always has two operands (says Book III). But tlbie always has > one operand (says the PEM). > > Furthermore, tlbiel is defined as having only one operand. But tlbiel > always has two operands (says Book III). But tlbiel doesn't exist > (says the PEM). > > I suggest for now, we fix the cmp[l] thing, and make both tlbie and > tlbiel take either one or two operands. > > If that's okay with you, I'll send some patches shortly. And break kernel compile ... Hrm, that will not be fun Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Tue Mar 30 17:01:06 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 30 Mar 2004 09:01:06 +0200 Subject: Bug on syscall error test for 64 bits apps In-Reply-To: <1080629806.1216.26.camel@gaston> References: <1080284301.1389.11.camel@gaston> <780A91EC-8166-11D8-BC8C-000A95A4DC02@kernel.crashing.org> <20040330005037.GD23980@bubble.modra.org> <20C5AFFE-8217-11D8-B279-000A95A4DC02@kernel.crashing.org> <1080629806.1216.26.camel@gaston> Message-ID: <042D11E8-8218-11D8-B279-000A95A4DC02@kernel.crashing.org> > And break kernel compile ... Hrm, that will not be fun So fix the kernel. Hey, all of this was triggered by a kernel bug you found; if binutils would have worked correctly, you would have trivially found this otherwise not-all-that-easy to find, potentially devastating kernel bug. :-) Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/