From jdewand at redhat.com Sat Nov 1 00:42:14 2003 From: jdewand at redhat.com (Julie DeWandel) Date: Fri, 31 Oct 2003 08:42:14 -0500 Subject: [PATCH][2.4] Fix to allow ethtool to work with large bus #s References: <3F9EB327.4090300@redhat.com> <3FA1CEFF.8010504@austin.ibm.com> Message-ID: <3FA266B6.50208@redhat.com> Good suggestions -- I changed to use snprintf and increased the slot_name field to 12 chars (although 10 should do it -- 4 hex + ':' + 2 hex + '.' + 1 dec + '\0'). Updated patch attached. Thanks for the comments! Olof Johansson wrote: > On some very large configs, we allocate (sparse) PCI bus numbers, > sometimes larger than 256. There's still a little room in the string, > since the other parts can maximum take up 6 characters (':' + 2 hex + > '.' + 1 dec + '\0'). snprintf() would protect against writing over the > following fields at least. > > Why not allocate 12 characters while you're at it, since that's what > the field will be rounded up to in the struct? > > Besides that, the patch looks good to me. > > > -Olof > -- Julie DeWandel Red Hat, Inc. Tel (978) 692-3113 x23251 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pci_large_bus_fix.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031031/3d4f7a74/attachment.txt From engebret at vnet.ibm.com Sat Nov 1 02:46:33 2003 From: engebret at vnet.ibm.com (Dave Engebretsen) Date: Fri, 31 Oct 2003 09:46:33 -0600 Subject: Logical CPU Numbering for DLPAR In-Reply-To: <3F9F0525.6000907@us.ibm.com> References: <3F9F0525.6000907@us.ibm.com> Message-ID: <3FA283D9.90902@vnet.ibm.com> Hi, In 2.4 we implemented a logcial cpu numbering design which was changed to use a physical numbering in 2.5. This creates a problem when a partition is configured for DLPAR of processors, so we have been moving the DLPAR code base back to the original design point. The fundamental problem with using physical numbers is that the kernel must assume the platform maximum number of processors and alllocate and configure all data structures to allow for this. Even though we do not do all the right things here yet, I would expect that one uncoming focus in the kernel will be to free up unused statically allocated per-cpu data in order to reduce the memory footprint for small partitions while maintaining single binary compatibilly on large systems. The problem is made worse when N-1 support is taken into account. Note that the platform archicture does not require that firmware create a logical collection of processors, and in fact legacy firmware explicitly does not.logically number processors. By logically numbering the processors, we only need to account for the max partition configuration as defined at partition boot time. In support of this, a new property has been added to cpu nodes in the device tree: linux,logical-name (similar to what we have for phbs). This property allows a binding to be made between a physical processor (as represented by the reg property of the node) and the logical name used by Linux. This is required when the platform specifies a hardware processor to remove (for example in a gard operation) but the software stack must select a logical cpu to offline. It turns out this code we have in the DLPAR tree gets wound up a bit in the SMT support code, which I'm in the middle of reporting to 2.6, so I'd like to move to this model in the general tree ASAP. Comments? Dave. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Sat Nov 1 03:24:10 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 31 Oct 2003 10:24:10 -0600 Subject: [RFC][PATCH] discontiguous vterms In-Reply-To: <3FA1D677.7060008@austin.ibm.com> Message-ID: Hey Olof, thanks for the comments... On Thursday, Oct 30, 2003, at 21:26 US/Central, Olof Johansson wrote: > > hvc_get_vterms(): "u32 array[]" will never really be an array: array > arguments always get casted to pointers instead. Defining it as one > would be less misleading (u32 *array). Personally I think it would be more misleading. u32 *array is a pointer to a u32 (variable name aside). u32 array[] is a pointer to a whole array of u32's. Obviously there's no functional difference, but... > hvc_console_setup(): How about a variable that the return value from > hvc_get_vterms() is saved in and compared to instead? Makes it a > little more readable. Sure. > Other cleanup: There's a struct called hvc_struct, and an > instantiation of the same struct called hvc_struct. Maybe the > instantiation should be renamed to "hvcs" or something similar? Agreed. I've noticed it, but wanted to stick with functional changes for now. > Also, one final suggestion: Right now there's hvc_* functions both in > pSeries_lpar.c and hvc.c. Would it be better to split the namespace to > be vterm_* or vty_* in pSeries_lpar instead? This would also make a > distinction between the hvc driver and the vterm hypervisor interface. > vterm_get_vterms() would obviously need to be renamed. :) Yup, it would be a good distinction to make. I'll think about the naming... :) -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From santil at us.ibm.com Sat Nov 1 03:42:46 2003 From: santil at us.ibm.com (Santiago Leon) Date: Fri, 31 Oct 2003 11:42:46 -0500 Subject: [PATCH][2.4] VIO infrastructure update - review request In-Reply-To: <1067530214.22506.4.camel@verve> References: <1067530214.22506.4.camel@verve> Message-ID: <3FA29106.1020108@us.ibm.com> Hey John.. thanks for the comments... > - vio_register_device(): Might it be more clear to accept a struct > device_node * here rather than a void *? Is this done for > portability? Currently you have to cast the device node * to void * > when calling, and then cast the void * back to device node once > inside the function. Yes, the idea of using a void * was to make it more portable. Now that I think about it, it doesn't make too much sense because if someone else wants to use the interface, they can write their own vio_dev struct. I will apply the suggestions, Thanks, -- ******************************************************************** Santiago A. Leon Power Linux Development IBM Linux Technology Center Off: (919) 254-6048 T/L: 444-6048 Fax: (919) 543-7378 ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From brking at us.ibm.com Sat Nov 1 10:17:03 2003 From: brking at us.ibm.com (Brian King) Date: Fri, 31 Oct 2003 17:17:03 -0600 Subject: [PATCH] pci_alloc_consistent memory conservation Message-ID: <3FA2ED6F.8080609@us.ibm.com> Currently pci_alloc_consistent calls __get_free_pages to allocate memory. This results in a lot of wasted memory if people call pci_alloc_consistent a lot for odd sized allocations (which is what I wanted to do). The attached patch changes pci_alloc_consistent to use kmalloc and pci_map_single to accomplish the same function, with the added benefit of kmalloc to reduce the amount of wasted memory. Comments? -- Brian King eServer Storage I/O IBM Linux Technology Center -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: alloc_consistent.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031031/bf8297ca/attachment.txt From agl at us.ibm.com Sat Nov 1 10:22:42 2003 From: agl at us.ibm.com (Adam Litke) Date: 31 Oct 2003 15:22:42 -0800 Subject: Kernel call graphing Message-ID: <1067642564.545.34.camel@agtpad> Hello, I have written a tool to generate deterministic function call graphs. For each function profiled, its callers are listed above (along with call counts) and its callees (and counts) below. This is useful for things like lightweight code coverage analysis and finding out exactly who's calling expensive functions, etc. The tool has both a kernel patch and a user-space utility. Both are located in the files section of the LSE project at Sourceforge www.sf.net/projects/lse. Direct links to the latest releases follow: http://osdn.dl.sourceforge.net/sourceforge/lse/kcg-2.6.0-test9-2.patch http://osdn.dl.sourceforge.net/sourceforge/lse/readcg-0.3.tar.gz I have tried this on an 8-way p650 and it works great. I have not tried on iSeries (the usual reason, no hw :) so it almost certainly will not work. Please give it a try. Comments, suggestions, flames? welcomed :) Here is some sample output taken during a kernel build for the curious: ================================================== 2675523 .do_anonymous_page 103048 .do_wp_page 68635 .do_no_page 20233 .generic_file_aio_write_nolock 12209 .do_page_cache_readahead 10328 .__get_free_pages 6892 .copy_strings 2194 .find_or_create_page 849 .copy_strings32 194 .do_generic_mapping_read .__alloc_pages 2900105 .buffered_rmqueue ================================================== 21436 do_work 19769 .cpu_idle 11371 .pipe_wait 6283 .io_schedule 6046 .do_exit 5187 .sys_wait4 2266 .worker_thread 1186 .schedule_timeout 419 .unmap_vmas 272 .__down 134 .do_generic_mapping_read 131 .schedule_timeout 127 .interruptible_sleep_on 100 .do_get_write_access 97 .ksoftirqd 65 .kjournald 45 .generic_file_aio_write_nolock 25 .__pdflush 18 .truncate_inode_pages 17 .journal_commit_transaction 15 .link_path_walk 8 .link_path_walk 5 .wait_for_completion 4 .migration_thread 2 .journal_commit_transaction 2 .journal_stop 2 .log_wait_commit .schedule 75292 .sched_clock 67071 .__switch_to 23200 .load_balance 20292 .recalc_task_prio 6002 .__mmdrop 6 .__put_task_struct 6 .free_task ================================================== -- Adam Litke (agl at us.ibm.com) IBM Linux Technology Center (503) 578 - 3283 t/l 775 - 3283 ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Sat Nov 1 10:38:01 2003 From: haveblue at us.ibm.com (Dave Hansen) Date: 31 Oct 2003 15:38:01 -0800 Subject: Kernel call graphing In-Reply-To: <1067642564.545.34.camel@agtpad> References: <1067642564.545.34.camel@agtpad> Message-ID: <1067643480.18867.4.camel@nighthawk> On Fri, 2003-10-31 at 15:22, Adam Litke wrote: > I have written a tool to generate deterministic function call graphs. And it's based on the old code from kernprof, right? > ================================================== > 2675523 .do_anonymous_page > 103048 .do_wp_page > 68635 .do_no_page > 20233 .generic_file_aio_write_nolock > 12209 .do_page_cache_readahead > 10328 .__get_free_pages > 6892 .copy_strings > 2194 .find_or_create_page > 849 .copy_strings32 > 194 .do_generic_mapping_read > .__alloc_pages > 2900105 .buffered_rmqueue > ================================================== One thing that kernprof did was sort the callers descending and the called functions ascending. That way, the most important data were near the function name that you were looking at. ... > 20233 .generic_file_aio_write_nolock > 68635 .do_no_page > 103048 .do_wp_page > 2675523 .do_anonymous_page > .__alloc_pages > 2900105 .buffered_rmqueue > ================================================== -- Dave Hansen haveblue at us.ibm.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Mon Nov 3 16:19:11 2003 From: paulus at samba.org (Paul Mackerras) Date: Mon, 3 Nov 2003 16:19:11 +1100 Subject: [RFC][PATCH] discontiguous vterms In-Reply-To: References: Message-ID: <16293.58703.997620.496640@cargo.ozlabs.ibm.com> Hollis Blanchard writes: > In practice, for us, they probably are contiguous. However, > transforming a list of vty at N nodes (from find_devices()) into [base, > number] in hvc_count() is a bit awkward at best, and could end up > making some assumptions about the order in which OF presents device > nodes to us. Also, this discontiguous functionality might be useful in > other environments (e.g. PPC simulators). So hvc_get_vterms() replaces > hvc_count() to communicate the vterm numbers. Any chance of these things appearing and disappearing at runtime? Perhaps we should have the arch code calling some hvc_console routine to tell it about consoles, rather than having hvc_console calling arch code to discover the consoles as at present. The only thing is that getting the `add_console' routine called at the right time in the boot sequence might be interesting. Thoughts? Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Mon Nov 3 19:14:40 2003 From: olh at suse.de (Olaf Hering) Date: Mon, 3 Nov 2003 09:14:40 +0100 Subject: biarch support for 2.6, missing kdb pieces In-Reply-To: <20031014230853.GC610@krispykreme> References: <20031014181622.GA22599@suse.de> <20031014230853.GC610@krispykreme> Message-ID: <20031103081440.GA6828@suse.de> On Wed, Oct 15, Anton Blanchard wrote: > > Hi, > > > This patch adds biarch support, removes the hardcoded power4 (which will > > result in a missing tlbiel handling in pSeries_flush_hash_range()) and > > adds a missing kdb file (copied from 2.4). > > It makes sense to optimise for POWER4 in the default kernel. Up until > recently -mcpu=POWER4 would tune for it but would still run on RS64 and > POWER3. current bk doesnt boot unless I remove the -mcpu=power4 and #if 0 the tlbiel calls. system is p610, hangs in instantiating rtas. we need plan B now. soon. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From engebret at vnet.ibm.com Tue Nov 4 01:57:31 2003 From: engebret at vnet.ibm.com (Dave Engebretsen) Date: Mon, 03 Nov 2003 08:57:31 -0600 Subject: /proc/rtas/* vs. /proc/ppc64/rtas/* References: <1067470244.19752.38.camel@verve> <1067522664.32232.1306.camel@tin.ibm.com > <16289.33252.209893.357707@cargo.ozlabs.ibm.com> Message-ID: <3FA66CDB.6F1634C5@vnet.ibm.com> Paul Mackerras wrote: > > Jake Moilanen writes: > > > We went from /proc/rtas to /proc/ppc64/rtas in 2.4. > > Does anyone remember the reasoning behind this? At first blush I > don't see any good reason for having the extra ppc64/ in there, but I > could be missing something. > I believe the reason was simply to be consistent with where the rest of the ppc64 /proc interfaces have been put. Dave. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From engebret at vnet.ibm.com Tue Nov 4 02:22:07 2003 From: engebret at vnet.ibm.com (Dave Engebretsen) Date: Mon, 03 Nov 2003 09:22:07 -0600 Subject: [RFC][PATCH] discontiguous vterms References: <16293.58703.997620.496640@cargo.ozlabs.ibm.com> Message-ID: <3FA6729F.DA1A5AD9@vnet.ibm.com> Paul Mackerras wrote: > > Hollis Blanchard writes: > > > In practice, for us, they probably are contiguous. However, > > transforming a list of vty at N nodes (from find_devices()) into [base, > > number] in hvc_count() is a bit awkward at best, and could end up > > making some assumptions about the order in which OF presents device > > nodes to us. Also, this discontiguous functionality might be useful in > > other environments (e.g. PPC simulators). So hvc_get_vterms() replaces > > hvc_count() to communicate the vterm numbers. > > Any chance of these things appearing and disappearing at runtime? > Perhaps we should have the arch code calling some hvc_console routine > to tell it about consoles, rather than having hvc_console calling arch > code to discover the consoles as at present. The only thing is that > getting the `add_console' routine called at the right time in the boot > sequence might be interesting. > > Thoughts? We definately need to support runtime add/remove of vty's and vty should end up fitting in the general hot plug infrastrcutre ultimatly. Dave. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Nov 4 02:51:04 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 3 Nov 2003 09:51:04 -0600 Subject: [RFC][PATCH] discontiguous vterms In-Reply-To: <16293.58703.997620.496640@cargo.ozlabs.ibm.com> Message-ID: <880F036A-0E15-11D8-8049-000A95A0560C@us.ibm.com> On Sunday, Nov 2, 2003, at 23:19 US/Central, Paul Mackerras wrote: > > Any chance of these things appearing and disappearing at runtime? It's ambiguous. There are a few factors: From my reading of the Logical Resource Dynamic Reconfiguration documentation, it's possible. Currently the Linux's vterms always "hosted by" the hypervisor. In the future this is not necessarily true -- you can have one LPAR hosting the consoles of others. (This is called "vty-server".) It may not make much sense to dynamically add another console session to the hypervisor/HMC. However, once that vty-server functionality is in place, it may make sense to dynamically add vty-server consoles (e.g. you have one console LPAR and as you boot more LPARs you want to add more vterm sessions to the console LPAR). However, the vty-server description seems to have something else in mind: it describes a system in which you have N console sessions at boot, and at runtime you instruct the hypervisor to connect those sessions to different clients, as long as you were only talking to N at a time. Perhaps this is intended to avoid having the dynamically add dozens of consoles for dozens of LPARs. In conclusion, I have no idea. But if DR vterms is done, I don't expect that functionality to show up for a while (for whatever that's worth ;) . > Perhaps we should have the arch code calling some hvc_console routine > to tell it about consoles, rather than having hvc_console calling arch > code to discover the consoles as at present. The only thing is that > getting the `add_console' routine called at the right time in the boot > sequence might be interesting. Yup, that's just like the (hotpluggable) PCI driver model which presents a table of interesting devices and then has a "probe" function called when such devices show up. To make the driver properly support removable consoles (via that hotplug mechanism) would take significantly more work. I started going down that path, but the diff to the driver became large, and since all that code would go unused right now (and maybe forever) I couldn't justify it... -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Nov 4 02:51:04 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 3 Nov 2003 09:51:04 -0600 Subject: [RFC][PATCH] discontiguous vterms Message-ID: <00A613FB-0E33-11D8-9FBF-000A95A0560C@us.ibm.com> [resend; had some bounces the first time] On Sunday, Nov 2, 2003, at 23:19 US/Central, Paul Mackerras wrote: > > Any chance of these things appearing and disappearing at runtime? It's ambiguous. There are a few factors: From my reading of the Logical Resource Dynamic Reconfiguration documentation, it's possible. Currently the Linux's vterms always "hosted by" the hypervisor. In the future this is not necessarily true -- you can have one LPAR hosting the consoles of others. (This is called "vty-server".) It may not make much sense to dynamically add another console session to the hypervisor/HMC. However, once that vty-server functionality is in place, it may make sense to dynamically add vty-server consoles (e.g. you have one console LPAR and as you boot more LPARs you want to add more vterm sessions to the console LPAR). However, the vty-server description seems to have something else in mind: it describes a system in which you have N console sessions at boot, and at runtime you instruct the hypervisor to connect those sessions to different clients, as long as you were only talking to N at a time. Perhaps this is intended to avoid having the dynamically add dozens of consoles for dozens of LPARs. In conclusion, I have no idea. But if DR vterms is done, I don't expect that functionality to show up for a while (for whatever that's worth ;) > Perhaps we should have the arch code calling some hvc_console routine > to tell it about consoles, rather than having hvc_console calling arch > code to discover the consoles as at present. The only thing is that > getting the `add_console' routine called at the right time in the boot > sequence might be interesting. Yup, that's just like the (hotpluggable) PCI driver model which presents a table of interesting devices and then has a "probe" function called when such devices show up. To make the driver properly support removable consoles (via that hotplug mechanism) would take significantly more work. I started going down that path, but the diff to the driver became large, and since all that code would go unused right now (and maybe forever) I couldn't justify it... -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Nov 4 03:09:30 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 3 Nov 2003 10:09:30 -0600 Subject: power4 compiling In-Reply-To: <20031103081440.GA6828@suse.de> Message-ID: <1B509FAF-0E18-11D8-8049-000A95A0560C@us.ibm.com> On Monday, Nov 3, 2003, at 02:14 US/Central, Olaf Hering wrote: > > On Wed, Oct 15, Anton Blanchard wrote: >> >>> This patch adds biarch support, removes the hardcoded power4 (which >>> will >>> result in a missing tlbiel handling in pSeries_flush_hash_range()) >>> and >>> adds a missing kdb file (copied from 2.4). >> >> It makes sense to optimise for POWER4 in the default kernel. Up until >> recently -mcpu=POWER4 would tune for it but would still run on RS64 >> and >> POWER3. > > current bk doesnt boot unless I remove the -mcpu=power4 and #if 0 the > tlbiel calls. system is p610, hangs in instantiating rtas. > > we need plan B now. soon. Mike Wolf has a final power4 patch that I believe just he just needs to test, but other things may have distracted him. You shouldn't need to comment out tlbiel. Instead do this: CFLAGS = -mtune=power4 -Wa,-mcpu=power4 That will tell the compiler not to use POWER4-specific instructions, but also tell the assembler it's ok when we use inline asm statements. If you look closely at the use of tlbiel, you'll see we don't in fact use them on CPUs that don't support it. As for your p610, I don't know. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jdewand at redhat.com Tue Nov 4 07:41:51 2003 From: jdewand at redhat.com (Julie DeWandel) Date: Mon, 03 Nov 2003 15:41:51 -0500 Subject: [2.4 Patch] Fix to get all sysrq-T output on iSeries virtual console Message-ID: <3FA6BD8F.8000106@redhat.com> Hi, When debugging iSeries problems, I've been frustrated sometimes when I find I cannot get all of the sysrq-t output on my console. Usually data for only about 5 or 6 tasks prints out and that is it. I investigated this problem and found the cause. When a person types sysrq-t, the viocons interrupt handler intercepts it and calls directly into handle_sysrq. This routine, in turn, formats data for the requested sysrq key and sends it to the viocons_write routine. The viocons_write routine can only write up to the limit of the internal overflow buffers in the viocons driver and nothing further until it gets an ACK back from the OS/400. But it doesn't look for the ACK. So I modified the code to look for ACKs coming back once the viocons_write routine fills up the overflow buffers. However, this *still* wasn't enough to fix the problem. As it turns out, even if we can detect an awaiting ACK, calling process_iSeries_events() will refuse to work since we are still inside an interrupt handler and there is a guard to protect you from servicing interrupts from within an interrupt handler. My final idea was to not service sysrq interrupts from within the critical area of the interrupt service routine. Instead, I borrowed a byte from an unused field within the paca and I cache the requested sysrq key there. Then, on the way out of the ISR, this field is checked. If it contains a non-zero character, handle_sysrq is called with the character to service the request. I'm sure this patch is controversial and perhaps too much of a hack, but it does get the job done in those critical situations where you absolutely need the output of sysrq-t to debug a soft hang. What do you think? -- Julie DeWandel Red Hat, Inc. Tel (978) 692-3113 x23251 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sysrq_patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031103/e338605e/attachment.txt From johnrose at austin.ibm.com Tue Nov 4 08:32:15 2003 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 03 Nov 2003 15:32:15 -0600 Subject: lock file question Message-ID: <1067895134.13400.26.camel@verve> As part of a new RTAS interface, I'm writing a library that needs to manage mmap() use of a phys mem region in /dev/mem/. If library user B requires an mmap()'ed buffer of a subset of this memory, then the library should ensure that it does not hand out a region being used by library user A. The current plan is to create a lock file "/var/lock/LCK..librtas", and use byte-range locks to manage the use of buffers that fall within this region. When such a buffer is needed, the lock file will be probed to select an appropriate region for use. The client function can then mmap /dev/mem with the confidence that it has exclusive control over the region. Does this seem like to right direction to go? I'm under the impression that I shouldn't use byte-range locks directly on /dev/mem. Thoughts? Thanks- John "Rtas Interface" Rose :) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Tue Nov 4 11:23:44 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 4 Nov 2003 10:53:44 +1030 Subject: power4 compiling In-Reply-To: <1B509FAF-0E18-11D8-8049-000A95A0560C@us.ibm.com> References: <20031103081440.GA6828@suse.de> <1B509FAF-0E18-11D8-8049-000A95A0560C@us.ibm.com> Message-ID: <20031104002344.GG2506@bubble.sa.bigpond.net.au> On Mon, Nov 03, 2003 at 10:09:30AM -0600, Hollis Blanchard wrote: > You shouldn't need to comment out tlbiel. Instead do this: > CFLAGS = -mtune=power4 -Wa,-mcpu=power4 > That will tell the compiler not to use POWER4-specific instructions, > but also tell the assembler it's ok when we use inline asm statements. No, this is dangerous. -mcpu=power4 tells the assembler to use the power4 form of mtcrf if it so happens that only one field of cr is being set. gcc may generate such mtcrf instruction. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Tue Nov 4 11:49:26 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 03 Nov 2003 18:49:26 -0600 Subject: [PATCH] minor fixes for proc_ppc64.c Message-ID: <3FA6F796.1010408@austin.ibm.com> Hi- This fixes the following problems with the device node addition code: - An off-by-one bug that causes the /proc/ppc64/ofdt command "parser" to get confused. - A failure to nul-terminate property values which are ascii strings. This would cause oopses in routines such as of_find_by_name. Thanks to Dave Engebretsen for tracking this down. Nathan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: proc_ppc64_cleanup.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031103/8bb54d56/attachment.txt From linas at austin.ibm.com Tue Nov 4 11:57:05 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 3 Nov 2003 18:57:05 -0600 Subject: lock file question In-Reply-To: <1067895134.13400.26.camel@verve>; from johnrose@austin.ibm.com on Mon, Nov 03, 2003 at 03:32:15PM -0600 References: <1067895134.13400.26.camel@verve> Message-ID: <20031103185705.A30762@forte.austin.ibm.com> On Mon, Nov 03, 2003 at 03:32:15PM -0600, John Rose wrote: > > As part of a new RTAS interface, I'm writing a library that needs to > manage mmap() use of a phys mem region in /dev/mem/. If library user B > requires an mmap()'ed buffer of a subset of this memory, then the > library should ensure that it does not hand out a region being used by > library user A. > > The current plan is to create a lock file "/var/lock/LCK..librtas", and > use byte-range locks to manage the use of buffers that fall within this > region. When such a buffer is needed, the lock file will be probed to > select an appropriate region for use. The client function can then mmap > /dev/mem with the confidence that it has exclusive control over the > region. Its not clear to me if security is a concern here. A cracker could write code that does this access and ignores the contents of /var/lock... (or, more easily, starts client A, mv /var/lock/LCK..librtas /var/lock/bogus, and then starts client b...) Note also, if librtas is needed for 'day-to-day' sysadmin purposes, then this interface might make it hard for any sysadmins who like to run chrooted environments, virtual servers, etc or systems that use anything other than plain-unix-root-user perms to get access to things. (e.g. SELinux, RSBAC, etc.) It would be painful for these people to set up systems where only the trusted users would get access to /dev/mem rtas but not others... > Does this seem like to right direction to go? I'm under the impression --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Tue Nov 4 12:11:33 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 03 Nov 2003 19:11:33 -0600 Subject: lock file question In-Reply-To: <20031103185705.A30762@forte.austin.ibm.com> References: <1067895134.13400.26.camel@verve> <20031103185705.A30762@forte.austin.ibm.com> Message-ID: <3FA6FCC5.8060309@austin.ibm.com> linas at austin.ibm.com wrote: > Its not clear to me if security is a concern here. A cracker could write > code that does this access and ignores the contents of /var/lock... > (or, more easily, starts client A, mv /var/lock/LCK..librtas /var/lock/bogus, > and then starts client b...) Anyone who has access to /dev/mem can do much worse damage than that which ever way you look at it. John's solution seems to help serialize access to parts of the RTAS buffers between the legitimate and well-behaved users of it. Right? -Olof -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Tue Nov 4 12:20:11 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 03 Nov 2003 19:20:11 -0600 Subject: hvc console sleep under spinlock In-Reply-To: <3FA1D997.9000707@austin.ibm.com> References: <20031028114307.GD20717@krispykreme> <3F9EFB36.6010206@austin.ibm.com> <3F9EFF93.7040502@austin.ibm.com> <3FA1D997.9000707@austin.ibm.com> Message-ID: <3FA6FECB.7080503@austin.ibm.com> Olof Johansson wrote: > The return value of kmalloc() is never verified to be non-NULL. Thanks, patch has been updated accordingly. Could this be pushed to mainline through Ameslab, or should it be taken to lkml? I ask because the hvc code is not in an arch-specific location in the source tree. Nathan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hvc_write.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031103/d48f09da/attachment.txt From shangg at dspchina.com Tue Nov 4 12:28:10 2003 From: shangg at dspchina.com (shanggao) Date: Tue, 4 Nov 2003 09:28:10 +0800 Subject: excellent plan References: <1067895134.13400.26.camel@verve> <20031103185705.A30762@forte.austin.ibm.com> <3FA6FCC5.8060309@austin.ibm.com> Message-ID: <008101c3a274$0ed545f0$4b01a8c0@shang> I am planing to design a smallest(15cm*8cm) computer in the world. It only has USB,1394,DVI interface,no any slowing PS/2,VGA,IDE,LPT...But a powerful GPU(such as ATI7500)is needed. It must be a lower power and silent fanless devices. PowerPC750FX(800Mhz~1GHz)is the system CPU. It is a very powerful PDA? thin-client?or a notebook? Maybe not all of above but it should be everthing. Harddisk,Keyboard,Mouse,Audio,Network..........all connected by USB or 1394 port. If anyone would like to play it with me,please contact me freely. midholy at yahoo.com or shangg at dspchina.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From rusty at au1.ibm.com Tue Nov 4 12:39:27 2003 From: rusty at au1.ibm.com (Rusty Russell) Date: Tue, 04 Nov 2003 12:39:27 +1100 Subject: lock file question In-Reply-To: Your message of "Mon, 03 Nov 2003 15:32:15 MDT." <1067895134.13400.26.camel@verve> Message-ID: <20031104021537.B554F189DD@ozlabs.au.ibm.com> In message <1067895134.13400.26.camel at verve> you write: > > As part of a new RTAS interface, I'm writing a library that needs to > manage mmap() use of a phys mem region in /dev/mem/. If library user B > requires an mmap()'ed buffer of a subset of this memory, then the > library should ensure that it does not hand out a region being used by > library user A. > > The current plan is to create a lock file "/var/lock/LCK..librtas", and > use byte-range locks to manage the use of buffers that fall within this > region. Yep, that's the most sane idea. You don't need to use ranges at all though, since there's little point in having more than one user at a time, even when it's possible. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 4 16:32:45 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 4 Nov 2003 16:32:45 +1100 Subject: hvc console sleep under spinlock In-Reply-To: <3FA6FECB.7080503@austin.ibm.com> References: <20031028114307.GD20717@krispykreme> <3F9EFB36.6010206@austin.ibm.com> <3F9EFF93.7040502@austin.ibm.com> <3FA1D997.9000707@austin.ibm.com> <3FA6FECB.7080503@austin.ibm.com> Message-ID: <20031104053245.GC1843@krispykreme> Hi Nathan, > Thanks, patch has been updated accordingly. Looks good. I just had a thought, is there an uppoer limit on count? (eg what happens when someone does a 1MB write()?) > Could this be pushed to mainline through Ameslab, or should it be taken > to lkml? I ask because the hvc code is not in an arch-specific location > in the source tree. These sort of things usually get pushed separately but with the 2.6 lockdown we may not get it in until after 2.6.0. Anton > diff -Nru a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c > --- a/drivers/char/hvc_console.c Mon Nov 3 19:10:06 2003 > +++ b/drivers/char/hvc_console.c Mon Nov 3 19:10:06 2003 > @@ -130,24 +130,26 @@ > const unsigned char *buf, int count) > { > struct hvc_struct *hp = tty->driver_data; > - char *p; > + char *p, *kbuf = NULL; > int todo, written = 0; > unsigned long flags; > > + if (from_user) { > + kbuf = kmalloc(count, GFP_KERNEL); > + if (!kbuf) > + return -ENOMEM; > + if (copy_from_user(kbuf, buf, count)) { > + kfree(kbuf); > + return -EFAULT; > + } > + buf = kbuf; > + } > spin_lock_irqsave(&hp->lock, flags); > while (count > 0 && (todo = N_OUTBUF - hp->n_outbuf) > 0) { > if (todo > count) > todo = count; > p = hp->outbuf + hp->n_outbuf; > - if (from_user) { > - todo -= copy_from_user(p, buf, todo); > - if (todo == 0) { > - if (written == 0) > - written = -EFAULT; > - break; > - } > - } else > - memcpy(p, buf, todo); > + memcpy(p, buf, todo); > count -= todo; > buf += todo; > hp->n_outbuf += todo; > @@ -155,7 +157,7 @@ > hvc_push(hp); > } > spin_unlock_irqrestore(&hp->lock, flags); > - > + kfree(kbuf); > return written; > } > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Wed Nov 5 01:47:49 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Tue, 4 Nov 2003 08:47:49 -0600 Subject: power4 compiling In-Reply-To: <20031104002344.GG2506@bubble.sa.bigpond.net.au> Message-ID: On Monday, Nov 3, 2003, at 18:23 US/Central, Alan Modra wrote: > > On Mon, Nov 03, 2003 at 10:09:30AM -0600, Hollis Blanchard wrote: >> You shouldn't need to comment out tlbiel. Instead do this: >> CFLAGS = -mtune=power4 -Wa,-mcpu=power4 >> That will tell the compiler not to use POWER4-specific instructions, >> but also tell the assembler it's ok when we use inline asm statements. > > No, this is dangerous. -mcpu=power4 tells the assembler to use the > power4 form of mtcrf if it so happens that only one field of cr is > being set. gcc may generate such mtcrf instruction. Oh. That's too bad. What we have is code like this (from pSeries_htab.c): if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { _tlbiel(va); } else { ... _tlbie(va); ... } Without CONFIG_POWER4_ONLY (i.e. without -mcpu=power4), the assembler will refuse 'tlbiel'. Any suggestions? -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Wed Nov 5 01:54:14 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 4 Nov 2003 15:54:14 +0100 Subject: power4 compiling In-Reply-To: References: <20031104002344.GG2506@bubble.sa.bigpond.net.au> Message-ID: <20031104145414.GA28687@suse.de> On Tue, Nov 04, Hollis Blanchard wrote: > Without CONFIG_POWER4_ONLY (i.e. without -mcpu=power4), the assembler > will refuse 'tlbiel'. Any suggestions? move it to some .c or .S file and use extra CFLAGS for that one or a bunch of files? -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Wed Nov 5 02:00:55 2003 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 4 Nov 2003 16:00:55 +0100 (CET) Subject: power4 compiling In-Reply-To: Message-ID: Hollis Blanchard wrote: > Oh. That's too bad. What we have is code like this (from > pSeries_htab.c): > > if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { > _tlbiel(va); > } else { > ... _tlbie(va); ... > } > > Without CONFIG_POWER4_ONLY (i.e. without -mcpu=power4), the assembler > will refuse 'tlbiel'. Any suggestions? Use something like static inline void _tlbiel(unsigned long va) { __asm__ __volatile__( ".long 0x7c000224 | (%0 << 11)" : : "r"(va) ); } (I hope this compiles :-) ) Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Wed Nov 5 02:21:44 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Tue, 4 Nov 2003 09:21:44 -0600 Subject: power4 compiling In-Reply-To: Message-ID: <994FD0D6-0EDA-11D8-BF79-000A95A0560C@us.ibm.com> On Tuesday, Nov 4, 2003, at 09:00 US/Central, Segher Boessenkool wrote: > > Use something like > > static inline void _tlbiel(unsigned long va) > { > __asm__ __volatile__( > ".long 0x7c000224 | (%0 << 11)" : : "r"(va) > ); > } If only we had some sort of automated tool to convert from human-readable assembly language into binary instructions... then this hack would be unnecessary. ;) -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Wed Nov 5 02:29:03 2003 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 4 Nov 2003 16:29:03 +0100 (CET) Subject: power4 compiling In-Reply-To: <994FD0D6-0EDA-11D8-BF79-000A95A0560C@us.ibm.com> Message-ID: Hollis Blanchard wrote: > > static inline void _tlbiel(unsigned long va) > > { > > __asm__ __volatile__( > > ".long 0x7c000224 | (%0 << 11)" : : "r"(va) > > ); > > } > > If only we had some sort of automated tool to convert from > human-readable assembly language into binary instructions... then this > hack would be unnecessary. ;) Yeah yeah yeah. Just use the hack until the majority of people have a fixed assembler ;-) Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Wed Nov 5 03:03:02 2003 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 04 Nov 2003 10:03:02 -0600 Subject: lock file question In-Reply-To: <20031104021537.B554F189DD@ozlabs.au.ibm.com> References: <20031104021537.B554F189DD@ozlabs.au.ibm.com> Message-ID: <1067961781.16458.15.camel@verve> > Yep, that's the most sane idea. You don't need to use ranges at all > though, since there's little point in having more than one user at a > time, even when it's possible. Agreed. The more I think about this, setting up the "kernel workarea" part of the library to handle more than user is overkill. There aren't many RTAS calls that require low mem buffers like these. If we do decide to enforce one user at a time, we have probably overdone the size of the reserved kernel buffer (64 * 1024 iirc). Of the calls I've seen, the biggest requirement of a single call is configure-connector. It needs 2 pages. Thoughts? John ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Wed Nov 5 07:46:02 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Tue, 04 Nov 2003 14:46:02 -0600 Subject: [RFC][PATCH] discontiguous vterms In-Reply-To: <16293.58703.997620.496640@cargo.ozlabs.ibm.com> References: <16293.58703.997620.496640@cargo.ozlabs.ibm.com> Message-ID: <3FA8100A.8090409@us.ibm.com> Paul Mackerras wrote: > Hollis Blanchard writes: > >>In practice, for us, they probably are contiguous. However, >>transforming a list of vty at N nodes (from find_devices()) into [base, >>number] in hvc_count() is a bit awkward at best, and could end up >>making some assumptions about the order in which OF presents device >>nodes to us. Also, this discontiguous functionality might be useful in >>other environments (e.g. PPC simulators). So hvc_get_vterms() replaces >>hvc_count() to communicate the vterm numbers. > > Any chance of these things appearing and disappearing at runtime? > Perhaps we should have the arch code calling some hvc_console routine > to tell it about consoles, rather than having hvc_console calling arch > code to discover the consoles as at present. The only thing is that > getting the `add_console' routine called at the right time in the boot > sequence might be interesting. This patch (vs 2.4 bk) does what you suggest. Overall description: - Breaks out struct hvc_struct into include/linux/hvconsole.h . It can be include/asm-ppc64/hvconsole.h if you like, but there's nothing arch-specific about it. - Breaks out the arch-specific code from arch/ppc64/kernel/pSeries_lpar.c to arch/ppc64/kernel/hvconsole.c (Note that I'm not quite sure what to do about the copyrights for those new files; the code is mostly Anton's and Paul's, and I'm just moving it.) - Have arch code call hvc_instantiate to notify the driver of a new console. Right now this discovery is triggered from hvc_console_init(), which is called from drivers/char/tty_io.c relatively early at boot. (Right now hvc_instantiate is pretty dumb, but right now we're not doing any hotplugging either.) - Make the transport code (hvc_put/get_chars) per-hvc rather than per-arch. This is to handle the case where one console uses the Host Virtual Serial Interface (which I'm about to implement) but others use the normal vty console. An ioctl call has also been added, which will be needed for HVSI support. - Supports discontiguous vterm numbers. This code has been tested. I'd like to get this committed to the 2.4 bk very soon so that I can implement HVSI while developers in other organizations are working on it too (which is now). -Hollis -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hvc.diff Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031104/083ea55e/attachment.txt From santil at us.ibm.com Wed Nov 5 08:57:46 2003 From: santil at us.ibm.com (Santiago Leon) Date: 04 Nov 2003 16:57:46 -0500 Subject: [PATCH] Removing inline asm from ibmveth.c - commit In-Reply-To: <3F9F0525.6000907@us.ibm.com> References: <3F9F0525.6000907@us.ibm.com> Message-ID: <1067983063.18722.63.camel@santit30> Hi everybody... If there are no objections, I'm going to commit this to 2.4 bk on Thursday... -- ******************************************************************** Santiago A. Leon Power Linux Development IBM Linux Technology Center Off: (919) 254-6048 T/L: 444-6048 Fax: (919) 543-7378 On Tue, 2003-10-28 at 19:09, Santiago Leon wrote: > There was a discussion in another list because someone added inline > assembly to ibmveth.c... This is my fix for plpar_hcall_8arg_2ret() so > that the inline assembly won't be necesary... I just added two lines of > assembly to pull the 9th and 10th parameters from the stack into R11 and > R12... Which is what the rest of the code expects. > > The patch includes the whole plpar_hcall_8arg_2ret() because it was > deleted by a previous patch... > > The patch is against the 2.4 bk tree... > > Let me know what you guys think... > -- > ******************************************************************** > Santiago A. Leon > Power Linux Development > IBM Linux Technology Center > Off: (919) 254-6048 T/L: 444-6048 Fax: (919) 543-7378 > > ______________________________________________________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: hcall_8arg_2ret-2.4.patch Type: text/x-patch Size: 3077 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031104/b95b2012/attachment.bin From amodra at bigpond.net.au Wed Nov 5 09:57:56 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Nov 2003 09:27:56 +1030 Subject: power4 compiling In-Reply-To: References: <994FD0D6-0EDA-11D8-BF79-000A95A0560C@us.ibm.com> Message-ID: <20031104225756.GU2506@bubble.sa.bigpond.net.au> On Tue, Nov 04, 2003 at 04:29:03PM +0100, Segher Boessenkool wrote: > people have a fixed assembler ;-) It's not too hard to fix either, just not at the top of my priority list at the moment. I envisage implementing assembler support for: .cpu power4 some power4 insns .cpu previous -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From aprasad at in.ibm.com Wed Nov 5 10:16:29 2003 From: aprasad at in.ibm.com (Anil K Prasad) Date: Wed, 5 Nov 2003 04:46:29 +0530 Subject: stack size limit on ppc/ppc64 Message-ID: Hi All, I have code which shouldn't work, working under ppc64 linux. int main() { char *p = (char*)0x50000000; while(p < 0xfffff000){ *p = 'a'; p++; } while(1); return 0; } Theoretically, above code should cause segmentation violation. But on ppc64 linux, it puts variable 'p ' in stack segment, I looked at /proc/pid/maps, and there was huge memory area from 0x50000000 ---> 0xFFFFF000. It behaving perfectly as expected on Intel Linux i.e. causing segmentation violation. Even on ppc64 If I make ptr to point to un-used segment between text and data, then it get signal 11. Is there any reason for increasing stack range from upper end address to any arbitrary address?? Thanks, Anil. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From santil at us.ibm.com Wed Nov 5 10:34:35 2003 From: santil at us.ibm.com (Santiago Leon) Date: 04 Nov 2003 18:34:35 -0500 Subject: [BUG][PATCH][2.4] Virtual Ethernet stops responding Message-ID: <1067988875.18722.221.camel@santit30> Hi, This trivial-looking patch fixes a race condition that permanently disabled the virtual ethernet interrupts... Engebretsen found this bug while running netperf for 20+ hours... The code deleted by this patch was supposed to check for an error condition where an interrupt happened while interrupts were disabled. In reality, it's checking if an interrupt happened while the __LINK_STATE_RX_SCHED bit is set in dev->state. The latter is normal behavior for the driver (see "rotting packet race-window avoidance scheme" in Documentation/net/NAPI_HOWTO.txt)... However, because this code re-disables the interrupts, there's small window where another processor or thread (that is executing the ibmveth_poll()), enables the interrupts right before they get disabled by this code fragment. I have tested this patch running netperf for 50+ hours... This patch is against the latest 2.4 bk tree, but it also works with a little fuzz in 2.5... Opinions, comments and suggestions are welcome... -- ******************************************************************** Santiago A. Leon Power Linux Development IBM Linux Technology Center Off: (919) 254-6048 T/L: 444-6048 Fax: (919) 543-7378 -------------- next part -------------- A non-text attachment was scrubbed... Name: ibmveth.patch Type: text/x-patch Size: 564 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031104/bba6a37d/attachment.bin From santil at us.ibm.com Wed Nov 5 10:37:38 2003 From: santil at us.ibm.com (Santiago Leon) Date: 04 Nov 2003 18:37:38 -0500 Subject: [PATCH][2.4] VIO infrastructure - commit In-Reply-To: <1067530214.22506.4.camel@verve> References: <1067530214.22506.4.camel@verve> Message-ID: <1067989058.18722.226.camel@santit30> Hi everybody, Here's the patch again with the following changes due to suggestions from last post: - Moved the functionality of vio_get_irq() into vio_register_device() and call vio_build_tce_table() from vio_register_device(). - Use prom_n_addr_cells() instead of calling get_property("#address-cells); - vio_register_device now accepts a device_node* instead of a void*. - Use list_for_each_entry() instead of list_for_each() plus list_entry(). - Fixed spelling errors. Unless there is an objection, I'm planning to commit it by Friday... -- ******************************************************************** Santiago A. Leon Power Linux Development IBM Linux Technology Center Off: (919) 254-6048 T/L: 444-6048 Fax: (919) 543-7378 -------------- next part -------------- A non-text attachment was scrubbed... Name: vio.patch Type: text/x-patch Size: 30758 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031104/604282ab/attachment.bin From sjmunroe at us.ibm.com Wed Nov 5 11:05:21 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Tue, 4 Nov 2003 18:05:21 -0600 Subject: stack size limit on ppc/ppc64 Message-ID: Anil K Prasad writes: > > I have code which shouldn't work, working under ppc64 linux. > >Theoretically, above code should cause segmentation violation. But >on ppc64 linux, it puts variable 'p ' in stack segment, I looked at >/proc/pid/maps, and there was huge memory area from 0x50000000 ---> >0xFFFFF000. I assume this a 32-bit application? This looks like it just auto-extending the stack segment. On Powerpc64 the stack can grow from 0xFFFFF000 down until it runs into another allocation or TASK_UNMAPPED_BASE (0x40000000 which is normally where ld.so loads). But it should check the ulimit first. What does ulimit -a say? Steven J. Munroe Power Linux Toolchain Architect IBM Corporation, Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From agl at us.ibm.com Wed Nov 5 11:24:23 2003 From: agl at us.ibm.com (Adam Litke) Date: 04 Nov 2003 16:24:23 -0800 Subject: Trivial build cleanup Message-ID: <1067991864.1770.43.camel@agtpad> I was getting annoyed with all the junk printed when generating the zImage so I converted the noisy commands over to the kbuild quiet_cmd framework. I was able to suppress the output of the imagesize.c target but not in the normal way due to shell escaping issues. Comments? diff -purN linux-anton2/arch/ppc64/boot/Makefile linux-kbuild/arch/ppc64/boot/Makefile --- linux-anton2/arch/ppc64/boot/Makefile 2003-09-01 17:57:59.000000000 -0700 +++ linux-kbuild/arch/ppc64/boot/Makefile 2003-11-04 16:02:24.000000000 -0800 @@ -98,12 +98,16 @@ $(call gz-sec, $(required)): $(obj)/kern $(obj)/kernel-initrd.gz: $(obj)/ramdisk.image.gz cp -f $(obj)/ramdisk.image.gz $@ +quiet_cmd_touch = TOUCH $@ + cmd_touch = touch $@ $(call src-sec, $(required) $(initrd)): $(obj)/kernel-%.c: $(obj)/kernel-%.gz FORCE - touch $@ + $(call cmd,touch) +quiet_cmd_addsect = OBJCOPY $@ + cmd_addsect = $(call addsection, $@) $(call obj-sec, $(required) $(initrd)): $(obj)/kernel-%.o: $(obj)/kernel-%.c FORCE $(call if_changed_dep,bootcc) - $(call addsection, $@) + $(call cmd,addsect) $(obj)/zImage: obj-boot += $(call obj-sec, $(required)) $(obj)/zImage: $(call obj-sec, $(required)) $(obj-boot) $(obj)/addnote FORCE @@ -113,14 +117,17 @@ $(obj)/zImage.initrd: obj-boot += $(call $(obj)/zImage.initrd: $(call obj-sec, $(required) $(initrd)) $(obj-boot) $(obj)/addnote FORCE $(call if_changed,addnote) -$(obj)/imagesize.c: vmlinux - @echo Generating $@ +define mkimgsize ls -l vmlinux | \ awk '{printf "/* generated -- do not edit! */\n" \ - "unsigned long vmlinux_filesize = %d;\n", $$5}' > $(obj)/imagesize.c + "unsigned long vmlinux_filesize = %d;\n", $$5}' > $(obj)/imagesize.c && \ $(CROSS_COMPILE)nm -n vmlinux | tail -n 1 | \ awk '{printf "unsigned long vmlinux_memsize = 0x%s;\n", substr($$1,8)}' \ >> $(obj)/imagesize.c +endef +$(obj)/imagesize.c: vmlinux + @echo ' GEN $@' + $(shell $(mkimgsize)) install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) sh -x $(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(TOPDIR)/System.map" "$(INSTALL_PATH)" -- Adam Litke (agl at us.ibm.com) IBM Linux Technology Center (503) 578 - 3283 t/l 775 - 3283 ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 5 13:24:43 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 5 Nov 2003 13:24:43 +1100 Subject: stack size limit on ppc/ppc64 In-Reply-To: References: Message-ID: <20031105022443.GG1843@krispykreme> Hi Anil, > I have code which shouldn't work, working under ppc64 linux. Sounds like a feature to me :) > Theoretically, above code should cause segmentation violation. But on ppc64 > linux, it puts variable 'p ' in stack segment, I looked at /proc/pid/maps, > and there was huge memory area from 0x50000000 ---> 0xFFFFF000. I guess technically its a bug, Paul has fixed it in ppc32. I attached the relevant parts of his fix (it comes from arch/ppc/kernel/fault.c) There are a few things to change for ppc64: - add the 64bit versions of the store with update instructions - keep in mind the 64bit ABI allows space below the sp to be used (it looks like the 2048 constant will cover this) Does anyone feel in the mood to work on this? Anton /* * Check whether the instruction at regs->nip is a store using * an update addressing form which will update r1. */ static int store_updates_sp(struct pt_regs *regs) { unsigned int inst; if (get_user(inst, (unsigned int *)regs->nip)) return 0; /* check for 1 in the rA field */ if (((inst >> 16) & 0x1f) != 1) return 0; /* check major opcode */ switch (inst >> 26) { case 37: /* stwu */ case 39: /* stbu */ case 45: /* sthu */ case 53: /* stfsu */ case 55: /* stfdu */ return 1; case 31: /* check minor opcode */ switch ((inst >> 1) & 0x3ff) { case 183: /* stwux */ case 247: /* stbux */ case 439: /* sthux */ case 695: /* stfsux */ case 759: /* stfdux */ return 1; } } return 0; } void do_page_fault(struct pt_regs *regs, unsigned long address, unsigned long error_code) { ... /* * A user-mode access to an address a long way below * the stack pointer is only valid if the instruction * is one which would update the stack pointer to the * address accessed if the instruction completed, * i.e. either stwu rs,n(r1) or stwux rs,r1,rb * (or the byte, halfword, float or double forms). * * If we don't check this then any write to the area * between the last mapped region and the stack will * expand the stack rather than segfaulting. */ if (address + 2048 < uregs->gpr[1] && (!user_mode(regs) || !store_updates_sp(regs))) goto bad_area; } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From aprasad at in.ibm.com Wed Nov 5 14:43:30 2003 From: aprasad at in.ibm.com (Anil K Prasad) Date: Wed, 5 Nov 2003 09:13:30 +0530 Subject: stack size limit on ppc/ppc64 In-Reply-To: <20031105022443.GG1843@krispykreme> Message-ID: >There are a few things to change for ppc64: >- add the 64bit versions of the store with update instructions >- keep in mind the 64bit ABI allows space below the sp to be used (it > looks like the 2048 constant will cover this) >Does anyone feel in the mood to work on this? I can work your code fragment and fine tune it.. Thanks, Anil. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Wed Nov 5 18:27:23 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Nov 2003 17:57:23 +1030 Subject: stack size limit on ppc/ppc64 In-Reply-To: References: <20031105022443.GG1843@krispykreme> Message-ID: <20031105072723.GB2506@bubble.sa.bigpond.net.au> On Wed, Nov 05, 2003 at 09:13:30AM +0530, Anil K Prasad wrote: > > >There are a few things to change for ppc64: > >- add the 64bit versions of the store with update instructions > >- keep in mind the 64bit ABI allows space below the sp to be used (it > > looks like the 2048 constant will cover this) > > >Does anyone feel in the mood to work on this? > > I can work your code fragment and fine tune it.. ppc64 stack frames are required to be set up using stdu or stdux (set back chain and update sp atomically), so I can't see that it's necessary to check all the byte and word insns. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Wed Nov 5 20:23:18 2003 From: paulus at samba.org (Paul Mackerras) Date: Wed, 5 Nov 2003 20:23:18 +1100 Subject: stack size limit on ppc/ppc64 In-Reply-To: <20031105072723.GB2506@bubble.sa.bigpond.net.au> References: <20031105022443.GG1843@krispykreme> <20031105072723.GB2506@bubble.sa.bigpond.net.au> Message-ID: <16296.49542.747948.827256@cargo.ozlabs.ibm.com> Alan Modra writes: > ppc64 stack frames are required to be set up using stdu or stdux (set > back chain and update sp atomically), so I can't see that it's necessary > to check all the byte and word insns. And on ppc32 the ABI says that they are set up with stwu or stwux. The reason for allowing the other forms is that I was making the kernel enforce the rule "no accesses allowed between the top of the heap and the stack pointer" (actually some constant offset below the stack pointer), where the "stack pointer" is interpreted to mean the *final* stack pointer value for st*u[x] instructions. I think it is more appropriate for the kernel to enforce that rule than for the kernel to require that programs follow the ELF ABI internally. Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Wed Nov 5 22:08:59 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Nov 2003 21:38:59 +1030 Subject: stack size limit on ppc/ppc64 In-Reply-To: <16296.49542.747948.827256@cargo.ozlabs.ibm.com> References: <20031105022443.GG1843@krispykreme> <20031105072723.GB2506@bubble.sa.bigpond.net.au> <16296.49542.747948.827256@cargo.ozlabs.ibm.com> Message-ID: <20031105110859.GE2506@bubble.sa.bigpond.net.au> On Wed, Nov 05, 2003 at 08:23:18PM +1100, Paul Mackerras wrote: > Alan Modra writes: > > > ppc64 stack frames are required to be set up using stdu or stdux (set > > back chain and update sp atomically), so I can't see that it's necessary > > to check all the byte and word insns. > > And on ppc32 the ABI says that they are set up with stwu or stwux. > The reason for allowing the other forms is that I was making the > kernel enforce the rule "no accesses allowed between the top of the > heap and the stack pointer" (actually some constant offset below the > stack pointer), I agree that this is a worthy aim. > where the "stack pointer" is interpreted to mean the > *final* stack pointer value for st*u[x] instructions. I think it is > more appropriate for the kernel to enforce that rule than for the > kernel to require that programs follow the ELF ABI internally. But I fail to see how allowing something like the following to extend the stack helps meet that aim. lis 9,-10 stbux 3,1,9 That stbux is just a wild write that also happens to fiddle with r1. It's _not_ a valid stack frame allocation, which must store the old value of r1, hence must use stwu or stwux (*) on ppc32 and stdu or stdux on ppc64. A byte or half-word write is just too small. (*) I suppose you could construct a sequence to store the old r1 using a float reg, but it's stretching credibility to beleive that anyone would actually try that! -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From segher at kernel.crashing.org Wed Nov 5 23:09:07 2003 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Wed, 5 Nov 2003 13:09:07 +0100 (CET) Subject: stack size limit on ppc/ppc64 In-Reply-To: <20031105110859.GE2506@bubble.sa.bigpond.net.au> Message-ID: Alan Modra wrote: > But I fail to see how allowing something like the following to > extend the stack helps meet that aim. > > lis 9,-10 > stbux 3,1,9 > > That stbux is just a wild write that also happens to fiddle with r1. > It's _not_ a valid stack frame allocation, which must store the old > value of r1, hence must use stwu or stwux (*) on ppc32 and stdu or stdux > on ppc64. A byte or half-word write is just too small. That is if you are assuming one of the current ABIs; there are other ways to restore the stack pointer, so it's possible to define an ABI for which this would work just fine (or an assembler program that doesn't care about ABIs at all, or whatnot). It's not the kernel's job to enforce an ABI. Or does the kernel actually *need* stack back-links? Segher ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Thu Nov 6 00:05:53 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 5 Nov 2003 23:35:53 +1030 Subject: stack size limit on ppc/ppc64 In-Reply-To: References: <20031105110859.GE2506@bubble.sa.bigpond.net.au> Message-ID: <20031105130553.GG2506@bubble.sa.bigpond.net.au> On Wed, Nov 05, 2003 at 01:09:07PM +0100, Segher Boessenkool wrote: > That is if you are assuming one of the current ABIs; Of course. I thought the idea was to distinguish stack frame allocation from other wild pointer writes. > It's not the kernel's job to enforce an ABI. There are a lot of things that a binary file needs to get right according to the ABI before it will run. Try running stuff from /dev/mouse. :) -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Nov 6 03:34:58 2003 From: anton at samba.org (Anton Blanchard) Date: Thu, 6 Nov 2003 03:34:58 +1100 Subject: [PATCH] pci_alloc_consistent memory conservation In-Reply-To: <3FA2ED6F.8080609@us.ibm.com> References: <3FA2ED6F.8080609@us.ibm.com> Message-ID: <20031105163458.GB19104@krispykreme> Hi Brian, > Currently pci_alloc_consistent calls __get_free_pages to allocate > memory. This results in a lot of wasted memory if people call > pci_alloc_consistent a lot for odd sized allocations (which is what I > wanted to do). The attached patch changes pci_alloc_consistent to use > kmalloc and pci_map_single to accomplish the same function, with the > added benefit of kmalloc to reduce the amount of wasted memory. Keep in mind we need to use an entire page of TCE space for each allocation and TCE space is severely restricted by LPAR. Check out the pci pool code in drivers/pci/pool.c, will that help your problem? Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From aprasad at in.ibm.com Thu Nov 6 05:55:38 2003 From: aprasad at in.ibm.com (Anil K Prasad) Date: Thu, 6 Nov 2003 00:25:38 +0530 Subject: stack size limit on ppc64 Message-ID: Hi, I worked on Anton's skelton code, and accomodated 64bit instructions.. Here is the patch. Thanks, Anil. *************************************************************************************** diff -uNr linux-official/arch/ppc64/mm/fault.c linux-stack/arch/ppc64/mm/fault.c --- linux-official/arch/ppc64/mm/fault.c 2003-11-04 23:40:06.000000000 -0600 +++ linux-stack/arch/ppc64/mm/fault.c 2003-11-05 11:43:55.000000000 -0600 @@ -52,6 +52,7 @@ extern void die_if_kernel(char *, struct pt_regs *, long); void bad_page_fault(struct pt_regs *, unsigned long); void do_page_fault(struct pt_regs *, unsigned long, unsigned long); +static int store_updates_sp(struct pt_regs *); #ifdef CONFIG_PPCDBG extern unsigned long get_srr0(void); @@ -119,6 +120,41 @@ goto bad_area; } vma = find_vma_prev(mm, address, &prev_vma); + /* + * N.B. The rs6000/xcoff ABI allows programs to access up to + * a few hundred bytes below the stack pointer. + * The kernel signal delivery code writes up to about 1.5kB + * below the stack pointer (r1) before decrementing it. + * The exec code can write slightly over 640kB to the stack + * before setting the user r1. Thus we allow the stack to + * expand to 1MB without further checks. + */ + if(address + 0x100000 < vma->vm_end){ + /* get user regs even if this fault is in kernel mode */ + struct pt_regs *uregs = current->thread.regs; + if (uregs == NULL) + goto bad_area; + + /* + * A user-mode access to an address a long way below + * the stack pointer is only valid if the instruction + * is one which would update the stack pointer to the + * address accessed if the instruction completed, + * i.e. either stwu rs,n(r1) or stwux rs,r1,rb + * (or the byte, halfword, float or double forms). + * + * If we don't check this then any write to the area + * between the last mapped region and the stack will + * expand the stack rather than segfaulting. + * Also lowest valid address is 288 bytes less than the + * value in the stack pointer( according to ppc64 ABI) + */ + + if (address + 288 < uregs->gpr[1] && + (!user_mode(regs) || !store_updates_sp(regs))) + PPCDBG(PPCDBG_MM, "\tdo_page_fault: huge stack\n"); + goto bad_area; + } if (expand_stack(vma, address, prev_vma)) { PPCDBG(PPCDBG_MM, "\tdo_page_fault: expand_stack\n"); goto bad_area; @@ -242,3 +278,66 @@ panic("kernel access of bad area pc %lx lr %lx address %lX tsk %s/%d", regs->nip,regs->link,address,current->comm,current->pid); } +/* + * Check wether the instruction at regs->nip is a store using an + * update addressing form which will update sp + */ + +static int +store_updates_sp(struct pt_regs *regs) +{ + union { + unsigned int inst; + union { + struct { + unsigned int opcode:6; + unsigned int rs:5; + unsigned int ra:5; + unsigned int d:16; + }d_form; + struct { + unsigned int opcode:6; + unsigned int rs:5; + unsigned int ra:5; + unsigned int rb:5; + unsigned int xcode:10; + unsigned int resv:1; + }index_form; + }inst_form; + }inst; + if(get_user(inst.inst, (unsigned int*)regs->nip)) + return 0; + /* + * if RA != sp + */ + if(inst.inst_form.d_form.ra != 1) + return 0; + /* + * check for store update instructions + */ + switch(inst.inst_form.d_form.opcode) { + case 37: /* stwu */ + case 39: /* stbu */ + case 45: /* sthu */ + case 53: /* stfsu */ + case 55: /* stfdu */ + case 62: /* stdu */ + return 1; + case 31: + switch(inst.inst_form.index_form.xcode) { + case 181: /* stwux */ + case 183: /* stbux */ + case 247: /* stdux */ + case 439: /* sthux */ + case 695: /* stfsux */ + case 759: /* stfdux */ + return 1; + default: + return 0; + } + default: + return 0; + + } + return 0; +} ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Thu Nov 6 06:26:26 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Wed, 5 Nov 2003 13:26:26 -0600 Subject: [Bk-commit] bk commit URLs In-Reply-To: <20031104204720.14CD824064@source.scl.ameslab.gov> Message-ID: On Tuesday, Nov 4, 2003, at 14:47 US/Central, ppc64 at source.scl.ameslab.gov wrote: > full patch URL: > http://source.scl.ameslab.gov:14690//linux-2.5/patch at 1.1360 > > ChangeSet > 1.1360 03/11/04 14:37:06 engebret at brule.rchland.ibm.com +6 -0 > Initial round of code to add shared processor support into 2.6. > This adds h_call interfaces, paca/VPA fields, and vpa register. The "full URL" in the bk commit messages is incorrect. From http://source.scl.ameslab.gov:14690//linux-2.5/ChangeSet at - 1d?nav=index.html we can see that the changeset described has been (re)numbered 1.1359.1.1, rather than 1.1360. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From brking at us.ibm.com Thu Nov 6 06:42:32 2003 From: brking at us.ibm.com (Brian King) Date: Wed, 05 Nov 2003 13:42:32 -0600 Subject: [PATCH] pci_alloc_consistent memory conservation References: <3FA2ED6F.8080609@us.ibm.com> <20031105163458.GB19104@krispykreme> Message-ID: <3FA952A8.2000703@us.ibm.com> > >>Currently pci_alloc_consistent calls __get_free_pages to allocate >>memory. This results in a lot of wasted memory if people call >>pci_alloc_consistent a lot for odd sized allocations (which is what I >>wanted to do). The attached patch changes pci_alloc_consistent to use >>kmalloc and pci_map_single to accomplish the same function, with the >>added benefit of kmalloc to reduce the amount of wasted memory. > > > Keep in mind we need to use an entire page of TCE space for each > allocation and TCE space is severely restricted by LPAR. Good point. I didn't think about that. > Check out the pci pool code in drivers/pci/pool.c, will that help > your problem? I think it will. Thanks -- Brian King eServer Storage I/O IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From boutcher at us.ibm.com Thu Nov 6 13:59:39 2003 From: boutcher at us.ibm.com (David Boutcher) Date: Wed, 5 Nov 2003 20:59:39 -0600 Subject: [2.4 Patch] Fix to get all sysrq-T output on iSeries virtual console In-Reply-To: <3FA6BD8F.8000106@redhat.com> Message-ID: owner-linuxppc64-dev at lists.linuxppc.org wrote on 11/03/2003 02:41:51 PM: > When debugging iSeries problems, I've been frustrated sometimes when I > find I cannot get all of the sysrq-t output on my console. Usually data > for only about 5 or 6 tasks prints out and that is it. Yes, there is a trick where if you type CTRL-X CTRL-X on the console you can frequently see more output. > My final idea was to not service sysrq interrupts from within the > critical area of the interrupt service routine. Instead, I borrowed a > byte from an unused field within the paca and I cache the requested > sysrq key there. Then, on the way out of the ISR, this field is checked. > If it contains a non-zero character, handle_sysrq is called with the > character to service the request. > > I'm sure this patch is controversial and perhaps too much of a hack, but > it does get the job done in those critical situations where you > absolutely need the output of sysrq-t to debug a soft hang. I think this should be OK...one of the issues this handles is the case where there are other events (disk ops, etc.) that need to be handled. Trying to handle the acks inside just the sysreq code would have got you really messed up. One issue with the PACA is that it is architected by the underlying system, so you may get bitten by using an unused byte, either because it gets used one day, or because there is some undefined use :-) But you can probably get away with it. Dave Boutcher IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Nov 6 14:54:14 2003 From: anton at samba.org (Anton Blanchard) Date: Thu, 6 Nov 2003 14:54:14 +1100 Subject: [2.4 Patch] Fix to get all sysrq-T output on iSeries virtual console In-Reply-To: References: <3FA6BD8F.8000106@redhat.com> Message-ID: <20031106035414.GB6910@krispykreme> > One issue with the PACA is that it is architected by the underlying system, > so you may get bitten by using an unused byte, either because it gets used > one day, or because there is some undefined use :-) But you can probably > get away with it. Also in 2.6 it would be nice to move all fields in the paca that arent required by the system into per cpu data. Hopefully one day we get more intelligent and allocate per cpu data using node local memory. There are some things that cant go into per cpu data, mainly stuff that needs to be accessed in real mode (exception stack, rtas_args struct etc). It would be good to clearly differentiate between stuff in the paca that we need for the hypervisor vs stuff just there so we can get at it in real mode. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Thu Nov 6 16:11:32 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 06 Nov 2003 16:11:32 +1100 Subject: stack size limit on ppc/ppc64 In-Reply-To: <20031105022443.GG1843@krispykreme> References: <20031105022443.GG1843@krispykreme> Message-ID: <1068095491.692.188.camel@gaston> On Wed, 2003-11-05 at 13:24, Anton Blanchard wrote: > Hi Anil, > > > I have code which shouldn't work, working under ppc64 linux. > > Sounds like a feature to me :) > > > Theoretically, above code should cause segmentation violation. But on ppc64 > > linux, it puts variable 'p ' in stack segment, I looked at /proc/pid/maps, > > and there was huge memory area from 0x50000000 ---> 0xFFFFF000. > > I guess technically its a bug, Paul has fixed it in ppc32. I attached > the relevant parts of his fix (it comes from arch/ppc/kernel/fault.c) It is a bug and not for the reason you may think ;) I fixed it in ppc32 a while by simply preventing reads from growing the stack, paulus further improved my fix by doing the writes case with the insn checking. The reason why this "feature" can really bite you is things like sys_mount() which will blindly copy_from_user a whole page starting at the pointer you pass for the arguments, whatever the actual string length is. If that string you pass happens to be near the top of your highest non-stack VMA, the kernel will happily fault in the byte next to that limit within the copy_from_user and so grow down your stack all the way down to your heap, effectively preventing any further memory allocations. This has caused me plenty of random mount() failures, on embedded platforms typically, using small sized mount() implementations or things llike busybox. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Thu Nov 6 16:13:01 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 06 Nov 2003 16:13:01 +1100 Subject: stack size limit on ppc/ppc64 In-Reply-To: <20031105130553.GG2506@bubble.sa.bigpond.net.au> References: <20031105110859.GE2506@bubble.sa.bigpond.net.au> <20031105130553.GG2506@bubble.sa.bigpond.net.au> Message-ID: <1068095581.680.191.camel@gaston> On Thu, 2003-11-06 at 00:05, Alan Modra wrote: > On Wed, Nov 05, 2003 at 01:09:07PM +0100, Segher Boessenkool wrote: > > That is if you are assuming one of the current ABIs; > > Of course. I thought the idea was to distinguish stack frame allocation > from other wild pointer writes. > > > It's not the kernel's job to enforce an ABI. > > There are a lot of things that a binary file needs to get right > according to the ABI before it will run. Try running stuff from > /dev/mouse. :) Heh, well ;) I think that Segher comment may actually make sense with things like JITs, DR emulators etc... those may end up doing wild things of that kind... But we'll figure that out soon enough ;) Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From lxiep at us.ibm.com Fri Nov 7 08:31:20 2003 From: lxiep at us.ibm.com (Linda Xie) Date: 06 Nov 2003 15:31:20 -0600 Subject: [PATCH] RPA PCI Hot Plug Controller Driver--rpaphp.patch Message-ID: <1068154287.1031.60.camel@ibm-yvb14cujqjl-udp14382399uds.austin.ibm.com> Hi All, Here is the patch again with some changes I made based on comments/suggestions from last post. It against 2.6.0-test9. Unless there is an objection, I am planing to send it to Mike Wolf by next Wednesday (to be included into Ameslab 2.6). Thanks, Linda -------------- next part -------------- A non-text attachment was scrubbed... Name: rpaphp.patch Type: text/x-patch Size: 34534 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031106/59d99e8e/attachment.bin From moilanen at austin.ibm.com Sat Nov 8 00:05:43 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 07 Nov 2003 07:05:43 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: References: Message-ID: <1068210343.21219.17.camel@tin.ibm.com > This is a port of the nvram buffering/error logging code from 2.4 to 2.6. I should also note that I included moving /proc/rtas to /proc/ppc64/rtas. It's a larger patch, but the 2.4 code has been well tested by the test teams. I would like to push this to ameslab as soon as possible, so please send comments. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-nvram-errlog-1.patch.bz2 Type: application/x-bzip Size: 11941 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031107/67ccfb62/attachment.bin From leigh at solinno.co.uk Sat Nov 8 02:10:59 2003 From: leigh at solinno.co.uk (Leigh Brown) Date: Fri, 7 Nov 2003 15:10:59 -0000 (GMT) Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <1068210343.21219.17.camel@tin.ibm.com > References: <1068210343.21219.17.camel@tin.ibm.com > Message-ID: <32864.80.7.99.14.1068217859.squirrel@www.solinno.co.uk> Jake Moilanen said: > This is a port of the nvram buffering/error logging code from 2.4 to > 2.6. I should also note that I included moving /proc/rtas to > /proc/ppc64/rtas. [...] When I asked about this last week nobody could come up with a sensible reason why rtas should not remain as /proc/rtas, for compatibility with ppc32. Regards, Leigh. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Nov 8 03:36:13 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 7 Nov 2003 10:36:13 -0600 Subject: syscall table patch Message-ID: <20031107103613.A28940@forte.austin.ibm.com> Can someone apply the patch below, harmonizing the syscall table in misc.S with the #define syscalls in unistd.h ? Its a 'trivial' patch; what it really does is to make it easier for other non-mainstream kernel extensions to add new system calls 'cleanly' to include/asm/unistd.h and arch/ppc64/kernel/misc.S without making an ugly hash of things. Since several people commented about having a syscall table written in C, I also append a 'sample' implementation in C. There are two or three things to note about this: -- table initialization is now done at runtime, rather than at compile time. -- C compiler wants function prototypes to be really happy, and this patch doesn't provide them. Does anybody want them? The C-code patch is incomplete right now, its just for flavour, if the maintainers want such a thing & will include it, let me know & I can finish work on it. --linas p.s. the patch is against stock marcello-2.4.22 although I think it should apply cleanly to just about any recent kernel. -------------- next part -------------- Index: arch/ppc64/kernel/misc.S =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kernel/misc.S,v retrieving revision 1.1.1.2 diff -u -p -u -p -r1.1.1.2 misc.S --- arch/ppc64/kernel/misc.S 23 Oct 2003 20:07:43 -0000 1.1.1.2 +++ arch/ppc64/kernel/misc.S 7 Nov 2003 16:14:01 -0000 @@ -593,6 +593,7 @@ _GLOBAL(arch_kernel_thread) #ifdef CONFIG_BINFMT_ELF32 /* Why isn't this a) automatic, b) written in 'C'? */ +/* because a) its not regular enough, b) C function protoypes needed */ .balign 8 _GLOBAL(sys_call_table32) .llong .sys_ni_syscall /* 0 - old "setup()" system call */ @@ -819,8 +820,35 @@ _GLOBAL(sys_call_table32) .llong .sys_fremovexattr /* 220 */ .llong .sys_futex #endif + .llong .sys_ni_syscall /* 208 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 210 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 215 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 220 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 225 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 230 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall .llong .sys_perfmonctl /* Put this here for now ... */ - .rept NR_syscalls-222 + .rept NR_syscalls-235 .llong .sys_ni_syscall .endr #endif @@ -1050,7 +1078,34 @@ _GLOBAL(sys_call_table) .llong .sys_fremovexattr /* 220 */ .llong .sys_futex #endif + .llong .sys_ni_syscall /* 208 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 210 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 215 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 220 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 225 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall /* 230 */ + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall + .llong .sys_ni_syscall .llong .sys_perfmonctl /* Put this here for now ... */ - .rept NR_syscalls-222 + .rept NR_syscalls-235 .llong .sys_ni_syscall .endr -------------- next part -------------- /* * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ #include void * sys_call_table[x_NR_num_syscalls]; #ifdef CONFIG_BINFMT_ELF32 void * sys_call_table32[x_NR_num_syscalls]; #endif /* System Call Table Entry */ #ifdef CONFIG_BINFMT_ELF32 #define SCTE(name,handler32,handler64) \ extern void handler32(void); \ extern void handler64(void); \ sys_call_table_32[__NR_##name] = handler32; \ sys_call_table [__NR_##name] = handler64; #else #define SCTE(name,handler32,handler64) \ extern void handler64(void); \ sys_call_table [__NR_##name] = handler64; #endif void setup_syscall_table (void) { #define __NR_noop 0 SCTE(noop, sys_ni_syscall, sys_ni_syscall) SCTE(exit, sys32_exit, sys_exit) SCTE(fork, sys32_fork, sys_fork) SCTE(read, sys_read, sys_read) SCTE(write, sys_write, sys_write) SCTE(open, sys32_open, sys_open) SCTE(close, sys_close, sys_close) SCTE(waitpid, sys32_waitpid, sys_waitpid) SCTE(creat, sys32_creat, sys_creat) SCTE(link, sys_link, sys_link) SCTE(unlink, sys_unlink, sys_unlink) SCTE(execve, sys32_execve, sys_execve) SCTE(chdir, sys_chdir, sys_chdir) SCTE(time, sys32_time, sys64_time) SCTE(mknod, sys_mknod, sys_mknod) SCTE(chmod, sys_chmod, sys_chmod) SCTE(lchown, sys_lchown, sys_lchown) SCTE(break, sys_ni_syscall, sys_ni_syscall) SCTE(oldstat, sys32_stat, sys_stat) SCTE(lseek, sys32_lseek, sys_lseek) SCTE(getpid, sys_getpid , sys_getpid) SCTE(mount, sys32_mount, sys_mount) SCTE(umount, sys_oldumount, sys_ni_syscall) SCTE(setuid, sys_setuid, sys_setuid) SCTE(getuid, sys_getuid, sys_getuid) SCTE(stime, ppc64_sys32_stime, ppc64_sys_stime) SCTE(ptrace, sys32_ptrace, sys_ptrace) SCTE(alarm, sys_alarm, sys_alarm) SCTE(oldfstat, sys32_fstat, sys_fstat) SCTE(pause, sys32_pause, sys_pause) SCTE(utime, sys32_utime, sys_utime) SCTE(stty, sys_ni_syscall, sys_ni_syscall) SCTE(gtty, sys_ni_syscall, sys_ni_syscall) SCTE(access, sys32_access, sys_access) SCTE(nice, sys32_nice, sys_nice) SCTE(ftime, sys_ni_syscall, sys_ni_syscall) SCTE(sync, sys_sync, sys_sync) SCTE(kill, sys32_kill, sys_kill) SCTE(rename, sys_rename, sys_rename) SCTE(mkdir, sys32_mkdir, sys_mkdir) SCTE(rmdir, sys_rmdir, sys_rmdir) SCTE(dup, sys_dup, sys_dup) SCTE(pipe, sys_pipe, sys_pipe) SCTE(times, sys32_times, sys_times) SCTE(prof, sys_ni_syscall, sys_ni_syscall) SCTE(brk, sys_brk, sys_brk) SCTE(setgid, sys_setgid, sys_setgid) SCTE(getgid, sys_getgid, sys_getgid) SCTE(signal, sys_signal, sys_signal) SCTE(geteuid, sys_geteuid, sys_geteuid) SCTE(getegid, sys_getegid, sys_getegid) SCTE(acct, sys_acct, sys_acct) SCTE(umount2, sys32_umount, sys_umount) SCTE(lock, sys_ni_syscall, sys_ni_syscall) SCTE(ioctl, sys32_ioctl, sys_ioctl) SCTE(fcntl, sys32_fcntl, sys_fcntl) SCTE(mpx, sys_ni_syscall, sys_ni_syscall) SCTE(setpgid, sys32_setpgid, sys_setpgid) SCTE(ulimit, sys_ni_syscall, sys_ni_syscall) SCTE(oldolduname, sys_olduname, sys_ni_syscall) SCTE(umask, sys32_umask, sys_umask) SCTE(chroot, sys_chroot, sys_chroot) SCTE(ustat, sys_ustat, sys_ustat) SCTE(dup2, sys_dup2, sys_dup2) SCTE(getppid, sys_getppid, sys_getppid) SCTE(getpgrp, sys_getpgrp, sys_getpgrp) SCTE(setsid, sys_setsid, sys_setsid) SCTE(sigaction, sys32_sigaction, sys_sigaction) SCTE(sgetmask, sys_sgetmask, sys_sgetmask) SCTE(ssetmask, sys32_ssetmask, sys_ssetmask) SCTE(setreuid, sys_setreuid, sys_setreuid) SCTE(setregid, sys_setregid, sys_setregid) SCTE(sigsuspend, sys_sigsuspend, sys_sigsuspend) SCTE(sigpending, sys32_sigpending, sys_sigpending) SCTE(sethostname, sys32_sethostname, sys_sethostname) SCTE(setrlimit, sys32_setrlimit, sys_setrlimit) SCTE(getrlimit, sys32_old_getrlimit, sys_ni_syscall) SCTE(getrusage, sys32_getrusage, sys_getrusage) SCTE(gettimeofday, sys32_gettimeofday, sys_gettimeofday) SCTE(settimeofday, sys32_settimeofday, sys_settimeofday) SCTE(getgroups, sys32_getgroups, sys_getgroups) SCTE(setgroups, sys32_setgroups, sys_setgroups) SCTE(select, sys_ni_syscall, sys_ni_syscall) SCTE(symlink, sys_symlink, sys_symlink) SCTE(oldlstat, sys32_lstat, sys_lstat) SCTE(readlink, sys32_readlink, sys_readlink) SCTE(uselib, sys_uselib, sys_uselib) SCTE(swapon, sys32_swapon, sys_swapon) SCTE(reboot, sys32_reboot, sys_reboot) SCTE(readdir, old32_readdir, sys_ni_syscall) SCTE(mmap, sys32_mmap, sys_mmap) SCTE(munmap, sys_munmap, sys_munmap) SCTE(truncate, sys_truncate, sys_truncate) SCTE(ftruncate, sys_ftruncate, sys_ftruncate) SCTE(fchmod, sys_fchmod, sys_fchmod) SCTE(fchown, sys_fchown, sys_fchown) SCTE(getpriority, sys32_getpriority, sys_getpriority) SCTE(setpriority, sys32_setpriority, sys_setpriority) SCTE(profil, sys_ni_syscall, sys_ni_syscall) SCTE(statfs, sys32_statfs, sys_statfs) SCTE(fstatfs, sys32_fstatfs, sys_fstatfs) SCTE(ioperm, sys_ioperm, sys_ioperm) SCTE(socketcall, sys32_socketcall, sys_socketcall) SCTE(syslog, sys32_syslog, sys_syslog) SCTE(setitimer, sys32_setitimer, sys_setitimer) SCTE(getitimer, sys32_getitimer, sys_getitimer) SCTE(stat, sys32_newstat, sys_newstat) SCTE(lstat, sys32_newlstat, sys_newlstat) SCTE(fstat, sys32_newfstat, sys_newfstat) SCTE(olduname, sys_uname, sys_uname) SCTE(iopl, sys_ni_syscall, sys_ni_syscall) SCTE(vhangup, sys_vhangup, sys_vhangup) SCTE(idle, sys_ni_syscall, sys_ni_syscall) SCTE(vm86, sys_ni_syscall, sys_ni_syscall) SCTE(wait4, sys32_wait4, sys_wait4) SCTE(swapoff, sys_swapoff, sys_swapoff) SCTE(sysinfo, sys32_sysinfo, sys_sysinfo) SCTE(ipc, sys32_ipc, sys_ipc) SCTE(fsync, sys_fsync, sys_fsync) SCTE(sigreturn, ppc32_sigreturn, ppc64_sigreturn) SCTE(clone, sys32_clone, sys_clone) SCTE(setdomainname, sys32_setdomainname, sys_setdomainname) SCTE(uname, ppc64_newuname, ppc64_newuname) SCTE(modify_ldt, sys_ni_syscall, sys_ni_syscall) SCTE(adjtimex, sys32_adjtimex, sys_adjtimex) SCTE(mprotect, sys_mprotect, sys_mprotect) SCTE(sigprocmask, sys32_sigprocmask, sys_sigprocmask) SCTE(create_module, sys32_create_module, sys_create_module) SCTE(init_module, sys32_init_module, sys_init_module) SCTE(delete_module, sys32_delete_module, sys_delete_module) SCTE(get_kernel_syms, sys32_get_kernel_syms, sys_get_kernel_syms) SCTE(quotactl, sys32_quotactl, sys_quotactl) SCTE(getpgid, sys32_getpgid, sys_getpgid) SCTE(fchdir, sys_fchdir, sys_fchdir) SCTE(bdflush, sys32_bdflush, sys_bdflush) SCTE(sysfs, sys32_sysfs, sys_sysfs) SCTE(personality, sys32_personality, sys_personality) SCTE(afs_syscall, sys_ni_syscall, sys_ni_syscall) SCTE(setfsuid, sys_setfsuid, sys_setfsuid) SCTE(setfsgid, sys_setfsgid, sys_setfsgid) SCTE(_llseek, sys_llseek, sys_llseek) SCTE(getdents, sys32_getdents, sys_getdents) SCTE(_newselect, ppc32_select, sys_select) SCTE(flock, sys_flock, sys_flock) SCTE(msync, sys32_msync, sys_msync) SCTE(readv, sys32_readv, sys_readv) SCTE(writev, sys32_writev, sys_writev) SCTE(getsid, sys32_getsid, sys_getsid) SCTE(fdatasync, sys_fdatasync, sys_fdatasync) SCTE(_sysctl, sys32_sysctl, sys_sysctl) SCTE(mlock, sys_mlock, sys_mlock) SCTE(munlock, sys_munlock, sys_munlock) SCTE(mlockall, sys32_mlockall, sys_mlockall) SCTE(munlockall, sys_munlockall, sys_munlockall) SCTE(sched_setparam, sys32_sched_setparam, sys_sched_setparam) SCTE(sched_getparam, sys32_sched_getparam, sys_sched_getparam) SCTE(sched_setscheduler, sys32_sched_setscheduler, sys_sched_setscheduler) SCTE(sched_getscheduler, sys32_sched_getscheduler, sys_sched_getscheduler) SCTE(sched_yield, sys_sched_yield, sys_sched_yield) SCTE(sched_get_priority_max, sys32_sched_get_priority_max, sys_sched_get_priority_max) SCTE(sched_get_priority_min, sys32_sched_get_priority_min, sys_sched_get_priority_min) SCTE(sched_rr_get_interval, sys32_sched_rr_get_interval, sys_sched_rr_get_interval) SCTE(nanosleep, sys32_nanosleep, sys_nanosleep) SCTE(mremap, sys32_mremap, sys_mremap) SCTE(setresuid, sys_setresuid, sys_setresuid) SCTE(getresuid, sys_getresuid, sys_getresuid) SCTE(query_module, sys32_query_module, sys_query_module) SCTE(poll, sys_poll, sys_poll) SCTE(nfsservctl, sys32_nfsservctl, sys_nfsservctl) SCTE(setresgid, sys_setresgid, sys_setresgid) SCTE(getresgid, sys_getresgid, sys_getresgid) SCTE(prctl, sys32_prctl, sys_prctl) SCTE(rt_sigreturn, ppc32_rt_sigreturn, ppc64_rt_sigreturn) SCTE(rt_sigaction, sys32_rt_sigaction, sys_rt_sigaction) SCTE(rt_sigprocmask, sys32_rt_sigprocmask, sys_rt_sigprocmask) SCTE(rt_sigpending, sys32_rt_sigpending, sys_rt_sigpending) SCTE(rt_sigtimedwait, sys32_rt_sigtimedwait, sys_rt_sigtimedwait) SCTE(rt_sigqueueinfo, sys32_rt_sigqueueinfo, sys_rt_sigqueueinfo) SCTE(rt_sigsuspend, sys32_rt_sigsuspend, sys_rt_sigsuspend) SCTE(pread, sys32_pread, sys_pread) SCTE(pwrite, sys32_pwrite, sys_pwrite) SCTE(chown, sys_chown, sys_chown) SCTE(getcwd, sys_getcwd, sys_getcwd) SCTE(capget, sys_capget, sys_capget) SCTE(capset, sys_capset, sys_capset) SCTE(sigaltstack, sys32_sigaltstack, sys_sigaltstack) SCTE(sendfile, sys32_sendfile, sys_sendfile) SCTE(getpmsg, sys_ni_syscall, sys_ni_syscall) SCTE(putpmsg, sys_ni_syscall, sys_ni_syscall) SCTE(vfork, sys32_vfork, sys_vfork) SCTE(ugetrlimit, sys32_getrlimit, sys_getrlimit) SCTE(readahead, sys32_readahead, sys_readahead) SCTE(mmap2, ppc32_mmap2, sys_ni_syscall) SCTE(truncate64, sys32_truncate64, sys_ni_syscall) SCTE(ftruncate64, sys32_ftruncate64, sys_ni_syscall) SCTE(stat64, sys_stat64, sys_ni_syscall) SCTE(lstat64, sys_lstat64, sys_ni_syscall) SCTE(fstat64, sys_fstat64, sys_ni_syscall) SCTE(pciconfig_read, sys32_pciconfig_read, sys_pciconfig_read) SCTE(pciconfig_write, sys32_pciconfig_write, sys_pciconfig_write) SCTE(pciconfig_iobase, sys_pciconfig_iobase, sys_pciconfig_iobase) SCTE(multiplexer, sys_ni_syscall, sys_ni_syscall) SCTE(getdents64, sys_getdents64, sys_getdents64) SCTE(pivot_root, sys_pivot_root, sys_pivot_root) SCTE(fcntl64, sys32_fcntl64, sys_ni_syscall) SCTE(madvise, sys_madvise, sys_madvise) SCTE(mincore, sys_mincore, sys_mincore) SCTE(gettid, sys_gettid, sys_gettid) SCTE(tkill, sys_ni_syscall, sys_ni_syscall) SCTE(setxattr, sys_ni_syscall, sys_ni_syscall) SCTE(lsetxattr, sys_ni_syscall, sys_ni_syscall) SCTE(fsetxattr, sys_ni_syscall, sys_ni_syscall) SCTE(getxattr, sys_ni_syscall, sys_ni_syscall) SCTE(lgetxattr, sys_ni_syscall, sys_ni_syscall) SCTE(fgetxattr, sys_ni_syscall, sys_ni_syscall) SCTE(listxattr, sys_ni_syscall, sys_ni_syscall) SCTE(llistxattr, sys_ni_syscall, sys_ni_syscall) SCTE(flistxattr, sys_ni_syscall, sys_ni_syscall) SCTE(removexattr, sys_ni_syscall, sys_ni_syscall) SCTE(lremovexattr, sys_ni_syscall, sys_ni_syscall) SCTE(fremovexattr, sys_ni_syscall, sys_ni_syscall) SCTE(futex, sys_ni_syscall, sys_ni_syscall) SCTE(sched_setaffinity, sys_ni_syscall, sys_ni_syscall) SCTE(sched_getaffinity, sys_ni_syscall, sys_ni_syscall) SCTE(security, sys_ni_syscall, sys_ni_syscall) SCTE(tuxcall, sys_ni_syscall, sys_ni_syscall) SCTE(sendfile64, sys_ni_syscall, sys_ni_syscall) SCTE(io_setup, sys_ni_syscall, sys_ni_syscall) SCTE(io_destroy, sys_ni_syscall, sys_ni_syscall) SCTE(io_getevents, sys_ni_syscall, sys_ni_syscall) SCTE(io_submit, sys_ni_syscall, sys_ni_syscall) SCTE(io_cancel, sys_ni_syscall, sys_ni_syscall) SCTE(alloc_hugepages, sys_ni_syscall, sys_ni_syscall) SCTE(free_hugepages, sys_ni_syscall, sys_ni_syscall) SCTE(exit_group, sys_ni_syscall, sys_ni_syscall) SCTE(perfmonctl, sys_perfmonctl, sys_perfmonctl) } From hollisb at us.ibm.com Sat Nov 8 03:50:39 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 7 Nov 2003 10:50:39 -0600 Subject: syscall table patch In-Reply-To: <20031107103613.A28940@forte.austin.ibm.com> Message-ID: <844F4B02-1142-11D8-A79D-000A95A0560C@us.ibm.com> On Friday, Nov 7, 2003, at 10:36 US/Central, linas at austin.ibm.com wrote: > > -- table initialization is now done at runtime, rather than at > compile time. Why? -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Nov 8 04:27:22 2003 From: moilanen at austin.ibm.com (moilanen at austin.ibm.com) Date: Fri, 7 Nov 2003 11:27:22 -0600 (CST) Subject: [PATCH] nvram buffering/error loggin port to 2.6 Message-ID: I'm sorry if you receive this email more then once. I've been having email issues the past couple days. This is a port of the nvram buffering/error logging code from 2.4 to 2.6. I should also note that it includes moving /proc/rtas to /proc/ppc64/rtas. It's a large patch, but the 2.4 code has been well tested by the test teams. I would like to push this to ameslab as soon as possible, so please send comments. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-nvram-errlog-1.patch.gz Type: application/octet-stream Size: 12955 bytes Desc: Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031107/7c757f48/attachment.obj From linas at austin.ibm.com Sat Nov 8 05:48:17 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 7 Nov 2003 12:48:17 -0600 Subject: syscall table patch In-Reply-To: <844F4B02-1142-11D8-A79D-000A95A0560C@us.ibm.com>; from hollisb@us.ibm.com on Fri, Nov 07, 2003 at 10:50:39AM -0600 References: <20031107103613.A28940@forte.austin.ibm.com> <844F4B02-1142-11D8-A79D-000A95A0560C@us.ibm.com> Message-ID: <20031107124817.A29028@forte.austin.ibm.com> On Fri, Nov 07, 2003 at 10:50:39AM -0600, Hollis Blanchard wrote: > On Friday, Nov 7, 2003, at 10:36 US/Central, linas at austin.ibm.com wrote: > > > > -- table initialization is now done at runtime, rather than at > > compile time. > > Why? two reasons: First reason: Because I wanted to define a macro to initialize the 32-bit and 64-bit tables at the same time, and couldn't figure out how to do this at compile time. #define SCTE(slot,handler32,handler64) ... SCTE(execve, sys32_execve, sys_execve) To do a compile time table, one needs to assemble something like (void (*)()) sys_call_table[NR_syscalls] = { ..., sys_execve, ... } and so one can't initialize 32 and 64 bit tables with one macro. Second reason: If one does a simple C array such as the following: (void (*)()) sys_call_table[NR_syscalls] = { ..., sys_execve, ... } Then its real easy to make off-by-one errors by accidentally misplacing an initializer. This makes it no better (and some ways worse) than the current assembler implementation (which is just fine by me, except for the occasional off-by-one error that one stupidly commits.) The only way to avoid this in C starts getting complicated: struct { ... int execve_entry (char *, char **, char **); ... } syscall_table = { ... execve_entry: sys32_execve, ... }; and even that is prone to misalignment with the __NR_execve value stored in unistd.h Which is why the safest thing seems to be a runtime syscall_table[__NR_execve] = sys32_execve; So yuck, too long a reply ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Nov 8 06:02:54 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 07 Nov 2003 13:02:54 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <32864.80.7.99.14.1068217859.squirrel@www.solinno.co.uk> References: <1068210343.21219.17.camel@tin.ibm.com > <32864.80.7.99.14.1068217859.squirrel@www.solinno.co.uk> Message-ID: <1068231774.1314.22.camel@tin.ibm.com > > When I asked about this last week nobody could come up with a sensible > reason why rtas should not remain as /proc/rtas, for compatibility with > ppc32. > As Dave Engebretsen stated it was originally done to "be consistent with where the rest of the ppc64 /proc interfaces have been put". Many of the ppc64-utils are currently expecting these interfaces in /proc/ppc64/rtas. It's would be nice not to break utilities when a customer or user moves up to 2.6. I have not looked, but what are the utilities in ppc32 that use /proc/rtas? I personally would rather stay compatible between releases then architectures. Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Sat Nov 8 06:06:18 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 7 Nov 2003 13:06:18 -0600 Subject: syscall table patch In-Reply-To: <20031107124817.A29028@forte.austin.ibm.com> Message-ID: <7782ECF8-1155-11D8-A79D-000A95A0560C@us.ibm.com> On Friday, Nov 7, 2003, at 12:48 US/Central, linas at austin.ibm.com wrote: > > First reason: > Because I wanted to define a macro to initialize the 32-bit and > 64-bit > tables at the same time, and couldn't figure out how to do this at > compile time. [snip] > and so one can't initialize 32 and 64 bit tables with one macro. I don't see why that's important. They are different tables, after all... > Second reason: > If one does a simple C array such as the following: > (void (*)()) sys_call_table[NR_syscalls] = { ..., sys_execve, ... } > > Then its real easy to make off-by-one errors by accidentally > misplacing an initializer. [snip] This may be a C99 thing, but: #include void main(void) { int array[] = { [12] = 1, }; printf("%i\n", array[0]); printf("%i\n", array[12]); } -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Sat Nov 8 06:10:25 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 7 Nov 2003 13:10:25 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <1068231774.1314.22.camel@tin.ibm.com > Message-ID: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> On Friday, Nov 7, 2003, at 13:02 US/Central, Jake Moilanen wrote: > >> When I asked about this last week nobody could come up with a sensible >> reason why rtas should not remain as /proc/rtas, for compatibility >> with >> ppc32. > > As Dave Engebretsen stated it was originally done to "be consistent > with > where the rest of the ppc64 /proc interfaces have been put". > > Many of the ppc64-utils are currently expecting these interfaces in > /proc/ppc64/rtas. It's would be nice not to break utilities when a > customer or user moves up to 2.6. I remember someone suggesting a symlink... > I have not looked, but what are the utilities in ppc32 that use > /proc/rtas? You mean the ones we're writing won't work on ppc32 RS/6000? Only slightly joking... > I personally would rather stay compatible between releases then > architectures. Let's do both with a symlink. Architectures because it's "right", and releases because we need to cover not-quite-right decisions in the past. ;) -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Nov 8 06:14:00 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 7 Nov 2003 13:14:00 -0600 Subject: syscall table patch In-Reply-To: <7782ECF8-1155-11D8-A79D-000A95A0560C@us.ibm.com>; from hollisb@us.ibm.com on Fri, Nov 07, 2003 at 01:06:18PM -0600 References: <20031107124817.A29028@forte.austin.ibm.com> <7782ECF8-1155-11D8-A79D-000A95A0560C@us.ibm.com> Message-ID: <20031107131400.C29030@forte.austin.ibm.com> On Fri, Nov 07, 2003 at 01:06:18PM -0600, Hollis Blanchard wrote: > On Friday, Nov 7, 2003, at 12:48 US/Central, linas at austin.ibm.com wrote: > > > > First reason: > > Because I wanted to define a macro to initialize the 32-bit and > > 64-bit > > tables at the same time, and couldn't figure out how to do this at > > compile time. > [snip] > > and so one can't initialize 32 and 64 bit tables with one macro. > > I don't see why that's important. They are different tables, after > all... Hmm, well, I thought it might be handy to be able to read the two entries, side-by-side. > This may be a C99 thing, but: > > #include > void main(void) { > int array[] = { [12] = 1, }; > printf("%i\n", array[0]); > printf("%i\n", array[12]); > } Ohh, good one, I didn't know that one. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Sat Nov 8 06:28:55 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 07 Nov 2003 13:28:55 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> References: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> Message-ID: <1068233335.1319.38.camel@tin.ibm.com > > > Many of the ppc64-utils are currently expecting these interfaces in > > /proc/ppc64/rtas. It's would be nice not to break utilities when a > > customer or user moves up to 2.6. > > I remember someone suggesting a symlink... > Let's do both with a symlink. Architectures because it's "right", and > releases because we need to cover not-quite-right decisions in the > past. ;) I think your right. Nathan had a good idea to do the symlink. I will add this into the patch. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Nov 8 07:27:36 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 8 Nov 2003 07:27:36 +1100 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <1068231774.1314.22.camel@tin.ibm.com > References: <1068210343.21219.17.camel@tin.ibm.com> <32864.80.7.99.14.1068217859.squirrel@www.solinno.co.uk> <1068231774.1314.22.camel@tin.ibm.com > Message-ID: <20031107202736.GA3440@krispykreme> > I personally would rather stay compatible between releases then > architectures. Having said that, maintaining compatibility between ppc32 and ppc64 _is_ important. I got a message from one of the java guys that they will be using /proc/cpuinfo not /proc/ppc64/systemcfg for cpu probing. They need compatibility between ppc32 and ppc64 and writing two versions of cpu probing etc doesnt make sense. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Sat Nov 8 08:35:31 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 7 Nov 2003 15:35:31 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <20031107202736.GA3440@krispykreme> Message-ID: <4FD0A59F-116A-11D8-A79D-000A95A0560C@us.ibm.com> On Friday, Nov 7, 2003, at 14:27 US/Central, Anton Blanchard wrote: > > Having said that, maintaining compatibility between ppc32 and ppc64 > _is_ > important. Yeah, I think that's a very cool thing about our architecture, both from a technology and an end-user point of view. > I got a message from one of the java guys that they will be using > /proc/cpuinfo not /proc/ppc64/systemcfg for cpu probing. They need > compatibility between ppc32 and ppc64 and writing two versions of cpu > probing etc doesnt make sense. Good for them! I'm pleasantly surprised to hear it. :) Sometimes it's very easy for us to focus only on ppc64, when only trivial modifications would extend the functionality of ppc32 as well. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From roland at topspin.com Sat Nov 8 10:34:41 2003 From: roland at topspin.com (Roland Dreier) Date: 07 Nov 2003 15:34:41 -0800 Subject: syscall table patch In-Reply-To: <7782ECF8-1155-11D8-A79D-000A95A0560C@us.ibm.com> References: <7782ECF8-1155-11D8-A79D-000A95A0560C@us.ibm.com> Message-ID: <52vfpvoir2.fsf@topspin.com> Hollis> This may be a C99 thing, but: Hollis> #include Hollis> void main(void) { Hollis> int array[] = { [12] = 1, }; Hollis> printf("%i\n", array[0]); Hollis> printf("%i\n", array[12]); Hollis> } I'm not sure which version of the C spec added the "[XXX] = YYY" style of array initializer, but it is supported by gcc versions at least as far back as 2.95. So it is definitely OK to use in kernel code. - Roland ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Nov 8 11:23:30 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 8 Nov 2003 11:23:30 +1100 Subject: 2.6 POWER3/POWER4 compilation fix Message-ID: <20031108002330.GC3440@krispykreme> Hi, Heres a quick fix I threw together from patches from Hollis and Segher. I think this is worthy of sending to Linus now, considering all the build problems we've been having lately. Thoughts? Anton ===== arch/ppc64/Kconfig 1.31 vs edited ===== --- 1.31/arch/ppc64/Kconfig Fri Sep 26 14:04:08 2003 +++ edited/arch/ppc64/Kconfig Fri Nov 7 05:29:29 2003 @@ -72,6 +72,14 @@ bool default y +config POWER4_ONLY + bool "Optimize for POWER4" + default n + ---help--- + Cause the compiler to optimize for POWER4 processors. The resulting + binary will not work on POWER3 or RS64 processors when compiled with + binutils 2.15 or later. + config SMP bool "Symmetric multi-processing support" ---help--- ===== arch/ppc64/Makefile 1.34 vs edited ===== --- 1.34/arch/ppc64/Makefile Mon Oct 20 14:32:12 2003 +++ edited/arch/ppc64/Makefile Fri Nov 7 05:29:29 2003 @@ -17,8 +17,13 @@ LDFLAGS := -m elf64ppc LDFLAGS_vmlinux := -Bstatic -e $(KERNELLOAD) -Ttext $(KERNELLOAD) -CFLAGS += -msoft-float -pipe -Wno-uninitialized -mminimal-toc \ - -mcpu=power4 +CFLAGS += -msoft-float -pipe -Wno-uninitialized -mminimal-toc + +ifeq ($(CONFIG_POWER4_ONLY),y) +CFLAGS += -mcpu=power4 +else +CFLAGS += -mtune=power4 +endif have_zero_bss := $(shell if $(CC) -fno-zero-initialized-in-bss -S -o /dev/null -xc /dev/null > /dev/null 2>&1; then echo y; else echo n; fi) ===== arch/ppc64/kernel/pSeries_htab.c 1.9 vs edited ===== --- 1.9/arch/ppc64/kernel/pSeries_htab.c Sat Jun 7 11:19:27 2003 +++ edited/arch/ppc64/kernel/pSeries_htab.c Fri Nov 7 06:14:29 2003 @@ -350,12 +350,8 @@ if ((cur_cpu_spec->cpu_features & CPU_FTR_TLBIEL) && !large && local) { asm volatile("ptesync":::"memory"); - for (i = 0; i < j; i++) { - asm volatile("\n\ - clrldi %0,%0,16\n\ - tlbiel %0" - : : "r" (batch->vaddr[i]) : "memory" ); - } + for (i = 0; i < j; i++) + __tlbiel(batch->vaddr[i]); asm volatile("ptesync":::"memory"); } else { @@ -364,12 +360,8 @@ asm volatile("ptesync":::"memory"); - for (i = 0; i < j; i++) { - asm volatile("\n\ - clrldi %0,%0,16\n\ - tlbie %0" - : : "r" (batch->vaddr[i]) : "memory" ); - } + for (i = 0; i < j; i++) + __tlbie(batch->vaddr[i]); asm volatile("eieio; tlbsync; ptesync":::"memory"); ===== include/asm-ppc64/mmu.h 1.8 vs edited ===== --- 1.8/include/asm-ppc64/mmu.h Sun Sep 7 11:24:09 2003 +++ edited/include/asm-ppc64/mmu.h Fri Nov 7 06:24:26 2003 @@ -202,26 +202,41 @@ return (vsid & 0x7fffffffff) ^ page; } -static inline void _tlbie(unsigned long va, int large) +static inline void __tlbie(unsigned long va, int large) { - asm volatile("ptesync": : :"memory"); + /* clear top 16 bits, non SLS segment */ + va &= ~(0xffffULL << 48); - if (large) { - asm volatile("clrldi %0,%0,16\n\ - tlbie %0,1" : : "r"(va) : "memory"); - } else { - asm volatile("clrldi %0,%0,16\n\ - tlbie %0,0" : : "r"(va) : "memory"); - } + if (large) + asm volatile("tlbie %0,1" : : "r"(va) : "memory"); + else + asm volatile("tlbie %0,0" : : "r"(va) : "memory"); +} +static inline void tlbie(unsigned long va, int large) +{ + asm volatile("ptesync": : :"memory"); + __tlbie(va, large); asm volatile("eieio; tlbsync; ptesync": : :"memory"); } -static inline void _tlbiel(unsigned long va) +static inline void __tlbiel(unsigned long va) +{ + /* clear top 16 bits, non SLS segment */ + va &= ~(0xffffULL << 48); + + /* one day Alan Modra will give us a way to do this cleanly :) */ +#ifdef WAITING_FOR_ALANM + asm volatile("tlbiel %0" : : "r"(va) : "memory"); +#else + asm volatile(".long 0x7c000224 | (%0 << 11)" : : "r"(va) : "memory"); +#endif +} + +static inline void tlbiel(unsigned long va) { asm volatile("ptesync": : :"memory"); - asm volatile("clrldi %0,%0,16\n\ - tlbiel %0" : : "r"(va) : "memory"); + __tlbiel(va); asm volatile("ptesync": : :"memory"); } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sun Nov 9 17:46:07 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 09 Nov 2003 17:46:07 +1100 Subject: syscall table patch In-Reply-To: <20031107124817.A29028@forte.austin.ibm.com> References: <20031107103613.A28940@forte.austin.ibm.com> <844F4B02-1142-11D8-A79D-000A95A0560C@us.ibm.com> <20031107124817.A29028@forte.austin.ibm.com> Message-ID: <1068360366.6807.0.camel@gaston> On Sat, 2003-11-08 at 05:48, linas at austin.ibm.com wrote: > On Fri, Nov 07, 2003 at 10:50:39AM -0600, Hollis Blanchard wrote: > > On Friday, Nov 7, 2003, at 10:36 US/Central, linas at austin.ibm.com wrote: > > > > > > -- table initialization is now done at runtime, rather than at > > > compile time. > > > > Why? > > two reasons: > > First reason: > Because I wanted to define a macro to initialize the 32-bit and 64-bit > tables at the same time, and couldn't figure out how to do this at > compile time. Ugly preprocessor trick: #include the table definition twice, once with a macro defined only for 64 bits table, once with a macro defined only for 32 bits table Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Mon Nov 10 21:03:42 2003 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Nov 2003 21:03:42 +1100 Subject: 2.6 POWER3/POWER4 compilation fix In-Reply-To: <20031108002330.GC3440@krispykreme> References: <20031108002330.GC3440@krispykreme> Message-ID: <16303.25214.272601.224661@cargo.ozlabs.ibm.com> Anton Blanchard writes: > Heres a quick fix I threw together from patches from Hollis and Segher. > I think this is worthy of sending to Linus now, considering all the > build problems we've been having lately. Looks ok. What version of gcc or binutils do you have to have to see problems? Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Mon Nov 10 21:21:08 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Mon, 10 Nov 2003 20:51:08 +1030 Subject: 2.6 POWER3/POWER4 compilation fix In-Reply-To: <16303.25214.272601.224661@cargo.ozlabs.ibm.com> References: <20031108002330.GC3440@krispykreme> <16303.25214.272601.224661@cargo.ozlabs.ibm.com> Message-ID: <20031110102108.GG2506@bubble.sa.bigpond.net.au> On Mon, Nov 10, 2003 at 09:03:42PM +1100, Paul Mackerras wrote: > Looks ok. What version of gcc or binutils do you have to have to see > problems? gas later than 2003-07-04 will use power4 forms of mtcrf if possible when given -mpower4. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Mon Nov 10 21:49:15 2003 From: paulus at samba.org (Paul Mackerras) Date: Mon, 10 Nov 2003 21:49:15 +1100 Subject: syscall table patch In-Reply-To: <20031107103613.A28940@forte.austin.ibm.com> References: <20031107103613.A28940@forte.austin.ibm.com> Message-ID: <16303.27947.273573.622813@cargo.ozlabs.ibm.com> linas at austin.ibm.com writes: > Can someone apply the patch below, harmonizing the syscall table > in misc.S with the #define syscalls in unistd.h ? Its a 'trivial' > patch; what it really does is to make it easier for other > non-mainstream kernel extensions to add new system calls > 'cleanly' to include/asm/unistd.h and arch/ppc64/kernel/misc.S > without making an ugly hash of things. Hmmm, I don't like how the patch ends up with a string of lines inside #if 0 and then another set of lines (for the same series of syscall numbers) saying sys_ni_syscall. I would prefer that the #if 0 goes away and the lines inside that section be changed to look like: .llong .sys_ni_syscall /* 208, reserved for tkill */ so that we don't end up with the numbers in the comments going from 220 to 208. We also need to resync the list in 2.4 with 2.5. > Since several people commented about having a syscall table written > in C, I also append a 'sample' implementation in C. There are two > or three things to note about this: > > -- table initialization is now done at runtime, rather than at > compile time. Hmmm this will bloat the size of the kernel image, won't it? And it doesn't save us any space at runtime. > -- C compiler wants function prototypes to be really happy, > and this patch doesn't provide them. Does anybody want them? That's the trouble with doing it in C. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Tue Nov 11 04:51:21 2003 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 10 Nov 2003 11:51:21 -0600 Subject: [PATCH] RTAS syscall - review request Message-ID: <1068486681.6301.15.camel@verve> This patch implements a generic RTAS interface to userspace through a system call. It was originally written by Rusty Russel and modified by myself. There are two main parts: - A new "rtas" syscall, which allows a user app to make any RTAS call available to the system. - A special RMO buffer is reserved at boot time for user-space apps that require low memory. Some RTAS calls require such "workarea" buffers, so the kernel needs to somehow export low memory regions for user-space use. This implementation exports the physical address and size of the reserved region through a simple /proc file. The user app is then free to mmap() /dev/mem at the address specified. Please respond with any comments by EOB Thursday, Nov 13th. Thanks- John diff -Nru a/arch/ppc/kernel/misc.S b/arch/ppc/kernel/misc.S --- a/arch/ppc/kernel/misc.S Mon Nov 10 11:34:58 2003 +++ b/arch/ppc/kernel/misc.S Mon Nov 10 11:34:58 2003 @@ -1385,3 +1385,4 @@ .long sys_statfs64 .long sys_fstatfs64 .long ppc_fadvise64_64 + .long sys_ni_syscall /* 255 - rtas (used on ppc64) */ diff -Nru a/arch/ppc64/kernel/misc.S b/arch/ppc64/kernel/misc.S --- a/arch/ppc64/kernel/misc.S Mon Nov 10 11:34:58 2003 +++ b/arch/ppc64/kernel/misc.S Mon Nov 10 11:34:58 2003 @@ -852,6 +852,8 @@ .llong .sys32_utimes .llong .sys_statfs64 .llong .sys_fstatfs64 + .llong .sys_ni_syscall /* 32bit only fadvise64 */ + .llong .ppc_rtas /* 255 */ .balign 8 _GLOBAL(sys_call_table) @@ -1109,3 +1111,5 @@ .llong .sys_utimes .llong .sys_statfs64 .llong .sys_fstatfs64 + .llong .sys_ni_syscall /* 32bit only fadvise64 */ + .llong .ppc_rtas /* 255 */ diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Mon Nov 10 11:34:58 2003 +++ b/arch/ppc64/kernel/prom.c Mon Nov 10 11:34:58 2003 @@ -611,6 +611,10 @@ _rtas->base) >= 0) { _rtas->entry = (long)_prom->args.rets[1]; } + RELOC(rtas_rmo_buf) + = (void *)lmb_alloc_base(RTAS_SYSCALL_MAX, + PAGE_SIZE, + rtas_region); } if (_rtas->entry <= 0) { diff -Nru a/arch/ppc64/kernel/rtas-proc.c b/arch/ppc64/kernel/rtas-proc.c --- a/arch/ppc64/kernel/rtas-proc.c Mon Nov 10 11:34:58 2003 +++ b/arch/ppc64/kernel/rtas-proc.c Mon Nov 10 11:34:58 2003 @@ -161,6 +161,8 @@ size_t count, loff_t *ppos); static ssize_t ppc_rtas_tone_volume_read(struct file * file, char * buf, size_t count, loff_t *ppos); +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char *buf, + size_t count, loff_t *ppos); struct file_operations ppc_rtas_poweron_operations = { .read = ppc_rtas_poweron_read, @@ -185,6 +187,10 @@ .write = ppc_rtas_tone_volume_write }; +static struct file_operations ppc_rtas_rmo_buf_ops = { + .read = ppc_rtas_rmo_buf_read, +}; + int ppc_rtas_find_all_sensors (void); int ppc_rtas_process_sensor(struct individual_sensor s, int state, int error, char * buf); @@ -233,6 +239,9 @@ entry = create_proc_entry("volume", S_IWUSR|S_IRUGO, proc_rtas); if (entry) entry->proc_fops = &ppc_rtas_tone_volume_operations; + + entry = create_proc_entry("rmo_buffer", S_IRUSR, proc_rtas); + if (entry) entry->proc_fops = &ppc_rtas_rmo_buf_ops; } /* ****************************************************************** */ @@ -842,6 +851,23 @@ int n; n = sprintf(buf, "%lu\n", rtas_tone_volume); + if (*ppos >= strlen(buf)) + return 0; + if (n > strlen(buf) - *ppos) + n = strlen(buf) - *ppos; + if (n > count) + n = count; + *ppos += n; + return n; +} + +/* RTAS Userspace access */ +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char *buf, + size_t count, loff_t *ppos) +{ + int n; + + n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); if (*ppos >= strlen(buf)) return 0; if (n > strlen(buf) - *ppos) diff -Nru a/arch/ppc64/kernel/rtas.c b/arch/ppc64/kernel/rtas.c --- a/arch/ppc64/kernel/rtas.c Mon Nov 10 11:34:58 2003 +++ b/arch/ppc64/kernel/rtas.c Mon Nov 10 11:34:58 2003 @@ -29,6 +29,7 @@ #include #include #include +#include struct flash_block_list_header rtas_firmware_flash_list = {0, 0}; @@ -381,6 +382,44 @@ if (rtas_firmware_flash_list.next) rtas_flash_bypass_warning(); rtas_power_off(); +} + +void *rtas_rmo_buf = NULL; + +asmlinkage int ppc_rtas(struct rtas_args __user *uargs) +{ + struct rtas_args args; + unsigned long flags; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0) + return -EFAULT; + + if (args.nargs > ARRAY_SIZE(args.args) + || args.nret > ARRAY_SIZE(args.args) + || args.nargs + args.nret > ARRAY_SIZE(args.args)) + return -EINVAL; + + /* Copy in args. */ + if (copy_from_user(args.args, uargs->args, + args.nargs * sizeof(rtas_arg_t)) != 0) + return -EFAULT; + + spin_lock_irqsave(&rtas.lock, flags); + get_paca()->xRtas = args; + enter_rtas((void *)__pa((unsigned long)&get_paca()->xRtas)); + args = get_paca()->xRtas; + spin_unlock_irqrestore(&rtas.lock, flags); + + /* Copy out args. */ + if (copy_to_user(uargs->args + args.nargs, + args.args + args.nargs, + args.nret * sizeof(rtas_arg_t)) != 0) + return -EFAULT; + + return 0; } EXPORT_SYMBOL(proc_ppc64); diff -Nru a/arch/ppc64/kernel/syscalls.c b/arch/ppc64/kernel/syscalls.c --- a/arch/ppc64/kernel/syscalls.c Mon Nov 10 11:34:58 2003 +++ b/arch/ppc64/kernel/syscalls.c Mon Nov 10 11:34:58 2003 @@ -41,6 +41,7 @@ #include #include #include +#include extern unsigned long wall_jiffies; @@ -234,3 +235,6 @@ return secs; } + +/* Only exists on P-series. */ +cond_syscall(ppc_rtas); diff -Nru a/include/asm-ppc/unistd.h b/include/asm-ppc/unistd.h --- a/include/asm-ppc/unistd.h Mon Nov 10 11:34:58 2003 +++ b/include/asm-ppc/unistd.h Mon Nov 10 11:34:58 2003 @@ -259,8 +259,9 @@ #define __NR_statfs64 252 #define __NR_fstatfs64 253 #define __NR_fadvise64_64 254 +#define __NR_rtas 255 -#define __NR_syscalls 255 +#define __NR_syscalls 256 #define __NR(n) #n diff -Nru a/include/asm-ppc64/rtas.h b/include/asm-ppc64/rtas.h --- a/include/asm-ppc64/rtas.h Mon Nov 10 11:34:58 2003 +++ b/include/asm-ppc64/rtas.h Mon Nov 10 11:34:58 2003 @@ -19,6 +19,9 @@ #define RTAS_UNKNOWN_SERVICE (-1) #define RTAS_INSTANTIATE_MAX (1UL<<30) /* Don't instantiate rtas at/above this value */ +/* Buffer size for ppc_rtas system call. */ +#define RTAS_SYSCALL_MAX (64 * 1024) + /* * In general to call RTAS use rtas_token("string") to lookup * an RTAS token for the given string (e.g. "event-scan"). @@ -188,5 +191,8 @@ extern spinlock_t rtas_data_buf_lock; extern char rtas_data_buf[RTAS_DATA_BUF_SIZE]; + +/* Buffer used for ppc_rtas system call */ +extern void *rtas_rmo_buf; #endif /* _PPC64_RTAS_H */ diff -Nru a/include/asm-ppc64/unistd.h b/include/asm-ppc64/unistd.h --- a/include/asm-ppc64/unistd.h Mon Nov 10 11:34:58 2003 +++ b/include/asm-ppc64/unistd.h Mon Nov 10 11:34:58 2003 @@ -264,8 +264,10 @@ #define __NR_utimes 251 #define __NR_statfs64 252 #define __NR_fstatfs64 253 +#define __NR_fadvise64_64 254 +#define __NR_rtas 255 -#define __NR_syscalls 254 +#define __NR_syscalls 256 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Nov 11 04:54:19 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 10 Nov 2003 11:54:19 -0600 Subject: syscall table patch In-Reply-To: <16303.27947.273573.622813@cargo.ozlabs.ibm.com>; from paulus@samba.org on Mon, Nov 10, 2003 at 09:49:15PM +1100 References: <20031107103613.A28940@forte.austin.ibm.com> <16303.27947.273573.622813@cargo.ozlabs.ibm.com> Message-ID: <20031110115418.A22020@forte.austin.ibm.com> On Mon, Nov 10, 2003 at 09:49:15PM +1100, Paul Mackerras wrote: > > linas at austin.ibm.com writes: > > > Can someone apply the patch below, harmonizing the syscall table > > in misc.S with the #define syscalls in unistd.h ? Its a 'trivial' > > patch; what it really does is to make it easier for other > > non-mainstream kernel extensions to add new system calls > > 'cleanly' to include/asm/unistd.h and arch/ppc64/kernel/misc.S > > without making an ugly hash of things. > > Hmmm, I don't like how the patch ends up with a string of lines inside > #if 0 and then another set of lines (for the same series of syscall > numbers) saying sys_ni_syscall. I would prefer that the #if 0 goes > away and the lines inside that section be changed to look like: > > .llong .sys_ni_syscall /* 208, reserved for tkill */ > > so that we don't end up with the numbers in the comments going from > 220 to 208. The attached patch does this. > We also need to resync the list in 2.4 with 2.5. The attached patch does this too, except for one change: -#define __NR_pread 179 -#define __NR_pwrite 180 +#define __NR_pread64 179 +#define __NR_pwrite64 180 which I was nervous doing ... as these are already defined and used ... > > Since several people commented about having a syscall table written > > in C, I also append a 'sample' implementation in C. There are two > > or three things to note about this: > > > > -- table initialization is now done at runtime, rather than at > > compile time. > > Hmmm this will bloat the size of the kernel image, won't it? And it > doesn't save us any space at runtime. Table initialization can be done at compile time; sacrificing the two-tables-at-once intiailization (or doing the ugly preprocessor trick mentioned in this thread. > > -- C compiler wants function prototypes to be really happy, > > and this patch doesn't provide them. Does anybody want them? > > That's the trouble with doing it in C. Well, let me know if you'd want the table-in-C patch, & I'll finish it. I'm wishy-washy, I won't argue strongly for or against it. --linas p.s. is there an ETA for when the marcello kernel will sync up with the current ppc64 code? -------------- next part -------------- Index: arch/ppc64/kernel/misc.S =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kernel/misc.S,v retrieving revision 1.1.1.2 diff -u -p -u -p -r1.1.1.2 misc.S --- arch/ppc64/kernel/misc.S 23 Oct 2003 20:07:43 -0000 1.1.1.2 +++ arch/ppc64/kernel/misc.S 10 Nov 2003 17:32:24 -0000 @@ -593,9 +593,10 @@ _GLOBAL(arch_kernel_thread) #ifdef CONFIG_BINFMT_ELF32 /* Why isn't this a) automatic, b) written in 'C'? */ +/* because a) its not regular enough, b) C function protoypes needed */ .balign 8 _GLOBAL(sys_call_table32) - .llong .sys_ni_syscall /* 0 - old "setup()" system call */ + .llong .sys_ni_syscall /* 0 - reserved - restart_syscall */ .llong .sys32_exit .llong .sys32_fork .llong .sys_read @@ -803,30 +804,55 @@ _GLOBAL(sys_call_table32) .llong .sys_madvise /* 205 */ .llong .sys_mincore /* 206 */ .llong .sys_gettid /* 207 */ -#if 0 /* Reserved syscalls */ - .llong .sys_tkill /* 208 */ - .llong .sys_setxattr - .llong .sys_lsetxattr /* 210 */ - .llong .sys_fsetxattr - .llong .sys_getxattr - .llong .sys_lgetxattr - .llong .sys_fgetxattr - .llong .sys_listxattr /* 215 */ - .llong .sys_llistxattr - .llong .sys_flistxattr - .llong .sys_removexattr - .llong .sys_lremovexattr - .llong .sys_fremovexattr /* 220 */ - .llong .sys_futex -#endif + .llong .sys_ni_syscall /* 208 - reserved - .sys_tkill */ + .llong .sys_ni_syscall /* 209 - reserved - .sys_setxattr */ + .llong .sys_ni_syscall /* 210 - reserved - .sys_lsetxattr */ + .llong .sys_ni_syscall /* 211 - reserved - .sys_lsetxattr */ + .llong .sys_ni_syscall /* 212 - reserved - .sys_getxattr */ + .llong .sys_ni_syscall /* 213 - reserved - .sys_lgetxattr */ + .llong .sys_ni_syscall /* 214 - reserved - .sys_fgetxattr */ + .llong .sys_ni_syscall /* 215 - reserved - .sys_listxattr */ + .llong .sys_ni_syscall /* 216 - reserved - .sys_llistxattr */ + .llong .sys_ni_syscall /* 217 - reserved - .sys_flistxattr */ + .llong .sys_ni_syscall /* 218 - reserved - .sys_removexattr */ + .llong .sys_ni_syscall /* 219 - reserved - .sys_lremovexattr */ + .llong .sys_ni_syscall /* 220 - resreved - .sys_fremovexattr */ + .llong .sys_ni_syscall /* 221 - reserved - .sys_futex */ + .llong .sys_ni_syscall /* 222 - reserved - sched_setaffinity */ + .llong .sys_ni_syscall /* 223 - reserved - sched_getaffinity */ + .llong .sys_ni_syscall /* 224 - currently unused */ + .llong .sys_ni_syscall /* 225 - reserved - tuxcall */ + .llong .sys_ni_syscall /* 226 - reserved - sendfile64 */ + .llong .sys_ni_syscall /* 227 - reserved - io_setup */ + .llong .sys_ni_syscall /* 228 - reserved - io_destroy */ + .llong .sys_ni_syscall /* 229 - reserved - io_getevents */ + .llong .sys_ni_syscall /* 230 - reserved - io_submit */ + .llong .sys_ni_syscall /* 231 - reserved - io_cancel */ + .llong .sys_ni_syscall /* 232 - reserved - set_tid_address */ + .llong .sys_ni_syscall /* 233 - reserved - fadvise64 */ + .llong .sys_ni_syscall /* 234 - reserved - exit_group */ + .llong .sys_ni_syscall /* 235 - reserved - lookup_dcookie */ + .llong .sys_ni_syscall /* 236 - reserved - sys_epoll_create */ + .llong .sys_ni_syscall /* 237 - reserved - sys_epoll_ctl */ + .llong .sys_ni_syscall /* 238 - reserved - sys_epoll_wait */ + .llong .sys_ni_syscall /* 239 - reserved - remap_file_pages */ + .llong .sys_ni_syscall /* 240 - reserved - timer_create */ + .llong .sys_ni_syscall /* 241 - reserved - timer_settime */ + .llong .sys_ni_syscall /* 242 - reserved - timer_gettime */ + .llong .sys_ni_syscall /* 243 - reserved - timer_getoverrun */ + .llong .sys_ni_syscall /* 244 - reserved - timer_delete */ + .llong .sys_ni_syscall /* 245 - reserved - clock_settime */ + .llong .sys_ni_syscall /* 246 - reserved - clock_gettime */ + .llong .sys_ni_syscall /* 247 - reserved - clock_getres */ + .llong .sys_ni_syscall /* 248 - reserved - clock_nanosleep */ .llong .sys_perfmonctl /* Put this here for now ... */ - .rept NR_syscalls-222 + .rept NR_syscalls-250 .llong .sys_ni_syscall .endr #endif .balign 8 _GLOBAL(sys_call_table) - .llong .sys_ni_syscall /* 0 - old "setup()" system call */ + .llong .sys_ni_syscall /* 0 - reserved - restart_syscall */ .llong .sys_exit .llong .sys_fork .llong .sys_read @@ -1034,23 +1060,48 @@ _GLOBAL(sys_call_table) .llong .sys_madvise /* 205 */ .llong .sys_mincore /* 206 */ .llong .sys_gettid /* 207 */ -#if 0 /* Reserved syscalls */ - .llong .sys_tkill /* 208 */ - .llong .sys_setxattr - .llong .sys_lsetxattr /* 210 */ - .llong .sys_fsetxattr - .llong .sys_getxattr - .llong .sys_lgetxattr - .llong .sys_fgetxattr - .llong .sys_listxattr /* 215 */ - .llong .sys_llistxattr - .llong .sys_flistxattr - .llong .sys_removexattr - .llong .sys_lremovexattr - .llong .sys_fremovexattr /* 220 */ - .llong .sys_futex -#endif + .llong .sys_ni_syscall /* 208 - reserved - .sys_tkill */ + .llong .sys_ni_syscall /* 209 - reserved - .sys_setxattr */ + .llong .sys_ni_syscall /* 210 - reserved - .sys_lsetxattr */ + .llong .sys_ni_syscall /* 211 - reserved - .sys_lsetxattr */ + .llong .sys_ni_syscall /* 212 - reserved - .sys_getxattr */ + .llong .sys_ni_syscall /* 213 - reserved - .sys_lgetxattr */ + .llong .sys_ni_syscall /* 214 - reserved - .sys_fgetxattr */ + .llong .sys_ni_syscall /* 215 - reserved - .sys_listxattr */ + .llong .sys_ni_syscall /* 216 - reserved - .sys_llistxattr */ + .llong .sys_ni_syscall /* 217 - reserved - .sys_flistxattr */ + .llong .sys_ni_syscall /* 218 - reserved - .sys_removexattr */ + .llong .sys_ni_syscall /* 219 - reserved - .sys_lremovexattr */ + .llong .sys_ni_syscall /* 220 - resreved - .sys_fremovexattr */ + .llong .sys_ni_syscall /* 221 - reserved - .sys_futex */ + .llong .sys_ni_syscall /* 222 - reserved - sched_setaffinity */ + .llong .sys_ni_syscall /* 223 - reserved - sched_getaffinity */ + .llong .sys_ni_syscall /* 224 - currently unused */ + .llong .sys_ni_syscall /* 225 - reserved - tuxcall */ + .llong .sys_ni_syscall /* 226 - reserved - sendfile64 */ + .llong .sys_ni_syscall /* 227 - reserved - io_setup */ + .llong .sys_ni_syscall /* 228 - reserved - io_destroy */ + .llong .sys_ni_syscall /* 229 - reserved - io_getevents */ + .llong .sys_ni_syscall /* 230 - reserved - io_submit */ + .llong .sys_ni_syscall /* 231 - reserved - io_cancel */ + .llong .sys_ni_syscall /* 232 - reserved - set_tid_address */ + .llong .sys_ni_syscall /* 233 - reserved - fadvise64 */ + .llong .sys_ni_syscall /* 234 - reserved - exit_group */ + .llong .sys_ni_syscall /* 235 - reserved - lookup_dcookie */ + .llong .sys_ni_syscall /* 236 - reserved - sys_epoll_create */ + .llong .sys_ni_syscall /* 237 - reserved - sys_epoll_ctl */ + .llong .sys_ni_syscall /* 238 - reserved - sys_epoll_wait */ + .llong .sys_ni_syscall /* 239 - reserved - remap_file_pages */ + .llong .sys_ni_syscall /* 240 - reserved - timer_create */ + .llong .sys_ni_syscall /* 241 - reserved - timer_settime */ + .llong .sys_ni_syscall /* 242 - reserved - timer_gettime */ + .llong .sys_ni_syscall /* 243 - reserved - timer_getoverrun */ + .llong .sys_ni_syscall /* 244 - reserved - timer_delete */ + .llong .sys_ni_syscall /* 245 - reserved - clock_settime */ + .llong .sys_ni_syscall /* 246 - reserved - clock_gettime */ + .llong .sys_ni_syscall /* 247 - reserved - clock_getres */ + .llong .sys_ni_syscall /* 248 - reserved - clock_nanosleep */ .llong .sys_perfmonctl /* Put this here for now ... */ - .rept NR_syscalls-222 + .rept NR_syscalls-250 .llong .sys_ni_syscall .endr Index: include/asm-ppc64/unistd.h =================================================================== RCS file: /home/linas/cvsroot/linux24/include/asm-ppc64/unistd.h,v retrieving revision 1.1.1.2 diff -u -p -u -p -r1.1.1.2 unistd.h --- include/asm-ppc64/unistd.h 23 Oct 2003 20:11:14 -0000 1.1.1.2 +++ include/asm-ppc64/unistd.h 10 Nov 2003 17:21:07 -0000 @@ -9,7 +9,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - +#define __NR_restart_syscall 0 #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 @@ -233,7 +233,7 @@ #define __NR_futex 221 #define __NR_sched_setaffinity 222 #define __NR_sched_getaffinity 223 -#define __NR_security 224 +/* 224 currently unused */ #define __NR_tuxcall 225 #define __NR_sendfile64 226 #define __NR_io_setup 227 @@ -241,9 +241,28 @@ #define __NR_io_getevents 229 #define __NR_io_submit 230 #define __NR_io_cancel 231 -#define __NR_alloc_hugepages 232 -#define __NR_free_hugepages 233 +#define __NR_set_tid_address 232 +#define __NR_fadvise64 233 #define __NR_exit_group 234 +#define __NR_lookup_dcookie 235 +#define __NR_sys_epoll_create 236 +#define __NR_sys_epoll_ctl 237 +#define __NR_sys_epoll_wait 238 +#define __NR_remap_file_pages 239 +#define __NR_timer_create 240 +#define __NR_timer_settime 241 +#define __NR_timer_gettime 242 +#define __NR_timer_getoverrun 243 +#define __NR_timer_delete 244 +#define __NR_clock_settime 245 +#define __NR_clock_gettime 246 +#define __NR_clock_getres 247 +#define __NR_clock_nanosleep 248 + +#define __NR_syscalls 249 +#ifdef __KERNEL__ +#define NR_syscalls __NR_syscalls +#endif #define __NR(n) #n From olh at suse.de Tue Nov 11 08:35:44 2003 From: olh at suse.de (Olaf Hering) Date: Mon, 10 Nov 2003 22:35:44 +0100 Subject: [PATCH] __sysrq_put_key_op undeclared Message-ID: <20031110213544.GA28333@suse.de> kdb doesnt compile if sysrq is not active in the .config. --- ../linuxppc64-2.5/arch/ppc64/kdb/kdbasupport.c 2003-10-15 04:46:20.000000000 -0700 +++ ./arch/ppc64/kdb/kdbasupport.c 2003-11-10 13:31:19.000000000 -0800 @@ -74,6 +74,7 @@ extern int kdb_parse(const char *cmdstr, #include #include +#ifdef CONFIG_MAGIC_SYSRQ static void sysrq_handle_kdb(int key, struct pt_regs *pt_regs, struct kbd_struct *kbd, struct tty_struct *tty) { @@ -93,6 +94,7 @@ kdb_map_scc(void) /* register sysrq 'x' */ __sysrq_put_key_op('x', &sysrq_kdb_op); } +#endif /* @@ -2089,7 +2091,9 @@ functionname(int argc, const char **argv void __init kdba_init(void) { +#ifdef CONFIG_MAGIC_SYSRQ kdb_map_scc(); /* map sysrq key */ +#endif debugger = kdb_debugger; debugger_bpt = kdb_debugger_bpt; -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Tue Nov 11 08:49:57 2003 From: paulus at samba.org (Paul Mackerras) Date: Tue, 11 Nov 2003 08:49:57 +1100 Subject: syscall table patch In-Reply-To: <20031110115418.A22020@forte.austin.ibm.com> References: <20031107103613.A28940@forte.austin.ibm.com> <16303.27947.273573.622813@cargo.ozlabs.ibm.com> <20031110115418.A22020@forte.austin.ibm.com> Message-ID: <16304.2053.123830.139919@cargo.ozlabs.ibm.com> linas at austin.ibm.com writes: > p.s. is there an ETA for when the marcello kernel will sync up with > the current ppc64 code? If there is anything critical I'll send it off immediately. Non-critical things can wait until Marcelo releases the final 2.4.23. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Tue Nov 11 08:54:28 2003 From: olh at suse.de (Olaf Hering) Date: Mon, 10 Nov 2003 22:54:28 +0100 Subject: gcc33 creates a zImage which doesnt boot Message-ID: <20031110215428.GA21424@suse.de> Good morning, the attached .config creates a zImage which boots on a p610 when compiled with SLES8 gcc3.2.2, but not with gcc 3.3.2. olh at mandarine:~> powerpc64-linux-ld -v GNU ld version 2.14.90 20031019 olh at mandarine:~> powerpc64-linux-gcc -v Reading specs from /opt/cross/lib/gcc-lib/powerpc64-linux/3.3.2/specs Configured with: /usr/src/packages/BUILD/cross-ppc64-gcc-3.3.2/gcc-3.3.2/configure --enable-languages=c,c++,f77 --prefix=/opt/cross --host=powerpc-suse-linux --target=powerpc64-linux --enable-threads=posix --disable-nls --enable-shared --with-headers=/usr/src/packages/BUILD/cross-ppc64-gcc-3.3.2/include-ppc64-glibc-2.2.5 Thread model: posix gcc version 3.3.2 I remember that this was built with gcc1.diff and gcc3.diff from ftp.linuxppc64.org/pub/people/amodra/gcc-3.3/20031017 0 > boot network console=ttyS0,9600 kdb=on root=/dev/md0 init=/bin/bash BOOTP S = 1 FILE: elderberry Load Addr=0x4000 Max Size=0xbfc000 FINAL Packet Count = 4562 FINAL File Size = 2335348 bytes. zImage starting: loaded at 0x400000 gunzipping (0x1400000 <- 0x408000:0x5f5a8d)...done 6293299 bytes 57996 bytes of heap consumed, max in use 41896 opening display /pci at fee00000/pci at b/display at 0... ok instantiating rtas -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG -------------- next part -------------- # # Automatically generated make config: don't edit # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y CONFIG_FRAME_POINTER=y CONFIG_FORCE_MAX_ZONEORDER=13 # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_STANDALONE=y # # General setup # CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y # # Loadable module support # # CONFIG_MODULES is not set # # Platform support # # CONFIG_PPC_ISERIES is not set CONFIG_PPC_PSERIES=y CONFIG_PPC=y CONFIG_PPC64=y # CONFIG_POWER4_ONLY is not set CONFIG_SMP=y CONFIG_IRQ_ALL_CPUS=y CONFIG_NR_CPUS=32 # CONFIG_HMT is not set # CONFIG_DISCONTIGMEM is not set # CONFIG_RTAS_FLASH is not set CONFIG_SCANLOG=y CONFIG_PPC_RTAS=y # # General setup # CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set # CONFIG_PCI_LEGACY_PROC is not set # CONFIG_PCI_NAMES is not set # CONFIG_HOTPLUG is not set CONFIG_PROC_DEVICETREE=y CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="console=ttyS0,9600 kdb=on root=/dev/sda6" # # Generic Driver Options # # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # CONFIG_PNP is not set # # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_LOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_RAM is not set # CONFIG_BLK_DEV_INITRD is not set # # ATA/ATAPI/MFM/RLL support # # CONFIG_IDE is not set # # SCSI device support # CONFIG_SCSI=y # CONFIG_SCSI_PROC_FS is not set # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y # CONFIG_CHR_DEV_ST is not set # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_REPORT_LUNS=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_MEGARAID is not set # CONFIG_SCSI_SATA is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INIA100 is not set CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_NSP32 is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID5=y # CONFIG_MD_MULTIPATH is not set CONFIG_BLK_DEV_DM=y CONFIG_DM_IOCTL_V4=y # # Fusion MPT device support # # CONFIG_FUSION is not set # # IEEE 1394 (FireWire) support (EXPERIMENTAL) # # CONFIG_IEEE1394 is not set # # I2O device support # # CONFIG_I2O is not set # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_INET_ECN=y CONFIG_SYN_COOKIES=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y # CONFIG_IPV6 is not set # CONFIG_DECNET is not set # CONFIG_BRIDGE is not set # CONFIG_NETFILTER is not set CONFIG_XFRM=y # CONFIG_XFRM_USER is not set # # SCTP Configuration (EXPERIMENTAL) # CONFIG_IPV6_SCTP__=y # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_VLAN_8021Q is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_NET_FASTROUTE is not set # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # # Network testing # # CONFIG_NET_PKTGEN is not set CONFIG_NETDEVICES=y # # ARCnet devices # # CONFIG_ARCNET is not set # CONFIG_DUMMY is not set # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # CONFIG_ETHERTAP is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_OAKNET is not set # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_NET_VENDOR_3COM is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set CONFIG_NET_PCI=y CONFIG_PCNET32=y # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_B44 is not set # CONFIG_DGRS is not set # CONFIG_EEPRO100 is not set # CONFIG_E100 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set # CONFIG_8139CP is not set # CONFIG_8139TOO is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SIS190 is not set # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Token Ring devices # # CONFIG_TR is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set # # Wan interfaces # # CONFIG_WAN is not set # # Amateur Radio support # # CONFIG_HAMRADIO is not set # # IrDA (infrared) support # # CONFIG_IRDA is not set # # Bluetooth support # # CONFIG_BT is not set # # ISDN subsystem # # CONFIG_ISDN_BOOL is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PCIPS2 is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_PS2_SYNAPTICS is not set # CONFIG_MOUSE_SERIAL is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=y # CONFIG_INPUT_UINPUT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_NR_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_UNIX98_PTY_COUNT=256 CONFIG_HVC_CONSOLE=y # # I2C support # # CONFIG_I2C is not set # # I2C Algorithms # # # I2C Hardware Bus support # # # I2C Hardware Sensors Chip support # # CONFIG_I2C_SENSOR is not set # # Mice # # CONFIG_BUSMOUSE is not set # CONFIG_QIC02_TAPE is not set # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_NVRAM is not set # CONFIG_RTC is not set # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # # CONFIG_AGP is not set # CONFIG_DRM is not set CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=256 # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # File systems # CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y # CONFIG_EXT2_FS_SECURITY is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=y # CONFIG_REISERFS_CHECK is not set # CONFIG_REISERFS_PROC_INFO is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set CONFIG_AUTOFS_FS=y # CONFIG_AUTOFS4_FS is not set # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y # CONFIG_JOLIET is not set # CONFIG_ZISOFS is not set # CONFIG_UDF_FS is not set # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS=y CONFIG_DEVPTS_FS_XATTR=y # CONFIG_DEVPTS_FS_SECURITY is not set CONFIG_TMPFS=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set CONFIG_CRAMFS=y # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFS_V4=y # CONFIG_NFS_DIRECTIO is not set # CONFIG_NFSD is not set CONFIG_LOCKD=y CONFIG_LOCKD_V4=y # CONFIG_EXPORTFS is not set CONFIG_SUNRPC=y # CONFIG_SUNRPC_GSS is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_INTERMEZZO_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y CONFIG_NLS=y # # Native Language Support # CONFIG_NLS_DEFAULT="iso8859-1" # CONFIG_NLS_CODEPAGE_437 is not set # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set # CONFIG_NLS_ISO8859_1 is not set # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set # CONFIG_NLS_ISO8859_15 is not set # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set # CONFIG_NLS_UTF8 is not set # # Graphics support # # CONFIG_FB is not set # # Console display driver support # # CONFIG_VGA_CONSOLE is not set # CONFIG_MDA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y # # Sound # # CONFIG_SOUND is not set # # USB support # # CONFIG_USB is not set # CONFIG_USB_GADGET is not set # # Profiling support # CONFIG_PROFILING=y CONFIG_OPROFILE=y # # Kernel hacking # CONFIG_DEBUG_KERNEL=y # CONFIG_DEBUG_SLAB is not set # CONFIG_MAGIC_SYSRQ is not set # CONFIG_XMON is not set CONFIG_KDB=y # CONFIG_KDB_OFF is not set # CONFIG_PPCDBG is not set # CONFIG_DEBUG_INFO is not set # # Security options # # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=y CONFIG_CRYPTO_MD4=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_SHA512=y CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=y CONFIG_CRYPTO_TWOFISH=y # CONFIG_CRYPTO_SERPENT is not set CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_CAST5=y CONFIG_CRYPTO_CAST6=y CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_TEST=y # # Library routines # CONFIG_CRC32=y CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y From amodra at bigpond.net.au Tue Nov 11 09:35:16 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 11 Nov 2003 09:05:16 +1030 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031110215428.GA21424@suse.de> References: <20031110215428.GA21424@suse.de> Message-ID: <20031110223516.GI2506@bubble.sa.bigpond.net.au> On Mon, Nov 10, 2003 at 10:54:28PM +0100, Olaf Hering wrote: > Good morning, > > the attached .config creates a zImage which boots on a p610 when > compiled with SLES8 gcc3.2.2, but not with gcc 3.3.2. Hi Olaf, > CONFIG_64BIT=y This means you use -mcpu=power4 doesn't it? Probably a bad idea on power3 processors. Likely gcc-3.3 is generating some mtcrf insns with a single cr field destination, and gas is then using the power4 form of the instruction. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Tue Nov 11 09:44:06 2003 From: paulus at samba.org (Paul Mackerras) Date: Tue, 11 Nov 2003 09:44:06 +1100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031110215428.GA21424@suse.de> References: <20031110215428.GA21424@suse.de> Message-ID: <16304.5302.329925.535567@cargo.ozlabs.ibm.com> Olaf Hering writes: > the attached .config creates a zImage which boots on a p610 when > compiled with SLES8 gcc3.2.2, but not with gcc 3.3.2. Which kernel tree? ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Tue Nov 11 10:03:24 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 11 Nov 2003 09:33:24 +1030 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031110223516.GI2506@bubble.sa.bigpond.net.au> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> Message-ID: <20031110230324.GK2506@bubble.sa.bigpond.net.au> On Tue, Nov 11, 2003 at 09:05:16AM +1030, Alan Modra wrote: > > On Mon, Nov 10, 2003 at 10:54:28PM +0100, Olaf Hering wrote: > > Good morning, > > > > the attached .config creates a zImage which boots on a p610 when > > compiled with SLES8 gcc3.2.2, but not with gcc 3.3.2. > > Hi Olaf, > > > CONFIG_64BIT=y > > This means you use -mcpu=power4 doesn't it? OK, so it doesn't mean that at all. I was confusing this one with Anton's new option to choose power4 compilation. However, if you've hacked the makefiles and are somehow getting -mpower4 being passed to gas, the following comment is still true. > Probably a bad idea on > power3 processors. Likely gcc-3.3 is generating some mtcrf insns with a > single cr field destination, and gas is then using the power4 form of > the instruction. Incidentally, I think gcc-3.3 is more likely to generate single field mtcrf insns than gcc-3.2. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 10:11:53 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 10:11:53 +1100 Subject: 2.6.0 kernel: Bind interrupt question. In-Reply-To: References: Message-ID: <20031110231153.GG930@krispykreme> > You're right. By defaut, the CONFIG_NR_CPUS is set to 32. > I need to reset that and rebuild the kernel to try the interrupt binding > again. Well it should work with NR_CPUS == 32 as well. Sounds like a bug. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:24:16 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:24:16 +1100 Subject: important ppc64 bug fixes Message-ID: <20031111062416.GH930@krispykreme> Hi, I had a look at our outstanding patches and came up with the following list that could potentially be merged before 2.6.0. Even if we dont get them merged, I want to make we dont lose track of any bug fixes that are floating around. Am I missing anything in the bug fix category which isnt in Linus' tree? - PCI IO windows can start at 0x0 (Anton) - revert IRQ_INPROGRESS change (Anton) - Fix POWER4 compile (Anton/Hollis/Segher) - Limit stack growth (Anil/Anton) - fix sleep inside spinlock in HVC console (Nathan) - fix signal wakeup race due to unordered access of SIGPENDING and TASK_INTERRUPTIBLE (Anton) - fix interrupt affinity bugs (Anton) - add proper sched_clock (???) Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Tue Nov 11 17:30:34 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 11 Nov 2003 07:30:34 +0100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031110230324.GK2506@bubble.sa.bigpond.net.au> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> Message-ID: <20031111063034.GA22520@suse.de> On Tue, Nov 11, Alan Modra wrote: > On Tue, Nov 11, 2003 at 09:05:16AM +1030, Alan Modra wrote: > > > > On Mon, Nov 10, 2003 at 10:54:28PM +0100, Olaf Hering wrote: > > > Good morning, > > > > > > the attached .config creates a zImage which boots on a p610 when > > > compiled with SLES8 gcc3.2.2, but not with gcc 3.3.2. > > > > Hi Olaf, > > > > > CONFIG_64BIT=y > > > > This means you use -mcpu=power4 doesn't it? > > OK, so it doesn't mean that at all. I was confusing this one with > Anton's new option to choose power4 compilation. However, if you've > hacked the makefiles and are somehow getting -mpower4 being passed to > gas, the following comment is still true. > > > Probably a bad idea on > > power3 processors. Likely gcc-3.3 is generating some mtcrf insns with a > > single cr field destination, and gas is then using the power4 form of > > the instruction. > > Incidentally, I think gcc-3.3 is more likely to generate single field > mtcrf insns than gcc-3.2. It is exaxtly the same source, ameslab 2.5 from yesterday. I had to tweak compiler.h for 3.2.2. (people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm2/broken-out/ppc64-reloc_hide.patch) I tried to use -mcpu=power3 instead of -mcpu=power4 with the gcc33 build, but that did not help. I will copy the 3.2 .o files into the 3.3 tree and see where it breaks. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:35:34 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:35:34 +1100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111063034.GA22520@suse.de> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> <20031111063034.GA22520@suse.de> Message-ID: <20031111063534.GI930@krispykreme> > It is exaxtly the same source, ameslab 2.5 from yesterday. I had to > tweak compiler.h for 3.2.2. > (people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm2/broken-out/ppc64-reloc_hide.patch) > > I tried to use -mcpu=power3 instead of -mcpu=power4 with the gcc33 > build, but that did not help. I will copy the 3.2 .o files into the 3.3 > tree and see where it breaks. What was the exact error when -mcpu=power4 wasnt added? Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:37:32 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:37:32 +1100 Subject: important ppc64 bug fixes In-Reply-To: <20031111062416.GH930@krispykreme> References: <20031111062416.GH930@krispykreme> Message-ID: <20031111063732.GJ930@krispykreme> > - PCI IO windows can start at 0x0 (Anton) The current pci probe code ignores PCI IO windows that start at 0x0. This is legal and it can be found on some ppc64 boxes. foo-anton/drivers/pci/probe.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/pci/probe.c~pci_patch drivers/pci/probe.c --- foo/drivers/pci/probe.c~pci_patch 2003-09-06 21:23:49.000000000 -0500 +++ foo-anton/drivers/pci/probe.c 2003-09-06 21:23:49.000000000 -0500 @@ -176,7 +176,7 @@ void __devinit pci_read_bridge_bases(str limit |= (io_limit_hi << 16); } - if (base && base <= limit) { + if (base <= limit) { res->flags = (io_base_lo & PCI_IO_RANGE_TYPE_MASK) | IORESOURCE_IO; res->start = base; res->end = limit + 0xfff; _ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:37:59 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:37:59 +1100 Subject: important ppc64 bug fixes In-Reply-To: <20031111062416.GH930@krispykreme> References: <20031111062416.GH930@krispykreme> Message-ID: <20031111063758.GK930@krispykreme> > - revert IRQ_INPROGRESS change (Anton) Revert the IRQ_INPROGRESS fix as x86 has done recently. --- linux-2.5/arch/ppc64/kernel/irq.c 2003-10-18 07:30:09.000000000 +1000 +++ for-linus-ppc64/arch/ppc64/kernel/irq.c 2003-11-11 17:12:39.508732513 +1100 @@ -300,7 +300,7 @@ spin_lock_irqsave(&desc->lock, flags); switch (desc->depth) { case 1: { - unsigned int status = desc->status & ~(IRQ_DISABLED | IRQ_INPROGRESS); + unsigned int status = desc->status & ~IRQ_DISABLED; desc->status = status; if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) { desc->status = status | IRQ_REPLAY; ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Tue Nov 11 17:42:41 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 11 Nov 2003 07:42:41 +0100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111063534.GI930@krispykreme> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> <20031111063034.GA22520@suse.de> <20031111063534.GI930@krispykreme> Message-ID: <20031111064241.GD22520@suse.de> On Tue, Nov 11, Anton Blanchard wrote: > > > It is exaxtly the same source, ameslab 2.5 from yesterday. I had to > > tweak compiler.h for 3.2.2. > > (people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm2/broken-out/ppc64-reloc_hide.patch) > > > > I tried to use -mcpu=power3 instead of -mcpu=power4 with the gcc33 > > build, but that did not help. I will copy the 3.2 .o files into the 3.3 > > tree and see where it breaks. > > What was the exact error when -mcpu=power4 wasnt added? There was no error. But I just realized that '-mcpu=power4' is the wrong if branch in the Makefile, so my change was never active. Earlier debugging showed that it dies in lmb_alloc_base(), probably in the while() loop where lmb_overlaps_region() is called. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:43:36 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:43:36 +1100 Subject: important ppc64 bug fixes In-Reply-To: <20031111062416.GH930@krispykreme> References: <20031111062416.GH930@krispykreme> Message-ID: <20031111064336.GL930@krispykreme> > - Limit stack growth (Anil/Anton) I cleaned up Anil's patch a bit. Im a little worried by the two constants (1MB and 2kB), do they still hold? foo_work-anton/arch/ppc64/mm/fault.c | 72 ++++++++++++++++++++++++++++++++++ 1 files changed, 72 insertions(+) diff -puN arch/ppc64/mm/fault.c~stackgrowth arch/ppc64/mm/fault.c --- foo_work/arch/ppc64/mm/fault.c~stackgrowth 2003-11-10 21:05:47.950740614 -0600 +++ foo_work-anton/arch/ppc64/mm/fault.c 2003-11-11 00:38:52.905989057 -0600 @@ -46,6 +46,46 @@ int debugger_kernel_faults = 1; void bad_page_fault(struct pt_regs *, unsigned long, int); /* + * Check whether the instruction at regs->nip is a store using an + * update addressing form which will update sp + */ +static int store_updates_sp(struct pt_regs *regs) +{ + unsigned int inst; + + if (get_user(inst, (unsigned int *)regs->nip)) + return 0; + + /* check for 1 in the rA field */ + if (((inst >> 16) & 0x1f) != 1) + return 0; + + /* check major opcode */ + switch (inst >> 26) { + case 37: /* stwu */ + case 39: /* stbu */ + case 45: /* sthu */ + case 53: /* stfsu */ + case 55: /* stfdu */ + case 62: /* stdu */ + return 1; + case 31: + /* check minor opcode */ + switch((inst >> 1) & 0x3ff) { + case 181: /* stwux */ + case 183: /* stbux */ + case 247: /* stdux */ + case 439: /* sthux */ + case 695: /* stfsux */ + case 759: /* stfdux */ + return 1; + } + } + + return 0; +} + +/* * The error_code parameter is * - DSISR for a non-SLB data access fault, * - SRR1 & 0x08000000 for a non-SLB instruction access fault @@ -96,6 +136,38 @@ void do_page_fault(struct pt_regs *regs, } if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; + + /* + * N.B. The ppc64 ABI allows programs to access up to 288 + * bytes below the stack pointer. + * The kernel signal delivery code writes up to about 1.5kB + * below the stack pointer (r1) before decrementing it. + * The exec code can write slightly over 640kB to the stack + * before setting the user r1. Thus we allow the stack to + * expand to 1MB without further checks. + */ + if (address + 0x100000 < vma->vm_end) { + /* get user regs even if this fault is in kernel mode */ + struct pt_regs *uregs = current->thread.regs; + if (uregs == NULL) + goto bad_area; + + /* + * A user-mode access to an address a long way below + * the stack pointer is only valid if the instruction + * is one which would update the stack pointer to the + * address accessed if the instruction completed, + * i.e. either stwu rs,n(r1) or stwux rs,r1,rb + * (or the byte, halfword, float or double forms). + * + * If we don't check this then any write to the area + * between the last mapped region and the stack will + * expand the stack rather than segfaulting. + */ + if (address + 2048 < uregs->gpr[1] + && (!user_mode(regs) || !store_updates_sp(regs))) + goto bad_area; + } if (expand_stack(vma, address)) goto bad_area; _ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Nov 11 17:47:13 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 11 Nov 2003 17:47:13 +1100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111064241.GD22520@suse.de> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> <20031111063034.GA22520@suse.de> <20031111063534.GI930@krispykreme> <20031111064241.GD22520@suse.de> Message-ID: <20031111064713.GM930@krispykreme> > Earlier debugging showed that it dies in lmb_alloc_base(), probably in > the while() loop where lmb_overlaps_region() is called. Im willing to bet you a beer thats the first single field mtcrf instruction you execute. It was dying in the lmb code in rtas_instantiate when I first debugged the -mcpu=power4 bug :) Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Tue Nov 11 17:59:27 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 11 Nov 2003 07:59:27 +0100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111064713.GM930@krispykreme> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> <20031111063034.GA22520@suse.de> <20031111063534.GI930@krispykreme> <20031111064241.GD22520@suse.de> <20031111064713.GM930@krispykreme> Message-ID: <20031111065927.GA6904@suse.de> On Tue, Nov 11, Anton Blanchard wrote: > > > Earlier debugging showed that it dies in lmb_alloc_base(), probably in > > the while() loop where lmb_overlaps_region() is called. > > Im willing to bet you a beer thats the first single field mtcrf > instruction you execute. It was dying in the lmb code in > rtas_instantiate when I first debugged the -mcpu=power4 bug :) I disabled this line in arch/ppc64/Makefile and replaced all tlbiel calls with kdb(). #CFLAGS += -mtune=power4 -Wa,-mpower4 This helps. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Tue Nov 11 22:48:16 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 11 Nov 2003 12:48:16 +0100 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111065927.GA6904@suse.de> References: <20031110215428.GA21424@suse.de> <20031110223516.GI2506@bubble.sa.bigpond.net.au> <20031110230324.GK2506@bubble.sa.bigpond.net.au> <20031111063034.GA22520@suse.de> <20031111063534.GI930@krispykreme> <20031111064241.GD22520@suse.de> <20031111064713.GM930@krispykreme> <20031111065927.GA6904@suse.de> Message-ID: <20031111114816.GC14718@suse.de> On Tue, Nov 11, Olaf Hering wrote: > This helps. This one helps as well, with recent binutils: --- arch/ppc64/Makefile~ 2003-10-30 20:19:37.000000000 +0100 +++ arch/ppc64/Makefile 2003-11-11 08:48:58.000000000 +0100 @@ -33,7 +33,7 @@ CFLAGS += -msoft-float -pipe -Wno-unini ifeq ($(CONFIG_POWER4_ONLY),y) CFLAGS += -mcpu=power4 else -CFLAGS += -mtune=power4 -Wa,-mpower4 +CFLAGS += -mtune=power4 -Wa,-many endif have_zero_bss := $(shell if $(CC) -fno-zero-initialized-in-bss -S -o /dev/null -xc /dev/null > /dev/null 2>&1; then echo y; else echo n; fi) -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Wed Nov 12 01:21:17 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Tue, 11 Nov 2003 08:21:17 -0600 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <20031111065927.GA6904@suse.de> Message-ID: <50879E80-1452-11D8-84DC-000A95A0560C@us.ibm.com> On Tuesday, Nov 11, 2003, at 00:59 US/Central, Olaf Hering wrote: > > I disabled this line in arch/ppc64/Makefile and replaced all tlbiel > calls with kdb(). > > #CFLAGS += -mtune=power4 -Wa,-mpower4 > > This helps. Yes, Alan already explained that -Wa,-mpower4 is a bad idea because it frees the assembler to use power4-specific instructions. The latest idea was Anton's patch, which also replaces tlbiel with a .long directive (in addition to playing with the compiler flags). -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 12 04:12:13 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 12 Nov 2003 04:12:13 +1100 Subject: important ppc64 bug fixes In-Reply-To: <20031111062416.GH930@krispykreme> References: <20031111062416.GH930@krispykreme> Message-ID: <20031111171213.GO930@krispykreme> > - fix signal wakeup race due to unordered access of SIGPENDING and > TASK_INTERRUPTIBLE (Anton) The ppc64 specific part. diff --exclude=SCCS -ur linux-2.5/arch/ppc64/kernel/signal.c for-linus-ppc64/arch/ppc64/kernel/signal.c --- linux-2.5/arch/ppc64/kernel/signal.c 2003-07-17 06:09:04.000000000 +1000 +++ for-linus-ppc64/arch/ppc64/kernel/signal.c 2003-11-12 04:07:21.310243191 +1100 @@ -95,7 +95,7 @@ regs->gpr[3] = EINTR; regs->ccr |= 0x10000000; while (1) { - current->state = TASK_INTERRUPTIBLE; + set_current_state(TASK_INTERRUPTIBLE); schedule(); if (do_signal(&saveset, regs)) return regs->gpr[3]; diff --exclude=SCCS -ur linux-2.5/arch/ppc64/kernel/signal32.c for-linus-ppc64/arch/ppc64/kernel/signal32.c --- linux-2.5/arch/ppc64/kernel/signal32.c 2003-07-17 06:09:04.000000000 +1000 +++ for-linus-ppc64/arch/ppc64/kernel/signal32.c 2003-11-12 04:07:22.381217152 +1100 @@ -133,7 +133,7 @@ regs->gpr[3] = EINTR; regs->ccr |= 0x10000000; while (1) { - current->state = TASK_INTERRUPTIBLE; + set_current_state(TASK_INTERRUPTIBLE); schedule(); if (do_signal32(&saveset, regs)) /* @@ -806,7 +806,7 @@ regs->gpr[3] = EINTR; regs->ccr |= 0x10000000; while (1) { - current->state = TASK_INTERRUPTIBLE; + set_current_state(TASK_INTERRUPTIBLE); schedule(); if (do_signal32(&saveset, regs)) /* ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 12 04:17:31 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 12 Nov 2003 04:17:31 +1100 Subject: important ppc64 bug fixes In-Reply-To: <20031111062416.GH930@krispykreme> References: <20031111062416.GH930@krispykreme> Message-ID: <20031111171731.GP930@krispykreme> > - fix signal wakeup race due to unordered access of SIGPENDING and > TASK_INTERRUPTIBLE (Anton) The arch independent bit (havent compile tested this yet). I originally added smp_mb()'s around set/clear_thread_flag but ended up putting it in the functions themselves. Thoughts? ===== include/linux/thread_info.h 1.5 vs edited ===== --- 1.5/include/linux/thread_info.h Tue Mar 18 16:32:12 2003 +++ edited/include/linux/thread_info.h Mon Nov 10 04:55:14 2003 @@ -25,16 +25,18 @@ /* * flag set/clear/test wrappers * - pass TIF_xxxx constants to these functions + * - we use the test_and_* bitop versions because they guarantee memory + * ordering */ static inline void set_thread_flag(int flag) { - set_bit(flag,¤t_thread_info()->flags); + (void)test_and_set_bit(flag,¤t_thread_info()->flags); } static inline void clear_thread_flag(int flag) { - clear_bit(flag,¤t_thread_info()->flags); + (void)test_and_clear_bit(flag,¤t_thread_info()->flags); } static inline int test_and_set_thread_flag(int flag) @@ -54,12 +56,12 @@ static inline void set_ti_thread_flag(struct thread_info *ti, int flag) { - set_bit(flag,&ti->flags); + (void)test_and_set_bit(flag,&ti->flags); } static inline void clear_ti_thread_flag(struct thread_info *ti, int flag) { - clear_bit(flag,&ti->flags); + (void)test_and_clear_bit(flag,&ti->flags); } static inline int test_and_set_ti_thread_flag(struct thread_info *ti, int flag) ===== kernel/signal.c 1.98 vs edited ===== --- 1.98/kernel/signal.c Fri Oct 10 08:13:54 2003 +++ edited/kernel/signal.c Wed Oct 22 16:37:37 2003 @@ -2112,7 +2114,7 @@ recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); - current->state = TASK_INTERRUPTIBLE; + set_current_state(TASK_INTERRUPTIBLE); timeout = schedule_timeout(timeout); spin_lock_irq(¤t->sighand->siglock); @@ -2534,7 +2536,7 @@ asmlinkage long sys_pause(void) { - current->state = TASK_INTERRUPTIBLE; + set_current_state(TASK_INTERRUPTIBLE); schedule(); return -ERESTARTNOHAND; } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Wed Nov 12 08:56:41 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 11 Nov 2003 15:56:41 -0600 Subject: [PATCH] clean up compiler warnings in 2.5 bk Message-ID: <3FB15B19.3080607@austin.ibm.com> Hi- Attached patch cleans up some warnings I've seen when building ameslab 2.5 bk: arch/ppc64/kernel/prom.c:194: warning: missing braces around initializer arch/ppc64/kernel/prom.c:194: warning: (near initialization for `hmt_thread_data[0]') arch/ppc64/kernel/prom.c: In function `prom_hold_cpus': arch/ppc64/kernel/prom.c:1076: warning: implicit declaration of function `_get_PIR' arch/ppc64/kernel/prom.c: In function `inspect_node': arch/ppc64/kernel/prom.c:1578: warning: comparison between pointer and integer Also, the patch fixes an issue with our version of include/linux/proc_fs.h. It doesn't cause problems for ppc64, but when built for something like i386, we see lots of: include/linux/proc_fs.h:135: warning: `struct device_node' declared inside parameter list include/linux/proc_fs.h:135: warning: its scope is only this definition or declaration, which is probably not what you want There exist some other warnings from arch/ppc64/boot/prom.c that this patch does not address (I don't really grok that code yet). Nathan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: warning_cleanup.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031111/d0edc24a/attachment.txt From amodra at bigpond.net.au Wed Nov 12 09:15:34 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 12 Nov 2003 08:45:34 +1030 Subject: gcc33 creates a zImage which doesnt boot In-Reply-To: <50879E80-1452-11D8-84DC-000A95A0560C@us.ibm.com> References: <20031111065927.GA6904@suse.de> <50879E80-1452-11D8-84DC-000A95A0560C@us.ibm.com> Message-ID: <20031111221534.GA2542@bubble.sa.bigpond.net.au> On Tue, Nov 11, 2003 at 08:21:17AM -0600, Hollis Blanchard wrote: > > On Tuesday, Nov 11, 2003, at 00:59 US/Central, Olaf Hering wrote: > > > >I disabled this line in arch/ppc64/Makefile and replaced all tlbiel > >calls with kdb(). > > > >#CFLAGS += -mtune=power4 -Wa,-mpower4 > > > >This helps. > > Yes, Alan already explained that -Wa,-mpower4 is a bad idea because it > frees the assembler to use power4-specific instructions. The latest > idea was Anton's patch, which also replaces tlbiel with a .long > directive (in addition to playing with the compiler flags). Yeah, but Anton probably wouldn't have done that if he had remembered that I'd improved ppc gas -many to accept power4 instructions. I'd forgotten too. :) -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Wed Nov 12 11:10:35 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 11 Nov 2003 18:10:35 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> References: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> Message-ID: <1068595835.2239.58.camel@tin.ibm.com > Here is the patch again w/ a symlink for /proc/rtas to /proc/ppc64/rtas. Per Olof's suggestion I also changed the names of all the symbols to have the "nvram" portion of the name in the front. Thanks, Jake > > I personally would rather stay compatible between releases then > > architectures. > > Let's do both with a symlink. Architectures because it's "right", and > releases because we need to cover not-quite-right decisions in the > past. ;) -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-nvram-errlog-2.patch.bz2 Type: application/x-bzip Size: 11898 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031111/0113dc03/attachment.bin From nathanl at austin.ibm.com Thu Nov 13 03:20:57 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 12 Nov 2003 10:20:57 -0600 Subject: [PATCH] RTAS syscall - review request In-Reply-To: <1068486681.6301.15.camel@verve> References: <1068486681.6301.15.camel@verve> Message-ID: <3FB25DE9.5090809@austin.ibm.com> Hi John- > +/* RTAS Userspace access */ > +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char *buf, > + size_t count, loff_t *ppos) > +{ > + int n; > + > + n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); > if (*ppos >= strlen(buf)) > return 0; > if (n > strlen(buf) - *ppos) I don't think you can directly write to buf here; you should be using copy_to_user(). I believe it should look something like this (I've omitted error and bounds checking): static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { int n; char *kbuf = kmalloc(count, GFP_KERNEL); n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); ... copy_to_user(buf, kbuf, count); kfree(kbuf); } Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Thu Nov 13 03:31:10 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 12 Nov 2003 10:31:10 -0600 Subject: [PATCH] RTAS syscall - review request In-Reply-To: <3FB25DE9.5090809@austin.ibm.com> References: <1068486681.6301.15.camel@verve> <3FB25DE9.5090809@austin.ibm.com> Message-ID: <3FB2604E.3090009@austin.ibm.com> Nathan Lynch wrote: > static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char __user > *buf, size_t count, loff_t *ppos) > { > int n; > char *kbuf = kmalloc(count, GFP_KERNEL); > > n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); ^^^ kbuf, right? > ... > copy_to_user(buf, kbuf, count); > kfree(kbuf); > } -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Thu Nov 13 03:37:25 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Wed, 12 Nov 2003 10:37:25 -0600 Subject: user accesses In-Reply-To: <3FB25DE9.5090809@austin.ibm.com> Message-ID: <7EF3D8FC-152E-11D8-A26B-000A95A0560C@us.ibm.com> On Wednesday, Nov 12, 2003, at 10:20 US/Central, Nathan Lynch wrote: > > I don't think you can directly write to buf here; you should be using > copy_to_user(). I believe it should look something like this (I've > omitted error and bounds checking): > > static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char __user > *buf, size_t count, loff_t *ppos) Notice the "__user" that Nathan slipped in here. :) It enables automated checking for such problems. See it barely documented at http://lkml.org/lkml/2003/6/6/146 , and a URL to Linus' "sparse" checking tool... -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Thu Nov 13 03:50:38 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 12 Nov 2003 10:50:38 -0600 Subject: [PATCH] RTAS syscall - review request In-Reply-To: <3FB2604E.3090009@austin.ibm.com> References: <1068486681.6301.15.camel@verve> <3FB25DE9.5090809@austin.ibm.com> <3FB2604E.3090009@austin.ibm.com> Message-ID: <3FB264DE.1040709@austin.ibm.com> Olof Johansson wrote: > > Nathan Lynch wrote: > >> static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char __user >> *buf, size_t count, loff_t *ppos) >> { >> int n; >> char *kbuf = kmalloc(count, GFP_KERNEL); >> >> n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); > > > ^^^ kbuf, right? Yes, that's what I meant. Thanks. Also, John, you probably want to use snprintf or some other way of bounds-checking to avoid inadvertently overflowing the buffer. Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jschopp at austin.ibm.com Thu Nov 13 10:38:06 2003 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 12 Nov 2003 17:38:06 -0600 (CST) Subject: [PATCH] hang on p630 Message-ID: This short patch fixes a hang when physical cpu numbering is not the same as logical cpu numbering. Those running pSeries paritions may experience this hang at boot. The symptoms is the message: Kernel panic: bad return code qirr - rc = fffffffffffffffc diff -Nru a/arch/ppc64/kernel/xics.c b/arch/ppc64/kernel/xics.c --- a/arch/ppc64/kernel/xics.c Wed Nov 12 11:10:36 2003 +++ b/arch/ppc64/kernel/xics.c Wed Nov 12 11:10:36 2003 @@ -202,7 +202,7 @@ { unsigned long lpar_rc; - lpar_rc = plpar_ipi(n_cpu, value); + lpar_rc = plpar_ipi(get_hard_smp_processor_id(n_cpu), value); if (lpar_rc != H_Success) panic("bad return code qirr - rc = %lx\n", lpar_rc); } @@ -448,7 +448,7 @@ np; np = of_find_node_by_type(np, "cpu")) { ireg = (uint *)get_property(np, "reg", &ilen); - if (ireg && ireg[0] == smp_processor_id()) { + if (ireg && ireg[0] == hard_smp_processor_id()) { ireg = (uint *)get_property(np, "ibm,ppc-interrupt-gserver#s", &ilen); i = ilen / sizeof(int); if (ireg && i > 0) { @@ -485,8 +485,8 @@ for (i = 0; i < NR_CPUS; ++i) { if (!cpu_possible(i)) continue; - xics_per_cpu[i] = __ioremap((ulong)inodes[i].addr, - (ulong)inodes[i].size, + xics_per_cpu[i] = __ioremap((ulong)inodes[get_hard_smp_processor_id(i)].addr, + (ulong)inodes[get_hard_smp_processor_id(i)].size, _PAGE_NO_CACHE); } #else @@ -569,7 +569,7 @@ cpus_and(tmp, cpu_online_map, cpumask); if (cpus_empty(tmp)) goto out; - newmask = first_cpu(cpumask); + newmask = get_hard_smp_processor_id(first_cpu(cpumask)); } status = rtas_call(ibm_set_xive, 3, 1, NULL, ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Nov 13 22:10:12 2003 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Nov 2003 22:10:12 +1100 Subject: [PATCH] hang on p630 In-Reply-To: References: Message-ID: <16307.26260.591303.930283@cargo.ozlabs.ibm.com> Hi Anton, Do you think plpar_ipi etc. should translate the cpu number from soft to hard, or should it take the hard cpu number as Joel's patch does? Paul. Joel Schopp writes: > > This short patch fixes a hang when physical cpu numbering is > not the same as logical cpu numbering. Those running pSeries > paritions may experience this hang at boot. The symptoms is the > message: > > Kernel panic: bad return code qirr - rc = fffffffffffffffc > > diff -Nru a/arch/ppc64/kernel/xics.c b/arch/ppc64/kernel/xics.c > --- a/arch/ppc64/kernel/xics.c Wed Nov 12 11:10:36 2003 > +++ b/arch/ppc64/kernel/xics.c Wed Nov 12 11:10:36 2003 > @@ -202,7 +202,7 @@ > { > unsigned long lpar_rc; > > - lpar_rc = plpar_ipi(n_cpu, value); > + lpar_rc = plpar_ipi(get_hard_smp_processor_id(n_cpu), value); > if (lpar_rc != H_Success) > panic("bad return code qirr - rc = %lx\n", lpar_rc); > } > @@ -448,7 +448,7 @@ > np; > np = of_find_node_by_type(np, "cpu")) { > ireg = (uint *)get_property(np, "reg", &ilen); > - if (ireg && ireg[0] == smp_processor_id()) { > + if (ireg && ireg[0] == hard_smp_processor_id()) { > ireg = (uint *)get_property(np, "ibm,ppc-interrupt-gserver#s", &ilen); > i = ilen / sizeof(int); > if (ireg && i > 0) { > @@ -485,8 +485,8 @@ > for (i = 0; i < NR_CPUS; ++i) { > if (!cpu_possible(i)) > continue; > - xics_per_cpu[i] = __ioremap((ulong)inodes[i].addr, > - (ulong)inodes[i].size, > + xics_per_cpu[i] = __ioremap((ulong)inodes[get_hard_smp_processor_id(i)].addr, > + (ulong)inodes[get_hard_smp_processor_id(i)].size, > _PAGE_NO_CACHE); > } > #else > @@ -569,7 +569,7 @@ > cpus_and(tmp, cpu_online_map, cpumask); > if (cpus_empty(tmp)) > goto out; > - newmask = first_cpu(cpumask); > + newmask = get_hard_smp_processor_id(first_cpu(cpumask)); > } > > status = rtas_call(ibm_set_xive, 3, 1, NULL, > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Nov 13 22:39:39 2003 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Nov 2003 22:39:39 +1100 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <1068595835.2239.58.camel@tin.ibm.com > References: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> <1068595835.2239.58.camel@tin.ibm.com > Message-ID: <16307.28027.222060.344303@cargo.ozlabs.ibm.com> Jake, > Here is the patch again w/ a symlink for /proc/rtas to > /proc/ppc64/rtas. > > Per Olof's suggestion I also changed the names of all the symbols to > have the "nvram" portion of the name in the front. Mostly looks fine. A few comments: * Please add a comment somewhere explaining why it is necessary to have the kernel write the error log to nvram, i.e. what the circumstances are under which Bad Things would happen if we relied on userspace to do it (and what those Bad Things are). * We are going to have to separate the general /dev/nvram support from the methods for reading/writing nvram via RTAS, because the G5 powermacs have nvram but don't use RTAS. This is not directly a criticism of the patch, but if you felt like reworking it to make this separation (and maybe put function pointers for nvram_read / nvram_write into ppc_md) that would be great. * With the kernel parsing the partitions in the nvram, I wonder if we should be exporting that information to userspace somehow? * In future, as a way of reducing the bulk of the code, could we consider having userspace do the repartitioning to create the ppc64,linux partition instead of the kernel? That is, could we have rtasd (or something similar) create the partition ahead of time so that the kernel would only have to read the partitioning? Or is there some scenario where this would be impossible? * The nvram_read and nvram_write routines could be rewritten to be a bit shorter and neater. In particular we should be able to do it with just one loop containing the rtas_call call, rather than an if and a loop each with a call to rtas_call. And we shouldn't need to do a mod operation. Like this: ssize_t nvram_read(char *buf, size_t count, loff_t *index) { unsigned int i; unsigned long len, done; unsigned long flags; char *p = buf; if (*index >= rtas_nvram_size) return 0; i = *index; if (i + count > rtas_nvram_size) count = rtas_nvram_size - i; spin_lock_irqsave(&nvram_lock, flags); for (; count != 0; count -= len) { len = count; if (len > NVRW_CNT) len = NVRW_CNT; if ((rtas_call(nvram_fetch, 3, 2, &done, i, __pa(nvram_buf), len) != 0) || done != len) { spin_unlock_irqrestore(&nvram_lock, flags); return -EIO; } memcpy(p, nvram_buf, len); p += len; i += len; } spin_unlock_irqrestore(&nvram_lock, flags); *index = i; return p - buf; } Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Nov 13 22:43:15 2003 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Nov 2003 22:43:15 +1100 Subject: [PATCH] clean up compiler warnings in 2.5 bk In-Reply-To: <3FB15B19.3080607@austin.ibm.com> References: <3FB15B19.3080607@austin.ibm.com> Message-ID: <16307.28243.151844.849084@cargo.ozlabs.ibm.com> Nathan Lynch writes: > Attached patch cleans up some warnings I've seen when building ameslab > 2.5 bk: > -} hmt_thread_data[NR_CPUS] = {0}; > +} hmt_thread_data[NR_CPUS] = {{0}}; I guess this is because we don't initialize the bss early enough. Apart from that I'd rather see the initializer disappear. > - if (np->node != NULL) { > + if (np->node) { Why do you consider the + version to be better? I prefer the != NULL form myself. If the compiler is warning about that then the compiler is broken and we should get it fixed. > +extern unsigned long _get_PIR(void); > +#ifdef CONFIG_PROC_DEVICETREE > +struct device_node; > extern void proc_device_tree_init(void); > extern void proc_device_tree_add_node(struct device_node *, struct proc_dir_entry *); > +#endif /* CONFIG_PROC_DEVICETREE */ These changes look fine. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Thu Nov 13 22:51:48 2003 From: paulus at samba.org (Paul Mackerras) Date: Thu, 13 Nov 2003 22:51:48 +1100 Subject: [PATCH] RTAS syscall - review request In-Reply-To: <1068486681.6301.15.camel@verve> References: <1068486681.6301.15.camel@verve> Message-ID: <16307.28756.583878.998250@cargo.ozlabs.ibm.com> John Rose writes: > This patch implements a generic RTAS interface to userspace through a > system call. It was originally written by Rusty Russel and modified by > myself. There are two main parts: Oops, he's "Russell", you mustn't get him confused with the one-l Russel. :) Looks good. Just one very minor nit: since (I presume) lmb_alloc_base returns a real address, I would make rtas_rmo_buf an unsigned long rather than a void *, which would normally indicate a virtual address (effective address in IBM-speak). I assume you have the userspace part working too? Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Fri Nov 14 06:37:24 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Thu, 13 Nov 2003 13:37:24 -0600 Subject: [PATCH] clean up compiler warnings in 2.5 bk In-Reply-To: <16307.28243.151844.849084@cargo.ozlabs.ibm.com> References: <3FB15B19.3080607@austin.ibm.com> <16307.28243.151844.849084@cargo.ozlabs.ibm.com> Message-ID: <3FB3DD74.2050705@austin.ibm.com> Paul Mackerras wrote: >>- if (np->node != NULL) { >>+ if (np->node) { > > > Why do you consider the + version to be better? I prefer the != NULL > form myself. If the compiler is warning about that then the compiler > is broken and we should get it fixed. I prefer it because use of NULL implies that np->node is a pointer, which it is not. The compiler's warning seems valid to me. Anyway, both gcc 3.2.1 and 3.3.2 produce the warning. If you prefer explicit comparison, here is a patch that replaces NULL with 0. Nathan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: warning_cleanup.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031113/6e0941f0/attachment.txt From linas at austin.ibm.com Fri Nov 14 07:38:22 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 13 Nov 2003 14:38:22 -0600 Subject: marcello kernel &ppc64 In-Reply-To: <16304.2053.123830.139919@cargo.ozlabs.ibm.com>; from paulus@samba.org on Tue, Nov 11, 2003 at 08:49:57AM +1100 References: <20031107103613.A28940@forte.austin.ibm.com> <16303.27947.273573.622813@cargo.ozlabs.ibm.com> <20031110115418.A22020@forte.austin.ibm.com> <16304.2053.123830.139919@cargo.ozlabs.ibm.com> Message-ID: <20031113143822.A25254@forte.austin.ibm.com> On Tue, Nov 11, 2003 at 08:49:57AM +1100, Paul Mackerras wrote: > > linas at austin.ibm.com writes: > > > p.s. is there an ETA for when the marcello kernel will sync up with > > the current ppc64 code? > > If there is anything critical I'll send it off immediately. > Non-critical things can wait until Marcelo releases the final 2.4.23. Nothing critical per se, its just that diff -r marcello-2.4.23pre9/arch/ppc64/kernel sles8/arch/ppc64/kernel shows a vast number of differences. I would be much happier if these two dirs were nearly identical. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Fri Nov 14 08:00:42 2003 From: paulus at samba.org (Paul Mackerras) Date: Fri, 14 Nov 2003 08:00:42 +1100 Subject: [PATCH] clean up compiler warnings in 2.5 bk In-Reply-To: <3FB3DD74.2050705@austin.ibm.com> References: <3FB15B19.3080607@austin.ibm.com> <16307.28243.151844.849084@cargo.ozlabs.ibm.com> <3FB3DD74.2050705@austin.ibm.com> Message-ID: <16307.61690.798916.622956@cargo.ozlabs.ibm.com> Nathan Lynch writes: > I prefer it because use of NULL implies that np->node is a pointer, > which it is not. The compiler's warning seems valid to me. Ah, yes, my mistake. OK, then your change is fine. I was thinking of ppc32, where a phandle is a void *. Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Fri Nov 14 08:10:02 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Thu, 13 Nov 2003 15:10:02 -0600 Subject: marcello kernel &ppc64 In-Reply-To: <20031113143822.A25254@forte.austin.ibm.com> Message-ID: On Thursday, Nov 13, 2003, at 14:38 US/Central, linas at austin.ibm.com wrote: > > Nothing critical per se, its just that > diff -r marcello-2.4.23pre9/arch/ppc64/kernel sles8/arch/ppc64/kernel > shows a vast number of differences. I would be much happier if these > two dirs were nearly identical. Forget SLES8; they're always going to apply hundreds of patches and that's their business (take it up with them). What's more interesting is the diff between Marcello and Ameslab 2.4. But that's something that tree maintainers are always working on, so ... they already think it would be nice if there were no differences. :) -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Fri Nov 14 08:18:17 2003 From: meissner at suse.de (Marcus Meissner) Date: Thu, 13 Nov 2003 22:18:17 +0100 Subject: marcello kernel &ppc64 In-Reply-To: References: <20031113143822.A25254@forte.austin.ibm.com> Message-ID: <20031113211817.GA13526@suse.de> On Thu, Nov 13, 2003 at 03:10:02PM -0600, Hollis Blanchard wrote: > > On Thursday, Nov 13, 2003, at 14:38 US/Central, linas at austin.ibm.com wrote: > > >Nothing critical per se, its just that diff -r > >marcello-2.4.23pre9/arch/ppc64/kernel sles8/arch/ppc64/kernel shows a > >vast number of differences. I would be much happier if these two dirs > >were nearly identical. > > Forget SLES8; they're always going to apply hundreds of patches and > that's their business (take it up with them). > > What's more interesting is the diff between Marcello and Ameslab 2.4. > But that's something that tree maintainers are always working on, so > ... they already think it would be nice if there were no differences. :) sles8/arch/ppc64/* is taken mostly verbatim from IBM trees, mostly Ameslab. The number of actual SuSE patches applied there is pretty low. But push Ameslab first please. Ciao, Marcus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Nov 14 09:53:05 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 13 Nov 2003 16:53:05 -0600 Subject: marcello kernel &ppc64 In-Reply-To: ; from hollisb@us.ibm.com on Thu, Nov 13, 2003 at 03:10:02PM -0600 References: <20031113143822.A25254@forte.austin.ibm.com> Message-ID: <20031113165305.A30094@forte.austin.ibm.com> On Thu, Nov 13, 2003 at 03:10:02PM -0600, Hollis Blanchard wrote: > On Thursday, Nov 13, 2003, at 14:38 US/Central, linas at austin.ibm.com > wrote: > > > > Nothing critical per se, its just that > > diff -r marcello-2.4.23pre9/arch/ppc64/kernel sles8/arch/ppc64/kernel > > shows a vast number of differences. I would be much happier if these > > two dirs were nearly identical. > > Forget SLES8; they're always going to apply hundreds of patches and > that's their business (take it up with them). No, I was talking about the ppc64 bits specifically, not about the general kernel. I really would think that the ppc64 bits would be nearly identical across all vendors, etc. ... since I can assure you that SuSE is not developing new ppc64 code. I'm not sure where SuSE pulls thier ppc64 code from, I'm guessing it comes from ameslab. I'm guessing those two are real close, but I don't have an ameslab tree anywhere to look at. ================= Which takes me to another, ahem, politically touchy point, I suspect. There's a test lab here that pounds the crap out of SuSE kernels. Speaking from personal experience, 9 out of 10 or 19 out of 20 of the kernel crashes & hangs that they find are *not* in the ppc64 code, but are in the generic linux kernel code. These are races of various sorts, missing locks, data corruptions, you name it, I've seen it. These get fixed in the SuSE kernel. And that's what bugs me... I'm perfectly aware that many/most of these bugs are also in the marcello kernel. (For example, the latest: a race condition on setting/resetting of current->need_resched, which will make a heavily loaded machine go idle. The setting of need_resched uses no locks or semaphores of any kind, but is used as if it were always correct.) But the patches don't go out to the LKML because ... well, it can be hard to defend a patch when the bug was never seen on a marcello kernel. So I'm a tad concerned that there's a bit of a disconnect not only at the source code level, but also at the test level. (OK, your right, I need to post the need_resched thing to LKML. This one, at least, should be easy to explain). --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Sat Nov 15 06:26:07 2003 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 14 Nov 2003 13:26:07 -0600 Subject: [PATCH] RTAS syscall - updated In-Reply-To: <16307.28756.583878.998250@cargo.ozlabs.ibm.com> References: <1068486681.6301.15.camel@verve> <16307.28756.583878.998250@cargo.ozlabs.ibm.com> Message-ID: <1068837967.12651.11.camel@verve> Thanks to everyone for the comments, and apologies to Rusty for the missing "l" :). Below is an updated version of the patch. If there are no further comments, I'm going to push this to 2.6 on Monday. Thanks- John diff -Nru a/arch/ppc/kernel/misc.S b/arch/ppc/kernel/misc.S --- a/arch/ppc/kernel/misc.S Fri Nov 14 13:20:27 2003 +++ b/arch/ppc/kernel/misc.S Fri Nov 14 13:20:27 2003 @@ -1385,3 +1385,4 @@ .long sys_statfs64 .long sys_fstatfs64 .long ppc_fadvise64_64 + .long sys_ni_syscall /* 255 - rtas (used on ppc64) */ diff -Nru a/arch/ppc64/kernel/misc.S b/arch/ppc64/kernel/misc.S --- a/arch/ppc64/kernel/misc.S Fri Nov 14 13:20:27 2003 +++ b/arch/ppc64/kernel/misc.S Fri Nov 14 13:20:27 2003 @@ -852,6 +852,8 @@ .llong .sys32_utimes .llong .sys_statfs64 .llong .sys_fstatfs64 + .llong .sys_ni_syscall /* 32bit only fadvise64 */ + .llong .ppc_rtas /* 255 */ .balign 8 _GLOBAL(sys_call_table) @@ -1109,3 +1111,5 @@ .llong .sys_utimes .llong .sys_statfs64 .llong .sys_fstatfs64 + .llong .sys_ni_syscall /* 32bit only fadvise64 */ + .llong .ppc_rtas /* 255 */ diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Fri Nov 14 13:20:27 2003 +++ b/arch/ppc64/kernel/prom.c Fri Nov 14 13:20:27 2003 @@ -611,6 +611,9 @@ _rtas->base) >= 0) { _rtas->entry = (long)_prom->args.rets[1]; } + RELOC(rtas_rmo_buf) + = lmb_alloc_base(RTAS_RMOBUF_MAX, PAGE_SIZE, + rtas_region); } if (_rtas->entry <= 0) { diff -Nru a/arch/ppc64/kernel/rtas-proc.c b/arch/ppc64/kernel/rtas-proc.c --- a/arch/ppc64/kernel/rtas-proc.c Fri Nov 14 13:20:27 2003 +++ b/arch/ppc64/kernel/rtas-proc.c Fri Nov 14 13:20:27 2003 @@ -161,6 +161,8 @@ size_t count, loff_t *ppos); static ssize_t ppc_rtas_tone_volume_read(struct file * file, char * buf, size_t count, loff_t *ppos); +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char *buf, + size_t count, loff_t *ppos); struct file_operations ppc_rtas_poweron_operations = { .read = ppc_rtas_poweron_read, @@ -185,6 +187,10 @@ .write = ppc_rtas_tone_volume_write }; +static struct file_operations ppc_rtas_rmo_buf_ops = { + .read = ppc_rtas_rmo_buf_read, +}; + int ppc_rtas_find_all_sensors (void); int ppc_rtas_process_sensor(struct individual_sensor s, int state, int error, char * buf); @@ -233,6 +239,9 @@ entry = create_proc_entry("volume", S_IWUSR|S_IRUGO, proc_rtas); if (entry) entry->proc_fops = &ppc_rtas_tone_volume_operations; + + entry = create_proc_entry("rmo_buffer", S_IRUSR, proc_rtas); + if (entry) entry->proc_fops = &ppc_rtas_rmo_buf_ops; } /* ****************************************************************** */ @@ -849,5 +858,30 @@ if (n > count) n = count; *ppos += n; + return n; +} + +#define RMO_READ_BUF_MAX 30 + +/* RTAS Userspace access */ +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char __user *buf, + size_t count, loff_t *ppos) +{ + char kbuf[RMO_READ_BUF_MAX]; + int n; + + n = sprintf(kbuf, "%016lx %x\n", rtas_rmo_buf, RTAS_RMOBUF_MAX); + if (n > count) + n = count; + + if (ppos && *ppos != 0) + return 0; + + if (copy_to_user(buf, kbuf, n)) + return -EFAULT; + + if (ppos) + *ppos = n; + return n; } diff -Nru a/arch/ppc64/kernel/rtas.c b/arch/ppc64/kernel/rtas.c --- a/arch/ppc64/kernel/rtas.c Fri Nov 14 13:20:27 2003 +++ b/arch/ppc64/kernel/rtas.c Fri Nov 14 13:20:27 2003 @@ -29,6 +29,7 @@ #include #include #include +#include struct flash_block_list_header rtas_firmware_flash_list = {0, 0}; @@ -381,6 +382,44 @@ if (rtas_firmware_flash_list.next) rtas_flash_bypass_warning(); rtas_power_off(); +} + +unsigned long rtas_rmo_buf = NULL; + +asmlinkage int ppc_rtas(struct rtas_args __user *uargs) +{ + struct rtas_args args; + unsigned long flags; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0) + return -EFAULT; + + if (args.nargs > ARRAY_SIZE(args.args) + || args.nret > ARRAY_SIZE(args.args) + || args.nargs + args.nret > ARRAY_SIZE(args.args)) + return -EINVAL; + + /* Copy in args. */ + if (copy_from_user(args.args, uargs->args, + args.nargs * sizeof(rtas_arg_t)) != 0) + return -EFAULT; + + spin_lock_irqsave(&rtas.lock, flags); + get_paca()->xRtas = args; + enter_rtas((void *)__pa((unsigned long)&get_paca()->xRtas)); + args = get_paca()->xRtas; + spin_unlock_irqrestore(&rtas.lock, flags); + + /* Copy out args. */ + if (copy_to_user(uargs->args + args.nargs, + args.args + args.nargs, + args.nret * sizeof(rtas_arg_t)) != 0) + return -EFAULT; + + return 0; } EXPORT_SYMBOL(proc_ppc64); diff -Nru a/arch/ppc64/kernel/syscalls.c b/arch/ppc64/kernel/syscalls.c --- a/arch/ppc64/kernel/syscalls.c Fri Nov 14 13:20:27 2003 +++ b/arch/ppc64/kernel/syscalls.c Fri Nov 14 13:20:27 2003 @@ -41,6 +41,7 @@ #include #include #include +#include extern unsigned long wall_jiffies; @@ -234,3 +235,6 @@ return secs; } + +/* Only exists on P-series. */ +cond_syscall(ppc_rtas); diff -Nru a/include/asm-ppc/unistd.h b/include/asm-ppc/unistd.h --- a/include/asm-ppc/unistd.h Fri Nov 14 13:20:27 2003 +++ b/include/asm-ppc/unistd.h Fri Nov 14 13:20:27 2003 @@ -259,8 +259,9 @@ #define __NR_statfs64 252 #define __NR_fstatfs64 253 #define __NR_fadvise64_64 254 +#define __NR_rtas 255 -#define __NR_syscalls 255 +#define __NR_syscalls 256 #define __NR(n) #n diff -Nru a/include/asm-ppc64/rtas.h b/include/asm-ppc64/rtas.h --- a/include/asm-ppc64/rtas.h Fri Nov 14 13:20:27 2003 +++ b/include/asm-ppc64/rtas.h Fri Nov 14 13:20:27 2003 @@ -19,6 +19,9 @@ #define RTAS_UNKNOWN_SERVICE (-1) #define RTAS_INSTANTIATE_MAX (1UL<<30) /* Don't instantiate rtas at/above this value */ +/* Buffer size for ppc_rtas system call. */ +#define RTAS_RMOBUF_MAX (64 * 1024) + /* * In general to call RTAS use rtas_token("string") to lookup * an RTAS token for the given string (e.g. "event-scan"). @@ -188,5 +191,8 @@ extern spinlock_t rtas_data_buf_lock; extern char rtas_data_buf[RTAS_DATA_BUF_SIZE]; + +/* RMO buffer reserved for user-space RTAS use */ +extern unsigned long rtas_rmo_buf; #endif /* _PPC64_RTAS_H */ diff -Nru a/include/asm-ppc64/unistd.h b/include/asm-ppc64/unistd.h --- a/include/asm-ppc64/unistd.h Fri Nov 14 13:20:27 2003 +++ b/include/asm-ppc64/unistd.h Fri Nov 14 13:20:27 2003 @@ -264,8 +264,10 @@ #define __NR_utimes 251 #define __NR_statfs64 252 #define __NR_fstatfs64 253 +#define __NR_fadvise64_64 254 +#define __NR_rtas 255 -#define __NR_syscalls 254 +#define __NR_syscalls 256 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hozer at hozed.org Sat Nov 15 07:36:20 2003 From: hozer at hozed.org (Troy Benjegerdes) Date: Fri, 14 Nov 2003 14:36:20 -0600 Subject: marcello kernel &ppc64 In-Reply-To: <20031113165305.A30094@forte.austin.ibm.com> References: <20031113143822.A25254@forte.austin.ibm.com> <20031113165305.A30094@forte.austin.ibm.com> Message-ID: <20031114203620.GS3504@kalmia.hozed.org> > ================= > Which takes me to another, ahem, politically touchy point, I suspect. > There's a test lab here that pounds the crap out of SuSE kernels. > Speaking from personal experience, 9 out of 10 or 19 out of 20 of > the kernel crashes & hangs that they find are *not* in the ppc64 code, > but are in the generic linux kernel code. These are races of > various sorts, missing locks, data corruptions, you name it, I've > seen it. These get fixed in the SuSE kernel. What we need is this testing lab also pounding the crap out of marcelo kernels.. I was hoping the Linux Test project and the stuff OSDL is doing would wind up with a 'test suite' that mere mortals with real work to do could easily run. Personally, I'd love to be able to build every revision that gets pushed into the ameslab tree, and then run regression tests on it. And then do the same thing for the mainline kernels. It would be a lot easier to defend patches when you can show test logs of kernel.org kernels failing. OSDL and other internal testing labs are nice and all, but if we want the mainline kernel fixed, we have to either test mainline kernels, or make it easy for LKML people to get access to test results. > > And that's what bugs me... I'm perfectly aware that many/most of > these bugs are also in the marcello kernel. (For example, the latest: > a race condition on setting/resetting of current->need_resched, > which will make a heavily loaded machine go idle. The setting of > need_resched uses no locks or semaphores of any kind, but is > used as if it were always correct.) But the patches don't go out > to the LKML because ... well, it can be hard to defend a patch when > the bug was never seen on a marcello kernel. So I'm a tad concerned > that there's a bit of a disconnect not only at the source code level, > but also at the test level. > > (OK, your right, I need to post the need_resched thing to LKML. This > one, at least, should be easy to explain). > > --linas > > > -- -------------------------------------------------------------------------- Troy Benjegerdes 'da hozer' hozer at drgw.net ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sglass at us.ibm.com Sat Nov 15 08:07:27 2003 From: sglass at us.ibm.com (Stephanie Glass) Date: Fri, 14 Nov 2003 15:07:27 -0600 Subject: marcello kernel &ppc64 Message-ID: Tony wrote: > >> Which takes me to another, ahem, politically touchy point, I suspect. >> There's a test lab here that pounds the crap out of SuSE kernels. >> Speaking from personal experience, 9 out of 10 or 19 out of 20 of the >> kernel crashes & hangs that they find are *not* in the ppc64 code, >> but are in the generic linux kernel code. These are races of various >> sorts, missing locks, data corruptions, you name it, I've seen it. >> These get fixed in the SuSE kernel. > >What we need is this testing lab also pounding the crap out of marcelo >kernels.. > >I was hoping the Linux Test project and the stuff OSDL is doing would >wind up with a 'test suite' that mere mortals with real work to do >could easily run. > >Personally, I'd love to be able to build every revision that gets >pushed into the ameslab tree, and then run regression tests on it. And >then do the same thing for the mainline kernels. > >It would be a lot easier to defend patches when you can show test logs >of kernel.org kernels failing. OSDL and other internal testing labs >are nice and all, but if we want the mainline kernel fixed, we have to >either test mainline kernels, or make it easy for LKML people to get >access to test results. I am not sure what you want Tony. Right now, the LTP is run on the x-86 mainline kernel every night. We think that the LTP is something that can be easily run by mere mortals. If you have any specific problem, please let us know and we will work to correct. We do not run nightly on ppc64 due to a lack of hardware not desire. The LTC Test group right now picks up the Rochester p kernel builds (2.6 based) as they come out and runs many different tests on them. Is that the tree you want runs done on or is there another place to pick up the kernels you are talking about. If you contact me directly and give me the details we will try and do everything we can to help. Stephanie Linux Technology Center IBM, 11400 Burnet Road, Austin, TX 78758 Phone: (512) 838-9284 T/L: 678-9284 Fax: (512) 838-3882 E-Mail: sglass at us.ibm.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Sat Nov 15 10:18:02 2003 From: paulus at samba.org (Paul Mackerras) Date: Sat, 15 Nov 2003 10:18:02 +1100 Subject: [PATCH] RTAS syscall - updated In-Reply-To: <1068837967.12651.11.camel@verve> References: <1068486681.6301.15.camel@verve> <16307.28756.583878.998250@cargo.ozlabs.ibm.com> <1068837967.12651.11.camel@verve> Message-ID: <16309.25258.441330.205683@cargo.ozlabs.ibm.com> John Rose writes: > Thanks to everyone for the comments, and apologies to Rusty for the > missing "l" :). Below is an updated version of the patch. If there > are no further comments, I'm going to push this to 2.6 on Monday. Looks good. You have reminded me that we need to add the fadvise64_64 system call for 32-bit processes, but that can go in after you have pushed the patch. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Nov 15 10:26:06 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 15 Nov 2003 10:26:06 +1100 Subject: [PATCH] RTAS syscall - updated In-Reply-To: <16309.25258.441330.205683@cargo.ozlabs.ibm.com> References: <1068486681.6301.15.camel@verve> <16307.28756.583878.998250@cargo.ozlabs.ibm.com> <1068837967.12651.11.camel@verve> <16309.25258.441330.205683@cargo.ozlabs.ibm.com> Message-ID: <20031114232606.GD11253@krispykreme> > Looks good. You have reminded me that we need to add the fadvise64_64 > system call for 32-bit processes, but that can go in after you have > pushed the patch. Speaking of which, recent glibc uses the new statfs64 calls, we need the following patch from davem. Anton --- fs/compat.c.~1~ Wed Nov 12 16:09:49 2003 +++ fs/compat.c Wed Nov 12 16:10:35 2003 @@ -169,7 +169,6 @@ static int put_compat_statfs64(struct compat_statfs64 *ubuf, struct kstatfs *kbuf) { - if (sizeof ubuf->f_blocks == 4) { if ((kbuf->f_blocks | kbuf->f_bfree | kbuf->f_bavail | kbuf->f_files | kbuf->f_ffree) & @@ -192,10 +191,13 @@ return 0; } -asmlinkage long compat_statfs64(const char *path, struct compat_statfs64 *buf) +asmlinkage long compat_statfs64(const char *path, compat_size_t sz, struct compat_statfs64 *buf) { struct nameidata nd; int error; + + if (sz != sizeof(*buf)) + return -EINVAL; error = user_path_walk(path, &nd); if (!error) { We also need to hook them in: --- /tmp/misc.S 2003-10-16 14:30:22.000000000 +0000 +++ for-linus-ppc64/arch/ppc64/kernel/misc.S 2003-11-14 09:24:24.000000000 +0000 @@ -850,8 +850,8 @@ .llong .sys_ni_syscall .llong .sys32_tgkill /* 250 */ .llong .sys32_utimes - .llong .sys_statfs64 - .llong .sys_fstatfs64 + .llong .compat_statfs64 + .llong .compat_fstatfs64 .balign 8 _GLOBAL(sys_call_table) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hozer at hozed.org Sat Nov 15 17:22:27 2003 From: hozer at hozed.org (Troy Benjegerdes) Date: Sat, 15 Nov 2003 00:22:27 -0600 Subject: marcello kernel &ppc64 In-Reply-To: References: Message-ID: <20031115062227.GT3504@kalmia.hozed.org> On Fri, Nov 14, 2003 at 03:07:27PM -0600, Stephanie Glass wrote: > > Tony wrote: > > > >> Which takes me to another, ahem, politically touchy point, I suspect. > >> There's a test lab here that pounds the crap out of SuSE kernels. > >> Speaking from personal experience, 9 out of 10 or 19 out of 20 of the > >> kernel crashes & hangs that they find are *not* in the ppc64 code, > >> but are in the generic linux kernel code. These are races of various > >> sorts, missing locks, data corruptions, you name it, I've seen it. > >> These get fixed in the SuSE kernel. > > > >What we need is this testing lab also pounding the crap out of marcelo > >kernels.. > > > >I was hoping the Linux Test project and the stuff OSDL is doing would > >wind up with a 'test suite' that mere mortals with real work to do > >could easily run. > > > >Personally, I'd love to be able to build every revision that gets > >pushed into the ameslab tree, and then run regression tests on it. And > >then do the same thing for the mainline kernels. > > > >It would be a lot easier to defend patches when you can show test logs > >of kernel.org kernels failing. OSDL and other internal testing labs > >are nice and all, but if we want the mainline kernel fixed, we have to > >either test mainline kernels, or make it easy for LKML people to get > >access to test results. > > I am not sure what you want Tony. Right now, the LTP is run on the x-86 > mainline kernel every night. We think that the LTP is something that can > be easily run by mere mortals. If you have any specific problem, please > let us know and we will work to correct. We do not run nightly on ppc64 > due to a lack of hardware not desire. Well, honestly, I haven't tried it. But what I'd really like to be able to do is: Download 'linux-test-thingy.tar.gz' (or get via cvs/bk/whatever) extract it somewhere, export via nfs/whatever, then set up DHCP and tftp to to support netboot of some target machines, and maybe a short shell script to be able to power-cycle the target machine, as well as recording serial console output. Then, point it to a kernel tree to start running tests. Provided I already have DHCP, tftp, console, and power cycling set up, I'd like this to take less than an hour or so to get running. Otherwise, I'd be happy with a RedHat rpm or debian package I can install on one of my cluster nodes that installs all the tests and some reporting tools or somesuch. My impression of where things are now is it would take me at least a couple of days to get something usable up and running. > The LTC Test group right now picks up the Rochester p kernel builds (2.6 > based) as they come out and runs many different tests on them. Is that > the tree you want runs done on or is there another place to pick up the > kernels you are talking about. If you contact me directly and give me > the details we will try and do everything we can to help. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Sat Nov 15 18:19:46 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 15 Nov 2003 18:19:46 +1100 Subject: [PATCH] clean up compiler warnings in 2.5 bk In-Reply-To: <16307.28243.151844.849084@cargo.ozlabs.ibm.com> References: <3FB15B19.3080607@austin.ibm.com> <16307.28243.151844.849084@cargo.ozlabs.ibm.com> Message-ID: <1068880786.4642.5.camel@gaston> On Thu, 2003-11-13 at 22:43, Paul Mackerras wrote: > Nathan Lynch writes: > > > Attached patch cleans up some warnings I've seen when building ameslab > > 2.5 bk: > > > -} hmt_thread_data[NR_CPUS] = {0}; > > +} hmt_thread_data[NR_CPUS] = {{0}}; > > I guess this is because we don't initialize the bss early enough. > Apart from that I'd rather see the initializer disappear. I have a patch doing the init from prom_init, is that early enough ? I can dig that out on Monday. Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From dwm at austin.ibm.com Mon Nov 17 05:15:08 2003 From: dwm at austin.ibm.com (Doug Maxey) Date: Sun, 16 Nov 2003 12:15:08 -0600 Subject: building cross-ppc64 on RH9 Message-ID: <200311161815.hAGIF87J018480@localhost.localdomain> Howdy, It's been a few years since building a cross gcc, and must say that over all, it appears much easier. Also have read all the docs that I could find on the subject. I am having trouble, however, determining where to put the target include files for building the initial pass with xgcc. Started with the instructions from ftp://ftp.linuxppc64.org/pub/people/amodra/gcc-3.2/bootstrap and have built in binutils-2.14 and apparently installed ok, but building the cross-gcc always fails at the point where xgcc is attempting to pick up the ppc include files... Have pulled down the latest glibc and attempted to install the generic headers, but cannot cleanly build or install that either because I don't have the cross-gcc built yet. Any hints? It seems that I am missing a crucial first or second step to get the correct (or at least reasonable) version of the standard headers installed... ++doug -------------- next part -------------- cd ~/build/toolchain/gcc-33/ make /home/dwm/build/toolchain/gcc-33/libiberty make[1]: Entering directory `/home/dwm/build/toolchain/gcc-33/libiberty' /home/dwm/build/toolchain/gcc-33/libiberty/testsuite make[2]: Entering directory `/home/dwm/build/toolchain/gcc-33/libiberty/testsuite' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/home/dwm/build/toolchain/gcc-33/libiberty/testsuite' make[1]: Leaving directory `/home/dwm/build/toolchain/gcc-33/libiberty' /home/dwm/build/toolchain/gcc-33/gcc make[1]: Entering directory `/home/dwm/build/toolchain/gcc-33/gcc' echo "/* This file is machine generated. Do not edit. */" > tmp-gtyp.h echo "static const char *srcdir = " >> tmp-gtyp.h echo "\"/home/dwm/sb/toolchain/gcc-33/gcc\"" >> tmp-gtyp.h echo ";" >> tmp-gtyp.h echo "static const char *lang_files[] = {" >> tmp-gtyp.h ll="/home/dwm/sb/toolchain/gcc-33/gcc/c-lang.c /home/dwm/sb/toolchain/gcc-33/gcc/c-parse.in /home/dwm/sb/toolchain/gcc-33/gcc/c-tree.h /home/dwm/sb/toolchain/gcc-33/gcc/c-decl.c /home/dwm/sb/toolchain/gcc-33/gcc/c-common.c /home/dwm/sb/toolchain/gcc-33/gcc/c-common.h /home/dwm/sb/toolchain/gcc-33/gcc/c-pragma.c /home/dwm/sb/toolchain/gcc-33/gcc/c-objc-common.c "; \ for f in $ll; do \ echo "\"$f\", "; done >> tmp-gtyp.h echo "NULL};" >> tmp-gtyp.h echo "static const char *langs_for_lang_files[] = {" >> tmp-gtyp.h ff="c c c c c c c c "; \ for f in $ff; do \ echo "\"$f\", " ; done >> tmp-gtyp.h echo "NULL};" >> tmp-gtyp.h echo "static const char *all_files[] = {" >> tmp-gtyp.h gf="config.h auto-host.h /home/dwm/sb/toolchain/gcc-33/gcc/../include/ansidecl.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/rs6000.h /home/dwm/sb/toolchain/gcc-33/gcc/config/dbxelf.h /home/dwm/sb/toolchain/gcc-33/gcc/config/elfos.h /home/dwm/sb/toolchain/gcc-33/gcc/config/svr4.h /home/dwm/sb/toolchain/gcc-33/gcc/config/freebsd-spec.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/sysv4.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h /home/dwm/sb/toolchain/gcc-33/gcc/defaults.h /home/dwm/sb/toolchain/gcc-33/gcc/defaults.h /home/dwm/sb/toolchain/gcc-33/gcc/location.h /home/dwm/sb/toolchain/gcc-33/gcc/../include/hashtab.h /home/dwm/sb/toolchain/gcc-33/gcc/bitmap.h /home/dwm/sb/toolchain/gcc-33/gcc/function.h /home/dwm/sb/toolchain/gcc-33/gcc/rtl.h /home/dwm/sb/toolchain/gcc-33/gcc/optabs.h /home/dwm/sb/toolchain/gcc-33/gcc/tree.h /home/dwm/sb/toolchain/gcc-33/gcc/libfuncs.h /home/dwm/sb/toolchain/gcc-33/gcc/hashtable.h /home/dwm/sb/toolchain/gcc-33/gcc/real.h /! home/dwm/sb/toolchain/gcc-33/gcc/varray.h /home/dwm/sb/toolchain/gcc-33/gcc/ssa.h /home/dwm/sb/toolchain/gcc-33/gcc/insn-addr.h /home/dwm/sb/toolchain/gcc-33/gcc/cselib.h /home/dwm/sb/toolchain/gcc-33/gcc/c-common.h /home/dwm/sb/toolchain/gcc-33/gcc/c-tree.h /home/dwm/sb/toolchain/gcc-33/gcc/basic-block.h /home/dwm/sb/toolchain/gcc-33/gcc/alias.c /home/dwm/sb/toolchain/gcc-33/gcc/bitmap.c /home/dwm/sb/toolchain/gcc-33/gcc/cselib.c /home/dwm/sb/toolchain/gcc-33/gcc/dwarf2out.c /home/dwm/sb/toolchain/gcc-33/gcc/emit-rtl.c /home/dwm/sb/toolchain/gcc-33/gcc/except.c /home/dwm/sb/toolchain/gcc-33/gcc/explow.c /home/dwm/sb/toolchain/gcc-33/gcc/expr.c /home/dwm/sb/toolchain/gcc-33/gcc/fold-const.c /home/dwm/sb/toolchain/gcc-33/gcc/function.c /home/dwm/sb/toolchain/gcc-33/gcc/gcse.c /home/dwm/sb/toolchain/gcc-33/gcc/integrate.c /home/dwm/sb/toolchain/gcc-33/gcc/lists.c /home/dwm/sb/toolchain/gcc-33/gcc/optabs.c /home/dwm/sb/toolchain/gcc-33/gcc/profile.c /home/dwm/sb/toolchain/gcc-3! 3/gcc/ra-build.c /home/dwm/sb/toolchain/gcc-33/gcc/regclass.c! /home/dwm/sb/toolchain/gcc-33/gcc/reg-stack.c /home/dwm/sb/toolchain/gcc-33/gcc/sdbout.c /home/dwm/sb/toolchain/gcc-33/gcc/stmt.c /home/dwm/sb/toolchain/gcc-33/gcc/stor-layout.c /home/dwm/sb/toolchain/gcc-33/gcc/tree.c /home/dwm/sb/toolchain/gcc-33/gcc/varasm.c /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/rs6000.c /home/dwm/sb/toolchain/gcc-33/gcc/c-lang.c /home/dwm/sb/toolchain/gcc-33/gcc/c-parse.in /home/dwm/sb/toolchain/gcc-33/gcc/c-tree.h /home/dwm/sb/toolchain/gcc-33/gcc/c-decl.c /home/dwm/sb/toolchain/gcc-33/gcc/c-common.c /home/dwm/sb/toolchain/gcc-33/gcc/c-common.h /home/dwm/sb/toolchain/gcc-33/gcc/c-pragma.c /home/dwm/sb/toolchain/gcc-33/gcc/c-objc-common.c"; \ for f in $gf; do \ echo "\"$f\", "; done >> tmp-gtyp.h echo " NULL};" >> tmp-gtyp.h echo "static const char *lang_dir_names[] = { \"c\", " >> tmp-gtyp.h gf=""; \ for l in $gf; do \ echo "\"$l\", "; done >> tmp-gtyp.h echo "NULL};" >> tmp-gtyp.h /bin/sh /home/dwm/sb/toolchain/gcc-33/gcc/move-if-change tmp-gtyp.h gtyp-gen.h gtyp-gen.h is unchanged (cd intl && make all) /home/dwm/build/toolchain/gcc-33/gcc/intl make[2]: Entering directory `/home/dwm/build/toolchain/gcc-33/gcc/intl' make[2]: Nothing to be done for `all'. make[2]: Leaving directory `/home/dwm/build/toolchain/gcc-33/gcc/intl' if [ -f specs.ready ] ; then \ true; \ else \ echo timestamp > specs.ready; \ fi make GCC_FOR_TARGET="/home/dwm/build/toolchain/gcc-33/gcc/xgcc -B/home/dwm/build/toolchain/gcc-33/gcc/ -B/opt/ppc64-33/powerpc-linux/bin/ -B/opt/ppc64-33/powerpc-linux/lib/ -isystem /opt/ppc64-33/powerpc-linux/include" \ BUILD_PREFIX="" BUILD_PREFIX_1="loser-" \ AR_FOR_TARGET="powerpc-linux-ar" \ AR_CREATE_FOR_TARGET="powerpc-linux-ar rc" \ AR_FLAGS_FOR_TARGET="" \ CFLAGS="-g -O2 -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wtraditional -pedantic -Wno-long-long " \ RANLIB_FOR_TARGET="powerpc-linux-ranlib" \ RANLIB_TEST_FOR_TARGET="[ -f powerpc-linux-ranlib ] || ( [ "i686-pc-linux-gnu" = "powerpc-unknown-linux-gnu" ] && [ -f /usr/bin/ranlib -o -f /bin/ranlib ] )" \ NM_FOR_TARGET="/opt/ppc64-33/powerpc-linux/bin/nm" AWK="gawk" \ LIBGCC2_CFLAGS="-O2 -DIN_GCC -DCROSS_COMPILE -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -isystem ./include -fPIC -g -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc" \ INCLUDES="-I. -I. -I/home/dwm/sb/toolchain/gcc-33/gcc -I/home/dwm/sb/toolchain/gcc-33/gcc/. -I/home/dwm/sb/toolchain/gcc-33/gcc/config -I/home/dwm/sb/toolchain/gcc-33/gcc/../include" \ CONFIG_H="config.h auto-host.h /home/dwm/sb/toolchain/gcc-33/gcc/../include/ansidecl.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/rs6000.h /home/dwm/sb/toolchain/gcc-33/gcc/config/dbxelf.h /home/dwm/sb/toolchain/gcc-33/gcc/config/elfos.h /home/dwm/sb/toolchain/gcc-33/gcc/config/svr4.h /home/dwm/sb/toolchain/gcc-33/gcc/config/freebsd-spec.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/sysv4.h /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h /home/dwm/sb/toolchain/gcc-33/gcc/defaults.h /home/dwm/sb/toolchain/gcc-33/gcc/defaults.h insn-constants.h insn-flags.h" MACHMODE_H="machmode.h machmode.def /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/rs6000-modes.def" \ LIB1ASMSRC='' \ MAKEOVERRIDES= \ -f libgcc.mk all make[2]: Entering directory `/home/dwm/build/toolchain/gcc-33/gcc' for d in libgcc nof libgcc/nof; do \ if [ -d $d ]; then true; else /bin/sh /home/dwm/sb/toolchain/gcc-33/gcc/mkinstalldirs $d; fi; \ done if [ -f stmp-dirs ]; then true; else touch stmp-dirs; fi /home/dwm/build/toolchain/gcc-33/gcc/xgcc -B/home/dwm/build/toolchain/gcc-33/gcc/ -B/opt/ppc64-33/powerpc-linux/bin/ -B/opt/ppc64-33/powerpc-linux/lib/ -isystem /opt/ppc64-33/powerpc-linux/include -O2 -DIN_GCC -DCROSS_COMPILE -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -isystem ./include -fPIC -g -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. -I. -I/home/dwm/sb/toolchain/gcc-33/gcc -I/home/dwm/sb/toolchain/gcc-33/gcc/. -I/home/dwm/sb/toolchain/gcc-33/gcc/config -I/home/dwm/sb/toolchain/gcc-33/gcc/../include -fPIC -mstrict-align -DL_muldi3 -c /home/dwm/sb/toolchain/gcc-33/gcc/libgcc2.c -o libgcc/./_muldi3.o In file included from tconfig.h:22, from /home/dwm/sb/toolchain/gcc-33/gcc/libgcc2.c:36: /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:89:20: signal.h: No such file or directory In file included from tconfig.h:22, from /home/dwm/sb/toolchain/gcc-33/gcc/libgcc2.c:36: /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:98: error: parse error before "stack_t" /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:98: warning: no semicolon at end of struct or union /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:100: error: parse error before "uc_sigmask" /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:100: warning: type defaults to `int' in declaration of `uc_sigmask' /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:100: warning: data definition has no type or storage class /home/dwm/sb/toolchain/gcc-33/gcc/config/rs6000/linux.h:99: error: storage size of `uc_mcontext' isn't known make[2]: *** [libgcc/./_muldi3.o] Error 1 make[2]: Leaving directory `/home/dwm/build/toolchain/gcc-33/gcc' make[1]: *** [stmp-multilib] Error 2 make[1]: Leaving directory `/home/dwm/build/toolchain/gcc-33/gcc' make: *** [all-gcc] Error 2 Compilation exited abnormally with code 2 at Sat Nov 15 14:50:41 From amodra at bigpond.net.au Mon Nov 17 21:40:06 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Mon, 17 Nov 2003 21:10:06 +1030 Subject: building cross-ppc64 on RH9 In-Reply-To: <200311161815.hAGIF87J018480@localhost.localdomain> References: <200311161815.hAGIF87J018480@localhost.localdomain> Message-ID: <20031117104006.GM14287@bubble.sa.bigpond.net.au> On Sun, Nov 16, 2003 at 12:15:08PM -0600, Doug Maxey wrote: > have built in binutils-2.14 and apparently installed ok, but > building the cross-gcc always fails at the point where xgcc is > attempting to pick up the ppc include files... [snip] > gcc/config/rs6000/linux.h:89:20: signal.h: No such file or directory Did you pass --with-headers to the first stage gcc configure? You shouldn't do that. Without --with-headers, inhibit_libc should be defined, and that disables the code that needs signal.h. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From johnrose at austin.ibm.com Tue Nov 18 06:40:08 2003 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 17 Nov 2003 13:40:08 -0600 Subject: [PATCH] RTAS syscall - updated In-Reply-To: <20031114232606.GD11253@krispykreme> References: <1068486681.6301.15.camel@verve> <16307.28756.583878.998250@cargo.ozlabs.ibm.com> <1068837967.12651.11.camel@verve> <16309.25258.441330.205683@cargo.ozlabs.ibm.com> <20031114232606.GD11253@krispykreme> Message-ID: <1069098008.25176.7.camel@verve> Does anyone have thoughts on whether this could go into 2.4 as well? Thanks- John On Fri, 2003-11-14 at 17:26, Anton Blanchard wrote: > > Looks good. You have reminded me that we need to add the fadvise64_64 > > system call for 32-bit processes, but that can go in after you have > > pushed the patch. > > Speaking of which, recent glibc uses the new statfs64 calls, we need the > following patch from davem. > > Anton > > --- fs/compat.c.~1~ Wed Nov 12 16:09:49 2003 > +++ fs/compat.c Wed Nov 12 16:10:35 2003 > @@ -169,7 +169,6 @@ > > static int put_compat_statfs64(struct compat_statfs64 *ubuf, struct kstatfs *kbuf) > { > - > if (sizeof ubuf->f_blocks == 4) { > if ((kbuf->f_blocks | kbuf->f_bfree | > kbuf->f_bavail | kbuf->f_files | kbuf->f_ffree) & > @@ -192,10 +191,13 @@ > return 0; > } > > -asmlinkage long compat_statfs64(const char *path, struct compat_statfs64 *buf) > +asmlinkage long compat_statfs64(const char *path, compat_size_t sz, struct compat_statfs64 *buf) > { > struct nameidata nd; > int error; > + > + if (sz != sizeof(*buf)) > + return -EINVAL; > > error = user_path_walk(path, &nd); > if (!error) { > > > We also need to hook them in: > > --- /tmp/misc.S 2003-10-16 14:30:22.000000000 +0000 > +++ for-linus-ppc64/arch/ppc64/kernel/misc.S 2003-11-14 09:24:24.000000000 +0000 > @@ -850,8 +850,8 @@ > .llong .sys_ni_syscall > .llong .sys32_tgkill /* 250 */ > .llong .sys32_utimes > - .llong .sys_statfs64 > - .llong .sys_fstatfs64 > + .llong .compat_statfs64 > + .llong .compat_fstatfs64 > > .balign 8 > _GLOBAL(sys_call_table) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Wed Nov 19 02:59:49 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 18 Nov 2003 09:59:49 -0600 Subject: [PATCH] nvram buffering/error logging port to 2.6 In-Reply-To: <16307.28027.222060.344303@cargo.ozlabs.ibm.com> References: <0B0CFE56-1156-11D8-A79D-000A95A0560C@us.ibm.com> <1068595835.2239.58.camel@tin.ibm.com > <16307.28027.222060.344303@cargo.ozlabs.ibm.com> Message-ID: <1069171188.1137.356.camel@tin.ibm.com > Here is an updated version of the patch changes from Paul's comments. > Mostly looks fine. A few comments: > > * Please add a comment somewhere explaining why it is necessary > to have the kernel write the error log to nvram, i.e. what the > circumstances are under which Bad Things would happen if we relied > on userspace to do it (and what those Bad Things are). Done in nvram_write_error_log()'s comments. > * We are going to have to separate the general /dev/nvram support > from the methods for reading/writing nvram via RTAS, because the > G5 powermacs have nvram but don't use RTAS. This is not directly a > criticism of the patch, but if you felt like reworking it to make > this separation (and maybe put function pointers for nvram_read / > nvram_write into ppc_md) that would be great. I did this abstraction. > * With the kernel parsing the partitions in the nvram, I wonder if we > should be exporting that information to userspace somehow? The nvram partition table is printed during boot right now. There is also the ppc64-util command 'nvram' that can be used to read the partition table. > * In future, as a way of reducing the bulk of the code, could we > consider having userspace do the repartitioning to create the > ppc64,linux partition instead of the kernel? That is, could we have > rtasd (or something similar) create the partition ahead of time so > that the kernel would only have to read the partitioning? Or is > there some scenario where this would be impossible? I think this is a good idea. This is something that I've considered doing. There are a number of components coming down the line that will be using NVRAM and some of them may have restrictions. > * The nvram_read and nvram_write routines could be rewritten to be a > bit shorter and neater. In particular we should be able to do it > with just one loop containing the rtas_call call, rather than an if > and a loop each with a call to rtas_call. And we shouldn't need to > do a mod operation. Like this: I like your method much more, I've made this change. Thanks, Jake -------------- next part -------------- A non-text attachment was scrubbed... Name: linux-2.6-nvram-errlog-3.patch.bz2 Type: application/x-bzip Size: 12352 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031118/56ceaab6/attachment.bin From olh at suse.de Wed Nov 19 05:19:26 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 18 Nov 2003 19:19:26 +0100 Subject: [PATCH] missing ppc32 time syscalls in 2.6.0-test9 Message-ID: <20031118181926.GA3204@suse.de> The new coreutils package uses syscall 246 for `sleep 42`, it has no fallback. syscall 249 is a new one from linuxppc-2.5 tree. diff -p -purNX /suse/olh/kernel/kernel_exclude.txt x/linux-2.6.0-test9/arch/ppc64/kernel/misc.S linux-2.6.0-test9/arch/ppc64/kernel/misc.S --- x/linux-2.6.0-test9/arch/ppc64/kernel/misc.S 2003-11-18 16:38:29.000000000 +0100 +++ linux-2.6.0-test9/arch/ppc64/kernel/misc.S 2003-11-18 19:12:42.000000000 +0100 @@ -843,11 +843,11 @@ _GLOBAL(sys_call_table32) .llong .sys_ni_syscall .llong .sys_ni_syscall .llong .sys_ni_syscall - .llong .sys_ni_syscall /* 245 */ - .llong .sys_ni_syscall - .llong .sys_ni_syscall - .llong .sys_ni_syscall - .llong .sys_ni_syscall + .llong .compat_clock_settime /* 245 */ + .llong .compat_clock_gettime + .llong .compat_clock_getres + .llong .compat_clock_nanosleep + .llong .sys_ni_syscall /* 249 swapcontext */ .llong .sys32_tgkill /* 250 */ .llong .sys32_utimes .llong .compat_statfs64 -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 19 16:37:19 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 19 Nov 2003 16:37:19 +1100 Subject: [PATCH] missing ppc32 time syscalls in 2.6.0-test9 In-Reply-To: <20031118181926.GA3204@suse.de> References: <20031118181926.GA3204@suse.de> Message-ID: <20031119053719.GE26020@krispykreme> Hi, > The new coreutils package uses syscall 246 for `sleep 42`, it has no > fallback. > syscall 249 is a new one from linuxppc-2.5 tree. Looks good, thanks Olaf. BTW It seems to be falling back to gettimeofday on my system: SYS_246(0, 0xffffecc8, 0, 0, 0) = -1 ENOSYS (Function not implemented) gettimeofday({1069220171, 27198}, NULL) = 0 nanosleep({10, 0}, Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 19 21:29:37 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 19 Nov 2003 21:29:37 +1100 Subject: [PATCH] missing ppc32 time syscalls in 2.6.0-test9 In-Reply-To: <20031118181926.GA3204@suse.de> References: <20031118181926.GA3204@suse.de> Message-ID: <1069237776.31665.80.camel@gaston> On Wed, 2003-11-19 at 05:19, Olaf Hering wrote: > The new coreutils package uses syscall 246 for `sleep 42`, it has no > fallback. > syscall 249 is a new one from linuxppc-2.5 tree. FYI, I will do the swapcontext stuff for ppc64, next week hopefully Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Nov 20 05:35:44 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 19 Nov 2003 12:35:44 -0600 Subject: [PATCH]: viewing TCE usage stats in /proc Message-ID: <20031119123543.A30968@forte.austin.ibm.com> Hi, Here is a half-finished patch I'm using to debug TCE allocation problems on the ppc64 platform. TCE's are used to translate PCI DMA addresses, and TCE table entries are heavily alloced/freed during scsi scatter-gather DMA ops. This patch prints usage statistics for the tce table, and can give you some idea of what a device driver is doing with all those dma blcoks. I'm trying to gauge interest in this patch... anyone out there find this interesting? Anyone have feature requests? This patch is currently for 2.4.21. Its only for ppc64 pSeries, since I don't think the ppc processors use tce's in thier i/o subsystem. However, I'm surely not the first to look at pci dma scatter-gather bugs, and so I thought this topic might be of general interest. I'd like to submit the final patch to the 2.6.0 tree at some point, unless everyone thinks this would be a dumb idea. The patch collects tce usage stats. These can be viewed either with kdb, or with a /proc/ppc64/tce/stats entry. I'm working on putting detailed stats into /proc/ppc64/tce/busid:devid files. (The patch is half-finished, so, yes there is some crud in it). --linas -------------- next part -------------- Index: arch/ppc64/kdb/kdbasupport.c =================================================================== RCS file: /cvs/local/sles8/arch/ppc64/kdb/kdbasupport.c,v retrieving revision 1.1.1.2 diff -u -p -r1.1.1.2 kdbasupport.c --- arch/ppc64/kdb/kdbasupport.c 10 Sep 2003 13:14:42 -0000 1.1.1.2 +++ arch/ppc64/kdb/kdbasupport.c 19 Nov 2003 18:02:12 -0000 @@ -1693,11 +1693,16 @@ kdba_dump_tce_table(int argc, const char long tce_table_address; int nr; int i,j,k; - int full,empty; + int full,partial,empty; int fulldump=0; u64 mapentry; - int totalpages; + int freepages; int levelpages; +#ifdef CONFIG_TCE_STATS + struct tce_blk_stats *blk_stats; + int alloced_blocks, stale_blocks; + unsigned long alloc_jiffies; +#endif /* CONFIG_TCE_STATS */ if (argc == 0) { kdb_printf("need address\n"); @@ -1710,17 +1715,21 @@ kdba_dump_tce_table(int argc, const char if (strcmp(argv[2], "full") == 0) fulldump=1; - /* with address, read contents of memory and dump tce table. */ - /* possibly making some assumptions on the depth and size of table..*/ + /* use address to read contents of memory and dump tce table. */ - nr = kdba_readarea_size(tce_table_address+0 ,&kt.busNumber,8); - nr = kdba_readarea_size(tce_table_address+8 ,&kt.size,8); - nr = kdba_readarea_size(tce_table_address+16,&kt.startOffset,8); - nr = kdba_readarea_size(tce_table_address+24,&kt.base,8); - nr = kdba_readarea_size(tce_table_address+32,&kt.index,8); - nr = kdba_readarea_size(tce_table_address+40,&kt.tceType,8); +#define GET_TCE_VAL(X) \ + nr = kdba_readarea_size( \ + ((long) &(((struct TceTable *)tce_table_address)->X)), \ + &(kt.X), sizeof(kt.X)); + + GET_TCE_VAL (busNumber); + GET_TCE_VAL (size); + GET_TCE_VAL (startOffset); + GET_TCE_VAL (base); + GET_TCE_VAL (index); + GET_TCE_VAL (tceType); #ifdef CONFIG_SMP - nr = kdba_readarea_size(tce_table_address+48,&kt.lock,8); + GET_TCE_VAL (lock); #endif kdb_printf("\n"); @@ -1734,43 +1743,90 @@ kdba_dump_tce_table(int argc, const char #ifdef CONFIG_SMP kdb_printf("lock: 0x%x \n",(uint)kt.lock.lock); #endif - nr = kdba_readarea_size(tce_table_address+56,&kt.mlbm.maxLevel,8); - kdb_printf(" maxLevel: 0x%x \n",(uint)kt.mlbm.maxLevel); - totalpages=0; + GET_TCE_VAL (mlbm.maxLevel); + kdb_printf(" maxLevel: 0x%x \n",(uint)kt.mlbm.maxLevel); +#ifdef CONFIG_TCE_STATS + GET_TCE_VAL (use_cnt); + kdb_printf(" use_cnt: %d \n",(uint)kt.use_cnt); +#endif + freepages=0; for (i=0;i 3*HZ) { + stale_blocks ++; + } + } + } + kdb_printf(" blk_stats: %p\n", blk_stats); + } else { + alloced_blocks = -1; + stale_blocks = -1; + } + GET_TCE_VAL (mlbm.level[i].use_cnt); + GET_TCE_VAL (mlbm.level[i].split_cnt); + GET_TCE_VAL (mlbm.level[i].merge_cnt); + kdb_printf(" use_cnt: %d split: %d merge: %d alloced: %d stale: %d\n", + kt.mlbm.level[i].use_cnt, + kt.mlbm.level[i].split_cnt, + kt.mlbm.level[i].merge_cnt, + alloced_blocks, + stale_blocks); +#endif /* if these dont match, this might not be a valid tce table, so dont try to iterate the map entries. */ if (kt.mlbm.level[i].numBits == 8*kt.mlbm.level[i].numBytes) { - full=0;empty=0;levelpages=0; + int n=0; + full=0;partial=0;empty=0;levelpages=0; for (j=0;j>= 56; + if (mapentry == 0xff) full++; + else if (mapentry) + partial++; else empty++; if (mapentry && fulldump) { - kdb_printf("0x%lx\n",mapentry); + if (n && (n%32 == 0)) kdb_printf ("\n"); + kdb_printf("%02lx ",(int) mapentry); + n++; } - for (k=0;(k<=64) && ((0x1UL<mlbm.level[0].numBits; + num_entries *= 2; /* room for other levels as well */ + num_bytes = num_entries * sizeof( struct tce_blk_stats ); + p = (struct tce_blk_stats *)__get_free_pages( GFP_ATOMIC, get_order( num_bytes )); + + /* alloc may fail for large areas; keep driving */ + if (p) memset( p, 0, num_bytes ); + + for (i=0; imlbm.level[i].use_cnt = 0; + tbl->mlbm.level[i].split_cnt = 0; + tbl->mlbm.level[i].merge_cnt = 0; + + if (p) { + tbl->mlbm.level[i].blk_stats = p; + p += tbl->mlbm.level[i].numBits; + } else { + tbl->mlbm.level[i].blk_stats = 0x0; + } + } + + tbl->use_cnt = 0; +} + +#endif /* CONFIG_TCE_STATS */ /* * Build a TceTable structure. This contains a multi-level bit map which * is used to manage allocation of the tce space. @@ -276,7 +312,6 @@ struct TceTable *build_tce_table(struct } /* For the highest level, turn on all the bits */ - i = tbl->mlbm.maxLevel; p = tbl->mlbm.level[i].map; m = numBits[i]; @@ -301,6 +336,10 @@ struct TceTable *build_tce_table(struct } } +#ifdef CONFIG_TCE_STATS + build_tce_stats (tbl); +#endif /* CONFIG_TCE_STATS */ + return tbl; } @@ -364,6 +403,14 @@ static long alloc_tce_range_nolock( stru */ PPCDBG(PPCDBG_TCE, "alloc_tce_range_nolock: allocating block %ld, (byte=%ld, bit=%ld) order %d\n", block, i, bit, order ); tcenum = block << order; +#ifdef CONFIG_TCE_STATS + if (tbl->mlbm.level[order].blk_stats) { + tbl->mlbm.level[order].blk_stats[block].use_cnt ++; + tbl->mlbm.level[order].blk_stats[block].alloc_jiffies = jiffies; + } + tbl->mlbm.level[order].use_cnt ++; + tbl->use_cnt ++; +#endif /* CONFIG_TCE_STATS */ return tcenum; } ++map; @@ -388,6 +435,19 @@ static long alloc_tce_range_nolock( stru if((tcenum == -1) && (order < (NUM_TCE_LEVELS - 1))) { tcenum = alloc_tce_range_nolock( tbl, order+1 ); if ( tcenum != -1 ) { +#ifdef CONFIG_TCE_STATS + /* fix up stats for 'what we actually used' */ + if (tbl->mlbm.level[order].blk_stats) { + tbl->mlbm.level[order].blk_stats[(tcenum>>order)].alloc_jiffies = jiffies; + tbl->mlbm.level[order].blk_stats[(tcenum>>order)].use_cnt ++; + tbl->mlbm.level[order].blk_stats[(tcenum>>order)+1].alloc_jiffies = jiffies; + } + if (tbl->mlbm.level[order+1].blk_stats) { + tbl->mlbm.level[order+1].blk_stats[(tcenum>>(order+1))].alloc_jiffies = (unsigned long) -1; + tbl->mlbm.level[order+1].blk_stats[(tcenum>>(order+1))].use_cnt --; + } + tbl->mlbm.level[order].split_cnt ++; +#endif /* CONFIG_TCE_STATS */ free_tce_range_nolock( tbl, tcenum+(1<> bit; bytep = map + byte; +#ifdef CONFIG_TCE_STATS + if (tbl->mlbm.level[order].blk_stats) { + if (0 == tbl->mlbm.level[order].blk_stats[block].alloc_jiffies) { + printk("PCI_DMA: Freeing tce that wasn't alloced: device %s, busno 0x%x tcenum %lx, order %x\n", (tbl->dn)?(tbl->dn->full_name):"?", tbl->busNumber, tcenum,order); + } + tbl->mlbm.level[order].blk_stats[block].alloc_jiffies = 0; + } +#endif /* CONFIG_TCE_STATS */ + #ifdef DEBUG_TCE PPCDBG(PPCDBG_TCE,"free_tce_range_nolock: freeing block %ld (byte=%d, bit=%d) of order %d\n", block, byte, bit, order); @@ -487,6 +556,9 @@ void free_tce_range_nolock(struct TceTab PPCDBG(PPCDBG_TCE, "free_tce_range: buddying blocks %ld & %ld\n", block, block+1); +#ifdef CONFIG_TCE_STATS + tbl->mlbm.level[order].merge_cnt ++; +#endif /* CONFIG_TCE_STATS */ free_tce_range_nolock( tbl, tcenum, order+1 ); } } @@ -689,6 +761,7 @@ void create_tce_tables_for_buses(struct if ((1<dma_window_size = 1 << (22 - num_slots_ilog2); +printk ("duuuude create_tce_tables phb slots=%d size=0x%lx\n", num_slots,phb->dma_window_size); /* Reserve 16MB of DMA space on the first PHB. * We should probably be more careful and use firmware props. * In reality this space is remapped, not lost. But we don't @@ -757,8 +830,8 @@ void create_tce_tables(void) { void create_pci_bus_tce_table( unsigned long token ) { struct TceTable * newTceTable; - PPCDBG(PPCDBG_TCE, "Entering create_pci_bus_tce_table.\n"); - PPCDBG(PPCDBG_TCE, "\ttoken = 0x%lx\n", token); + PPCDBG(PPCDBG_TCEINIT, "Entering create_pci_bus_tce_table.\n"); + PPCDBG(PPCDBG_TCEINIT, "\ttoken = 0x%lx\n", token); newTceTable = (struct TceTable *)kmalloc( sizeof(struct TceTable), GFP_KERNEL ); @@ -799,6 +872,9 @@ void create_pci_bus_tce_table( unsigned getTceTableParmsPSeriesLP(phb, dn, newTceTable); dn->tce_table = build_tce_table( newTceTable ); +#ifdef CONFIG_TCE_STATS + newTceTable->dn = dn; +#endif /* CONFIG_TCE_STATS */ } } @@ -1084,7 +1160,7 @@ dma_addr_t pci_map_single(struct pci_dev unsigned order, nPages; PPCDBG(PPCDBG_TCE, "pci_map_single:\n"); - PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, size = 0x%16.16lx, direction = 0x%16.16lx, vaddr = 0x%16.16lx\n", hwdev, size, direction, vaddr); + PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, size = 0x%lx, direction = %ld, vaddr = 0x%16.16lx\n", hwdev, size, direction, vaddr); if (direction == PCI_DMA_NONE) BUG(); @@ -1297,7 +1373,7 @@ static dma_addr_t create_tces_sg(struct /* Client asked for way to much space. This is checked later anyway */ /* It is easier to debug here for the drivers than in the tce tables.*/ if(order >= NUM_TCE_LEVELS) { - printk("PCI_DMA: create_tces_sg size too large: 0x%llx \n",(numTces << PAGE_SHIFT)); + printk("PCI_DMA: create_tces_sg size too large: 0x%x \n",(numTces << PAGE_SHIFT)); panic("numTces is off"); return NO_TCE; } @@ -1403,7 +1479,7 @@ void pci_unmap_sg( struct pci_dev *hwdev dma_addr_t dma_end_page, dma_start_page; PPCDBG(PPCDBG_TCE, "pci_unmap_sg:\n"); - PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, sg = 0x%16.16lx, direction = 0x%16.16lx, nelms = 0x%16.16lx\n", hwdev, sg, direction, nelms); + PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, sg = 0x%16.16lx, direction = %ld, nelms = %ld\n", hwdev, sg, direction, nelms); if ( direction == PCI_DMA_NONE || nelms == 0 ) BUG(); @@ -1425,7 +1501,7 @@ void pci_unmap_sg( struct pci_dev *hwdev /* Client asked for way to much space. This is checked later anyway */ /* It is easier to debug here for the drivers than in the tce tables.*/ if(order >= NUM_TCE_LEVELS) { - printk("PCI_DMA: dma_start_page:0x%lx dma_end_page:0x%lx\n",dma_start_page,dma_end_page); + printk("PCI_DMA: dma_start_page:0x%x dma_end_page:0x%x\n",dma_start_page,dma_end_page); printk("PCI_DMA: pci_unmap_sg size too large: 0x%x \n",(numTces << PAGE_SHIFT)); return; } Index: arch/ppc64/kernel/proc_pmc.c =================================================================== RCS file: /cvs/local/sles8/arch/ppc64/kernel/proc_pmc.c,v retrieving revision 1.1.1.1 diff -u -p -r1.1.1.1 proc_pmc.c --- arch/ppc64/kernel/proc_pmc.c 7 Aug 2003 03:23:04 -0000 1.1.1.1 +++ arch/ppc64/kernel/proc_pmc.c 19 Nov 2003 18:02:12 -0000 @@ -47,6 +47,12 @@ /* pci Flight Recorder AHT */ extern void proc_pciFr_init(struct proc_dir_entry *proc_ppc64_root); +#define CONFIG_TCE_STATS +#ifdef CONFIG_TCE_STATS +/* PCI TCE stats interface */ +extern void proc_tce_init(struct proc_dir_entry *proc_ppc64_root); +#endif /* CONFIG_TCE_STATS */ + static int proc_pmc_control_mode = 0; struct proc_dir_entry *proc_ppc64_root = NULL; @@ -184,6 +190,11 @@ void proc_ppc64_init(void) /* Create the /proc/ppc64/pcifr for the Pci Flight Recorder. */ proc_pciFr_init(proc_ppc64_root); + +#ifdef CONFIG_TCE_STATS + /* Create the /proc/ppc64/tce entry for TCE stats/debugging */ + proc_tce_init (proc_ppc64_root); +#endif /* CONFIG_TCE_STATS */ proc_ppc64_pmc_root = proc_mkdir("pmc", proc_ppc64_root); Index: include/asm-ppc64/pci_dma.h =================================================================== RCS file: /cvs/local/sles8/include/asm-ppc64/pci_dma.h,v retrieving revision 1.1.1.1 diff -u -p -r1.1.1.1 pci_dma.h --- include/asm-ppc64/pci_dma.h 7 Aug 2003 03:23:25 -0000 1.1.1.1 +++ include/asm-ppc64/pci_dma.h 19 Nov 2003 18:02:23 -0000 @@ -23,6 +23,8 @@ #include #include +#define CONFIG_TCE_STATS + /* * NUM_TCE_LEVELS defines the largest contiguous block * of dma (tce) space we can get. NUM_TCE_LEVELS = 10 @@ -53,10 +55,29 @@ union Tce { } tceBits; }; +#ifdef CONFIG_TCE_STATS +struct tce_blk_stats { + unsigned long alloc_jiffies; /* time when last allocated, helps find leaks */ + unsigned int use_cnt; /* how many times this block has been alloced */ + char direction; /* last i/o direction */ +}; +#endif /* CONFIG_TCE_STATS */ + struct Bitmap { unsigned long numBits; unsigned long numBytes; unsigned char * map; +#ifdef CONFIG_TCE_STATS + unsigned int use_cnt; /* count - block of this order has been alloced */ + + /* The split/merge counts provide stats about the buddy system, + * helping debug fragmentation problems. */ + unsigned int split_cnt; /* count - block split to make smaller blocks */ + unsigned int merge_cnt; /* count - block buddied back up by free */ + + /* Individual block stats should help debug alloc leaks. */ + struct tce_blk_stats * blk_stats; +#endif /* CONFIG_TCE_STATS */ }; struct MultiLevelBitmap { @@ -73,6 +94,10 @@ struct TceTable { u64 tceType; spinlock_t lock; struct MultiLevelBitmap mlbm; +#ifdef CONFIG_TCE_STATS + unsigned int use_cnt; /* count of times an alloc was made in this table */ + struct device_node *dn; /* simplify diagnostics */ +#endif /* CONFIG_TCE_STATS */ }; struct TceTableManagerCB { From linas at austin.ibm.com Thu Nov 20 08:22:05 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 19 Nov 2003 15:22:05 -0600 Subject: marcello kernel &ppc64 In-Reply-To: <20031114203620.GS3504@kalmia.hozed.org>; from hozer@hozed.org on Fri, Nov 14, 2003 at 02:36:20PM -0600 References: <20031113143822.A25254@forte.austin.ibm.com> <20031113165305.A30094@forte.austin.ibm.com> <20031114203620.GS3504@kalmia.hozed.org> Message-ID: <20031119152205.A23282@forte.austin.ibm.com> On Fri, Nov 14, 2003 at 02:36:20PM -0600, Troy Benjegerdes wrote: > > I was hoping the Linux Test project and the stuff OSDL is doing would > wind up with a 'test suite' that mere mortals with real work to do could > easily run. Off-topic, but ... LTP is OK, but not really a thorough test suite, it doesn't test many major components (disclaimer: to the best of my understanding). My understanding is that its good at single-box tests, but doesn't really stress network, nfs or device drivers (disk, scsi, etc). Testing is, by its very nature, not going to be something that 'mere mortals with work to do' are going to be able to do. First of all, just obtaining all the hardware and setting up all the interesting permutations can be a full-time job, esp. if you are testing networks or NFS or disk, e.g. fibre-channel i/o. Then you've got to monitor all of your test machines every now and then, make sure none of them hung. Didn't have a period of inexplicable slowness. Simply reviewing performance stats for a dozen machines is time consuming. Finally, and you'll hate me for saying this: if all your tests are passing, that means you aren't testing hard enough. Its a full-time job to figure out how you might be able to break the system, then to write that test case, and then try to break it. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Sat Nov 22 06:39:51 2003 From: olh at suse.de (Olaf Hering) Date: Fri, 21 Nov 2003 20:39:51 +0100 Subject: [PATCH] kconfig change for HOTPLUG_PCI_RPA Message-ID: <20031121193951.GA15002@suse.de> I bet its ppc64 only. --- ./drivers/pci/hotplug/Kconfig~ 2003-11-21 19:45:38.000000000 +0100 +++ ./drivers/pci/hotplug/Kconfig 2003-11-21 20:37:48.000000000 +0100 @@ -124,7 +124,7 @@ config HOTPLUG_PCI_CPCI_GENERIC config HOTPLUG_PCI_RPA tristate "RPA PCI Hotplug driver" - depends on HOTPLUG_PCI + depends on HOTPLUG_PCI && PPC64 help Say Y here if you have a a RPA system that supports PCI Hotplug. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Sat Nov 22 06:53:56 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 21 Nov 2003 13:53:56 -0600 Subject: [PATCH] kconfig change for HOTPLUG_PCI_RPA In-Reply-To: <20031121193951.GA15002@suse.de> Message-ID: <70A88E14-1C5C-11D8-9229-000A95A0560C@us.ibm.com> On Friday, Nov 21, 2003, at 13:39 US/Central, Olaf Hering wrote: > > I bet its ppc64 only. > > --- ./drivers/pci/hotplug/Kconfig~ 2003-11-21 19:45:38.000000000 > +0100 > +++ ./drivers/pci/hotplug/Kconfig 2003-11-21 20:37:48.000000000 > +0100 > @@ -124,7 +124,7 @@ config HOTPLUG_PCI_CPCI_GENERIC > > config HOTPLUG_PCI_RPA > tristate "RPA PCI Hotplug driver" > - depends on HOTPLUG_PCI > + depends on HOTPLUG_PCI && PPC64 > help > Say Y here if you have a a RPA system that supports PCI > Hotplug. And perhaps the help could be elaborated to say something about pSeries? Believe it or not, most people have absolutely no idea what "RPA" means. Also there was an "a a" typo. Also, which pSeries systems support hotplug? (Do not apply patch without fixing AAA/BBB/CCC. :) ===== linuxppc64-2.5/drivers/pci/hotplug/Kconfig 1.15 vs edited ===== --- 1.15/drivers/pci/hotplug/Kconfig Wed Nov 19 11:10:13 2003 +++ edited/linuxppc64-2.5/drivers/pci/hotplug/Kconfig Fri Nov 21 13:51:37 2003 @@ -123,10 +123,11 @@ When in doubt, say N. config HOTPLUG_PCI_RPA - tristate "RPA PCI Hotplug driver" - depends on HOTPLUG_PCI + tristate "pSeries PCI Hotplug driver" + depends on HOTPLUG_PCI && PPC64 help - Say Y here if you have a a RPA system that supports PCI Hotplug. + Say Y here if you have a pSeries system that supports PCI Hotplug, + such as AAA, BBB, or CCC. To compile this driver as a module, choose M here: the module will be called rpaphp. -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hollisb at us.ibm.com Tue Nov 25 06:12:12 2003 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Mon, 24 Nov 2003 13:12:12 -0600 Subject: new vscsi files In-Reply-To: <20031123175324.BA22524064@source.scl.ameslab.gov> Message-ID: <1BAE4F74-1EB2-11D8-B67D-000A95A0560C@us.ibm.com> On Sunday, Nov 23, 2003, at 11:53 US/Central, ppc64 at source.scl.ameslab.gov wrote: > full patch URL: > http://source.scl.ameslab.gov:14690//linux-2.4/patch at 1.1202 > > ChangeSet > 1.1202 03/11/23 11:37:33 boutcher at brule.rchland.ibm.com +6 -0 > Add ibmvscsi server [...] > drivers/scsi/srp.h > 1.0 03/11/23 11:20:52 boutcher at brule.rchland.ibm.com +0 -0 > BitKeeper file > /development/boutcher/bk/ppc64-2.4new/drivers/scsi/srp.h > > drivers/scsi/viosrp.h > 1.0 03/11/23 11:20:49 boutcher at brule.rchland.ibm.com +0 -0 > BitKeeper file > /development/boutcher/bk/ppc64-2.4new/drivers/scsi/viosrp.h These files need at least a copyright statement at the top. IMO they also need at least *some* sort of commenting, for example: - what does "SRP" mean? - on what systems can one find SRP? - where can one find more information on SRP? -- Hollis Blanchard IBM Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Nov 25 09:16:46 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 24 Nov 2003 16:16:46 -0600 Subject: [PATCH] RTAS syscall - review request In-Reply-To: <1068486681.6301.15.camel@verve>; from johnrose@austin.ibm.com on Mon, Nov 10, 2003 at 11:51:21AM -0600 References: <1068486681.6301.15.camel@verve> Message-ID: <20031124161646.A32192@forte.austin.ibm.com> On Mon, Nov 10, 2003 at 11:51:21AM -0600, John Rose wrote: > > This patch implements a generic RTAS interface to userspace through a > system call. It was originally written by Rusty Russel and modified by > myself. There are two main parts: .... In the patch below, someone mentioned the need for using copy_to_user() in the /proc routines. But I couldn't help noticing that much of the existing code ppc64 *proc*.c code doesn't use copy_to_user. (at least not in the 2.4 tree) (well, at least ppc_rtas_tone_volume_read() doesn't and I thought I saw more). --linas > diff -Nru a/arch/ppc64/kernel/rtas-proc.c b/arch/ppc64/kernel/rtas-proc.c > --- a/arch/ppc64/kernel/rtas-proc.c Mon Nov 10 11:34:58 2003 > +++ b/arch/ppc64/kernel/rtas-proc.c Mon Nov 10 11:34:58 2003 > @@ -842,6 +851,23 @@ > int n; > n = sprintf(buf, "%lu\n", rtas_tone_volume); > > + if (*ppos >= strlen(buf)) > + return 0; > + if (n > strlen(buf) - *ppos) > + n = strlen(buf) - *ppos; > + if (n > count) > + n = count; > + *ppos += n; > + return n; > +} > + > +/* RTAS Userspace access */ > +static ssize_t ppc_rtas_rmo_buf_read(struct file *file, char *buf, > + size_t count, loff_t *ppos) > +{ > + int n; > + > + n = sprintf(buf, "%p %x\n", rtas_rmo_buf, RTAS_SYSCALL_MAX); > if (*ppos >= strlen(buf)) > return 0; > if (n > strlen(buf) - *ppos) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Tue Nov 25 10:12:13 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 24 Nov 2003 17:12:13 -0600 Subject: [PATCH][2.6] JS20 support Message-ID: <1069715532.1258.68.camel@tin.ibm.com > Here are some patches and the config to boot a JS20 Blade. After the interrupt abstraction was done, there was not much code needed. Note: There is a current FW bug (should be fixed in a couple weeks) that you need to make sure you do not call event-scan. From the OF prompt run these commands. dev /rtas " event-scan" delete-property Patches ------- linux-2.6-amd74x-irq-1.patch - Bug fix required linux-2.6-js20-ide-workaorund-1.patch - Temporary workaround for IDE IO space being stuck in the ISA range. FW is working on a fix. linux-2.6-pcibios-scan-all-fns-1.patch - This is a forward port of the 2.4 patch to allow arch decide if need to scan all pci functions or not. This is based on a patch from Anton. defconfig.js20.bz2 - The config file to boot. Thanks, Jake -------------- next part -------------- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1342 -> 1.1343 # drivers/ide/pci/amd74xx.c 1.22 -> 1.23 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/11/24 moilanen at threadlp13.austin.ibm.com 1.1343 # init chipset does not return irq, like is required. # -------------------------------------------- # diff -Nru a/drivers/ide/pci/amd74xx.c b/drivers/ide/pci/amd74xx.c --- a/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 +++ b/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 @@ -374,7 +374,7 @@ #endif /* DISPLAY_AMD_TIMINGS && CONFIG_PROC_FS */ - return 0; + return dev->irq; } static void __init init_hwif_amd74xx(ide_hwif_t *hwif) -------------- next part -------------- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1341 -> 1.1342 # arch/ppc64/defconfig 1.41 -> 1.42 # arch/ppc64/Kconfig 1.25 -> 1.26 # include/asm-ppc64/eeh.h 1.6 -> 1.7 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/11/24 moilanen at threadlp13.austin.ibm.com 1.1342 # This is a temporary workaround for us the IDE IO space being in ISA # space instead of PCI space. Firmware is working on a fix for this. # -------------------------------------------- # diff -Nru a/arch/ppc64/Kconfig b/arch/ppc64/Kconfig --- a/arch/ppc64/Kconfig Mon Nov 24 14:20:01 2003 +++ b/arch/ppc64/Kconfig Mon Nov 24 14:20:01 2003 @@ -142,6 +142,12 @@ bool "Proc interface to RTAS" depends on !PPC_ISERIES +config JS20 + bool "JS20 System" + depends on !PPC_ISERIES + help + This option is for a JS20 box. + endmenu diff -Nru a/arch/ppc64/defconfig b/arch/ppc64/defconfig --- a/arch/ppc64/defconfig Mon Nov 24 14:20:01 2003 +++ b/arch/ppc64/defconfig Mon Nov 24 14:20:01 2003 @@ -59,6 +59,7 @@ # CONFIG_RTAS_FLASH is not set CONFIG_SCANLOG=y CONFIG_PPC_RTAS=y +# CONFIG_JS20 is not set # # General setup diff -Nru a/include/asm-ppc64/eeh.h b/include/asm-ppc64/eeh.h --- a/include/asm-ppc64/eeh.h Mon Nov 24 14:20:01 2003 +++ b/include/asm-ppc64/eeh.h Mon Nov 24 14:20:01 2003 @@ -150,7 +150,11 @@ * ISA does not implement EEH and ISA may not exist in the system. * For PCI we check for EEH failures. */ +#ifdef CONFIG_JS20 +#define _IO_IS_ISA(port) ((port) < 0x00000) +#else #define _IO_IS_ISA(port) ((port) < 0x10000) +#endif #define _IO_HAS_ISA_BUS (isa_io_base != 0) static inline u8 eeh_inb(unsigned long port) { -------------- next part -------------- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1343 -> 1.1344 # include/asm-alpha/pci.h 1.18 -> 1.19 # include/asm-sparc64/pci.h 1.17 -> 1.18 # include/asm-arm/pci.h 1.21 -> 1.22 # include/asm-v850/pci.h 1.2 -> 1.3 # include/asm-ppc64/pci.h 1.11 -> 1.12 # include/asm-i386/pci.h 1.25 -> 1.26 # include/asm-x86_64/pci.h 1.12 -> 1.13 # include/asm-um/pci.h 1.1 -> 1.2 # include/asm-m68k/pci.h 1.6 -> 1.7 # include/asm-generic/pci.h 1.1 -> 1.2 # include/asm-h8300/pci.h 1.3 -> 1.4 # include/asm-sh/pci.h 1.14 -> 1.15 # include/asm-arm26/pci.h 1.1 -> 1.2 # include/asm-mips/pci.h 1.13 -> 1.14 # drivers/pci/probe.c 1.37 -> 1.38 # include/asm-parisc/pci.h 1.9 -> 1.10 # include/asm-ppc/pci.h 1.22 -> 1.23 # include/asm-sparc/pci.h 1.11 -> 1.12 # include/asm-m68knommu/pci.h 1.2 -> 1.3 # include/asm-ia64/pci.h 1.22 -> 1.23 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/11/24 moilanen at threadlp13.austin.ibm.com 1.1344 # Have callouts for each architecture to determine if need to # scanning all pci functions or not. # -------------------------------------------- # diff -Nru a/drivers/pci/probe.c b/drivers/pci/probe.c --- a/drivers/pci/probe.c Mon Nov 24 14:30:06 2003 +++ b/drivers/pci/probe.c Mon Nov 24 14:30:06 2003 @@ -7,6 +7,8 @@ #include #include +#include + #undef DEBUG #ifdef DEBUG @@ -552,7 +554,7 @@ struct pci_dev *dev; dev = pci_scan_device(bus, devfn); - if (func == 0) { + if (!pcibios_scan_all_fns() && func == 0) { if (!dev) break; } else { diff -Nru a/include/asm-alpha/pci.h b/include/asm-alpha/pci.h --- a/include/asm-alpha/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-alpha/pci.h Mon Nov 24 14:30:06 2003 @@ -51,6 +51,7 @@ bus numbers. */ #define pcibios_assign_all_busses() 1 +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO alpha_mv.min_io_address #define PCIBIOS_MIN_MEM alpha_mv.min_mem_address diff -Nru a/include/asm-arm/pci.h b/include/asm-arm/pci.h --- a/include/asm-arm/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-arm/pci.h Mon Nov 24 14:30:06 2003 @@ -20,6 +20,8 @@ #endif +#define pcibios_scan_all_fns() 0 + static inline void pcibios_set_master(struct pci_dev *dev) { /* No special bus mastering setup handling */ diff -Nru a/include/asm-arm26/pci.h b/include/asm-arm26/pci.h --- a/include/asm-arm26/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-arm26/pci.h Mon Nov 24 14:30:06 2003 @@ -1,5 +1,6 @@ /* Should not be needed. IDE stupidity */ /* JMA 18.05.03 - is kinda needed, if only to tell it we don't have a PCI bus */ -#define PCI_DMA_BUS_IS_PHYS 0 +#define PCI_DMA_BUS_IS_PHYS 0 +#define pcibios_scan_all_fns() 0 diff -Nru a/include/asm-generic/pci.h b/include/asm-generic/pci.h --- a/include/asm-generic/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-generic/pci.h Mon Nov 24 14:30:06 2003 @@ -22,4 +22,6 @@ region->end = res->end; } +#define pcibios_scan_all_fns() 0 + #endif diff -Nru a/include/asm-h8300/pci.h b/include/asm-h8300/pci.h --- a/include/asm-h8300/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-h8300/pci.h Mon Nov 24 14:30:06 2003 @@ -8,6 +8,7 @@ */ #define pcibios_assign_all_busses() 0 +#define pcibios_scan_all_fns() 0 extern inline void pcibios_set_master(struct pci_dev *dev) { diff -Nru a/include/asm-i386/pci.h b/include/asm-i386/pci.h --- a/include/asm-i386/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-i386/pci.h Mon Nov 24 14:30:06 2003 @@ -15,6 +15,7 @@ #else #define pcibios_assign_all_busses() 0 #endif +#define pcibios_scan_all_fns() 0 extern unsigned long pci_mem_start; #define PCIBIOS_MIN_IO 0x1000 diff -Nru a/include/asm-ia64/pci.h b/include/asm-ia64/pci.h --- a/include/asm-ia64/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-ia64/pci.h Mon Nov 24 14:30:06 2003 @@ -16,6 +16,7 @@ * loader. */ #define pcibios_assign_all_busses() 0 +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 diff -Nru a/include/asm-m68k/pci.h b/include/asm-m68k/pci.h --- a/include/asm-m68k/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-m68k/pci.h Mon Nov 24 14:30:06 2003 @@ -36,6 +36,7 @@ }; #define pcibios_assign_all_busses() 0 +#define pcibios_scan_all_fns() 0 extern inline void pcibios_set_master(struct pci_dev *dev) { diff -Nru a/include/asm-m68knommu/pci.h b/include/asm-m68knommu/pci.h --- a/include/asm-m68knommu/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-m68knommu/pci.h Mon Nov 24 14:30:06 2003 @@ -11,6 +11,8 @@ #define PCIBIOS_MIN_IO 0x100 #define PCIBIOS_MIN_MEM 0x00010000 +#define pcibios_scan_all_fns() 0 + /* * Return whether the given PCI device DMA address mask can * be supported properly. For example, if your device can diff -Nru a/include/asm-mips/pci.h b/include/asm-mips/pci.h --- a/include/asm-mips/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-mips/pci.h Mon Nov 24 14:30:06 2003 @@ -20,6 +20,7 @@ #else #define pcibios_assign_all_busses() 0 #endif +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 diff -Nru a/include/asm-parisc/pci.h b/include/asm-parisc/pci.h --- a/include/asm-parisc/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-parisc/pci.h Mon Nov 24 14:30:06 2003 @@ -174,6 +174,7 @@ ** to zero for legacy platforms and one for PAT platforms. */ #define pcibios_assign_all_busses() (pdc_type == PDC_TYPE_PAT) +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0x10 #define PCIBIOS_MIN_MEM 0x1000 /* NBPG - but pci/setup-res.c dies */ diff -Nru a/include/asm-ppc/pci.h b/include/asm-ppc/pci.h --- a/include/asm-ppc/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-ppc/pci.h Mon Nov 24 14:30:06 2003 @@ -26,6 +26,7 @@ extern int pci_assign_all_busses; #define pcibios_assign_all_busses() (pci_assign_all_busses) +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 diff -Nru a/include/asm-ppc64/pci.h b/include/asm-ppc64/pci.h --- a/include/asm-ppc64/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-ppc64/pci.h Mon Nov 24 14:30:06 2003 @@ -19,6 +19,12 @@ #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 +/* + * ppc64 can have multifunction devices that do not respond to function 0. + * In this case we must scan all functions. + */ +#define pcibios_scan_all_fns() 1 + static inline void pcibios_set_master(struct pci_dev *dev) { /* No special bus mastering setup handling */ diff -Nru a/include/asm-sh/pci.h b/include/asm-sh/pci.h --- a/include/asm-sh/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-sh/pci.h Mon Nov 24 14:30:06 2003 @@ -12,6 +12,7 @@ or architectures with incomplete PCI setup by the loader */ #define pcibios_assign_all_busses() 1 +#define pcibios_scan_all_fns() 0 #if defined(CONFIG_CPU_SUBTYPE_ST40STB1) /* These are currently the correct values for the STM overdrive board. diff -Nru a/include/asm-sparc/pci.h b/include/asm-sparc/pci.h --- a/include/asm-sparc/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-sparc/pci.h Mon Nov 24 14:30:06 2003 @@ -8,6 +8,7 @@ * or architectures with incomplete PCI setup by the loader. */ #define pcibios_assign_all_busses() 0 +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0UL #define PCIBIOS_MIN_MEM 0UL diff -Nru a/include/asm-sparc64/pci.h b/include/asm-sparc64/pci.h --- a/include/asm-sparc64/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-sparc64/pci.h Mon Nov 24 14:30:06 2003 @@ -11,6 +11,7 @@ * or architectures with incomplete PCI setup by the loader. */ #define pcibios_assign_all_busses() 0 +#define pcibios_scan_all_fns() 0 #define PCIBIOS_MIN_IO 0UL #define PCIBIOS_MIN_MEM 0UL diff -Nru a/include/asm-um/pci.h b/include/asm-um/pci.h --- a/include/asm-um/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-um/pci.h Mon Nov 24 14:30:06 2003 @@ -2,5 +2,6 @@ #define __UM_PCI_H #define PCI_DMA_BUS_IS_PHYS (1) +#define pcibios_scan_all_fns() 0 #endif diff -Nru a/include/asm-v850/pci.h b/include/asm-v850/pci.h --- a/include/asm-v850/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-v850/pci.h Mon Nov 24 14:30:06 2003 @@ -17,6 +17,8 @@ /* Get any platform-dependent definitions. */ #include +#define pcibios_scan_all_fns() 0 + /* Generic declarations. */ struct scatterlist; diff -Nru a/include/asm-x86_64/pci.h b/include/asm-x86_64/pci.h --- a/include/asm-x86_64/pci.h Mon Nov 24 14:30:06 2003 +++ b/include/asm-x86_64/pci.h Mon Nov 24 14:30:06 2003 @@ -17,6 +17,7 @@ #else #define pcibios_assign_all_busses() 0 #endif +#define pcibios_scan_all_fns() 0 extern int no_iommu, force_iommu; -------------- next part -------------- A non-text attachment was scrubbed... Name: defconfig.js20.bz2 Type: application/x-bzip Size: 4368 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031124/2d831892/attachment.bin From olof at austin.ibm.com Tue Nov 25 10:36:10 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 24 Nov 2003 17:36:10 -0600 Subject: [PATCH] [2.4] Fix failing LTP testcase recvmsg01 Message-ID: <3FC295EA.6010009@austin.ibm.com> Attached patch fixes the LTP testcase recvmsg01, we don't return the same error (EMSGSIZE) for the translated 32-bit system call. 2.5 is not affected, translations are handled differently there. I'll push this to ameslab tomorrow. -Olof -- Olof Johansson Office: 4F005/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: recvmsg-ltp-patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031124/dbb6e616/attachment.txt From paulus at samba.org Tue Nov 25 12:14:02 2003 From: paulus at samba.org (Paul Mackerras) Date: Tue, 25 Nov 2003 12:14:02 +1100 Subject: [PATCH][2.6] JS20 support In-Reply-To: <1069715532.1258.68.camel@tin.ibm.com > References: <1069715532.1258.68.camel@tin.ibm.com > Message-ID: <16322.44250.747555.159787@cargo.ozlabs.ibm.com> Jake Moilanen writes: > This is a forward port of the 2.4 patch to allow arch > decide if need to scan all pci functions or not. This > is based on a patch from Anton. Why is this needed on a JS20 blade? My understanding is that this patch is needed in a partitioned environment where we may get function N of a PCI-PCI bridge assigned to a partition but not function 0. The JS20 blade isn't partitioned, so why do we need this patch? Regards, Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Tue Nov 25 16:06:38 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 24 Nov 2003 23:06:38 -0600 Subject: [PATCH] of_add_property and property list locking (2.5) Message-ID: <3FC2E35E.30400@austin.ibm.com> Hi- We need the ability to add custom properties to OF device nodes in certain situations (e.g. cpu hotplug). And if we're going to be adding properties, we need some sort of locking for struct device_node's properties list. I've also added a public proc_device_tree_add_property interface to fs/proc/proc_devtree.c to avoid duplication of code. The natural thing to follow this would be deprecating get_property in favor of a function (of_get_property?) that takes the new lock when searching the property list. Comments? Nathan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: of_add_property.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20031124/260ab916/attachment.txt From benh at kernel.crashing.org Tue Nov 25 16:48:42 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 25 Nov 2003 16:48:42 +1100 Subject: [PATCH] of_add_property and property list locking (2.5) In-Reply-To: <3FC2E35E.30400@austin.ibm.com> References: <3FC2E35E.30400@austin.ibm.com> Message-ID: <1069739322.669.0.camel@gaston> On Tue, 2003-11-25 at 16:06, Nathan Lynch wrote: > Hi- > > We need the ability to add custom properties to OF device nodes in > certain situations (e.g. cpu hotplug). And if we're going to be adding > properties, we need some sort of locking for struct device_node's > properties list. I've also added a public proc_device_tree_add_property > interface to fs/proc/proc_devtree.c to avoid duplication of code. > > The natural thing to follow this would be deprecating get_property in > favor of a function (of_get_property?) that takes the new lock when > searching the property list. Hrm... how do you protect the property content against deletion ? Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From paulus at samba.org Tue Nov 25 21:36:43 2003 From: paulus at samba.org (Paul Mackerras) Date: Tue, 25 Nov 2003 21:36:43 +1100 Subject: iSeries support in ameslab linux-2.5 Message-ID: <16323.12475.839925.470110@cargo.ozlabs.ibm.com> The ameslab linux-2.5 tree now has iSeries support that appears to be working well enough for people to start trying it out. There are still some problems with it, and we don't guarantee it won't eat your (virtual) disks, so don't try it on any systems with important data on them. * Virtual console is working but sometimes outputs rubbish and locks up. We need to forward-port the recent changes that went into the 2.4 iSeries virtual console code. * Virtual ethernet appears to be working well. * Virtual disk is mostly working but seems to crash the kernel in some error situations. * Native PCI support is still completely untested and probably doesn't work at all. Comments/questions/patches welcome. Paul. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jdewand at redhat.com Tue Nov 25 23:34:53 2003 From: jdewand at redhat.com (Julie DeWandel) Date: Tue, 25 Nov 2003 07:34:53 -0500 Subject: [PATCH] [2.4] Fix failing LTP testcase recvmsg01 References: <3FC295EA.6010009@austin.ibm.com> Message-ID: <3FC34C6D.1090904@redhat.com> Same fix is also needed for sys32_sendmsg(). -Julie Olof Johansson wrote: > Attached patch fixes the LTP testcase recvmsg01, we don't return the > same error (EMSGSIZE) for the > translated 32-bit system call. 2.5 is not affected, translations are > handled differently there. > > I'll push this to ameslab tomorrow. > > > -Olof > > -- > Olof Johansson Office: 4F005/905 > pSeries Linux Development IBM Systems Group > Email: olof at austin.ibm.com Phone: 512-838-9858 > All opinions are my own and not those of IBM > >------------------------------------------------------------------------ > >===== arch/ppc64/kernel/sys_ppc32.c 1.11 vs edited ===== >--- 1.11/arch/ppc64/kernel/sys_ppc32.c Tue Sep 2 10:49:30 2003 >+++ edited/arch/ppc64/kernel/sys_ppc32.c Mon Nov 24 16:21:08 2003 >@@ -3759,7 +3759,7 @@ > if(msghdr_from_user32_to_kern(&kern_msg, user_msg)) > return -EFAULT; > if(kern_msg.msg_iovlen > UIO_MAXIOV) >- return -EINVAL; >+ return -EMSGSIZE; > > uaddr = kern_msg.msg_name; > uaddr_len = &user_msg->msg_namelen; > > -- Julie DeWandel Red Hat, Inc. Tel (978) 692-3113 x23251 ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Wed Nov 26 00:05:00 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 25 Nov 2003 14:05:00 +0100 Subject: [PATCH] _syscall6 for 2.6 Message-ID: <20031125130500.GA29319@suse.de> This patch implements _syscall6 for ppc64, it is required for klibc. --- linuxppc64-2.5/include/asm-ppc64/unistd.h 2003-09-12 13:06:52.000000000 +0200 +++ linuxppc64-2.5/include/asm-ppc64/unistd.h 2003-11-24 20:26:02.000000000 +0100 @@ -287,6 +287,7 @@ register unsigned long __sc_5 __asm__ ("r5"); \ register unsigned long __sc_6 __asm__ ("r6"); \ register unsigned long __sc_7 __asm__ ("r7"); \ + register unsigned long __sc_8 __asm__ ("r8"); \ \ __sc_loadargs_##nr(name, args); \ __asm__ __volatile__ \ @@ -295,10 +296,10 @@ : "=&r" (__sc_0), \ "=&r" (__sc_3), "=&r" (__sc_4), \ "=&r" (__sc_5), "=&r" (__sc_6), \ - "=&r" (__sc_7) \ + "=&r" (__sc_7), "=&r" (__sc_8) \ : __sc_asm_input_##nr \ : "cr0", "ctr", "memory", \ - "r8", "r9", "r10","r11", "r12"); \ + "r9", "r10","r11", "r12"); \ __sc_ret = __sc_3; \ __sc_err = __sc_0; \ } \ @@ -326,6 +327,9 @@ #define __sc_loadargs_5(name, arg1, arg2, arg3, arg4, arg5) \ __sc_loadargs_4(name, arg1, arg2, arg3, arg4); \ __sc_7 = (unsigned long) (arg5) +#define __sc_loadargs_6(name, arg1, arg2, arg3, arg4, arg5, arg6) \ + __sc_loadargs_5(name, arg1, arg2, arg3, arg4, arg5); \ + __sc_8 = (unsigned long) (arg6) #define __sc_asm_input_0 "0" (__sc_0) #define __sc_asm_input_1 __sc_asm_input_0, "1" (__sc_3) @@ -333,6 +337,7 @@ #define __sc_asm_input_3 __sc_asm_input_2, "3" (__sc_5) #define __sc_asm_input_4 __sc_asm_input_3, "4" (__sc_6) #define __sc_asm_input_5 __sc_asm_input_4, "5" (__sc_7) +#define __sc_asm_input_6 __sc_asm_input_5, "6" (__sc_8) #define _syscall0(type,name) \ type name(void) \ @@ -369,6 +374,11 @@ type name(type1 arg1, type2 arg2, type3 { \ __syscall_nr(5, type, name, arg1, arg2, arg3, arg4, arg5); \ } +#define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,type5,arg5,type6,arg6) \ +type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5, type6 arg6) \ +{ \ + __syscall_nr(6, type, name, arg1, arg2, arg3, arg4, arg5, arg6); \ +} #ifdef __KERNEL_SYSCALLS__ -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From Franz.Sirl-ppc at lauterbach.com Wed Nov 26 00:26:18 2003 From: Franz.Sirl-ppc at lauterbach.com (Franz Sirl) Date: Tue, 25 Nov 2003 14:26:18 +0100 Subject: [PATCH] _syscall6 for 2.6 In-Reply-To: <20031125130500.GA29319@suse.de> References: <20031125130500.GA29319@suse.de> Message-ID: <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com> At 14:05 25.11.2003, Olaf Hering wrote: >This patch implements _syscall6 for ppc64, it is required for klibc. Hi Olaf, why do we need this in a 2.6 kernel? Can't we call everything directly now in-kernel? And using this _syscallN stuff in userspace is deprecated AFAIK and if there was some consensus across architectures, we could remove them completely. Franz. >--- linuxppc64-2.5/include/asm-ppc64/unistd.h 2003-09-12 >13:06:52.000000000 +0200 >+++ linuxppc64-2.5/include/asm-ppc64/unistd.h 2003-11-24 >20:26:02.000000000 +0100 >@@ -287,6 +287,7 @@ > register unsigned long __sc_5 __asm__ ("r5"); \ > register unsigned long __sc_6 __asm__ ("r6"); \ > register unsigned long __sc_7 __asm__ ("r7"); \ >+ register unsigned long __sc_8 __asm__ ("r8"); \ > \ > __sc_loadargs_##nr(name, args); \ > __asm__ __volatile__ \ >@@ -295,10 +296,10 @@ > : "=&r" (__sc_0), \ > "=&r" (__sc_3), "=&r" (__sc_4), \ > "=&r" (__sc_5), "=&r" (__sc_6), \ >- "=&r" (__sc_7) \ >+ "=&r" (__sc_7), "=&r" (__sc_8) \ > : __sc_asm_input_##nr \ > : "cr0", "ctr", "memory", \ >- "r8", "r9", "r10","r11", "r12"); \ >+ "r9", "r10","r11", "r12"); \ > __sc_ret = __sc_3; \ > __sc_err = __sc_0; \ > } \ >@@ -326,6 +327,9 @@ > #define __sc_loadargs_5(name, arg1, arg2, arg3, arg4, arg5) \ > __sc_loadargs_4(name, arg1, arg2, arg3, arg4); \ > __sc_7 = (unsigned long) (arg5) >+#define __sc_loadargs_6(name, arg1, arg2, arg3, arg4, arg5, arg6) \ >+ __sc_loadargs_5(name, arg1, arg2, arg3, arg4, arg5); \ >+ __sc_8 = (unsigned long) (arg6) > > #define __sc_asm_input_0 "0" (__sc_0) > #define __sc_asm_input_1 __sc_asm_input_0, "1" (__sc_3) >@@ -333,6 +337,7 @@ > #define __sc_asm_input_3 __sc_asm_input_2, "3" (__sc_5) > #define __sc_asm_input_4 __sc_asm_input_3, "4" (__sc_6) > #define __sc_asm_input_5 __sc_asm_input_4, "5" (__sc_7) >+#define __sc_asm_input_6 __sc_asm_input_5, "6" (__sc_8) > > #define _syscall0(type,name) \ > type > name(void) \ >@@ -369,6 +374,11 @@ type name(type1 arg1, type2 arg2, type3 > { \ > __syscall_nr(5, type, name, arg1, arg2, arg3, arg4, arg5); \ > } >+#define >_syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,type5,arg5,type6,arg6) >\ >+type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5, >type6 arg6) \ >+{ \ >+ __syscall_nr(6, type, name, arg1, arg2, arg3, arg4, arg5, >arg6); \ >+} > > #ifdef __KERNEL_SYSCALLS__ > >-- >USB is for mice, FireWire is for men! > >sUse lINUX ag, n??RNBERG > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Wed Nov 26 00:29:27 2003 From: olh at suse.de (Olaf Hering) Date: Tue, 25 Nov 2003 14:29:27 +0100 Subject: [PATCH] _syscall6 for 2.6 In-Reply-To: <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com> References: <20031125130500.GA29319@suse.de> <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com> Message-ID: <20031125132927.GA5072@suse.de> On Tue, Nov 25, Franz Sirl wrote: > At 14:05 25.11.2003, Olaf Hering wrote: > > >This patch implements _syscall6 for ppc64, it is required for klibc. > > Hi Olaf, > > why do we need this in a 2.6 kernel? Can't we call everything directly now > in-kernel? And using this _syscallN stuff in userspace is deprecated AFAIK > and if there was some consensus across architectures, we could remove them > completely. klibc includes the kernel headers, and uses the syscall6 macro. How should it be done? Maybe we can update klibc to use something else. Is there example code? -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From moilanen at austin.ibm.com Wed Nov 26 01:28:10 2003 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 25 Nov 2003 08:28:10 -0600 Subject: [PATCH][2.6] JS20 support In-Reply-To: <16322.44250.747555.159787@cargo.ozlabs.ibm.com> References: <1069715532.1258.68.camel@tin.ibm.com > <16322.44250.747555.159787@cargo.ozlabs.ibm.com> Message-ID: <1069770489.1264.132.camel@tin.ibm.com > > Why is this needed on a JS20 blade? My understanding is that this > patch is needed in a partitioned environment where we may get function > N of a PCI-PCI bridge assigned to a partition but not function 0. The > JS20 blade isn't partitioned, so why do we need this patch? You are correct, the JS20 blade is not partitioned. But on the 8111 the the second device is the LPC bus, IDE controller, and some other controllers we don't use. On this second device the LPC bus is function 0 and function 1 is the IDE controller. Firmware wanted to keep the LPC bus hidden since it is not available for use and if AIX sees an ISA bus, it will assume it has a certain level of ISA functionality. Because of this firmware does not have a function 0 in the device-tree. So without this patch we will not see the IDE controller. Thanks, Jake ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Wed Nov 26 03:32:52 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 25 Nov 2003 10:32:52 -0600 Subject: [PATCH] [2.4] Fix failing LTP testcase recvmsg01 In-Reply-To: <3FC34C6D.1090904@redhat.com> References: <3FC295EA.6010009@austin.ibm.com> <3FC34C6D.1090904@redhat.com> Message-ID: <3FC38434.90406@austin.ibm.com> Julie DeWandel wrote: > Same fix is also needed for sys32_sendmsg(). Right, thanks for catching that. I'll push the equivalent fix for sendmsg at the same time. Thanks, Olof -- Olof Johansson Office: 4F005/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Nov 26 03:54:18 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 10:54:18 -0600 Subject: [PATCH] _syscall6 for 2.6 In-Reply-To: <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com>; from Franz.Sirl-ppc@lauterbach.com on Tue, Nov 25, 2003 at 02:26:18PM +0100 References: <20031125130500.GA29319@suse.de> <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com> Message-ID: <20031125105418.A36052@forte.austin.ibm.com> On Tue, Nov 25, 2003 at 02:26:18PM +0100, Franz Sirl wrote: > > why do we need this in a 2.6 kernel? Can't we call everything directly now > in-kernel? And using this _syscallN stuff in userspace is deprecated AFAIK > and if there was some consensus across architectures, we could remove them > completely. Out of curiosity, what is this replaced by? Is the syscall ABI sufficiently spec'ed out so that glibc can safely "guess" the right way to make a syscall? (Since I thought glibc used _syscallN, or does it have its own macros?) --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From nathanl at austin.ibm.com Wed Nov 26 04:36:05 2003 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 25 Nov 2003 11:36:05 -0600 Subject: [PATCH] of_add_property and property list locking (2.5) In-Reply-To: <1069739322.669.0.camel@gaston> References: <3FC2E35E.30400@austin.ibm.com> <1069739322.669.0.camel@gaston> Message-ID: <3FC39305.1080306@austin.ibm.com> Benjamin Herrenschmidt wrote: > On Tue, 2003-11-25 at 16:06, Nathan Lynch wrote: >> >>We need the ability to add custom properties to OF device nodes in >>certain situations (e.g. cpu hotplug). And if we're going to be adding >>properties, we need some sort of locking for struct device_node's >>properties list. I've also added a public proc_device_tree_add_property >>interface to fs/proc/proc_devtree.c to avoid duplication of code. >> >>The natural thing to follow this would be deprecating get_property in >>favor of a function (of_get_property?) that takes the new lock when >>searching the property list. > > > Hrm... how do you protect the property content against deletion ? The only place that deletes properties afaik is the of_remove_node code. Properties are not deleted until their node is removed from the tree and the global list -- meaning that no other users should have references to the node or its properties at that time. If you are thinking of reference counts for properties, this is doable. But don't you think it could wait until we are deleting properties outside the context of deleting a device node? Nathan ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Nov 26 05:23:12 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 12:23:12 -0600 Subject: [PATCH][2.6] JS20 support In-Reply-To: <1069715532.1258.68.camel@tin.ibm.com >; from moilanen@austin.ibm.com on Mon, Nov 24, 2003 at 05:12:13PM -0600 References: <1069715532.1258.68.camel@tin.ibm.com > Message-ID: <20031125122312.A30364@forte.austin.ibm.com> Hi Jake, On Mon, Nov 24, 2003 at 05:12:13PM -0600, Jake Moilanen wrote: > diff -Nru a/drivers/ide/pci/amd74xx.c b/drivers/ide/pci/amd74xx.c > --- a/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 > +++ b/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 > @@ -374,7 +374,7 @@ > #endif /* DISPLAY_AMD_TIMINGS && CONFIG_PROC_FS */ > > > - return 0; > + return dev->irq; > } > > static void __init init_hwif_amd74xx(ide_hwif_t *hwif) FYI, 6 months ago I tried submitting a similar patch for a different ide driver. Alan Cox wrote back to say that doing this was wrong, and started talking something about 'legacy ide', and how this would break certain older pc's. I didn't understand the issue. So just be forwarned. Maybe Doug Maxey understands the issue? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Nov 26 05:51:03 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 12:51:03 -0600 Subject: Dumb kernel stacksize question Message-ID: <20031125125103.C57226@forte.austin.ibm.com> Is there a guideline for the largest thing one should alloc on a kernel stack? I want to have a temp buffer for a workspace, but doing a __get_free_page(GFP_KERNEL); free_page(); seems like a waste of time & resource If I can get a suitably large buffer on the stack. But I fear overflowing the kernel stack, which, if I understand correctly, is limited to 2 pages (or something like that?) --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Nov 26 06:01:13 2003 From: haveblue at us.ibm.com (Dave Hansen) Date: 25 Nov 2003 11:01:13 -0800 Subject: Dumb kernel stacksize question In-Reply-To: <20031125125103.C57226@forte.austin.ibm.com> References: <20031125125103.C57226@forte.austin.ibm.com> Message-ID: <1069786873.25213.110.camel@nighthawk> On Tue, 2003-11-25 at 10:51, linas at austin.ibm.com wrote: > Is there a guideline for the largest thing one should alloc on > a kernel stack? I want to have a temp buffer for a workspace, > but doing a __get_free_page(GFP_KERNEL); free_page(); seems like > a waste of time & resource If I can get a suitably large buffer > on the stack. But I fear overflowing the kernel stack, which, > if I understand correctly, is limited to 2 pages (or something > like that?) Look for THREAD_SIZE or THREAD_ORDER, those are the stack size definitions. How much you allocate on the stack largely depends on context. I tend to be a lot more careful when I know I'm far down in a call path than when I'm early in the system-call patch. In any case, a page is way too much to allocate on the stack, even with the 4 pages of stack on ppc64. We have a very fast single-page allocators, so your get/free overhead shouldn't really be a problem. The other options are a slab or some statically allocated per-cpu buffer. We can thank Martin Bligh's hot and cold pages for the fast 0-order allocations. -- Dave Hansen haveblue at us.ibm.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 26 06:02:16 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 26 Nov 2003 06:02:16 +1100 Subject: Dumb kernel stacksize question In-Reply-To: <20031125125103.C57226@forte.austin.ibm.com> References: <20031125125103.C57226@forte.austin.ibm.com> Message-ID: <20031125190216.GB26811@krispykreme> Hi, > Is there a guideline for the largest thing one should alloc on > a kernel stack? I want to have a temp buffer for a workspace, > but doing a __get_free_page(GFP_KERNEL); free_page(); seems like > a waste of time & resource If I can get a suitably large buffer > on the stack. But I fear overflowing the kernel stack, which, > if I understand correctly, is limited to 2 pages (or something > like that?) I tend to attack any function with over 1kB of stack usage. On ppc64 the kernel stack is actually 4 pages and we really do need that much. Below is a quick patch that warns whenever the kernel stack usage goes over 8kB, its not too hard to trip. Anton ===== arch/ppc64/Kconfig 1.31 vs edited ===== foo-anton/arch/ppc64/Kconfig | 4 ++++ foo-anton/arch/ppc64/kernel/irq.c | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff -puN arch/ppc64/Kconfig~debug_stackoverflow arch/ppc64/Kconfig --- foo/arch/ppc64/Kconfig~debug_stackoverflow 2003-11-17 12:16:49.623357752 -0600 +++ foo-anton/arch/ppc64/Kconfig 2003-11-17 12:16:49.631357801 -0600 @@ -331,6 +331,10 @@ config DEBUG_KERNEL Say Y here if you are developing drivers or trying to debug and identify kernel problems. +config DEBUG_STACKOVERFLOW + bool "Check for stack overflows" + depends on DEBUG_KERNEL + config DEBUG_SLAB bool "Debug memory allocations" depends on DEBUG_KERNEL diff -puN arch/ppc64/kernel/irq.c~debug_stackoverflow arch/ppc64/kernel/irq.c --- foo/arch/ppc64/kernel/irq.c~debug_stackoverflow 2003-11-17 12:16:49.627357777 -0600 +++ foo-anton/arch/ppc64/kernel/irq.c 2003-11-17 12:16:49.632357807 -0600 @@ -571,6 +571,21 @@ int do_IRQ(struct pt_regs *regs) irq_enter(); +#ifdef CONFIG_DEBUG_STACKOVERFLOW + /* Debugging check for stack overflow: is there less than 8KB free? */ + { + long sp; + + sp = (unsigned long)_get_SP() & (THREAD_SIZE-1); + + if (unlikely(sp < (sizeof(struct thread_info) + 8192))) { + printk("do_IRQ: stack overflow: %ld\n", + sp - sizeof(struct thread_info)); + dump_stack(); + } + } +#endif + #ifdef CONFIG_PPC_ISERIES lpaca = get_paca(); #ifdef CONFIG_SMP _ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From haveblue at us.ibm.com Wed Nov 26 06:17:24 2003 From: haveblue at us.ibm.com (Dave Hansen) Date: 25 Nov 2003 11:17:24 -0800 Subject: Dumb kernel stacksize question In-Reply-To: <20031125190216.GB26811@krispykreme> References: <20031125125103.C57226@forte.austin.ibm.com> <20031125190216.GB26811@krispykreme> Message-ID: <1069787844.25254.133.camel@nighthawk> On Tue, 2003-11-25 at 11:02, Anton Blanchard wrote: > Below is a quick patch that warns whenever the kernel stack usage > goes over 8kB, its not too hard to trip. It only catches you whenever you get an _interrupt_ and are deep into the stack. It's all a question of whether the interrupt comes in when you have your pants down. This will catch a lot of problems, but it isn't very deterministic. There was an x86 patch from Ben La Haise that used gcc's mcount feature to check stack depth at every function call, but it was never ported to ppc and will probably impair performance. How is ppc mcount support these days? -- Dave Hansen haveblue at us.ibm.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Nov 26 06:25:33 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 26 Nov 2003 06:25:33 +1100 Subject: Dumb kernel stacksize question In-Reply-To: <1069787844.25254.133.camel@nighthawk> References: <20031125125103.C57226@forte.austin.ibm.com> <20031125190216.GB26811@krispykreme> <1069787844.25254.133.camel@nighthawk> Message-ID: <20031125192532.GD26811@krispykreme> > It only catches you whenever you get an _interrupt_ and are deep into > the stack. It's all a question of whether the interrupt comes in when > you have your pants down. This will catch a lot of problems, but it > isn't very deterministic. Yep but its simple :) > There was an x86 patch from Ben La Haise that used gcc's mcount feature > to check stack depth at every function call, but it was never ported to > ppc and will probably impair performance. How is ppc mcount support > these days? It should be fine, some of the performance guys were using it a while ago I think. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 09:09:57 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 09:09:57 +1100 Subject: [PATCH] of_add_property and property list locking (2.5) In-Reply-To: <3FC39305.1080306@austin.ibm.com> References: <3FC2E35E.30400@austin.ibm.com> <1069739322.669.0.camel@gaston> <3FC39305.1080306@austin.ibm.com> Message-ID: <1069798196.6691.33.camel@gaston> > If you are thinking of reference counts for properties, this is doable. > But don't you think it could wait until we are deleting properties > outside the context of deleting a device node? No, that's ok, just wanted to be sure. One thing is that your current API doesn't allow for future addition of the deletion of property if we ever want it. For that, you would need, probably, to define an of_put_property Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Nov 26 09:22:01 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 16:22:01 -0600 Subject: [PATCH] [2.4] show TCE statistics in /proc/ppc64 Message-ID: <20031125162201.B20296@forte.austin.ibm.com> The attached patch should apply cleanly against the current ameslab-2.4 tree. It provides TCE usage statistics in /proc that might be useful to anyone interested in DMA performance or is debugging DMA usage in device drivers. I don't have BK access so I can't bk push anything at this time: Paul, if this patch looks good, can you apply & etc. as needed? The TCE usage stats are in the /proc/ppc64/tce directory. Get stats for all PCI devices by 'cat /proc/ppc64/tce/stats'. One can also get more detailed per-device stats by performing an 'echo "show bb:dd" > /proc/ppc64/tce/stats' where bb os the hex bus number, dd is the hex device number (cat /proc/ppc64/pci to figure these out). One can then 'cat /proc/ppc64/tce/detail-bb:dd' to see the detailed stats. Here's a sample of the 'detail' output: device 21:01 Fri Nov 21 14:04:07 PST 2003 total_use_cnt=92042 alloc_cnt=1267 max_alloc_cnt=1267 Level use_cnt split merge alloc maxaloc actual stale entries 0 29552 0 0 59 114 59 1 32768 1 1988 60 7 662 675 3 0 16384 2 3882 35 5 11 25 11 0 8192 3 7147 31 5 30 44 30 0 4096 4 12405 44 12 45 69 45 0 2048 5 28815 71 26 285 285 285 0 1024 6 8253 336 171 175 175 175 0 512 7 0 633 463 0 1 0 0 256 8 0 314 229 0 1 0 0 128 9 0 153 110 0 1 0 0 64 [dump of TCE bitmaps cut out] --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From dwm at austin.ibm.com Wed Nov 26 09:27:44 2003 From: dwm at austin.ibm.com (dwm at austin.ibm.com) Date: Tue, 25 Nov 2003 16:27:44 -0600 Subject: [PATCH][2.6] JS20 support In-Reply-To: linas's message of Tue, 25 Nov 2003 12:23:12 CST.<20031125122312.A30364@forte.austin.ibm.com> Message-ID: <200311252227.hAPMRi8o027048@falcon10.austin.ibm.com> Linas, Don't recall exactly, but it seems the issue being resolved in the ppc64 case is that just after this point, the code assumes that the values of the controller/drive have been set by the BIOS and reported. If the value is zero, use the values reported, which are not set, and the calling code exits before actually doing anything. The IDE controller is not configured:-/ Since we don't have a "BIOS", perhaps this should ber a conditional on CONFIG_PSERIES. from the keyboard of linas: > >Hi Jake, > > >On Mon, Nov 24, 2003 at 05:12:13PM -0600, Jake Moilanen wrote: >> diff -Nru a/drivers/ide/pci/amd74xx.c b/drivers/ide/pci/amd74xx.c >> --- a/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 >> +++ b/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 >> @@ -374,7 +374,7 @@ >> #endif /* DISPLAY_AMD_TIMINGS && CONFIG_PROC_FS */ >> >> >> - return 0; >> + return dev->irq; >> } >> >> static void __init init_hwif_amd74xx(ide_hwif_t *hwif) > > >FYI, 6 months ago I tried submitting a similar patch for a different >ide driver. Alan Cox wrote back to say that doing this was wrong, >and started talking something about 'legacy ide', and how this would >break certain older pc's. I didn't understand the issue. So just >be forwarned. Maybe Doug Maxey understands the issue? > >--linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Nov 26 10:09:21 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 17:09:21 -0600 Subject: [PATCH] [2.4] show TCE statistics in /proc/ppc64 In-Reply-To: <20031125162201.B20296@forte.austin.ibm.com>; from linas@austin.ibm.com on Tue, Nov 25, 2003 at 04:22:01PM -0600 References: <20031125162201.B20296@forte.austin.ibm.com> Message-ID: <20031125170921.A20298@forte.austin.ibm.com> Actually attaching the actual patch would be appropriate ... On Tue, Nov 25, 2003 at 04:22:01PM -0600, linas at austin.ibm.com wrote: > > The attached patch should apply cleanly against the current > ameslab-2.4 tree. It provides TCE usage statistics in /proc > that might be useful to anyone interested in DMA performance > or is debugging DMA usage in device drivers. > > I don't have BK access so I can't bk push anything at this time: > Paul, if this patch looks good, can you apply & etc. as needed? > > The TCE usage stats are in the /proc/ppc64/tce directory. > Get stats for all PCI devices by 'cat /proc/ppc64/tce/stats'. > One can also get more detailed per-device stats by performing an > 'echo "show bb:dd" > /proc/ppc64/tce/stats' > where bb os the hex bus number, dd is the hex device number > (cat /proc/ppc64/pci to figure these out). One can then > 'cat /proc/ppc64/tce/detail-bb:dd' to see the detailed stats. > > Here's a sample of the 'detail' output: > > device 21:01 > Fri Nov 21 14:04:07 PST 2003 > total_use_cnt=92042 alloc_cnt=1267 max_alloc_cnt=1267 > Level use_cnt split merge alloc maxaloc actual stale entries > 0 29552 0 0 59 114 59 1 32768 > 1 1988 60 7 662 675 3 0 16384 > 2 3882 35 5 11 25 11 0 8192 > 3 7147 31 5 30 44 30 0 4096 > 4 12405 44 12 45 69 45 0 2048 > 5 28815 71 26 285 285 285 0 1024 > 6 8253 336 171 175 175 175 0 512 > 7 0 633 463 0 1 0 0 256 > 8 0 314 229 0 1 0 0 128 > 9 0 153 110 0 1 0 0 64 > > [dump of TCE bitmaps cut out] > > --linas > > -------------- next part -------------- Index: arch/ppc64/config.in =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/config.in,v retrieving revision 1.1.1.4 diff -u -p -u -p -r1.1.1.4 config.in --- arch/ppc64/config.in 25 Nov 2003 20:04:34 -0000 1.1.1.4 +++ arch/ppc64/config.in 25 Nov 2003 20:43:04 -0000 @@ -50,6 +50,11 @@ bool 'Shared kernel/user space addressin tristate 'LPAR Configuration Data' CONFIG_LPARCFG +if [ "$CONFIG_PPC_ISERIES" != "y" ]; then + if [ "$CONFIG_PROC_FS" = "y" ]; then + bool 'Show realtime tce usage stats in /proc/ppc64/tce' CONFIG_TCE_STATS + fi +fi endmenu mainmenu_option next_comment Index: arch/ppc64/kdb/kdbasupport.c =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kdb/Attic/kdbasupport.c,v retrieving revision 1.1.1.1 diff -u -p -u -p -r1.1.1.1 kdbasupport.c --- arch/ppc64/kdb/kdbasupport.c 25 Nov 2003 20:04:34 -0000 1.1.1.1 +++ arch/ppc64/kdb/kdbasupport.c 25 Nov 2003 20:32:44 -0000 @@ -1617,11 +1617,16 @@ kdba_dump_tce_table(int argc, const char long tce_table_address; int nr; int i,j,k; - int full,empty; + int full,partial,empty; int fulldump=0; u64 mapentry; - int totalpages; + int freepages; int levelpages; +#ifdef CONFIG_TCE_STATS + struct tce_blk_stats *blk_stats; + int alloced_blocks, stale_blocks; + unsigned long alloc_jiffies; +#endif /* CONFIG_TCE_STATS */ if (argc == 0) { kdb_printf("need address\n"); @@ -1634,17 +1639,21 @@ kdba_dump_tce_table(int argc, const char if (strcmp(argv[2], "full") == 0) fulldump=1; - /* with address, read contents of memory and dump tce table. */ - /* possibly making some assumptions on the depth and size of table..*/ + /* use address to read contents of memory and dump tce table. */ - nr = kdba_readarea_size(tce_table_address+0 ,&kt.busNumber,8); - nr = kdba_readarea_size(tce_table_address+8 ,&kt.size,8); - nr = kdba_readarea_size(tce_table_address+16,&kt.startOffset,8); - nr = kdba_readarea_size(tce_table_address+24,&kt.base,8); - nr = kdba_readarea_size(tce_table_address+32,&kt.index,8); - nr = kdba_readarea_size(tce_table_address+40,&kt.tceType,8); +#define GET_TCE_VAL(X) \ + nr = kdba_readarea_size( \ + ((long) &(((struct TceTable *)tce_table_address)->X)), \ + &(kt.X), sizeof(kt.X)); + + GET_TCE_VAL (busNumber); + GET_TCE_VAL (size); + GET_TCE_VAL (startOffset); + GET_TCE_VAL (base); + GET_TCE_VAL (index); + GET_TCE_VAL (tceType); #ifdef CONFIG_SMP - nr = kdba_readarea_size(tce_table_address+48,&kt.lock,8); + GET_TCE_VAL (lock); #endif kdb_printf("\n"); @@ -1658,43 +1667,94 @@ kdba_dump_tce_table(int argc, const char #ifdef CONFIG_SMP kdb_printf("lock: 0x%x \n",(uint)kt.lock.lock); #endif - nr = kdba_readarea_size(tce_table_address+56,&kt.mlbm.maxLevel,8); - kdb_printf(" maxLevel: 0x%x \n",(uint)kt.mlbm.maxLevel); - totalpages=0; + GET_TCE_VAL (mlbm.maxLevel); + kdb_printf(" maxLevel: 0x%x \n",(uint)kt.mlbm.maxLevel); +#ifdef CONFIG_TCE_STATS + GET_TCE_VAL (use_cnt); + GET_TCE_VAL (alloc_cnt); + kdb_printf(" use_cnt: %d \n",(uint)kt.use_cnt); + kdb_printf(" alloc_cnt: %d \n",(uint)kt.alloc_cnt); +#endif + freepages=0; for (i=0;i 3*HZ) { + stale_blocks ++; + } + } + } + kdb_printf(" blk_stats: %p\n", blk_stats); + } else { + alloced_blocks = -1; + stale_blocks = -1; + } + GET_TCE_VAL (mlbm.level[i].use_cnt); + GET_TCE_VAL (mlbm.level[i].split_cnt); + GET_TCE_VAL (mlbm.level[i].merge_cnt); + GET_TCE_VAL (mlbm.level[i].alloc_cnt); + kdb_printf(" use_cnt: %d split: %d merge: %d alloced: %d cnt-alloc: %d stale: %d\n", + kt.mlbm.level[i].use_cnt, + kt.mlbm.level[i].split_cnt, + kt.mlbm.level[i].merge_cnt, + kt.mlbm.level[i].alloc_cnt, + alloced_blocks, + stale_blocks); +#endif /* if these dont match, this might not be a valid tce table, so dont try to iterate the map entries. */ if (kt.mlbm.level[i].numBits == 8*kt.mlbm.level[i].numBytes) { - full=0;empty=0;levelpages=0; + int n=0; + full=0;partial=0;empty=0;levelpages=0; for (j=0;j>= 56; + if (mapentry == 0xff) full++; + else if (mapentry) + partial++; else empty++; if (mapentry && fulldump) { - kdb_printf("0x%lx\n",mapentry); + if (n && (n%32 == 0)) kdb_printf ("\n"); + kdb_printf("%02lx ",(int) mapentry); + n++; } - for (k=0;(k<=64) && ((0x1UL<mlbm.level[i].use_cnt = 0; + tbl->mlbm.level[i].split_cnt = 0; + tbl->mlbm.level[i].merge_cnt = 0; + tbl->mlbm.level[i].alloc_cnt = 0; + tbl->mlbm.level[i].max_alloc_cnt = 0; + tbl->mlbm.level[i].blk_stats = 0x0; + } + + tbl->use_cnt = 0; + tbl->alloc_cnt = 0; + tbl->max_alloc_cnt = 0; +} + +#endif /* CONFIG_TCE_STATS */ /* * Build a TceTable structure. This contains a multi-level bit map which * is used to manage allocation of the tce space. @@ -276,7 +300,6 @@ struct TceTable *build_tce_table(struct } /* For the highest level, turn on all the bits */ - i = tbl->mlbm.maxLevel; p = tbl->mlbm.level[i].map; m = numBits[i]; @@ -301,6 +324,10 @@ struct TceTable *build_tce_table(struct } } +#ifdef CONFIG_TCE_STATS + init_tce_stats (tbl); +#endif /* CONFIG_TCE_STATS */ + return tbl; } @@ -364,6 +391,37 @@ static long alloc_tce_range_nolock( stru */ PPCDBG(PPCDBG_TCE, "alloc_tce_range_nolock: allocating block %ld, (byte=%ld, bit=%ld) order %d\n", block, i, bit, order ); tcenum = block << order; +#ifdef CONFIG_TCE_STATS + if (tbl->mlbm.level[order].blk_stats) { + tbl->mlbm.level[order].blk_stats[block].use_cnt ++; + tbl->mlbm.level[order].blk_stats[block].alloc_jiffies = jiffies; + } + tbl->mlbm.level[order].use_cnt ++; + tbl->use_cnt ++; + tbl->mlbm.level[order].alloc_cnt ++; + tbl->alloc_cnt ++; + if (tbl->mlbm.level[order].max_alloc_cnt < + tbl->mlbm.level[order].alloc_cnt) { + tbl->mlbm.level[order].max_alloc_cnt = + tbl->mlbm.level[order].alloc_cnt; + } + if (tbl->max_alloc_cnt < tbl->alloc_cnt) { + tbl->max_alloc_cnt = tbl->alloc_cnt; + } + +#define THRESHOLD 1000 + static int watermark = THRESHOLD; + if (tbl->alloc_cnt > watermark) { + printk ("alloc_tce_range: more than %d ranges alloced (%d)\n", + watermark, tbl->alloc_cnt); + watermark += THRESHOLD; + } + if (((int)tbl->alloc_cnt) < ((int)(watermark - 2*THRESHOLD))) { + watermark -= THRESHOLD; + printk ("alloc_tce_range: alloc usage dropped below %d (%d)\n", + watermark-THRESHOLD, tbl->alloc_cnt); + } +#endif /* CONFIG_TCE_STATS */ return tcenum; } ++map; @@ -388,6 +446,25 @@ static long alloc_tce_range_nolock( stru if((tcenum == -1) && (order < (NUM_TCE_LEVELS - 1))) { tcenum = alloc_tce_range_nolock( tbl, order+1 ); if ( tcenum != -1 ) { +#ifdef CONFIG_TCE_STATS + /* fix up stats for 'what we actually used' */ + if (tbl->mlbm.level[order].blk_stats) { + tbl->mlbm.level[order].blk_stats[(tcenum>>order)].alloc_jiffies = jiffies; + tbl->mlbm.level[order].blk_stats[(tcenum>>order)].use_cnt ++; + tbl->mlbm.level[order].blk_stats[(tcenum>>order)+1].alloc_jiffies = jiffies; + } + if (tbl->mlbm.level[order+1].blk_stats) { + tbl->mlbm.level[order+1].blk_stats[(tcenum>>(order+1))].alloc_jiffies = (unsigned long) -1; + tbl->mlbm.level[order+1].blk_stats[(tcenum>>(order+1))].use_cnt --; + } + tbl->mlbm.level[order].use_cnt ++; + tbl->mlbm.level[order+1].use_cnt --; + tbl->mlbm.level[order+1].split_cnt ++; + + tbl->mlbm.level[order+1].alloc_cnt --; /* uncount higher order */ + tbl->mlbm.level[order].alloc_cnt +=2; /* count twice, since next free will uncount */ + tbl->alloc_cnt ++; /* count 'twice' since free will uncount */ +#endif /* CONFIG_TCE_STATS */ free_tce_range_nolock( tbl, tcenum+(1<> bit; bytep = map + byte; +#ifdef CONFIG_TCE_STATS + tbl->alloc_cnt --; + tbl->mlbm.level[order].alloc_cnt --; + if (tbl->mlbm.level[order].blk_stats) { + if (0 == tbl->mlbm.level[order].blk_stats[block].alloc_jiffies) { + printk("PCI_DMA: Freeing tce that wasn't alloced: busno 0x%lx tcenum %lx, order %x\n", tbl->busNumber, tcenum,order); + } + tbl->mlbm.level[order].blk_stats[block].alloc_jiffies = 0; + } +#endif /* CONFIG_TCE_STATS */ + #ifdef DEBUG_TCE PPCDBG(PPCDBG_TCE,"free_tce_range_nolock: freeing block %ld (byte=%d, bit=%d) of order %d\n", block, byte, bit, order); @@ -487,6 +575,11 @@ void free_tce_range_nolock(struct TceTab PPCDBG(PPCDBG_TCE, "free_tce_range: buddying blocks %ld & %ld\n", block, block+1); +#ifdef CONFIG_TCE_STATS + tbl->mlbm.level[order+1].merge_cnt ++; + tbl->alloc_cnt ++; /* undo excess counting */ + tbl->mlbm.level[order+1].alloc_cnt ++; /* undo excess counts */ +#endif /* CONFIG_TCE_STATS */ free_tce_range_nolock( tbl, tcenum, order+1 ); } } @@ -757,8 +850,8 @@ void create_tce_tables(void) { void create_pci_bus_tce_table( unsigned long token ) { struct TceTable * newTceTable; - PPCDBG(PPCDBG_TCE, "Entering create_pci_bus_tce_table.\n"); - PPCDBG(PPCDBG_TCE, "\ttoken = 0x%lx\n", token); + PPCDBG(PPCDBG_TCEINIT, "Entering create_pci_bus_tce_table.\n"); + PPCDBG(PPCDBG_TCEINIT, "\ttoken = 0x%lx\n", token); newTceTable = (struct TceTable *)kmalloc( sizeof(struct TceTable), GFP_KERNEL ); @@ -1084,7 +1177,7 @@ dma_addr_t pci_map_single(struct pci_dev unsigned order, nPages; PPCDBG(PPCDBG_TCE, "pci_map_single:\n"); - PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, size = 0x%16.16lx, direction = 0x%16.16lx, vaddr = 0x%16.16lx\n", hwdev, size, direction, vaddr); + PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, size = 0x%lx, direction = %ld, vaddr = 0x%16.16lx\n", hwdev, size, direction, vaddr); if (direction == PCI_DMA_NONE) BUG(); @@ -1297,7 +1390,7 @@ static dma_addr_t create_tces_sg(struct /* Client asked for way to much space. This is checked later anyway */ /* It is easier to debug here for the drivers than in the tce tables.*/ if(order >= NUM_TCE_LEVELS) { - printk("PCI_DMA: create_tces_sg size too large: 0x%llx \n",(numTces << PAGE_SHIFT)); + printk("PCI_DMA: create_tces_sg size too large: 0x%x \n",(numTces << PAGE_SHIFT)); panic("numTces is off"); return NO_TCE; } @@ -1403,7 +1496,7 @@ void pci_unmap_sg( struct pci_dev *hwdev dma_addr_t dma_end_page, dma_start_page; PPCDBG(PPCDBG_TCE, "pci_unmap_sg:\n"); - PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, sg = 0x%16.16lx, direction = 0x%16.16lx, nelms = 0x%16.16lx\n", hwdev, sg, direction, nelms); + PPCDBG(PPCDBG_TCE, "\thwdev = 0x%16.16lx, sg = 0x%16.16lx, direction = %ld, nelms = %ld\n", hwdev, sg, direction, nelms); if ( direction == PCI_DMA_NONE || nelms == 0 ) BUG(); @@ -1425,7 +1518,7 @@ void pci_unmap_sg( struct pci_dev *hwdev /* Client asked for way to much space. This is checked later anyway */ /* It is easier to debug here for the drivers than in the tce tables.*/ if(order >= NUM_TCE_LEVELS) { - printk("PCI_DMA: dma_start_page:0x%lx dma_end_page:0x%lx\n",dma_start_page,dma_end_page); + printk("PCI_DMA: dma_start_page:0x%x dma_end_page:0x%x\n",dma_start_page,dma_end_page); printk("PCI_DMA: pci_unmap_sg size too large: 0x%x \n",(numTces << PAGE_SHIFT)); return; } Index: arch/ppc64/kernel/proc_pmc.c =================================================================== RCS file: /home/linas/cvsroot/linux24/arch/ppc64/kernel/proc_pmc.c,v retrieving revision 1.1.1.3 diff -u -p -u -p -r1.1.1.3 proc_pmc.c --- arch/ppc64/kernel/proc_pmc.c 25 Nov 2003 20:04:36 -0000 1.1.1.3 +++ arch/ppc64/kernel/proc_pmc.c 25 Nov 2003 20:32:44 -0000 @@ -47,6 +47,11 @@ /* pci Flight Recorder AHT */ extern void proc_pciFr_init(struct proc_dir_entry *proc_ppc64_root); +#ifdef CONFIG_TCE_STATS +/* PCI TCE stats interface */ +extern void proc_tce_init(struct proc_dir_entry *proc_ppc64_root); +#endif /* CONFIG_TCE_STATS */ + static int proc_pmc_control_mode = 0; struct proc_dir_entry *proc_ppc64_root = NULL; @@ -188,6 +193,11 @@ void proc_ppc64_init(void) /* Create the /proc/ppc64/pcifr for the Pci Flight Recorder. */ proc_pciFr_init(proc_ppc64_root); + +#ifdef CONFIG_TCE_STATS + /* Create the /proc/ppc64/tce entry for TCE stats/debugging */ + proc_tce_init (proc_ppc64_root); +#endif /* CONFIG_TCE_STATS */ proc_ppc64_pmc_root = proc_mkdir("pmc", proc_ppc64_root); Index: include/asm-ppc64/pci_dma.h =================================================================== RCS file: /home/linas/cvsroot/linux24/include/asm-ppc64/pci_dma.h,v retrieving revision 1.1.1.1 diff -u -p -u -p -r1.1.1.1 pci_dma.h --- include/asm-ppc64/pci_dma.h 15 Jul 2003 16:54:54 -0000 1.1.1.1 +++ include/asm-ppc64/pci_dma.h 25 Nov 2003 21:27:08 -0000 @@ -53,10 +53,32 @@ union Tce { } tceBits; }; +#ifdef CONFIG_TCE_STATS +struct tce_blk_stats { + unsigned long alloc_jiffies; /* time when last allocated, helps find leaks */ + unsigned int use_cnt; /* how many times this block has been alloced */ + char direction; /* last i/o direction */ +}; +#endif /* CONFIG_TCE_STATS */ + struct Bitmap { unsigned long numBits; unsigned long numBytes; unsigned char * map; +#ifdef CONFIG_TCE_STATS + unsigned int use_cnt; /* num of blocks that were ever alloced */ + + /* The split/merge counts provide stats about the buddy system, + * helping debug fragmentation problems. */ + unsigned int split_cnt; /* num blocks split to make smaller blocks */ + unsigned int merge_cnt; /* num blocks buddied back up by free */ + + unsigned int alloc_cnt; /* num alloc's currently pending */ + unsigned int max_alloc_cnt; /* highest num alloc's ever */ + + /* Individual block stats should help debug alloc leaks. */ + struct tce_blk_stats * blk_stats; +#endif /* CONFIG_TCE_STATS */ }; struct MultiLevelBitmap { @@ -73,6 +95,11 @@ struct TceTable { u64 tceType; spinlock_t lock; struct MultiLevelBitmap mlbm; +#ifdef CONFIG_TCE_STATS + unsigned int use_cnt; /* num alloc's there were ever made */ + unsigned int alloc_cnt; /* num alloc's currently pending */ + unsigned int max_alloc_cnt; /* highest num alloc's ever */ +#endif /* CONFIG_TCE_STATS */ }; struct TceTableManagerCB { --- arch/ppc64/kernel/proc_tce.c.orig 2003-11-21 18:34:35.000000000 -0600 +++ arch/ppc64/kernel/proc_tce.c 2003-11-24 18:11:10.000000000 -0600 @@ -0,0 +1,484 @@ +/* + * proc_tce.c + * Copyright (C) 2003 Linas Vepstas, IBM Corporation + * + * Dynamic DMA mapping statistics support. + * + * Manages the TCE space assigned to this partition. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include + +#include "pci.h" + +#ifdef CONFIG_TCE_STATS + +static struct proc_dir_entry *proc_ppc64_tce_root = NULL; + +/* ================================================================= */ +/* Alloc the detail-stats array. */ + +static inline int +get_tce_stats_bytes(struct TceTable * tbl) +{ + int num_entries, num_bytes; + + num_entries = tbl->mlbm.level[0].numBits; + num_entries *= 2; /* room for other levels as well */ + num_bytes = num_entries * sizeof( struct tce_blk_stats ); + return num_bytes; +} + +static inline int +get_tce_stats_order(struct TceTable * tbl) +{ + return get_order (get_tce_stats_bytes(tbl)); +} + +#define TRUE 1 +#define FALSE 0 + +static int +blk_is_alloced (struct TceTable * tbl, int tcenum, int order) +{ + unsigned byte, bit; + unsigned char mask, *bytep; + + if (order < 0) return TRUE; + if (order > tbl->mlbm.maxLevel) return TRUE; + bit = tcenum >> order; + byte = bit /8; + bit = bit%8; + mask = 0x80>>bit; + bytep = tbl->mlbm.level[order].map + byte; + if (mask & *bytep) return FALSE; + + /* check downwards */ + if (FALSE == blk_is_alloced (tbl, tcenum, order-1)) return FALSE; + if (FALSE == blk_is_alloced (tbl, tcenum+1, order-1)) return FALSE; + + return blk_is_alloced (tbl, tcenum, order+1); +} + +static void +setup_detail_tce_stats(struct TceTable * tbl) +{ + int i; + struct tce_blk_stats *p; + + /* Alloc per-block stats array */ + p = (struct tce_blk_stats *) + __get_free_pages( GFP_ATOMIC, get_tce_stats_order(tbl)); + + /* alloc may fail for large areas; keep driving */ + if (p) memset( p, 0, get_tce_stats_bytes(tbl) ); + + for (i=0; i<=tbl->mlbm.maxLevel; ++i) { + tbl->mlbm.level[i].use_cnt = 0; + tbl->mlbm.level[i].split_cnt = 0; + tbl->mlbm.level[i].merge_cnt = 0; + tbl->mlbm.level[i].max_alloc_cnt = 0; + + if (p) { + tbl->mlbm.level[i].blk_stats = p; + p += tbl->mlbm.level[i].numBits; + } else { + tbl->mlbm.level[i].blk_stats = 0x0; + } + } + + tbl->use_cnt = 0; + tbl->max_alloc_cnt = 0; + +#if 0 + /* make block stats match current bitmap */ + for (i=0; i<=tbl->mlbm.maxLevel; ++i) { + p = tbl->mlbm.level[i].blk_stats; + if (p) { + int j; + for (j=0; jmlbm.level[i].numBits; j++) { + int tcenum = j<mlbm.level[0].blk_stats; + if (!p) return; + int i; + for (i=0; i<=tbl->mlbm.maxLevel; ++i) { + tbl->mlbm.level[i].blk_stats = NULL; + } + free_pages ((unsigned long)p, get_tce_stats_order(tbl)); +} + +/* ================================================================= */ +#define SZ ((0<(count-n))?(count-n):0) + +static ssize_t +proc_tce_detail_read (struct file * file, char * user_buf, + size_t count, loff_t *ppos) +{ + int n = 0; + + /* Find the tce table */ + struct inode * inode = file->f_dentry->d_inode; + struct proc_dir_entry * dp; + dp = (struct proc_dir_entry *) inode->u.generic_ip; + struct TceTable *tbl = dp->data; + + char * buf = (char*) __get_free_page(GFP_KERNEL); + if (!buf) return -ENOMEM; + + /* start of virtual pci_for_each_dev(pdev_iter) */ + static int loop_iter; + if (*ppos == 0) { + loop_iter = 0; + + /* print header, summary stats */ + n += snprintf (buf+n, SZ, "total_use_cnt=%d", tbl->use_cnt); + n += snprintf (buf+n, SZ, " alloc_cnt=%d", tbl->alloc_cnt); + n += snprintf (buf+n, SZ, " max_alloc_cnt=%d\n", tbl->max_alloc_cnt); + n += snprintf (buf+n, SZ, + "\tLevel\tuse_cnt\tsplit\tmerge\talloc\tmaxaloc\tactual\tstale\tentries\n"); + + int i; + for (i=0; i<= tbl->mlbm.maxLevel; i++) { + struct Bitmap *lvl = &tbl->mlbm.level[i]; + + struct tce_blk_stats * blk_stats; + blk_stats = lvl->blk_stats; + + int alloced_blocks=0, stale_blocks=0; + if (blk_stats) { + + /* alloc_jiffies will be set if the block is + * allocated and not freed. Stale blocks suggest + * a leak or a really slow i/o system */ + int j; + for (j=0; jnumBits; j++) { + unsigned long alloc_jiffies = blk_stats[j].alloc_jiffies; + if (alloc_jiffies && alloc_jiffies != ((unsigned long) -1)) { + alloced_blocks++; + /* 'stale' if alloc happened more than 3 seconds ago */ + if (jiffies - alloc_jiffies > 3*HZ) { + stale_blocks ++; + } + } + } + } else { + n += snprintf (buf+n, SZ, "\t*** No Block Stats Available ***\n"); + } + n += snprintf (buf+n, SZ, + "\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%ld\n", i, + lvl->use_cnt, + lvl->split_cnt, + lvl->merge_cnt, + lvl->alloc_cnt, + lvl->max_alloc_cnt, + alloced_blocks, + stale_blocks, + lvl->numBits); + } + n += snprintf (buf+n, SZ, "\n"); + + /* we are done printing header */ + cond_resched(); + if (n > count) n = count; + copy_to_user (user_buf, buf, n); + free_page((unsigned long) buf); + + *ppos += n; + return n; + } + + /* end of iteration over levels */ + if (loop_iter > tbl->mlbm.maxLevel) { + free_page((unsigned long) buf); + return 0; + } + + struct Bitmap *lvl = &tbl->mlbm.level[loop_iter]; + + /* Dump bits for each level */ + n += snprintf (buf+n, SZ, "\nlevel[%d] num_entries=%ld\n", + loop_iter, lvl->numBits); + + struct tce_blk_stats * blk_stats; + blk_stats = lvl->blk_stats; + + if (blk_stats) { + int i; + for (i=0; inumBytes; i++) { + if (i && 0 == i%4) n += snprintf (buf+n, SZ, " "); + if (i && 0 == i%32) n += snprintf (buf+n, SZ, "\n"); + n += snprintf (buf+n, SZ, "%02x", lvl->map[i]); + if (count-n < 10) break; + } + } + if (count-n < 10) { n += snprintf (buf+n, SZ, "..."); } + n += snprintf (buf+n, SZ, "\n"); + + /* iterate loop one more time. */ + loop_iter ++; + + cond_resched(); + if (n > count) n = count; + copy_to_user (user_buf, buf, n); + free_page((unsigned long) buf); + + *ppos += n; + return n; +} + +static ssize_t +proc_tce_detail_write(struct file * file, const char * buf, + size_t count, loff_t *ppos) +{ + return count; +} + +static int +proc_tce_detail_unlink (struct inode *inode, struct dentry *dent) +{ + struct proc_dir_entry * dp; + dp = (struct proc_dir_entry *) inode->u.generic_ip; + struct TceTable *tbl = dp->data; + + printk ("attempt cleanup of tce stats upon file deletion tbl=%p\n", tbl); + + teardown_detail_tce_stats(tbl); + remove_proc_entry(dp->name, dp->parent); + return 0; +} + +/* ================================================================= */ + +struct file_operations tce_detail_stats_operations = { + .read = proc_tce_detail_read, + .write = proc_tce_detail_write +}; + +struct inode_operations tce_detail_inode_ops = { + .unlink = proc_tce_detail_unlink, +}; + +/* ================================================================= */ + +static ssize_t +proc_tce_stats_read (struct file * file, char * user_buf, + size_t count, loff_t *ppos) +{ + int n = 0; + + static struct pci_dev *pdev_iter; + + /* start of virtual pci_for_each_dev(pdev_iter) */ + if (*ppos == 0) { + pdev_iter = pci_dev_g(pci_devices.next); + } + + /* while not done virtual pci_for_each_dev(pdev_iter) */ + if (pdev_iter == pci_dev_g(&pci_devices)) { + return 0; + } + + char * buf = (char*) __get_free_page(GFP_KERNEL); + if (!buf) return -ENOMEM; + + /* Attempt to print just one device per call, so as to not + * overflow the user's buffer. If user gives us too small + * a buffer, we'll send the garbled data but who cares. */ + while (pdev_iter != pci_dev_g(&pci_devices)) { + if (PCI_SLOT(pdev_iter->devfn) == 0) goto try_again; + if (pdev_iter->sysdata == NULL) goto try_again; + + n += format_device_location (pdev_iter, buf+n, SZ); + n += snprintf (buf+n, SZ, "\n"); + struct device_node *dn = (struct device_node *)pdev_iter->sysdata; + if (!dn) goto try_again; + struct TceTable *tbl = dn->tce_table; + if (!tbl) goto try_again; + n += snprintf (buf+n, SZ, "\ttotal_use_cnt=%d", tbl->use_cnt); + n += snprintf (buf+n, SZ, " alloc_cnt=%d", tbl->alloc_cnt); + n += snprintf (buf+n, SZ, " max_alloc_cnt=%d\n", tbl->max_alloc_cnt); + + n += snprintf (buf+n, SZ, + "\tLevel\tuse_cnt\tsplit\tmerge\talloc\tmax_allo\n"); + int i; + for (i=0; i<= tbl->mlbm.maxLevel; i++) { + n += snprintf (buf+n, SZ, + "\t%d\t%d\t%d\t%d\t%d\t%d\n", + i, tbl->mlbm.level[i].use_cnt, + tbl->mlbm.level[i].split_cnt, + tbl->mlbm.level[i].merge_cnt, + tbl->mlbm.level[i].alloc_cnt, + tbl->mlbm.level[i].max_alloc_cnt); + } + break; +try_again: + pdev_iter = pci_dev_g(pdev_iter->global_list.next); + } + n += snprintf (buf+n, SZ, "\n"); + + /* iterate once for next time */ + pdev_iter = pci_dev_g(pdev_iter->global_list.next); + + cond_resched(); + if (n > count) n = count; + copy_to_user (user_buf, buf, n); + free_page((unsigned long) buf); + + *ppos += n; + return n; +} + +static ssize_t +proc_tce_stats_write(struct file * file, const char * buf, + size_t count, loff_t *ppos) +{ + if (!buf || count == 0) return 0; + + /* the 'reset' keyword zero's out the stats for all pci devices */ + if (0 == strncmp (buf, "reset", 5)) { + struct pci_dev *pdev; + + pci_for_each_dev(pdev) { + if (PCI_SLOT(pdev->devfn) == 0) continue; + if (pdev->sysdata == NULL) continue; + + struct device_node *dn = (struct device_node *)pdev->sysdata; + if (!dn) continue; + struct TceTable *tbl = dn->tce_table; + if (!tbl) continue; + + int i; + for (i=0; i<= tbl->mlbm.maxLevel; i++) { + tbl->mlbm.level[i].use_cnt = 0; + tbl->mlbm.level[i].split_cnt = 0; + tbl->mlbm.level[i].merge_cnt = 0; + tbl->mlbm.level[i].max_alloc_cnt = 0; + } + tbl->use_cnt = 0; + tbl->max_alloc_cnt = 0; + teardown_detail_tce_stats (tbl); + } + *ppos += count; + return count; + } + + /* The 'show' keyword attempts to enable collection of detailed stats + * for the indicated bus:deviceid */ + if (0 == strncmp (buf, "show", 4)) { + char * p = strchr (buf, ':'); + if (!p) return count; + unsigned long busno = simple_strtoul (buf+5, &p , 16); + if (!p) return count; + unsigned long devno = simple_strtoul (p+1, NULL , 16); + // printk ("parsed out bus=0x%lx dev=0x%lx\n", busno, devno); + + /* try to find the matching pci_dev */ + struct pci_dev *pdev; + struct device_node *dn; + struct TceTable *tbl; + + pci_for_each_dev(pdev) { + if (devno != PCI_SLOT(pdev->devfn)) continue; + if (busno != pdev->bus->number) continue; + if (pdev->sysdata == NULL) continue; + dn = (struct device_node *)pdev->sysdata; + if (!dn) continue; + tbl = dn->tce_table; + if (!tbl) continue; + break; + } + if (pdev == pci_dev_g(&pci_devices)) { + printk (KERN_INFO "tce_stats: uanble to find device %lx:%lx\n", busno, devno); + return count; + } + setup_detail_tce_stats(tbl); + + /* Create the coresponding entry in the proc table */ + char fname[100]; + snprintf (fname, 100, "detail-%02lx:%02lx",busno, devno); + struct proc_dir_entry *ent; + ent = create_proc_entry (fname, S_IWUSR|S_IRUGO, proc_ppc64_tce_root); + if (!ent) { + teardown_detail_tce_stats(tbl); + return count; + } + + ent->proc_fops = &tce_detail_stats_operations; + ent->proc_iops = &tce_detail_inode_ops; + // ent->read_proc = proc_tce_page_read; + ent->data = tbl; + + return count; + } + + *ppos += count; + return count; +} + +/* ================================================================= */ + +struct file_operations tce_stats_operations = { + .read = proc_tce_stats_read, + .write = proc_tce_stats_write +}; + +/* ================================================================= */ +/* Create entry /proc/ppc64/tce */ + +void proc_tce_init(struct proc_dir_entry *proc_ppc64_root) +{ + struct proc_dir_entry *ent = NULL; + + if (!proc_ppc64_root) return; + + printk(KERN_INFO "proc_tce: creating /proc/ppc64/tce\n"); + ent = proc_mkdir("tce", proc_ppc64_root); + if (!ent) { + printk (KERN_ERR "Failed to create /proc/ppc64/tce\n"); + return; + } + proc_ppc64_tce_root = ent; + + /* create the 'listener' */ + ent = create_proc_entry ("stats", S_IWUSR|S_IRUGO, proc_ppc64_tce_root); + if (!ent) return; + + ent->proc_fops = &tce_stats_operations; + +} + +#endif /* CONFIG_TCE_STATS */ +/* ============================= END OF FILE ================================ */ From linas at austin.ibm.com Wed Nov 26 10:17:22 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 25 Nov 2003 17:17:22 -0600 Subject: [PATCH][2.6] JS20 support In-Reply-To: <200311252227.hAPMRi8o027048@falcon10.austin.ibm.com>; from dwm@austin.ibm.com on Tue, Nov 25, 2003 at 04:27:44PM -0600 References: <20031125122312.A30364@forte.austin.ibm.com> <200311252227.hAPMRi8o027048@falcon10.austin.ibm.com> Message-ID: <20031125171722.B20298@forte.austin.ibm.com> On Tue, Nov 25, 2003 at 04:27:44PM -0600, dwm at austin.ibm.com wrote: > Linas, > > > Don't recall exactly, but it seems the issue being resolved in the > ppc64 case is that just after this point, the code assumes that the > values of the controller/drive have been set by the BIOS and reported. > If the value is zero, use the values reported, which are not set, > and the calling code exits before actually doing anything. The IDE > controller is not configured:-/ > > Since we don't have a "BIOS", perhaps this should ber a conditional on > CONFIG_PSERIES. Right. Its just that I never figured out what an "acceptable" patch would look like; I thought you might know. Of course, if Jake sneaks this one in, under the radar, and no one notices, then its a moot issue, right? --linas (fwiw, Alan seemed to imply that there were two classes of ide controllers, 'new' and 'legacy', and that returning dev->irq was wrong only for the 'legacy' ide controllers. Since the amd74xx.c is presumably 'new', then maybe this patch is 'correct'.) > > from the keyboard of linas: > > > >Hi Jake, > > > > > >On Mon, Nov 24, 2003 at 05:12:13PM -0600, Jake Moilanen wrote: > >> diff -Nru a/drivers/ide/pci/amd74xx.c b/drivers/ide/pci/amd74xx.c > >> --- a/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 > >> +++ b/drivers/ide/pci/amd74xx.c Mon Nov 24 14:23:40 2003 > >> @@ -374,7 +374,7 @@ > >> #endif /* DISPLAY_AMD_TIMINGS && CONFIG_PROC_FS */ > >> > >> > >> - return 0; > >> + return dev->irq; > >> } > >> > >> static void __init init_hwif_amd74xx(ide_hwif_t *hwif) > > > > > >FYI, 6 months ago I tried submitting a similar patch for a different > >ide driver. Alan Cox wrote back to say that doing this was wrong, > >and started talking something about 'legacy ide', and how this would > >break certain older pc's. I didn't understand the issue. So just > >be forwarned. Maybe Doug Maxey understands the issue? > > > >--linas > > ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 11:37:05 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 11:37:05 +1100 Subject: [PATCH][2.6] JS20 support In-Reply-To: <20031125122312.A30364@forte.austin.ibm.com> References: <1069715532.1258.68.camel@tin.ibm.com > <20031125122312.A30364@forte.austin.ibm.com> Message-ID: <1069807025.671.71.camel@gaston> > FYI, 6 months ago I tried submitting a similar patch for a different > ide driver. Alan Cox wrote back to say that doing this was wrong, > and started talking something about 'legacy ide', and how this would > break certain older pc's. I didn't understand the issue. So just > be forwarned. Maybe Doug Maxey understands the issue? it's a mess. Basically, this "fix" is needed for anything "sane" but will break a few broken x86 (and I suppose x86-wanabee like PReP) setups, and since Linux is all about running on broken x86 hardware, we can't get it fixed properly =P The root of the problem is the amoung of tricks done by x86 hardware to mimmic the good old disk controllers at fixed port & fixed interrupt addresses found in old HW and to try to make anything recent still some what compatible with this old junk. Mix that with the fact that both channel interrupts may be wired to different physical lines and not to the PCI interrupt line, just so they end up beeing on the old "legacy" disk interrupt numbers and you get an idea of how much of a mess it is. So far, I had rather good results by forcing the controllers into "fully native" mode from the arch PCI quirks though. Something around those lines of what I have in the pmac code: void pmac_pci_fixup_pciata(struct pci_dev* dev) { u8 progif = 0; /* * On PowerMacs, we try to switch any PCI ATA controller to * fully native mode */ if (_machine != _MACH_Pmac) return; /* Some controllers don't have the class IDE */ if (dev->vendor == PCI_VENDOR_ID_PROMISE) switch(dev->device) { case PCI_DEVICE_ID_PROMISE_20246: case PCI_DEVICE_ID_PROMISE_20262: case PCI_DEVICE_ID_PROMISE_20263: case PCI_DEVICE_ID_PROMISE_20265: case PCI_DEVICE_ID_PROMISE_20267: case PCI_DEVICE_ID_PROMISE_20268: case PCI_DEVICE_ID_PROMISE_20269: case PCI_DEVICE_ID_PROMISE_20270: case PCI_DEVICE_ID_PROMISE_20271: case PCI_DEVICE_ID_PROMISE_20275: case PCI_DEVICE_ID_PROMISE_20276: case PCI_DEVICE_ID_PROMISE_20277: goto good; } /* Others, check PCI class */ if ((dev->class >> 8) != PCI_CLASS_STORAGE_IDE) return; good: pci_read_config_byte(dev, PCI_CLASS_PROG, &progif); if ((progif & 5) != 5) { printk(KERN_INFO "Forcing PCI IDE into native mode: %s\n", pci_name(dev)); (void) pci_write_config_byte(dev, PCI_CLASS_PROG, progif|5); if (pci_read_config_byte(dev, PCI_CLASS_PROG, &progif) || (progif & 5) != 5) printk(KERN_ERR "Rewrite of PROGIF failed !\n"); } } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 11:39:23 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 11:39:23 +1100 Subject: [PATCH][2.6] JS20 support In-Reply-To: <1069807025.671.71.camel@gaston> References: <1069715532.1258.68.camel@tin.ibm.com > <20031125122312.A30364@forte.austin.ibm.com> <1069807025.671.71.camel@gaston> Message-ID: <1069807163.19543.75.camel@gaston> > So far, I had rather good results by forcing the controllers into > "fully native" mode from the arch PCI quirks though. Another option is to have the asm/ide.h hooks for setting the "default" IDE ports setup the on board channels completely including interrupts. I think in this case, the generic code may pick up properly (provided it doesn't try to go back to some stupid hard coded irq stuffs) If you end up implementing something is those hooks, then please route them through ppc_md. or some equivalent structure so that other machine types can use different ones like we do on ppc32 Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 11:41:55 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 11:41:55 +1100 Subject: [PATCH][2.6] JS20 support In-Reply-To: <1069770489.1264.132.camel@tin.ibm.com > References: <1069715532.1258.68.camel@tin.ibm.com > <16322.44250.747555.159787@cargo.ozlabs.ibm.com> <1069770489.1264.132.camel@tin.ibm.com > Message-ID: <1069807315.671.77.camel@gaston> On Wed, 2003-11-26 at 01:28, Jake Moilanen wrote: > > Why is this needed on a JS20 blade? My understanding is that this > > patch is needed in a partitioned environment where we may get function > > N of a PCI-PCI bridge assigned to a partition but not function 0. The > > JS20 blade isn't partitioned, so why do we need this patch? > > You are correct, the JS20 blade is not partitioned. But on the 8111 the > the second device is the LPC bus, IDE controller, and some other > controllers we don't use. On this second device the LPC bus is function > 0 and function 1 is the IDE controller. > > Firmware wanted to keep the LPC bus hidden since it is not available for > use and if AIX sees an ISA bus, it will assume it has a certain level of > ISA functionality. Because of this firmware does not have a function 0 > in the device-tree. So without this patch we will not see the IDE > controller. So basically, we have this nice new hw doing perfectly normal thing, and we now add junk to the firmware which itself requires an horrible hack in the kernel just for the sake of not fixing AIX wrong assumptions ? Nice.... Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Wed Nov 26 14:11:22 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Tue, 25 Nov 2003 21:11:22 -0600 Subject: [PATCH] _syscall6 for 2.6 Message-ID: GLIBC does have its own syscall macros and does not depend on the unistd.h versions. I believe that GLIBCs syscalls are correct for the management of volatile management. Unfortunately the Kernel's syscall macros are used by the LTP in many of their kernel level tests. We can't change this without annoying a lot of people. Steven J. Munroe Power Linux Toolchain Architect IBM Corporation, Linux Technology Center [ linas at austin.ibm.com writes: ] > > On Tue, Nov 25, 2003 at 02:26:18PM +0100, Franz Sirl wrote: > > > > why do we need this in a 2.6 kernel? Can't we call everything directly > > now in-kernel? And using this _syscallN stuff in userspace is > > deprecated AFAIK and if there was some consensus across architectures, > > we could remove them completely. > > Out of curiosity, what is this replaced by? Is the syscall ABI > sufficiently spec'ed out so that glibc can safely "guess" the right way > to make a syscall? (Since I thought glibc used _syscallN, or does it > have its own macros?) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From hozer at hozed.org Wed Nov 26 17:25:35 2003 From: hozer at hozed.org (Troy Benjegerdes) Date: Wed, 26 Nov 2003 00:25:35 -0600 Subject: [PATCH][2.6] JS20 support In-Reply-To: <1069715532.1258.68.camel@tin.ibm.com> References: <1069715532.1258.68.camel@tin.ibm.com > Message-ID: <20031126062535.GC3504@kalmia.hozed.org> On Mon, Nov 24, 2003 at 05:12:13PM -0600, Jake Moilanen wrote: > Here are some patches and the config to boot a JS20 Blade. After the > interrupt abstraction was done, there was not much code needed. > > Note: There is a current FW bug (should be fixed in a couple weeks) that > you need to make sure you do not call event-scan. From the OF prompt > run these commands. > > dev /rtas > " event-scan" delete-property Has anyone done any testing yet to see how much running the hypervisor hurts myrinet bandwidth and latency? I don't have any js20's, but I have a couple of p630's to test with. And if nobody has done so already, I'd like to make a request for being able to run a js20 without the hypervisor. -- -------------------------------------------------------------------------- Troy Benjegerdes 'da hozer' hozer at drgw.net ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 17:32:55 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 17:32:55 +1100 Subject: [BUG] ameslab-2.4 missing altivec bits Message-ID: <1069828374.669.111.camel@gaston> Looks like we missed the sys_ppc32.c bits in 2.4: Whoever takes care of that tree currently, please apply. Ben. ===== arch/ppc64/kernel/sys_ppc32.c 1.14 vs edited ===== --- 1.14/arch/ppc64/kernel/sys_ppc32.c Wed Nov 26 03:50:06 2003 +++ edited/arch/ppc64/kernel/sys_ppc32.c Wed Nov 26 17:30:21 2003 @@ -4000,6 +4000,10 @@ goto out; if (regs->msr & MSR_FP) giveup_fpu(current); +#ifdef CONFIG_ALTIVEC + if (regs->msr & MSR_VEC) + giveup_altivec(current); +#endif /* CONFIG_ALTIVEC */ error = do_execve32(filename, (u32*) a1, (u32*) a2, regs); @@ -4023,8 +4027,16 @@ #ifndef CONFIG_SMP if (last_task_used_math == current) last_task_used_math = 0; + if (last_task_used_altivec == current) + last_task_used_altivec = 0; #endif + memset(current->thread.fpr, 0, sizeof(current->thread.fpr)); current->thread.fpscr = 0; +#ifdef CONFIG_ALTIVEC + memset(¤t->thread.vr[0], 0,offsetof(struct thread_struct,vrsave[2])- + offsetof(struct thread_struct,vr[0])); + current->thread.vscr.u[3] = 0x00010000; /* Java mode disabled */ +#endif /* CONFIG_ALTIVEC */ } extern asmlinkage int sys_prctl(int option, unsigned long arg2, unsigned long arg3, ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Wed Nov 26 18:59:42 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 26 Nov 2003 18:59:42 +1100 Subject: Altivec registers in ppc64 u/sigcontext Message-ID: <1069833581.671.131.camel@gaston> I have a problem implementing the proper altivec stuffs on the signal frame and context frames (based on current ameslab-2.5). I've already added the context switch code so I can actually run altivec applications on the g5/ppc64, and I'm now adding the signal part. I'll then implement a sys_swapcontext syscall like ppc32 so that glibc can call this instead of re-implementing it all, which should be more efficient. The way the frame is defined currently is: struct sigcontext { unsigned long _unused[4]; int signal; int _pad0; unsigned long handler; unsigned long oldmask; struct pt_regs *regs; elf_gregset_t gp_regs; elf_fpregset_t fp_regs; /* * To maintain compatibility with current implementations the sigcontext is * extended by appending a pointer (v_regs) to a quadword type (elf_vrreg_t) * followed by an unstructured (vmx_reserve) field of 69 doublewords. This * allows the array of vector registers to be quadword aligned independent of * the alignment of the containing sigcontext or ucontext. It is the * responsibility of the code setting the sigcontext to set this pointer to * either NULL (if this processor does not support the VMX feature) or the * address of the first quadword within the allocated (vmx_reserve) area. * * The pointer (v_regs) of vector type (elf_vrreg_t) is type compatible with * an array of 34 quadword entries (elf_vrregset_t). The entries with * indexes 0-31 contain the corresponding vector registers. The entry with * index 32 contains the vscr as the last word (offset 12) within the * quadword. This allows the vscr to be stored as either a quadword (since * it must be copied via a vector register to/from storage) or as a word. * The entry with index 33 contains the vrsave as the first word (offset 0) * within the quadword. */ elf_vrreg_t *v_regs; long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1]; }; The problem is how does userland here (or swapcontext when getting a sigcontext from userland, or even sigreturn) knows about the validity of the vector registers in there ? This isn't a _simple_ problem unfortunately, as there is a distinction between having valid saved vector regs and a valid VRSAVE. The later is _always_ saved and restored when the kernel has CONFIG_ALTIVEC so that it can really be used as an indication of the current altivec usage of a task (provided we even decide to _enforce_ that in the ABI and I'm all for doing it) while the actual vector regs may or may not be depending on an internal kernel flag indicating if the task ever used altivec. What ppc32 does is that vrsave is always saved and we set MSR_VEC in the context pt_regs' MSR copy whenever there's an altivec context (regardless of the actual state of MSR_VEC in the task at the moment the signal is issued). This is fine... except for a small bug: we do that only with CONFIG_ALTIVEC, which means that a kernel lacking that option will not copy vrsave (which doesn't exist in the task struct) at all, but will also not write a 0 there where it should do obviously. So the problem only really happens if glibc ever _use_ that value. If it's only ever the kernel manipulating this context structure, then a gi en kernel will always have CONFIG_ALTIVEC either set or not set between 2 calls using the sigcontext. That would work provided glibc uses only the kernel calls and doesn't do set/get_context by itself based on a signal context. With ppc64, I'm was tempted to use the v_regs pointer as an indication that there is an altivec context, but that fails because of the need to also have VRSAVE even when there's no altivec context. So I'm setting up v_regs all the time when CONFIG_ALTIVEC is set, and will set MSR_VEC like ppc32 when there's a valid context, while VRSAVE will always be backed up/restored. I'm also clearing v_regs when CONFIG_ALTIVEC is not set, so that at least, this will be a reliable way in the future to know not to try to tap VRSAVE from glibc if it ever want to do it. Earlier glibc's will have a problem though... However, if I add the sys_swapcontext syscall at the same time, then, we are mostly fine, provided we consider that glibc will always rely on the kernel for "altivec enabled" 64 bits environement and use it's own implementation only for earlier kernels that don't support altivec at all. Any thoughs ? Ben. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Thu Nov 27 03:59:46 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Wed, 26 Nov 2003 10:59:46 -0600 Subject: Altivec registers in ppc64 u/sigcontext Message-ID: Benjamin Herrenschmidt writes: > I have a problem implementing the proper altivec stuffs on the signal > frame and context frames (based on current ameslab-2.5). On reflection the use of VRSAVE is a tertiary test for VMX context. The primary and secondary controls are AT_HWCAP has VMX and the setting of sigcontext.v_regs. For GLIBC the AT_HWCAP flag will be picked up in libc_start_main and stored in a static flag. This flag will be used by setjmp/longjmp and *context. If FALSE then VMX regs are not touched. This required to allow integrated GLIBC VMX support to run on non-VMX hardware. The sigcontext.v_regs pointer should be set (!=NULL) only if the VMX is supported and can be set NULL if VMX is supported but not in use by this process. The state of VRSAVE can only be tested/set if AT_HWCAP has VMX and sigcontext.v_regs is != NULL. So VRSAVE can only be user as a final optimization to avoid saving a subset of the VRs. Steven J. Munroe Power Linux Toolchain Architect IBM Corporation, Linux Technology Center ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Fri Nov 28 03:25:11 2003 From: olh at suse.de (Olaf Hering) Date: Thu, 27 Nov 2003 17:25:11 +0100 Subject: 2.6.0-test10, oops in nfs_kill_super Message-ID: <20031127162511.GA15901@suse.de> I got this oops on ppc64 with 2.6.0-test10. System is a p660, 6 cpus, 6 gig. autofs is in use, which triggered the umount. Please cc me, I'm not subscribed to the nfs list. cpu 2: Vector: 300 (Data Access) at [c00000004b20b610] pc: c0000000000cd71c (.invalidate_list+0x60/0x160) lr: c0000000000cd8ec (.invalidate_inodes+0xd0/0x1b0) sp: c00000004b20b890 msr: a000000000009032 dar: 16400000450 dsisr: 40000000 current = 0xc00000011aa46080 paca = 0xc0000000005aa000 pid = 6843, comm = umount 2:mon> t c00000004b20b890 0000000000000000 c00000004b20b950 c0000000000cd8ec .invalidate_inodes+0xd0/0x1b0 c00000004b20ba00 c0000000000b21d8 .generic_shutdown_super+0x12c/0x354 c00000004b20baa0 c0000000000b36e0 .kill_anon_super+0x20/0xe0 c00000004b20bb30 c0000000001498b4 .nfs_kill_super+0x18/0x48 c00000004b20bbc0 c0000000000b1d9c .deactivate_super+0xbc/0x190 c00000004b20bc60 c0000000000d219c .__mntput+0x38/0x60 c00000004b20bcf0 c0000000000bcf9c .path_release+0x6c/0x80 c00000004b20bd80 c0000000000d2df8 .sys_umount+0x58/0xc8 c00000004b20be30 c0000000000118d4 ret_from_syscall_1 exception: c00 (System Call) regs c00000004b20bea0 000000000ff88574 2:mon> r R00 = 000000040000000c R16 = 0000000000000000 R01 = c00000004b20b890 R17 = 0000000000000000 R02 = c000000000639000 R18 = 0000000000000000 R03 = c00000010a0b8528 R19 = 0000000000000000 R04 = c000000146006c00 R20 = 0000000000000001 R05 = c00000004b20b9c0 R21 = c0000000005aaec0 R06 = c000000000636008 R22 = 0000000000000000 R07 = 0000000000000000 R23 = 0000000000000000 R08 = c000000146006ce0 R24 = c00000004b20b9c0 R09 = 0000000000000000 R25 = 0000000000000000 R10 = 0000000000000000 R26 = c000000146006c00 R11 = c00000011aa46080 R27 = c000000000536148 R12 = c00000014ec00d80 R28 = 0000016400000450 R13 = c0000000005aa000 R29 = 0000016400000440 R14 = 0000000000000000 R30 = c00000000056a7d8 R15 = 0000000000000000 R31 = 0000016400000450 pc = c0000000000cd71c msr = a000000000009032 lr = c0000000000cd8ec cr = 0000000040000844 ctr = 0000000000000000 xer = 0000000020000000 trap = 300 the function looks like that: c0000000000cd6bc <.invalidate_list>: c0000000000cd6bc: 7c 08 02 a6 mflr r0 c0000000000cd6c0: fa c1 ff b0 std r22,-80(r1) c0000000000cd6c4: fa e1 ff b8 std r23,-72(r1) c0000000000cd6c8: 3a c0 00 00 li r22,0 c0000000000cd6cc: fb 01 ff c0 std r24,-64(r1) c0000000000cd6d0: fb 21 ff c8 std r25,-56(r1) c0000000000cd6d4: 3a e0 00 00 li r23,0 c0000000000cd6d8: 7c b8 2b 78 mr r24,r5 c0000000000cd6dc: fb 41 ff d0 std r26,-48(r1) c0000000000cd6e0: fb 61 ff d8 std r27,-40(r1) c0000000000cd6e4: 3b 20 00 00 li r25,0 c0000000000cd6e8: 7c 9a 23 78 mr r26,r4 c0000000000cd6ec: fb c1 ff f0 std r30,-16(r1) c0000000000cd6f0: fb 81 ff e0 std r28,-32(r1) c0000000000cd6f4: 7c 7b 1b 78 mr r27,r3 c0000000000cd6f8: fb a1 ff e8 std r29,-24(r1) c0000000000cd6fc: fb e1 ff f8 std r31,-8(r1) c0000000000cd700: f8 01 00 10 std r0,16(r1) c0000000000cd704: eb c2 bb 18 ld r30,-17640(r2) c0000000000cd708: f8 21 ff 41 stdu r1,-192(r1) c0000000000cd70c: eb 83 00 00 ld r28,0(r3) c0000000000cd710: 7c 3c d8 00 cmpd r28,r27 c0000000000cd714: 3b bc ff f0 addi r29,r28,-16 c0000000000cd718: 7f 9f e3 78 mr r31,r28 c0000000000cd71c: eb 9c 00 00 ld r28,0(r28) It dies in invalidate_list() next = next->next; However, why does r3 and r27 differ even if they should be the same? -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Fri Nov 28 04:02:04 2003 From: olh at suse.de (Olaf Hering) Date: Thu, 27 Nov 2003 18:02:04 +0100 Subject: 2.6.0-test10, oops in nfs_kill_super In-Reply-To: <20031127162511.GA15901@suse.de> References: <20031127162511.GA15901@suse.de> Message-ID: <20031127170204.GA14765@suse.de> On Thu, Nov 27, Olaf Hering wrote: > It dies in invalidate_list() > next = next->next; > > However, why does r3 and r27 differ even if they should be the same? Scratch that, its a loop... -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Sat Nov 29 02:41:51 2003 From: olh at suse.de (Olaf Hering) Date: Fri, 28 Nov 2003 16:41:51 +0100 Subject: bogus changes for generic pci_scan_slot() in ameslab-2.5 Message-ID: <20031128154151.GA30606@suse.de> Good morning, what is the purpose of this change? diff -purN linux-2.5/drivers/pci/probe.c linuxppc64-2.5/drivers/pci/probe.c --- linux-2.5/drivers/pci/probe.c 2003-08-06 15:34:30.000000000 +0000 +++ linuxppc64-2.5/drivers/pci/probe.c 2003-11-05 22:12:33.000000000 +0000 @@ -552,6 +552,7 @@ int __devinit pci_scan_slot(struct pci_b struct pci_dev *dev; dev = pci_scan_device(bus, devfn); +#if 0 if (func == 0) { if (!dev) break; @@ -560,6 +561,10 @@ int __devinit pci_scan_slot(struct pci_b continue; dev->multifunction = 1; } +#else + if (!dev) + continue; +#endif /* Fix up broken headers */ pci_fixup_device(PCI_FIXUP_HEADER, dev); It breaks on ppc32, B&W G3, dies in indirect_read_config() because the pointer *cfg_data becomes bogus, devfn is > 0xff (no idea if that matters). turning #if 0 into #if 1 cures it. I havent tried it on other systems yet, but at least a PReP MTX+ works with the patch above. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mort at bork.org Sat Nov 29 03:53:44 2003 From: mort at bork.org (Martin Hicks) Date: Fri, 28 Nov 2003 11:53:44 -0500 Subject: [PATCH] _syscall6 for 2.6 In-Reply-To: <20031125132927.GA5072@suse.de> References: <20031125130500.GA29319@suse.de> <6.0.1.1.2.20031125142121.0335c4a8@mail.lauterbach.com> <20031125132927.GA5072@suse.de> Message-ID: <1070038424.8280.5.camel@plato.i.bork.org> On Tue, 2003-11-25 at 08:29, Olaf Hering wrote: > On Tue, Nov 25, Franz Sirl wrote: > > > At 14:05 25.11.2003, Olaf Hering wrote: > > > > >This patch implements _syscall6 for ppc64, it is required for > > >klibc. > > > > why do we need this in a 2.6 kernel? Can't we call everything > > directly now in-kernel? And using this _syscallN stuff in userspace > > is deprecated AFAIK and if there was some consensus across > > architectures, we could remove them completely. > > klibc includes the kernel headers, and uses the syscall6 macro. How > should it be done? Maybe we can update klibc to use something else. Is > there example code? Take a look at ia64. I have definied the _syscall* macros in arch/ia64/include/klibc The defines in klibc/userland really have nothing to do with the _syscall macros in the kernel. The kernel ones were in place to allow in-kernel syscalls. mh -- Martin Hicks || mort at bork.org || PGP/GnuPG: 0x4C7F2BEE ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/