From johnrose at austin.ibm.com Fri Oct 1 01:45:15 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 30 Sep 2004 10:45:15 -0500 Subject: Why do we map PCI IO space so late ? In-Reply-To: <1096532573.32754.13.camel@gaston> References: <1096532573.32754.13.camel@gaston> Message-ID: <1096559115.27021.33.camel@sinatra.austin.ibm.com> Hi Ben- Good questions :) First let me clear something up, and forgive me if I'm telling you stuff you already know. The ioremap()'s that we do at boot are _exclusively_ done for PHBs. This creates mappings that span the ranges for their children buses. Why do we do this when drivers can themselves use ioremap()? Because some drivers still use inb()/outb(), etc, without remapping their own space. The short answer to your questions is that I/O DLPAR required these PHB ioremap()'s to be moved to a later chronological point during boot, so that imalloc records would be kept. Here's the long answer. To dynamically remove a bus (EADS or PHB), we need to iounmap() the range associated with it. The iounmap() function is prototyped in generic code to take one argument, the virtual address in question. In order to know the size of the region to unmap, we need to keep some records of what was ioremap()'ed originally. The imalloc subsystem exists to keep these records. The ppc64 ioremap() implementation has the limitation that if one calls it before mem_init_done, no imalloc records are left behind. If we remap the PHBs early in boot, we have no way to unmap them (or their children) at DLPAR remove time. Does this make sense? As a side note, we didn't similarly defer the remap for ISA, b/c we assumed that we'd never want to unmap this range. I wrote the function that remaps for ISA, and it's a hack, you're right :) Suggestions are welcome. I would ask why your ISA node doesn't have a ranges property, b/c I thought it was mandatory from some spec. You asked about ioremap_explicit(). This is used in two ways. First during boot, to remap the necessary regions for PHBs after mem_init_done. We've saved off the "physical" range info from the ofdt early in boot, and now we explicitly remap starting at virtual addr PHBS_IO_BASE. Second, we use it to remap the range of a newly DLPAR-added bus. You can imagine that in the case of adding an EADS slot, we need the mappings to exist at exact virtual addresses relative to its parent PHB, etc. Hence the creation of ioremap_explicit(). Suggestions on improvements are welcome. Hope this helps, it's before lunch and I'm being wordy. :) Thanks- John On Thu, 2004-09-30 at 03:22, Benjamin Herrenschmidt wrote: > Hi John ! > > I was going through some of the PCI setup code while working on > some bringup stuff, and had an issue which was related to the way > we do the ioremap'ing of the PCI IO space. > > So the current scenario is: > > - early (setup_arch() time basically), we ioremap_explicit the ISA > space and that only > > - later (pcibios_fixup time), we scan all busses and ioremap_explicit > their various IO spaces. > > I have two problems with that at the moment. > > First is, I'm annoyed that during the actual PCI probing, the IO space > is not mapped. That means that any quirk that needs IO accesses to the > device will not work. I wonder also in which conditions we might end up > instanciating a PCI driver as early as the PCI probing and thus crash. > Also, this is all after console_initcalls(), so that leaves a gap of > code that runs with PCI IO space not mapped. So far, it ended up beeing > mostly ok because our console uses legacy serial drivers that use the > ISA space which happen to be mapped early, but that sounds fragile & > bogus to me. (For the short story, I found that while working on a board > for which the "isa" node didn't have a "ranges" property, so we failed > to early map it, thus the serial driver would crash doing IO cycles). > Why can't we do the ioremap_explicit right after setting up the PHBs ? > > The second thing that annoys me is that it seems we are also doing an > ioremap_explicit for each p2p bridge IO space, aren't we ? I don't fully > understand the logic here. Aren't those supposed to be fully enclosed by > their parent PHB IO space, and thus mapped by those ? > > Thanks for enlightening me, > Ben. > > > > From segher at kernel.crashing.org Fri Oct 1 02:38:20 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 30 Sep 2004 11:38:20 -0500 Subject: reading files in /proc/device-tree In-Reply-To: <1096546849.3081.2.camel@gaston> References: <20040929101700.GA2623@in.ibm.com> <1096546849.3081.2.camel@gaston> Message-ID: <23A68A84-12FF-11D9-8370-000A95A4DC02@kernel.crashing.org> >> Also, the format of the entries is dependent on the >> #address-cells and #sized-cells properties. > > ... of the parent node :) read the OF spec for more details Of the first (not necessarily immediate) parent that has those properties, yes. As memory is a child of the root node, it will be its direct parent, yes. Segher From david at gibson.dropbear.id.au Fri Oct 1 14:03:25 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 1 Oct 2004 14:03:25 +1000 Subject: mapping memory in 0xb space In-Reply-To: References: <20040929014017.GC5470@zax> Message-ID: <20041001040325.GB12890@zax> On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > On Wed, 29 Sep 2004, David Gibson wrote: > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > > > > > Recent kernels don't even > > > > have VSIDs allocated for the 0xb... region. > > > > > > Looking at both 2.6.8 and 2.4.21, I don't see a difference in > > > get_kernel_vsid() code. > > > > Ok, *very* recent kernels. The new VSID algorithm has gone into the > > BK tree since 2.6.8. > > >From the description I read, I might be better off using 0xfff.. addresses > with that algorithm. Not a big deal. Perhaps. However, there are issues there as well: older kernels have the same 41-bit address restriction (maybe somewhat extendable) in the 0xf region, just like 0xb. The new VSID algo gives VSIDs for every address above 0xc000000000000000 *except* the very last segment, 0xfffffffff0000000-0xffffffffffffffff. > > > This leaves segments. Both > > > DataAccess_common and DataAccessSLB_common call > > > do_stab_bolted/do_slb_bolted when confronted with an address in 0xb > > > region. > > > > Oh, so it does. That, I think is a 2.4 thing, long gone in 2.6 (even > > before the SLB rewrite, I'm pretty sure do_slb_bolted was only called > > for 0xc addresses). > > In my 2.4.21 source, do_slb_bolted does get called for 0xb addresses. > And thanks for letting me know about power4 being SLB. I was clueless on > the issue. > > > Presumably, this will fault in the segments I am interested in. > > > > Yes, actually, it should. Ok, I guess the problem is deeper than I > > thought. > > Or is it? > > > > Also, I narrowed it down to > > > working (or appearing to work) as long as the highest 5 bits of the page > > > index (those that end up as partial index in the HPTE) are zero. This may > > > just be a weird coincidence. > > > > Could be. > > > > > > Why on earth do you want to do this? > > > > > > Good question ;-). A long long time ago, I posted on this list and > > > explained. Since then, I found what appeared to be a solution, except > > > that it appears power4 breaks it. I am building a tool that allows > > > dynamic splicing of code into a running kernel (see > > > http://www.paradyn.org/html/kerninst.html). In order for this to work, I > > > need to be able to overwrite a single instruction with a jump to > > > spliced-in code. The target of the jump needs to be within the range (26 > > > bits). Therefore, I have a choice of 0xbfff.. addresses with backward > > > jumps from 0xc region, or the 0xff.. addresses for absolute jumps. I > > > chose 0xbff.., because I found already-working code, originally written > > > for the performance counter interface. Am I making more sense now? > > > > Aha! But this does actually explain the problem - there are only > > VSIDs assigned for the first 2^41 bits of each region - so although > > there are vsids for 0xb000000000000000-0xb00001ffffffffff, there > > aren't any for 0xbff... addresses. Likewise the Linux pagetables only > > cover a 41-bit address range, but that won't matter if you're creating > > HPTEs directly. > > And this is why I avoided explaining fully in my first email :-). I'd > like to solve one problem at a time. What I said in my initial email > is accurate. Even within the valid VSID range, if the highest 5 bits of > the page index are not zero, I get a crash on access (e.g. > 0xb00001FFFFF00000, but works on 0xb00001FFF0000000). Hrm. Ok. I'm not sure why that would be. > As for why I thought 0xbff would work, I reasoned that > since the highest bits are masked out in get_kernel_vsid(), and since > nobody else is using the 0xb region, it doesn't matter if I get a VSID > that is the same as some other VSID in 0xb region. However, I did not > consider the bug in do_slb_bolted that you describe below. Yes, with that bug the collision can be with a segment anywhere, not just in the 0xb region. > > You may have seen the comment in do_slb_bolted which claims to permit > > a full 32-bits of ESID - it's wrong. The code doesn't mask the ESID > > down to 13 bits as get_kernel_vsid() does, but it probably should - an > > overlarge ESID will cause collisions with VSIDs from entirely > > different address places, which would be a Bad Thing. > > This must be happening, although I would still like to know why it > misbehaves even within the valid VSID range. > > > > > Actually, you should be able to allow ESIDs of up to 21 bits there (36 > > bit VSID - 15 bits of "context"). But you will need to make sure > > get_kernel_vsid(), or whatever you're using to calculate the VAs for > > the hash HPTEs is updated to match - at the moment I think it will > > mask down to 13 bits. I'm not sure if that will get you sufficiently > > close to 0xc0... for your purposes. > > No, it's not close enough--I really must have that very last segment. > It sounds like I was simply getting lucky on the power3 machine. > Without the mask, I must have been getting random pages, and > happily overwriting them. > > Any ideas on how I might map that very last segment of 0xb, or for > that matter the very last segment of 0xf ? It need not be pretty, > but it cannot involve modifying the kernel source, though it can rely on > whatever dirty tricks a kernel module might get away with. I don't > want to modify the source, because I would like the tool to work on > unmodified kernels. Um... right. You know, I'm really not sure its possible without changing the kernel source, short of binary patching the do_slb_bolted code from a module. Sorry. The segment code's just really not set up to handle this. Though, come to that, you do only need one segment, so it might not be that hard to binary patch in branch to some code of your own which provides a VSID for that one segment. > It's starting to sound like an impossible task (at least on non-recent > kernels). I think I might go with a backup suboptimal solution, which > involves extra jumps, but at least it might work. That may be a better idea. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Fri Oct 1 17:21:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 01 Oct 2004 17:21:04 +1000 Subject: Why do we map PCI IO space so late ? In-Reply-To: <1096559115.27021.33.camel@sinatra.austin.ibm.com> References: <1096532573.32754.13.camel@gaston> <1096559115.27021.33.camel@sinatra.austin.ibm.com> Message-ID: <1096615264.11463.93.camel@gaston> On Fri, 2004-10-01 at 01:45, John Rose wrote: > Hi Ben- > > Good questions :) First let me clear something up, and forgive me if > I'm telling you stuff you already know. The ioremap()'s that we do at > boot are _exclusively_ done for PHBs. This creates mappings that span > the ranges for their children buses. Why do we do this when drivers can > themselves use ioremap()? Because some drivers still use inb()/outb(), > etc, without remapping their own space. Yah, that at least is obvious :) > The short answer to your questions is that I/O DLPAR required these PHB > ioremap()'s to be moved to a later chronological point during boot, so > that imalloc records would be kept. Okay, that makes more sense to me now. > Here's the long answer. To dynamically remove a bus (EADS or PHB), we > need to iounmap() the range associated with it. The iounmap() function > is prototyped in generic code to take one argument, the virtual address > in question. In order to know the size of the region to unmap, we need > to keep some records of what was ioremap()'ed originally. The imalloc > subsystem exists to keep these records. Right. > The ppc64 ioremap() implementation has the limitation that if one calls > it before mem_init_done, no imalloc records are left behind. If we > remap the PHBs early in boot, we have no way to unmap them (or their > children) at DLPAR remove time. Does this make sense? Yup. > As a side note, we didn't similarly defer the remap for ISA, b/c we > assumed that we'd never want to unmap this range. I wrote the function > that remaps for ISA, and it's a hack, you're right :) Suggestions are > welcome. I would ask why your ISA node doesn't have a ranges property, > b/c I thought it was mandatory from some spec. The OF tree of this board is still a work in progress. It has to be mapped early anyway for other reasons, like the console serial driver which will be initialized before we do the real mapping. > You asked about ioremap_explicit(). This is used in two ways. First > during boot, to remap the necessary regions for PHBs after > mem_init_done. We've saved off the "physical" range info from the ofdt > early in boot, and now we explicitly remap starting at virtual addr > PHBS_IO_BASE. Second, we use it to remap the range of a newly > DLPAR-added bus. You can imagine that in the case of adding an EADS > slot, we need the mappings to exist at exact virtual addresses relative > to its parent PHB, etc. Hence the creation of ioremap_explicit(). > > Suggestions on improvements are welcome. Hope this helps, it's before > lunch and I'm being wordy. :) Thanks, it's enough for now, I need to think of alternative (read: simpler) ways to deal with that in the future, but for now, it's fine. Ben. From david at gibson.dropbear.id.au Fri Oct 1 18:45:14 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 1 Oct 2004 18:45:14 +1000 Subject: [PPC64] Change bad choice of VSID_MULTIPLIER Message-ID: <20041001084514.GB19046@zax> Andrew/Linus, please apply: We recently changed the VSID allocation on PPC64 to use a new scheme based on a multiplicative hash. It turns out our choice of multiplier (the largest 28-bit prime) wasn't so great: with large contiguous mappings, we can get very poor hash scattering. In particular earlier machines (without 16M pages) which had a reasonable about of RAM (>2G or so) wouldn't boot, because the linear mapping overflowed some hash buckets. This patch changes the multiplier to something which seems to work better (it is, rather arbitrarily, the median of the primes between 2^27 and 2^28). Some more theory should almost certainly go into the choice of this constant, to avoid more pathological cases. But for now, this choice fixes a serious bug, and seems to do at least as well at scattering as the old choice on a handful of simple testcases. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/mmu_context.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu_context.h 2004-09-20 10:12:50.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu_context.h 2004-10-01 18:28:01.565963320 +1000 @@ -108,11 +108,10 @@ * * This scramble is only well defined for proto-VSIDs below * 0xFFFFFFFFF, so both proto-VSID and actual VSID 0xFFFFFFFFF are - * reserved. VSID_MULTIPLIER is prime (the largest 28-bit prime, in - * fact), so in particular it is co-prime to VSID_MODULUS, making this - * a 1:1 scrambling function. Because the modulus is 2^n-1 we can - * compute it efficiently without a divide or extra multiply (see - * below). + * reserved. VSID_MULTIPLIER is prime, so in particular it is + * co-prime to VSID_MODULUS, making this a 1:1 scrambling function. + * Because the modulus is 2^n-1 we can compute it efficiently without + * a divide or extra multiply (see below). * * This scheme has several advantages over older methods: * Index: working-2.6/include/asm-ppc64/mmu.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu.h 2004-09-20 10:12:50.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu.h 2004-10-01 18:28:01.566963168 +1000 @@ -202,7 +202,7 @@ #define SLB_VSID_KERNEL (SLB_VSID_KP|SLB_VSID_C) #define SLB_VSID_USER (SLB_VSID_KP|SLB_VSID_KS) -#define VSID_MULTIPLIER ASM_CONST(268435399) /* largest 28-bit prime */ +#define VSID_MULTIPLIER ASM_CONST(200730139) /* 28-bit prime */ #define VSID_BITS 36 #define VSID_MODULUS ((1UL<>SID_SHIFT) - .llong 0x40bffffd5 /* KERNELBASE VSID */ + .llong 0x408f92c94 /* KERNELBASE VSID */ /* We have to list the bolted VMALLOC segment here, too, so that it * will be restored on shared processor switch */ .llong (VMALLOCBASE>>SID_SHIFT) - .llong 0xb0cffffd1 /* VMALLOCBASE VSID */ + .llong 0xf09b89af5 /* VMALLOCBASE VSID */ .llong 8192 /* # pages to map (32 MB) */ .llong 0 /* Offset from start of loadarea to start of map */ - .llong 0x40bffffd50000 /* VPN of first page to map */ + .llong 0x408f92c940000 /* VPN of first page to map */ . = 0x6100 -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From grave at ipno.in2p3.fr Sat Oct 2 02:04:14 2004 From: grave at ipno.in2p3.fr (grave) Date: Fri, 01 Oct 2004 16:04:14 +0000 Subject: XServe Node running a debian with only one processor In-Reply-To: <1096548321l.32616l.0l@ipnnarval> (from grave@ipno.in2p3.fr on Thu Sep 30 14:45:21 2004) References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> Message-ID: <1096646654l.2901l.2l@ipnnarval> Got the xserve booting (thanks to http://ozlabs.org/ppc64-patches/patch.pl?id=59) But I can only run a single CPU kernel, does somebody know how to get the second CPU on ? The kernel is a ppc64 one with smp compiled in but only able to boot with nosmp option kernel from kernel.org + patch to setup.c and pmac_features.c) Thanks in advance for any hint... xavier From igor at cs.wisc.edu Sat Oct 2 04:05:12 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Fri, 1 Oct 2004 13:05:12 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <20041001040325.GB12890@zax> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: A question for the rest of you, who haven't been following this thread. Is there publicly available documentation on the power4 extensions, specifically the large page support, how it effects the HPT hashing, and the SLB, including the new instructions for maintaining it in software? I haven't been able to find anything yet. On Fri, 1 Oct 2004, David Gibson wrote: > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > > > > > > > Recent kernels don't even > > > > > have VSIDs allocated for the 0xb... region. > > > > > > > > Looking at both 2.6.8 and 2.4.21, I don't see a difference in > > > > get_kernel_vsid() code. > > > > > > Ok, *very* recent kernels. The new VSID algorithm has gone into the > > > BK tree since 2.6.8. > > > > >From the description I read, I might be better off using 0xfff.. addresses > > with that algorithm. Not a big deal. > > Perhaps. However, there are issues there as well: older kernels have > the same 41-bit address restriction (maybe somewhat extendable) in the > 0xf region, just like 0xb. The new VSID algo gives VSIDs for every > address above 0xc000000000000000 *except* the very last segment, > 0xfffffffff0000000-0xffffffffffffffff. Lucky me! I'll take a look at what the VSID for the last segment conflicts with, maybe it will be something unused. Or I'll have to think of something else clever. Right now, I still want my 2.4.21 implementation to work. > > > > Also, I narrowed it down to > > > > working (or appearing to work) as long as the highest 5 bits of the page > > > > index (those that end up as partial index in the HPTE) are zero. This may > > > > just be a weird coincidence. > > > > > > Could be. > > > > > > > > Why on earth do you want to do this? > > > > > > > > Good question ;-). A long long time ago, I posted on this list and > > > > explained. Since then, I found what appeared to be a solution, except > > > > that it appears power4 breaks it. I am building a tool that allows > > > > dynamic splicing of code into a running kernel (see > > > > http://www.paradyn.org/html/kerninst.html). In order for this to work, I > > > > need to be able to overwrite a single instruction with a jump to > > > > spliced-in code. The target of the jump needs to be within the range (26 > > > > bits). Therefore, I have a choice of 0xbfff.. addresses with backward > > > > jumps from 0xc region, or the 0xff.. addresses for absolute jumps. I > > > > chose 0xbff.., because I found already-working code, originally written > > > > for the performance counter interface. Am I making more sense now? > > > > > > Aha! But this does actually explain the problem - there are only > > > VSIDs assigned for the first 2^41 bits of each region - so although > > > there are vsids for 0xb000000000000000-0xb00001ffffffffff, there > > > aren't any for 0xbff... addresses. Likewise the Linux pagetables only > > > cover a 41-bit address range, but that won't matter if you're creating > > > HPTEs directly. > > > > And this is why I avoided explaining fully in my first email :-). I'd > > like to solve one problem at a time. What I said in my initial email > > is accurate. Even within the valid VSID range, if the highest 5 bits of > > the page index are not zero, I get a crash on access (e.g. > > 0xb00001FFFFF00000, but works on 0xb00001FFF0000000). > > Hrm. Ok. I'm not sure why that would be. Here is some more background. Maybe it will help you think of what's going wrong here. I noticed that if I write to the remapped 0xb00001FFF0000000, the changes do not show up at the physical address I mapped it to. At this point, I noticed that get_free_page() returns a 4K page frame above 256MB, which means that in reality, it's an address within a large page. SLB entry created by do_slb_bolted likewise has the large page bit set. I changed my code to create an HPTE mapping for the large page, and finally I get a sensible result: changes to the remapped page show up on the physical page. Note that even though I create a mapping for the whole large page, I only write to the 4K chunk that corresponds to the address returned by get_free_page() -- I do not want to clobber random memory. In summary, mapping the first large page of the 0xb00001FFF segment works, but mapping any other within that segment causes a kernel crash. There must be something I don't understand about how large pages fit into the HPT. Could you point me to documentation on the large page extensions of power4, and, while we are at it, documentation on the SLB? So far, I simply guessed on how it works, based on the code I see in the kernel. For what it's worth, here is (roughly) the relevant code I am using: frame = get_free_page(GFP_KERNEL); pa = (unsigned long)__v2a(frame) & 0xFFFFFFFFFF000000; //want physical address to point to the corresponding large page. ea = 0xb00001FFFF000000; vsid = get_kernel_vsid(ea); va = ( vsid << 28 ) | ( ea & 0xfffffff ); vpn = va >> PAGE_SHIFT; rpn = pa >> PAGE_SHIFT; hpteflags = _PAGE_ACCESSED|_PAGE_COHERENT|PP_RWXX; slot = ppc_md->hpte_insert(vpn, rpn, hpteflags, 1, 1); smallpage_offset = ( (unsigned long) __v2a(frame) - pa) return ea + smallpage_offset; //only access the relevant 4K chunk within the large page > > > As for why I thought 0xbff would work, I reasoned that > > since the highest bits are masked out in get_kernel_vsid(), and since > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > that is the same as some other VSID in 0xb region. However, I did not > > consider the bug in do_slb_bolted that you describe below. > > Yes, with that bug the collision can be with a segment anywhere, not > just in the 0xb region. OK, I will deal with this, somehow. Binary patch idea might just work. > Though, come to that, you do only need one segment, so it might not be > that hard to binary patch in branch to some code of your own which > provides a VSID for that one segment. > > > It's starting to sound like an impossible task (at least on non-recent > > kernels). I think I might go with a backup suboptimal solution, which > > involves extra jumps, but at least it might work. > > That may be a better idea. I'd like to avoid this, but if I only have to incur this for the binary patch to do_slb_bolted, I might be fine. Thanks, Igor From jschopp at austin.ibm.com Sat Oct 2 06:41:55 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:41:55 -0500 Subject: [PATCH][0/2] ppc64 pre/post boot memory macros Message-ID: <415DC113.1080007@austin.ibm.com> I'm sending two patches for review and passing upstream. The basic idea is that these patches put in place some macros such that memory management can be easily split into pre and post boot. This is based on the work of Mike Kravetz and Dave Hansen. It should be harmless, as the new macros are currently defined to the same thing the old macros were. It is also isolated to ppc64 files, so the other arch guys don't need to worry. Ultimatly my motivation is to move toward hotplug memory. Acceptance of these patches will allow us to carry smaller patches out of mainline and ease our development greatly. Comments/feedback/flames welcome. Patches against 2.6.9-rc3 and have been boot tested on Power5 LPAR. From jschopp at austin.ibm.com Sat Oct 2 06:43:27 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:43:27 -0500 Subject: [PATCH][1/2] ppc64 pre/post boot memory macros In-Reply-To: <415DC113.1080007@austin.ibm.com> References: <415DC113.1080007@austin.ibm.com> Message-ID: <415DC16F.7030402@austin.ibm.com> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64-daveh.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041001/14bca17c/attachment.txt From jschopp at austin.ibm.com Sat Oct 2 06:43:56 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:43:56 -0500 Subject: [PATCH][2/2] ppc64 pre/post boot memory macros In-Reply-To: <415DC113.1080007@austin.ibm.com> References: <415DC113.1080007@austin.ibm.com> Message-ID: <415DC18C.6040501@austin.ibm.com> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64-dave-hmore.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041001/52e74dd2/attachment.txt From schwab at suse.de Sat Oct 2 07:40:04 2004 From: schwab at suse.de (Andreas Schwab) Date: Fri, 01 Oct 2004 23:40:04 +0200 Subject: Machine check during PCI scan on PMac G5 Message-ID: Has anyone been able to get 2.6.9-rc3 running on the new PMacs (PowerMac7,3)? I'm getting a machine check during PCI scan in u3_ht_read_config while doing in_8 on 0xe00000008094800e. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sat Oct 2 21:17:01 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 02 Oct 2004 21:17:01 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: References: Message-ID: <1096715821.26913.35.camel@gaston> On Sat, 2004-10-02 at 07:40, Andreas Schwab wrote: > Has anyone been able to get 2.6.9-rc3 running on the new PMacs > (PowerMac7,3)? I'm getting a machine check during PCI scan in > u3_ht_read_config while doing in_8 on 0xe00000008094800e. Argh... again ! Looks like the box doesn't like us to probe the PCI device that is there. Can you print out the precise devfn bus number & offset where the machine check happens ? I wonder if it's something that is turned off by the firmware like one of the K2 internal USB1 controllers that are unused on this machine. K2 is notoriously allergic to us probing things that are turned off. This patch should help by preventing the config space accesses to occur on those devices that aren't in the device-tree, I'll push it to Linus as a temporary fix if you confirm it works. Ben. ===== arch/ppc64/kernel/pmac_pci.c 1.5 vs edited ===== --- 1.5/arch/ppc64/kernel/pmac_pci.c 2004-07-25 14:51:52 +10:00 +++ edited/arch/ppc64/kernel/pmac_pci.c 2004-08-04 10:26:07 +10:00 @@ -271,7 +271,7 @@ int offset, int len, u32 *val) { struct pci_controller *hose; - struct device_node *busdn; + struct device_node *busdn, *dn; unsigned long addr; if (bus->self) @@ -282,6 +282,16 @@ return PCIBIOS_DEVICE_NOT_FOUND; hose = busdn->phb; if (hose == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + /* We only allow config cycles to devices that are in OF device-tree + * as we are apparently having some weird things going on with some + * revs of K2 on recent G5s + */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + break; + if (dn == NULL) return PCIBIOS_DEVICE_NOT_FOUND; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); From benh at kernel.crashing.org Sat Oct 2 21:21:57 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 02 Oct 2004 21:21:57 +1000 Subject: XServe Node running a debian with only one processor In-Reply-To: <1096646654l.2901l.2l@ipnnarval> References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> <1096646654l.2901l.2l@ipnnarval> Message-ID: <1096716117.3634.40.camel@gaston> On Sat, 2004-10-02 at 02:04, grave wrote: > Got the xserve booting > (thanks to http://ozlabs.org/ppc64-patches/patch.pl?id=59) > > But I can only run a single CPU kernel, does somebody know how to get > the second CPU on ? > > The kernel is a ppc64 one with smp compiled in but only able to boot > with nosmp option > kernel from kernel.org + patch to setup.c and pmac_features.c) > > Thanks in advance for any hint... What exact version ? what patches ? What happens (last printed on serial console) if you try to boot SMP ? Ben. From schwab at suse.de Sun Oct 3 05:50:54 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 02 Oct 2004 21:50:54 +0200 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096715821.26913.35.camel@gaston> (Benjamin Herrenschmidt's message of "Sat, 02 Oct 2004 21:17:01 +1000") References: <1096715821.26913.35.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Argh... again ! Looks like the box doesn't like us to probe the > PCI device that is there. Can you print out the precise devfn > bus number & offset where the machine check happens ? The first occurence is devfn 48, bus number 0, offset 14. > This patch should help by preventing the config space accesses to > occur on those devices that aren't in the device-tree, I'll push it > to Linus as a temporary fix if you confirm it works. Thanks, I can confirm that it works. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Oct 3 10:38:38 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 10:38:38 +1000 Subject: [PATCH] Fix booting on some recent G5s Message-ID: <1096763918.26914.63.camel@gaston> Hi ! Some recent G5s have a problem with PCI/HT probing. They crash (machine check) during the probe of some slot numbers, it seems to be related to some functions beeing disabled by the firmware inside the K2 ASIC. This patch limits the config space accesses to devices that are present in the OF device-tree. This fixes the problem and shouldn't "add" any limitation. If you plug a "random" PCI card with no OF driver, the firmware will still build a node for it with the default set of properties created from the config space. Ben. Signed-off-by: Benjamin Herrenschmidt --- 1.5/arch/ppc64/kernel/pmac_pci.c 2004-07-25 14:51:52 +10:00 +++ edited/arch/ppc64/kernel/pmac_pci.c 2004-08-04 10:26:07 +10:00 @@ -271,7 +271,7 @@ int offset, int len, u32 *val) { struct pci_controller *hose; - struct device_node *busdn; + struct device_node *busdn, *dn; unsigned long addr; if (bus->self) @@ -282,6 +282,16 @@ return PCIBIOS_DEVICE_NOT_FOUND; hose = busdn->phb; if (hose == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + /* We only allow config cycles to devices that are in OF device-tree + * as we are apparently having some weird things going on with some + * revs of K2 on recent G5s + */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + break; + if (dn == NULL) return PCIBIOS_DEVICE_NOT_FOUND; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); --- 1.21/arch/ppc/platforms/pmac_pci.c 2004-07-29 14:58:35 +10:00 +++ edited/arch/ppc/platforms/pmac_pci.c 2004-08-17 14:18:09 +10:00 @@ -315,6 +315,10 @@ unsigned int addr; int i; + struct device_node *np = pci_busdev_to_OF_node(bus, devfn); + if (np == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + /* * When a device in K2 is powered down, we die on config * cycle accesses. Fix that here. @@ -362,6 +366,9 @@ unsigned int addr; int i; + struct device_node *np = pci_busdev_to_OF_node(bus, devfn); + if (np == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; /* * When a device in K2 is powered down, we die on config * cycle accesses. Fix that here. From benh at kernel.crashing.org Sun Oct 3 10:51:46 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 10:51:46 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> Message-ID: <1096764706.11996.77.camel@gaston> On Sun, 2004-10-03 at 10:36, Segher Boessenkool wrote: > >> Argh... again ! Looks like the box doesn't like us to probe the > >> PCI device that is there. Can you print out the precise devfn > >> bus number & offset where the machine check happens ? > > > > The first occurence is devfn 48, bus number 0, offset 14. > > That's the "header type" field on the GEM shim. > > I'd rather not have this fixed by the device-tree check, for > various reasons; note that this issue probably is related to the > "config space not readable while GEM is in sleep mode" problem > on older Macs. Is the GEM powered on during boot, on these boxes? I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem properly functionning on this box after this fix ?). I think the numbering of the Shims can change from firmware to firmware, it's more probably one of the USBs. There is code in pmac_feature.c to power up the GEM (but only if it has a device-node). I think the proper solution is the filter from the device-tree on Apple G5s, at least for now, though OF itself probably has a property somewhere that tells it which slots to probe and not to probe, I need to find it. Ben. From segher at kernel.crashing.org Sun Oct 3 10:36:26 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Sat, 2 Oct 2004 19:36:26 -0500 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: References: <1096715821.26913.35.camel@gaston> Message-ID: <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> >> Argh... again ! Looks like the box doesn't like us to probe the >> PCI device that is there. Can you print out the precise devfn >> bus number & offset where the machine check happens ? > > The first occurence is devfn 48, bus number 0, offset 14. That's the "header type" field on the GEM shim. I'd rather not have this fixed by the device-tree check, for various reasons; note that this issue probably is related to the "config space not readable while GEM is in sleep mode" problem on older Macs. Is the GEM powered on during boot, on these boxes? Segher From benh at kernel.crashing.org Sun Oct 3 13:44:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 13:44:44 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096764706.11996.77.camel@gaston> References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> <1096764706.11996.77.camel@gaston> Message-ID: <1096775084.9539.4.camel@gaston> > I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem > properly functionning on this box after this fix ?). I think the > numbering of the Shims can change from firmware to firmware, it's > more probably one of the USBs. There is code in pmac_feature.c to > power up the GEM (but only if it has a device-node). Ok, after digging in the OF code, it seems that on machines without a PCI-X bridge, shim 6 is just not used and the stuff is really upset when we probe it. K2 is a weird beast that needs care... Ben. From schwab at suse.de Sun Oct 3 21:52:54 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 13:52:54 +0200 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096764706.11996.77.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 03 Oct 2004 10:51:46 +1000") References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> <1096764706.11996.77.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem > properly functionning on this box after this fix ?). It appears to be. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From schwab at suse.de Sun Oct 3 21:59:47 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 13:59:47 +0200 Subject: PM72 works also on PowerMac7,3 Message-ID: The therm_pm72 driver appears to work fine on the PowerMac7,3. Andreas. Signed-off-by: Andreas Schwab --- linux-2.6/drivers/macintosh/therm_pm72.c.~1~ 2004-08-19 11:31:30.000000000 +0200 +++ linux-2.6/drivers/macintosh/therm_pm72.c 2004-10-03 13:55:22.361631501 +0200 @@ -1301,7 +1301,8 @@ static int __init therm_pm72_init(void) { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Oct 3 22:06:45 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 22:06:45 +1000 Subject: PM72 works also on PowerMac7,3 In-Reply-To: References: Message-ID: <1096805205.9514.17.camel@gaston> On Sun, 2004-10-03 at 21:59, Andreas Schwab wrote: > The therm_pm72 driver appears to work fine on the PowerMac7,3. Before commiting this, I'd rather make sure the code & fan IDs is actually the same in Darwin, also, just allowing the 7,3 may enable the code on the new water cooling machines. Before doing so, I'd rather make sure we get that right too. I'm waiting for one of these to be delivered by Apple, they seem to take ages, but hopefully, it should be there soon. Ben. From schwab at suse.de Sun Oct 3 22:20:43 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 14:20:43 +0200 Subject: Properly recognize PowerMac7,3 Message-ID: Make the PowerMac7,3 no longer unknown. Andreas. Signed-off-by: Andreas Schwab --- linux-2.6/arch/ppc64/kernel/pmac_feature.c.~1~ 2004-09-28 00:28:34.000000000 +0200 +++ linux-2.6/arch/ppc64/kernel/pmac_feature.c 2004-10-03 14:17:03.458461540 +0200 @@ -343,6 +343,10 @@ static struct pmac_mb_def pmac_mb_defs[] PMAC_TYPE_POWERMAC_G5, g5_features, 0, }, + { "PowerMac7,3", "PowerMac G5", + PMAC_TYPE_POWERMAC_G5, g5_features, + 0, + }, { "RackMac3,1", "XServe G5", PMAC_TYPE_POWERMAC_G5, g5_features, 0, -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From grave at ipno.in2p3.fr Mon Oct 4 17:59:44 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 07:59:44 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= XServe Node running a debian with only one processor In-Reply-To: <415D85D1.4040701@austin.ibm.com> (from olof@austin.ibm.com on Fri Oct 1 18:29:05 2004) References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> <1096646654l.2901l.2l@ipnnarval> <415D85D1.4040701@austin.ibm.com> Message-ID: <1096876784l.19627l.4l@ipnnarval> Here are a few informations : console output in the joined file I use a cross compiler ppc32 -> ppc64 gcc-3.4.1 GNU ld version 2.15 kernel from ftp.kernel.org 2.6.6 patch (had to apply it reversed because of the initial diff I think) : diff -ur linux-2.6.6-working/arch/ppc64/kernel/pmac_feature.c linux-2.6.6/arch/ppc64/kernel/pmac_feature.c --- linux-2.6.6-working/arch/ppc64/kernel/pmac_feature.c 2004-05-13 17:00:12.000000000 -0600 +++ linux-2.6.6/arch/ppc64/kernel/pmac_feature.c 2004-05-09 20:32:54.000000000 -0600 @@ -343,10 +343,6 @@ PMAC_TYPE_POWERMAC_G5, g5_features, 0, }, - { "RackMac3,1", "XServe G5", - PMAC_TYPE_POWERMAC_G5, g5_features, - 0, - }, }; /* diff -ur linux-2.6.6-working/arch/ppc64/kernel/setup.c linux-2.6.6/ arch/ppc64/kernel/setup.c --- linux-2.6.6-working/arch/ppc64/kernel/setup.c 2004-05-13 16:06:33.000000000 -0600 +++ linux-2.6.6/arch/ppc64/kernel/setup.c 2004-05-09 20:32:29.000000000 -0600 @@ -547,7 +547,7 @@ int __init ppc_init(void) { /* clear the progress line */ - if(ppc_md.progress) ppc_md.progress(" ", 0xffff); + ppc_md.progress(" ", 0xffff); if (ppc_md.init != NULL) { ppc_md.init(); -------------- next part -------------- Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount-cache hash table entries: 256 (order: 0, 4096 bytes) POSIX conformance testing by UNIFIX PowerMac SMP probe found 2 cpus Processor 1 found. Synchronizing timebase Got ack score 299, offset 1000 score 299, offset 500 score -299, offset 250 score 299, offset 375 score -299, offset 312 score -299, offset 343 score -299, offset 359 score -299, offset 367 score -283, offset 371 score -247, offset 373 score 133, offset 374 score -239, offset 373 Min 373 (score -237), Max 374 (score 129) Final offset: 374 (127/300) Brought up 2 CPUs a few seconds and : [c0000000000172f4] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to k00025ef1c0[1] 'swapper' THREAD: c0000000025e8000 CPU: 0 GPR00: C000000000077E90 C0000000025EBAE0 C00000000045EA58 FFFFFFFFFFFFFFFF GPR04: 0000000000000DE7 FFFFFFFFFFFFFFFF C000000000397200 C000000000397218 GPR08: 0000000000000000 C000000000359180 0000000000000001 C0000000004AE730 GPR12: 0000000088004044 C000000000308000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000001000 GPR20: C0000000004B16E0 000000000000007F 0000000000000008 000000000000001B GPR24: C00000007DFCDD80 C0000000025EBBE0 0000000000000036 0000000000000080 GPR28: C0000000025EBBE0 C0000000004364F0 C0000000003A5628 0000000000000000 NIP [c000000000077e9c] .smp_call_function_all_cpus+0x7c/0x98 LR [c000000000077e90] .smp_call_function_all_cpus+0x70/0x98 Call Trace: [c000000000079c28] .do_tune_cpucache+0xb4/0x3fc [c00000000007a050] .enable_cpucache+0xe0/0x118 [c00000000007a7b0] .kmem_cache_create+0x728/0x79c [c0000000002f5448] .sk_init+0x30/0xdc [c0000000002f539c] .sock_init+0x3c/0xb8 [c00000000000c6ec] .init+0x238/0x43c [c0000000000172f4] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to kill init! smp_call_function on cpu 0: other cpus not responding (0) Rebooting in 180 seconds.. From grave at ipno.in2p3.fr Mon Oct 4 18:48:26 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 08:48:26 +0000 Subject: discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... Message-ID: <1096879706l.20867l.0l@ipnnarval> http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things going right ! One more time sorry... From benh at kernel.crashing.org Mon Oct 4 18:55:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 18:55:29 +1000 Subject: discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096879706l.20867l.0l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> Message-ID: <1096880129.9514.70.camel@gaston> On Mon, 2004-10-04 at 18:48, grave wrote: > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > going right ! > > One more time sorry... Hrm, that should be in Linus tree already... Ben. From grave at ipno.in2p3.fr Mon Oct 4 22:31:21 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 12:31:21 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096880129.9514.70.camel@gaston> (from benh@kernel.crashing.org on Mon Oct 4 10:55:29 2004) References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> Message-ID: <1096893081l.23876l.0l@ipnnarval> On 04.10.2004 10:55:29, Benjamin Herrenschmidt wrote: > On Mon, 2004-10-04 at 18:48, grave wrote: > > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > > going right ! > > > > One more time sorry... > > Hrm, that should be in Linus tree already... Not in the 2.6.6 tree from www.kernel.org It's present in 2.6.8.1 but this one crash at boot (see attached file). This kernel also crash if I use the nosmp option xavier -------------- next part -------------- Min 8 (score -13), Max 9 (score 51) Final offset: 8 (9/300) Brought up 2 CPUs NET: Registered protocol family 16 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 POWERMAC NIP: C0000000002DA15C XER: 0000000000000000 LR: C00000000000C600 REGS: c0000000027e7be0 TRAP: 0300 Not tainted (2.6.8.1) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 0000000000000000, DSISR: 0000000008000000 TASK: c0000000027e1200[1] 'swapper' THREAD: c0000000027e4000 CPU: 0 GPR00: C00000000000C600 C0000000027E7E60 C000000000437E78 C0000000002AEC28 GPR04: 000000000000FFFF 0000000000000000 C000000000493C48 C00000007DE5BD78 GPR08: 0000000000000002 0000000000000000 0000000000000002 0000000000000000 GPR12: 0000000028000042 C000000000304000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000220000 0000000000230000 0000000001400000 GPR24: C000000000304000 C000000000435008 C0000000002F4348 C0000000002F8268 GPR28: 0000000000000000 C000000000436420 C000000000364260 C0000000002F7F30 NIP [c0000000002da15c] .ppc_init+0x30/0xa4 LR [c00000000000c600] .init+0x234/0x428 Call Trace: [c0000000027e7e60] [c0000000027e7ef0] 0xc0000000027e7ef0 (unreliable) [c0000000027e7ef0] [c00000000000c600] .init+0x234/0x428 [c0000000027e7f90] [c000000000017734] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to kill init! <0>Rebooting in 180 seconds.. From benh at kernel.crashing.org Mon Oct 4 23:32:23 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:32:23 +1000 Subject: =?iso-8859-1?q?Re=3A_Re=C2=A0=3A_discovered_the_patch_pages_and_?= =?iso-8859-1?q?how_it_work_on=0D=0A=09penguinppc64=2Eorg_sorry_for_th?= =?iso-8859-1?q?e_previous_mail=2E=2E=2E?= In-Reply-To: <1096893081l.23876l.0l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> Message-ID: <1096896743.9516.84.camel@gaston> On Mon, 2004-10-04 at 22:31, grave wrote: > On 04.10.2004 10:55:29, Benjamin Herrenschmidt wrote: > > On Mon, 2004-10-04 at 18:48, grave wrote: > > > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > > > going right ! > > > > > > One more time sorry... > > > > Hrm, that should be in Linus tree already... > Not in the 2.6.6 tree from www.kernel.org > > It's present in 2.6.8.1 but this one crash at boot (see attached file). > This kernel also crash if I use the nosmp option Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk snapshot of 2.6.9 Ben. From grave at ipno.in2p3.fr Mon Oct 4 23:48:28 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 13:48:28 +0000 Subject: =?iso-8859-1?q?Re=A0=3A_Re=A0=3A?= discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096896743.9516.84.camel@gaston> (from benh@kernel.crashing.org on Mon Oct 4 15:32:23 2004) References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> Message-ID: <1096897708l.24855l.1l@ipnnarval> > Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk > snapshot of 2.6.9 I also tryed with the bk tree (2.6.9-rc1-ames) it also crashed... I'll retry in order to send you a log of the crash... Where can I get the 2.6.9-rc3 tree ? I didn't find where it is ? I'm trying to have something "better" than 2.6.6 in order to have termal management. xavier From benh at kernel.crashing.org Mon Oct 4 23:45:50 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:45:50 +1000 Subject: =?iso-8859-1?q?Re=3A_Re=C2=A0=3A_Re=C2=A0=3A_discovered_the_patc?= =?iso-8859-1?q?h_pages_and_how_it_work_on=0D=0A=09penguinppc64=2Eorg_?= =?iso-8859-1?q?sorry_for_the_previous_mail=2E=2E=2E?= In-Reply-To: <1096897708l.24855l.1l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> <1096897708l.24855l.1l@ipnnarval> Message-ID: <1096897549.23141.93.camel@gaston> On Mon, 2004-10-04 at 23:48, grave wrote: > > Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk > > snapshot of 2.6.9 > > I also tryed with the bk tree (2.6.9-rc1-ames) it also crashed... > I'll retry in order to send you a log of the crash... ames ? just use mainstream > Where can I get the 2.6.9-rc3 tree ? I didn't find where it is ? kernel.org ? > I'm trying to have something "better" than 2.6.6 in order to have > termal management. > > xavier -- Benjamin Herrenschmidt From benh at kernel.crashing.org Mon Oct 4 23:46:54 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:46:54 +1000 Subject: =?iso-8859-1?q?Re=C2?= =?iso-8859-1?q?=C2=A0=3A?= =?iso-8859-1?q?Re=C2?= =?iso-8859-1?q?=C2=A0=3A_discovered?= the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096897549.23141.93.camel@gaston> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> <1096897708l.24855l.1l@ipnnarval> <1096897549.23141.93.camel@gaston> Message-ID: <1096897613.9539.95.camel@gaston> On Mon, 2004-10-04 at 23:45, Benjamin Herrenschmidt wrote: > > I'm trying to have something "better" than 2.6.6 in order to have > > termal management. BTW. Thermal control isn't there yet for xserve's ... soon hopefully Ben. From moilanen at austin.ibm.com Tue Oct 5 05:43:05 2004 From: moilanen at austin.ibm.com (moilanen at austin.ibm.com) Date: Mon, 4 Oct 2004 14:43:05 -0500 Subject: [PATCH 1/1] rtas_flash_4gig Message-ID: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> We should probably check to make sure that all of the flash list headers are above 4gig. Not just the first one. We could see this situation happen if we are low on memory and get a paged alloc'd that's over the 4 gig boundary. Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/rtas.c~rtas_flash_4gig arch/ppc64/kernel/rtas.c --- linux-2.6-bk/arch/ppc64/kernel/rtas.c~rtas_flash_4gig Mon Oct 4 10:46:46 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/rtas.c Mon Oct 4 14:22:31 2004 @@ -338,6 +338,12 @@ rtas_flash_firmware(void) f->next = (struct flash_block_list *)virt_to_abs(f->next); else f->next = NULL; + + if (f->next >= 4UL*1024*1024*1024) { + printk(KERN_ALERT "FLASH: aborted...flash list header addr above 4GB\n"); + return; + } + /* make num_blocks into the version/length field */ f->num_blocks = (FLASH_BLOCK_LIST_VERSION << 56) | ((f->num_blocks+1)*16); } _ From schwab at suse.de Tue Oct 5 06:51:55 2004 From: schwab at suse.de (Andreas Schwab) Date: Mon, 04 Oct 2004 22:51:55 +0200 Subject: Sound on G5 Message-ID: Is anyone already working on sound support for the PowerMac G5, by chance? That's actually the only thing still missing. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Tue Oct 5 11:25:03 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Oct 2004 11:25:03 +1000 Subject: Sound on G5 In-Reply-To: References: Message-ID: <1096939502.24584.6.camel@gaston> On Tue, 2004-10-05 at 06:51, Andreas Schwab wrote: > Is anyone already working on sound support for the PowerMac G5, by chance? > That's actually the only thing still missing. Nobody really seriously ATM. One of the main issue is that the darwin driver abuses apple "do-platform-*" shit. It's a mecanism they invented to put sort-of "scripts" (in binary form) in the device-tree that can contains elementary ops such as write GPIOs, I2C, etc... This is extremely messy and difficult to parse. I have written the basis for parsing them, but interpreting them is even more shitty as the actual implementation of each ops sort-of depends on the target object. It's really a piece-of-shit imho. So we could go that way and complete my "interpreter" or just hard code all of the GPIOs we need in the driver hoping apple don't shuffle them too much in upcoming models... Ben. From david at gibson.dropbear.id.au Tue Oct 5 13:13:41 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 13:13:41 +1000 Subject: [PPC64] Squash EEH warnings Message-ID: <20041005031341.GA3695@zax> Andrew, please apply: A slightly non-ideal version of the recent patch which fixed EEH being a no-op went in. The srcsave variable in eeh_memcpy_to_io() is now never referenced on non-pSeries machines, and so spews hundreds of warnings. The variable doesn't actually accomplish anything, so this patch gets rid of it. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/eeh.h =================================================================== --- working-2.6.orig/include/asm-ppc64/eeh.h 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/include/asm-ppc64/eeh.h 2004-10-05 13:09:24.730992368 +1000 @@ -196,7 +196,6 @@ static inline void eeh_memcpy_fromio(void *dest, const volatile void __iomem *src, unsigned long n) { void *vsrc = (void __force *) src; void *destsave = dest; - const volatile void __iomem *srcsave = src; unsigned long nsave = n; while(n && (!EEH_CHECK_ALIGN(vsrc, 4) || !EEH_CHECK_ALIGN(dest, 4))) { @@ -227,7 +226,7 @@ */ if ((nsave >= 4) && (EEH_POSSIBLE_ERROR((*((u32 *) destsave+nsave-4)), u32))) { - eeh_check_failure(srcsave, (*((u32 *) destsave+nsave-4))); + eeh_check_failure(src, (*((u32 *) destsave+nsave-4))); } } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Oct 5 15:26:27 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 15:26:27 +1000 Subject: [TRIVIAL, PPC64] Remove redundant #ifdef CONFIG_ALTIVEC Message-ID: <20041005052627.GD3695@zax> Andrew, please apply: arch/ppc64/kernel/process.c has an #ifdef CONFIG_ALTIVEC within an #ifdef CONFIG_ALTIVEC. This patch removes the inner one. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/process.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/process.c 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/process.c 2004-10-05 15:18:56.581996496 +1000 @@ -147,7 +147,6 @@ */ void flush_altivec_to_thread(struct task_struct *tsk) { -#ifdef CONFIG_ALTIVEC if (tsk->thread.regs) { preempt_disable(); if (tsk->thread.regs->msr & MSR_VEC) { @@ -158,7 +157,6 @@ } preempt_enable(); } -#endif } int dump_task_altivec(struct pt_regs *regs, elf_vrregset_t *vrregs) -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Oct 5 16:42:56 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 16:42:56 +1000 Subject: [PPC64] xmon sparse cleanups Message-ID: <20041005064255.GF3695@zax> Andrew, please apply: This patch removes many sparse warnings from the xmon code. Mostly K&R function declarations and 0-instead-of-NULLs. I believe this removes all save one sparse error in xmon, excepting those inherited from header files. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/xmon/xmon.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/xmon.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/xmon.c 2004-10-05 16:31:01.822963256 +1000 @@ -645,7 +645,7 @@ for (i = 0; i < NBPTS; ++i, ++bp) if (bp->enabled && pc == bp->address) return bp; - return 0; + return NULL; } static struct bpt *in_breakpoint_table(unsigned long nip, unsigned long *offp) @@ -1582,7 +1582,7 @@ extern char dec_exc; void -super_regs() +super_regs(void) { int cmd; unsigned long val; @@ -1816,7 +1816,7 @@ ""; void -memex() +memex(void) { int cmd, inc, i, nslash; unsigned long n; @@ -1967,7 +1967,7 @@ } int -bsesc() +bsesc(void) { int c; @@ -1985,7 +1985,7 @@ || ('a' <= (c) && (c) <= 'f') \ || ('A' <= (c) && (c) <= 'F')) void -dump() +dump(void) { int c; @@ -2150,7 +2150,7 @@ static unsigned mask; void -memlocate() +memlocate(void) { unsigned a, n; unsigned char val[4]; @@ -2183,7 +2183,7 @@ static unsigned long mlim = 0xffffffff; void -memzcan() +memzcan(void) { unsigned char v; unsigned a; @@ -2212,7 +2212,7 @@ /* Input scanning routines */ int -skipbl() +skipbl(void) { int c; @@ -2237,8 +2237,7 @@ }; int -scanhex(vp) -unsigned long *vp; +scanhex(unsigned long *vp) { int c, d; unsigned long v; @@ -2322,7 +2321,7 @@ } void -scannl() +scannl(void) { int c; @@ -2365,13 +2364,13 @@ static char *lineptr; void -flush_input() +flush_input(void) { lineptr = NULL; } int -inchar() +inchar(void) { if (lineptr == NULL || *lineptr == 0) { if (fgets(line, sizeof(line), stdin) == NULL) { @@ -2384,8 +2383,7 @@ } void -take_input(str) -char *str; +take_input(char *str) { lineptr = str; } Index: working-2.6/arch/ppc64/xmon/ppc-opc.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/ppc-opc.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/ppc-opc.c 2004-10-05 16:41:20.355047248 +1000 @@ -20,6 +20,7 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +#include #include "nonstdio.h" #include "ppc.h" @@ -110,12 +111,12 @@ /* The zero index is used to indicate the end of the list of operands. */ #define UNUSED 0 - { 0, 0, 0, 0, 0 }, + { 0, 0, NULL, NULL, 0 }, /* The BA field in an XL form instruction. */ #define BA UNUSED + 1 #define BA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_CR }, + { 5, 16, NULL, NULL, PPC_OPERAND_CR }, /* The BA field in an XL form instruction when it must be the same as the BT field in the same instruction. */ @@ -125,7 +126,7 @@ /* The BB field in an XL form instruction. */ #define BB BAT + 1 #define BB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_CR }, + { 5, 11, NULL, NULL, PPC_OPERAND_CR }, /* The BB field in an XL form instruction when it must be the same as the BA field in the same instruction. */ @@ -168,21 +169,21 @@ /* The BF field in an X or XL form instruction. */ #define BF BDPA + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR }, /* An optional BF field. This is used for comparison instructions, in which an omitted BF field is taken as zero. */ #define OBF BF + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, /* The BFA field in an X or XL form instruction. */ #define BFA OBF + 1 - { 3, 18, 0, 0, PPC_OPERAND_CR }, + { 3, 18, NULL, NULL, PPC_OPERAND_CR }, /* The BI field in a B form or XL form instruction. */ #define BI BFA + 1 #define BI_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_CR }, + { 5, 16, NULL, NULL, PPC_OPERAND_CR }, /* The BO field in a B form instruction. Certain values are illegal. */ @@ -197,36 +198,36 @@ /* The BT field in an X or XL form instruction. */ #define BT BOE + 1 - { 5, 21, 0, 0, PPC_OPERAND_CR }, + { 5, 21, NULL, NULL, PPC_OPERAND_CR }, /* The condition register number portion of the BI field in a B form or XL form instruction. This is used for the extended conditional branch mnemonics, which set the lower two bits of the BI field. This field is optional. */ #define CR BT + 1 - { 3, 18, 0, 0, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, + { 3, 18, NULL, NULL, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, /* The CRB field in an X form instruction. */ #define CRB CR + 1 - { 5, 6, 0, 0, 0 }, + { 5, 6, NULL, NULL, 0 }, /* The CRFD field in an X form instruction. */ #define CRFD CRB + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR }, /* The CRFS field in an X form instruction. */ #define CRFS CRFD + 1 - { 3, 0, 0, 0, PPC_OPERAND_CR }, + { 3, 0, NULL, NULL, PPC_OPERAND_CR }, /* The CT field in an X form instruction. */ #define CT CRFS + 1 - { 5, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 5, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The D field in a D form instruction. This is a displacement off a register, and implies that the next operand is a register in parentheses. */ #define D CT + 1 - { 16, 0, 0, 0, PPC_OPERAND_PARENS | PPC_OPERAND_SIGNED }, + { 16, 0, NULL, NULL, PPC_OPERAND_PARENS | PPC_OPERAND_SIGNED }, /* The DE field in a DE form instruction. This is like D, but is 12 bits only. */ @@ -252,40 +253,40 @@ /* The E field in a wrteei instruction. */ #define E DS + 1 - { 1, 15, 0, 0, 0 }, + { 1, 15, NULL, NULL, 0 }, /* The FL1 field in a POWER SC form instruction. */ #define FL1 E + 1 - { 4, 12, 0, 0, 0 }, + { 4, 12, NULL, NULL, 0 }, /* The FL2 field in a POWER SC form instruction. */ #define FL2 FL1 + 1 - { 3, 2, 0, 0, 0 }, + { 3, 2, NULL, NULL, 0 }, /* The FLM field in an XFL form instruction. */ #define FLM FL2 + 1 - { 8, 17, 0, 0, 0 }, + { 8, 17, NULL, NULL, 0 }, /* The FRA field in an X or A form instruction. */ #define FRA FLM + 1 #define FRA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_FPR }, + { 5, 16, NULL, NULL, PPC_OPERAND_FPR }, /* The FRB field in an X or A form instruction. */ #define FRB FRA + 1 #define FRB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_FPR }, + { 5, 11, NULL, NULL, PPC_OPERAND_FPR }, /* The FRC field in an A form instruction. */ #define FRC FRB + 1 #define FRC_MASK (0x1f << 6) - { 5, 6, 0, 0, PPC_OPERAND_FPR }, + { 5, 6, NULL, NULL, PPC_OPERAND_FPR }, /* The FRS field in an X form instruction or the FRT field in a D, X or A form instruction. */ #define FRS FRC + 1 #define FRT FRS - { 5, 21, 0, 0, PPC_OPERAND_FPR }, + { 5, 21, NULL, NULL, PPC_OPERAND_FPR }, /* The FXM field in an XFX instruction. */ #define FXM FRS + 1 @@ -298,11 +299,11 @@ /* The L field in a D or X form instruction. */ #define L FXM4 + 1 - { 1, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 1, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The LEV field in a POWER SC form instruction. */ #define LEV L + 1 - { 7, 5, 0, 0, 0 }, + { 7, 5, NULL, NULL, 0 }, /* The LI field in an I form instruction. The lower two bits are forced to zero. */ @@ -316,24 +317,24 @@ /* The LS field in an X (sync) form instruction. */ #define LS LIA + 1 - { 2, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 2, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The MB field in an M form instruction. */ #define MB LS + 1 #define MB_MASK (0x1f << 6) - { 5, 6, 0, 0, 0 }, + { 5, 6, NULL, NULL, 0 }, /* The ME field in an M form instruction. */ #define ME MB + 1 #define ME_MASK (0x1f << 1) - { 5, 1, 0, 0, 0 }, + { 5, 1, NULL, NULL, 0 }, /* The MB and ME fields in an M form instruction expressed a single operand which is a bitmask indicating which bits to select. This is a two operand form using PPC_OPERAND_NEXT. See the description in opcode/ppc.h for what this means. */ #define MBE ME + 1 - { 5, 6, 0, 0, PPC_OPERAND_OPTIONAL | PPC_OPERAND_NEXT }, + { 5, 6, NULL, NULL, PPC_OPERAND_OPTIONAL | PPC_OPERAND_NEXT }, { 32, 0, insert_mbe, extract_mbe, 0 }, /* The MB or ME field in an MD or MDS form instruction. The high @@ -345,7 +346,7 @@ /* The MO field in an mbar instruction. */ #define MO MB6 + 1 - { 5, 21, 0, 0, 0 }, + { 5, 21, NULL, NULL, 0 }, /* The NB field in an X form instruction. The value 32 is stored as 0. */ @@ -361,34 +362,34 @@ /* The RA field in an D, DS, DQ, X, XO, M, or MDS form instruction. */ #define RA NSI + 1 #define RA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_GPR }, + { 5, 16, NULL, NULL, PPC_OPERAND_GPR }, /* The RA field in the DQ form lq instruction, which has special value restrictions. */ #define RAQ RA + 1 - { 5, 16, insert_raq, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_raq, NULL, PPC_OPERAND_GPR }, /* The RA field in a D or X form instruction which is an updating load, which means that the RA field may not be zero and may not equal the RT field. */ #define RAL RAQ + 1 - { 5, 16, insert_ral, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ral, NULL, PPC_OPERAND_GPR }, /* The RA field in an lmw instruction, which has special value restrictions. */ #define RAM RAL + 1 - { 5, 16, insert_ram, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ram, NULL, PPC_OPERAND_GPR }, /* The RA field in a D or X form instruction which is an updating store or an updating floating point load, which means that the RA field may not be zero. */ #define RAS RAM + 1 - { 5, 16, insert_ras, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ras, NULL, PPC_OPERAND_GPR }, /* The RB field in an X, XO, M, or MDS form instruction. */ #define RB RAS + 1 #define RB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_GPR }, + { 5, 11, NULL, NULL, PPC_OPERAND_GPR }, /* The RB field in an X form instruction when it must be the same as the RS field in the instruction. This is used for extended @@ -402,22 +403,22 @@ #define RS RBS + 1 #define RT RS #define RT_MASK (0x1f << 21) - { 5, 21, 0, 0, PPC_OPERAND_GPR }, + { 5, 21, NULL, NULL, PPC_OPERAND_GPR }, /* The RS field of the DS form stq instruction, which has special value restrictions. */ #define RSQ RS + 1 - { 5, 21, insert_rsq, 0, PPC_OPERAND_GPR }, + { 5, 21, insert_rsq, NULL, PPC_OPERAND_GPR }, /* The RT field of the DQ form lq instruction, which has special value restrictions. */ #define RTQ RSQ + 1 - { 5, 21, insert_rtq, 0, PPC_OPERAND_GPR }, + { 5, 21, insert_rtq, NULL, PPC_OPERAND_GPR }, /* The SH field in an X or M form instruction. */ #define SH RTQ + 1 #define SH_MASK (0x1f << 11) - { 5, 11, 0, 0, 0 }, + { 5, 11, NULL, NULL, 0 }, /* The SH field in an MD form instruction. This is split. */ #define SH6 SH + 1 @@ -426,12 +427,12 @@ /* The SI field in a D form instruction. */ #define SI SH6 + 1 - { 16, 0, 0, 0, PPC_OPERAND_SIGNED }, + { 16, 0, NULL, NULL, PPC_OPERAND_SIGNED }, /* The SI field in a D form instruction when we accept a wide range of positive values. */ #define SISIGNOPT SI + 1 - { 16, 0, 0, 0, PPC_OPERAND_SIGNED | PPC_OPERAND_SIGNOPT }, + { 16, 0, NULL, NULL, PPC_OPERAND_SIGNED | PPC_OPERAND_SIGNOPT }, /* The SPR field in an XFX form instruction. This is flipped--the lower 5 bits are stored in the upper 5 and vice- versa. */ @@ -443,25 +444,25 @@ /* The BAT index number in an XFX form m[ft]ibat[lu] instruction. */ #define SPRBAT SPR + 1 #define SPRBAT_MASK (0x3 << 17) - { 2, 17, 0, 0, 0 }, + { 2, 17, NULL, NULL, 0 }, /* The SPRG register number in an XFX form m[ft]sprg instruction. */ #define SPRG SPRBAT + 1 #define SPRG_MASK (0x3 << 16) - { 2, 16, 0, 0, 0 }, + { 2, 16, NULL, NULL, 0 }, /* The SR field in an X form instruction. */ #define SR SPRG + 1 - { 4, 16, 0, 0, 0 }, + { 4, 16, NULL, NULL, 0 }, /* The STRM field in an X AltiVec form instruction. */ #define STRM SR + 1 #define STRM_MASK (0x3 << 21) - { 2, 21, 0, 0, 0 }, + { 2, 21, NULL, NULL, 0 }, /* The SV field in a POWER SC form instruction. */ #define SV STRM + 1 - { 14, 2, 0, 0, 0 }, + { 14, 2, NULL, NULL, 0 }, /* The TBR field in an XFX form instruction. This is like the SPR field, but it is optional. */ @@ -471,52 +472,52 @@ /* The TO field in a D or X form instruction. */ #define TO TBR + 1 #define TO_MASK (0x1f << 21) - { 5, 21, 0, 0, 0 }, + { 5, 21, NULL, NULL, 0 }, /* The U field in an X form instruction. */ #define U TO + 1 - { 4, 12, 0, 0, 0 }, + { 4, 12, NULL, NULL, 0 }, /* The UI field in a D form instruction. */ #define UI U + 1 - { 16, 0, 0, 0, 0 }, + { 16, 0, NULL, NULL, 0 }, /* The VA field in a VA, VX or VXR form instruction. */ #define VA UI + 1 #define VA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_VR }, + { 5, 16, NULL, NULL, PPC_OPERAND_VR }, /* The VB field in a VA, VX or VXR form instruction. */ #define VB VA + 1 #define VB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_VR }, + { 5, 11, NULL, NULL, PPC_OPERAND_VR }, /* The VC field in a VA form instruction. */ #define VC VB + 1 #define VC_MASK (0x1f << 6) - { 5, 6, 0, 0, PPC_OPERAND_VR }, + { 5, 6, NULL, NULL, PPC_OPERAND_VR }, /* The VD or VS field in a VA, VX, VXR or X form instruction. */ #define VD VC + 1 #define VS VD #define VD_MASK (0x1f << 21) - { 5, 21, 0, 0, PPC_OPERAND_VR }, + { 5, 21, NULL, NULL, PPC_OPERAND_VR }, /* The SIMM field in a VX form instruction. */ #define SIMM VD + 1 - { 5, 16, 0, 0, PPC_OPERAND_SIGNED}, + { 5, 16, NULL, NULL, PPC_OPERAND_SIGNED}, /* The UIMM field in a VX form instruction. */ #define UIMM SIMM + 1 - { 5, 16, 0, 0, 0 }, + { 5, 16, NULL, NULL, 0 }, /* The SHB field in a VA form instruction. */ #define SHB UIMM + 1 - { 4, 6, 0, 0, 0 }, + { 4, 6, NULL, NULL, 0 }, /* The other UIMM field in a EVX form instruction. */ #define EVUIMM SHB + 1 - { 5, 11, 0, 0, 0 }, + { 5, 11, NULL, NULL, 0 }, /* The other UIMM field in a half word EVX form instruction. */ #define EVUIMM_2 EVUIMM + 1 @@ -533,11 +534,11 @@ /* The WS field. */ #define WS EVUIMM_8 + 1 #define WS_MASK (0x7 << 11) - { 3, 11, 0, 0, 0 }, + { 3, 11, NULL, NULL, 0 }, /* The L field in an mtmsrd instruction */ #define MTMSRD_L WS + 1 - { 1, 16, 0, 0, PPC_OPERAND_OPTIONAL }, + { 1, 16, NULL, NULL, PPC_OPERAND_OPTIONAL }, }; Index: working-2.6/arch/ppc64/xmon/start.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/start.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/start.c 2004-10-05 16:33:50.355028808 +1000 @@ -173,7 +173,7 @@ c = xmon_getchar(); if (c == -1) { if (p == str) - return 0; + return NULL; break; } *p++ = c; -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From grave at ipno.in2p3.fr Tue Oct 5 18:41:04 2004 From: grave at ipno.in2p3.fr (grave) Date: Tue, 05 Oct 2004 08:41:04 +0000 Subject: xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 Message-ID: <1096965664l.7230l.0l@ipnnarval> Hi, I've tryed both kernel and got crashes (see attached files). Do I missed a patch ? xavier PS:2.6.6 + smp patch run fine -------------- next part -------------- PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub nvram_init: Could not find nvram partition for nvram buffered error logging. rtasd: no RTAS on system devfs: 2004-01-31 Richard Gooch (rgooch at atnf.csiro.au) devfs: boot_options: 0x0 Oops: Machine check, sig: 0 [#1] SMP NR_CPUS=2 POWERMAC NIP: C00000000014A640 XER: 0000000000000000 LR: C00000000014A614 REGS: c000000001a17a50 TRAP: 0200 Not tainted (2.6.9-rc3-bk4) MSR: 9000000000101032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c000000001a110c0[1] 'swapper' THREAD: c000000001a14000 CPU: 0 GPR00: FFFFFFFFFFFFFFFF C000000001A17CD0 C00000000043C390 00000000000000FF GPR04: C00000000FEFB400 0000000000000010 C0000000002AAAA0 C000000000468298 GPR08: C000000000468268 E0000000828CD000 C00000000045AD5C 9000000000009032 GPR12: 0000000028000042 C000000000355780 0000000000000000 0000000000000000 GPR16: 0000000001400000 00000000016FB720 00000000016FB720 BFFFFFFFFEC00000 GPR20: 000000000023FD58 0000000000000000 0000000001A6A020 00000000016FB998 GPR24: 9000000000009032 0000000000000032 C00000000043F730 C000000000352D58 GPR28: C00000000043F728 0000000000000000 C0000000003D5718 C00000000043F730 NIP [c00000000014a640] .i8042_flush+0x6c/0x15c LR [c00000000014a614] .i8042_flush+0x40/0x15c Call Trace: [c000000001a17cd0] [c000000000355780] 0xc000000000355780 (unreliable) [c000000001a17d80] [c00000000014b240] .i8042_controller_init+0x1c/0x1e4 [c000000001a17e10] [c0000000002f4164] .i8042_init+0xe8/0x64c [c000000001a17ef0] [c00000000000c688] .init+0x234/0x440 [c000000001a17f90] [c0000000000172b8] .kernel_thread+0x4c/0x6c <0>Kernel panic - not syncing: Attempted to kill init! <0>Rebooting in 180 seconds.. -------------- next part -------------- PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub nvram_init: Could not find nvram partition for nvram buffered error logging. rtasd: no RTAS on system devfs: 2004-01-31 Richard Gooch (rgooch at atnf.csiro.au) devfs: boot_options: 0x0 Oops: Machine check, sig: 0 [#1] SMP NR_CPUS=2 POWERMAC NIP: C00000000014A244 XER: 0000000000000000 LR: C00000000014A218 REGS: c000000001a17a50 TRAP: 0200 Not tainted (2.6.9-rc3) MSR: 9000000000101032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c000000001a110c0[1] 'swapper' THREAD: c000000001a14000 CPU: 0 GPR00: FFFFFFFFFFFFFFFF C000000001A17CD0 C0000000004383A8 00000000000000FF GPR04: C00000000FEED3C0 0000000000000010 C0000000002A7A48 C000000000464298 GPR08: C000000000464268 E0000000828CD000 C000000000456D64 9000000000009032 GPR12: 0000000028000042 C000000000351780 0000000000000000 0000000000000000 GPR16: 0000000001400000 00000000016F8720 00000000016F8720 BFFFFFFFFEC00000 GPR20: 000000000023FD58 0000000000000000 0000000001A66020 00000000016F8998 GPR24: 9000000000009032 0000000000000032 C00000000043B730 C00000000034ED58 GPR28: C00000000043B728 0000000000000000 C0000000003D1728 C00000000043B730 NIP [c00000000014a244] .i8042_flush+0x6c/0x15c LR [c00000000014a218] .i8042_flush+0x40/0x15c Call Trace: [c000000001a17cd0] [c000000000351780] 0xc000000000351780 (unreliable) [c000000001a17d80] [c00000000014ae44] .i8042_controller_init+0x1c/0x1e4 [c000000001a17e10] [c0000000002f1164] .i8042_init+0xe8/0x64c [c000000001a17ef0] [c00000000000c688] .init+0x234/0x440 [c000000001a17f90] [c0000000000172b8] .kernel_thread+0x4c/0x6c <0>Kernel panic - not syncing: Attempted to kill init! <0>Rebooting in 180 seconds.. From benh at kernel.crashing.org Tue Oct 5 18:46:43 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Oct 2004 18:46:43 +1000 Subject: xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 In-Reply-To: <1096965664l.7230l.0l@ipnnarval> References: <1096965664l.7230l.0l@ipnnarval> Message-ID: <1096966003.24535.48.camel@gaston> On Tue, 2004-10-05 at 18:41, grave wrote: > Hi, > > I've tryed both kernel and got crashes (see attached files). > > Do I missed a patch ? > > xavier > PS:2.6.6 + smp patch run fine That's your .config You have enabled the legacy x86 keyboard support ! :) Use a g5_defconfig I'm working on a fix so that this driver stops crashing though. Ben. From grave at ipno.in2p3.fr Tue Oct 5 19:25:20 2004 From: grave at ipno.in2p3.fr (grave) Date: Tue, 05 Oct 2004 09:25:20 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 In-Reply-To: <1096966003.24535.48.camel@gaston> (from benh@kernel.crashing.org on Tue Oct 5 10:46:43 2004) References: <1096965664l.7230l.0l@ipnnarval> <1096966003.24535.48.camel@gaston> Message-ID: <1096968320l.7230l.2l@ipnnarval> Le 05.10.2004 10:46:43, Benjamin Herrenschmidt a ?crit?: > On Tue, 2004-10-05 at 18:41, grave wrote: > > Hi, > > > > I've tryed both kernel and got crashes (see attached files). > > > > Do I missed a patch ? > > > > xavier > > PS:2.6.6 + smp patch run fine > > That's your .config > > You have enabled the legacy x86 keyboard support ! :) > > Use a g5_defconfig It works now... Thanks one more time ! From schwab at suse.de Tue Oct 5 19:50:44 2004 From: schwab at suse.de (Andreas Schwab) Date: Tue, 05 Oct 2004 11:50:44 +0200 Subject: Sound on G5 In-Reply-To: <1096939502.24584.6.camel@gaston> (Benjamin Herrenschmidt's message of "Tue, 05 Oct 2004 11:25:03 +1000") References: <1096939502.24584.6.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > So we could go that way and complete my "interpreter" or just hard code > all of the GPIOs we need in the driver hoping apple don't shuffle them > too much in upcoming models... I would be happy to test anything that is available. Thanks, Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From paulus at samba.org Tue Oct 5 20:38:52 2004 From: paulus at samba.org (Paul Mackerras) Date: Tue, 5 Oct 2004 20:38:52 +1000 Subject: [PPC64] xmon sparse cleanups In-Reply-To: <20041005064255.GF3695@zax> References: <20041005064255.GF3695@zax> Message-ID: <16738.31164.464250.638432@cargo.ozlabs.ibm.com> David Gibson writes: > Andrew, please apply: > > This patch removes many sparse warnings from the xmon code. Mostly > K&R function declarations and 0-instead-of-NULLs. The trouble with this patch is that it makes ppc-opc.c diverge from the version in binutils, which is where it came from. I'd rather keep it as close as possible to that version. I have no problem with the changes to the other files. Paul. From igor at cs.wisc.edu Wed Oct 6 03:46:53 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Tue, 5 Oct 2004 12:46:53 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <3337F539-14B0-11D9-AE7A-000A95A4DC02@kernel.crashing.org> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> <3337F539-14B0-11D9-AE7A-000A95A4DC02@kernel.crashing.org> Message-ID: On Sat, 2 Oct 2004, Segher Boessenkool wrote: > > A question for the rest of you, who haven't been following this thread. > > Is there publicly available documentation on the power4 extensions, > > specifically the large page support, how it effects the HPT hashing, > > and > > the SLB, including the new instructions for maintaining it in software? > > I haven't been able to find anything yet. > > http://www-106.ibm.com/developerworks/eserver/pdfs/archpub3.pdf > > has some info, don't know if that is enough for you -- nothing > much POWER4 specific in there, but large pages are part of the > architecture, so it does talk about the instructions to handle > them etc. Thanks, this is what I was looking for. -Igor From igor at cs.wisc.edu Wed Oct 6 03:45:47 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Tue, 5 Oct 2004 12:45:47 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <20041001040325.GB12890@zax> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: One more followup on this issue, since I do have the base code working now. The problem was in the fact that do_slb_bolted code sets the large page bit in the SLB entry, but my code (and particularly hpte_insert code) did not insert a proper large page mapping. On Fri, 1 Oct 2004, David Gibson wrote: > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > As for why I thought 0xbff would work, I reasoned that > > since the highest bits are masked out in get_kernel_vsid(), and since > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > that is the same as some other VSID in 0xb region. However, I did not > > consider the bug in do_slb_bolted that you describe below. > > Yes, with that bug the collision can be with a segment anywhere, not > just in the 0xb region. > I am not convinced anymore. The lower 36 bits of the ordinal are still the same in do_slb_bolted and get_kernel_vsid. Multiplying the ordinal by the 36-bit randomizer should produce the same lower 36 bits whether or not the upper bits are different. do_slb_bolted eventually clears the upper 28 bits, before using the VSID. I no longer think there can be a conflict outside the 0xb region. Is my reasoning correct? > > > You may have seen the comment in do_slb_bolted which claims to permit > > > a full 32-bits of ESID - it's wrong. The code doesn't mask the ESID > > > down to 13 bits as get_kernel_vsid() does, but it probably should - an > > > overlarge ESID will cause collisions with VSIDs from entirely > > > different address places, which would be a Bad Thing. > > > > This must be happening, although I would still like to know why it > > misbehaves even within the valid VSID range. > > > > > > > > Actually, you should be able to allow ESIDs of up to 21 bits there (36 > > > bit VSID - 15 bits of "context"). But you will need to make sure > > > get_kernel_vsid(), or whatever you're using to calculate the VAs for > > > the hash HPTEs is updated to match - at the moment I think it will > > > mask down to 13 bits. I'm not sure if that will get you sufficiently > > > close to 0xc0... for your purposes. > > Thanks, Igor From caveman at boxacle.net Wed Oct 6 04:24:25 2004 From: caveman at boxacle.net (CAVEMAN) Date: Tue, 5 Oct 2004 13:24:25 -0500 Subject: Sound on G5 In-Reply-To: <1096939502.24584.6.camel@gaston> References: <1096939502.24584.6.camel@gaston> Message-ID: <200410051324.25817@laptop> On Monday 04 October 2004 20:25, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-05 at 06:51, Andreas Schwab wrote: > > Is anyone already working on sound support for the PowerMac G5, by > > chance? That's actually the only thing still missing. > > Nobody really seriously ATM. One of the main issue is that the darwin > driver abuses apple "do-platform-*" shit. It's a mecanism they invented > to put sort-of "scripts" (in binary form) in the device-tree that can > contains elementary ops such as write GPIOs, I2C, etc... > > This is extremely messy and difficult to parse. I have written the > basis for parsing them, but interpreting them is even more shitty as > the actual implementation of each ops sort-of depends on the target > object. > > It's really a piece-of-shit imho. > > So we could go that way and complete my "interpreter" or just hard code > all of the GPIOs we need in the driver hoping apple don't shuffle them > too much in upcoming models... I'd be willing to do some work and/or testing on this, where can I get the code? Regards, caveman From rmk+lkml at arm.linux.org.uk Wed Oct 6 17:26:59 2004 From: rmk+lkml at arm.linux.org.uk (Russell King) Date: Wed, 6 Oct 2004 08:26:59 +0100 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <1096534248.32721.36.camel@gaston>; from benh@kernel.crashing.org on Thu, Sep 30, 2004 at 06:50:48PM +1000 References: <1096534248.32721.36.camel@gaston> Message-ID: <20041006082658.A18379@flint.arm.linux.org.uk> On Thu, Sep 30, 2004 at 06:50:48PM +1000, Benjamin Herrenschmidt wrote: > +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS > static struct old_serial_port old_serial_port[] = { > SERIAL_PORT_DFNS /* defined in asm/serial.h */ > }; > - > +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) > +{ > + *count = ARRAY_SIZE(old_serial_port); > + return old_serial_port; > +} > #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) > +#endif /* ARCH_HAS_GET_LEGACY_SERIAL_PORTS */ > + What happens if 8250.c is built as a module and ARCH_HAS_GET_LEGACY_SERIAL_PORTS is defined? > diff -urN linux-2.5/include/linux/serial.h linux-maple/include/linux/serial.h > --- linux-2.5/include/linux/serial.h 2004-09-30 18:31:55.867785437 +1000 > +++ linux-maple/include/linux/serial.h 2004-09-30 15:36:57.981697919 +1000 > @@ -14,6 +14,21 @@ > #include > > /* > + * Definition of a legacy serial port > + */ > +struct old_serial_port { > + unsigned int uart; > + unsigned int baud_base; > + unsigned int port; > + unsigned int irq; > + unsigned int flags; > + unsigned char hub6; > + unsigned char io_type; > + unsigned char *iomem_base; > + unsigned short iomem_reg_shift; > +}; > + > +/* > * Counters of the input lines (CTS, DSR, RI, CD) interrupts > */ serial.h is used by userspace programs. We should not expose this structure to those programs. Instead, maybe creating an 8250.h header, or even moving the existing 8250.h header ? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core From benh at kernel.crashing.org Wed Oct 6 18:15:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Oct 2004 18:15:11 +1000 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <20041006082658.A18379@flint.arm.linux.org.uk> References: <1096534248.32721.36.camel@gaston> <20041006082658.A18379@flint.arm.linux.org.uk> Message-ID: <1097050508.21132.15.camel@gaston> On Wed, 2004-10-06 at 17:26, Russell King wrote: > On Thu, Sep 30, 2004 at 06:50:48PM +1000, Benjamin Herrenschmidt wrote: > > +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS > > static struct old_serial_port old_serial_port[] = { > > SERIAL_PORT_DFNS /* defined in asm/serial.h */ > > }; > > - > > +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) > > +{ > > + *count = ARRAY_SIZE(old_serial_port); > > + return old_serial_port; > > +} > > #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) > > +#endif /* ARCH_HAS_GET_LEGACY_SERIAL_PORTS */ > > + > > What happens if 8250.c is built as a module and > ARCH_HAS_GET_LEGACY_SERIAL_PORTS is defined? It well call get_legacy_serial_ports() which is hopefully exported by the arch code. > serial.h is used by userspace programs. We should not expose this > structure to those programs. Instead, maybe creating an 8250.h > header, or even moving the existing 8250.h header ? Hrm... ok. Or adding a #ifdef __KERNEL__ (sic !) :) I'll send you a new patch later today as I had to do another fix, we tend to "force" register_console() apparently even when we have nothing to register because we set the "ops" to all ports even those who were never configured and we test "ops" to decide wether to succeed or fail in the console setup() callback. Ben. From benh at kernel.crashing.org Wed Oct 6 19:07:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Oct 2004 19:07:44 +1000 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <20041006082658.A18379@flint.arm.linux.org.uk> References: <1096534248.32721.36.camel@gaston> <20041006082658.A18379@flint.arm.linux.org.uk> Message-ID: <1097053663.21132.56.camel@gaston> On Wed, 2004-10-06 at 17:26, Russell King wrote: > serial.h is used by userspace programs. We should not expose this > structure to those programs. Instead, maybe creating an 8250.h > header, or even moving the existing 8250.h header ? Here's a new version of that patch that moves 8250.h to include/linux, moves the definition of old_serial_ports there, and also corrects the problem I told you about with serial console. Let me know if I can send it to Andrew... Ben. diff -urN linux-2.5/drivers/serial/8250.c linux-maple/drivers/serial/8250.c --- linux-2.5/drivers/serial/8250.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250.c 2004-10-06 19:05:13.042342513 +1000 @@ -41,7 +41,7 @@ #endif #include -#include "8250.h" +#include /* * Configuration: @@ -112,11 +112,18 @@ #define SERIAL_PORT_DFNS #endif +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS static struct old_serial_port old_serial_port[] = { SERIAL_PORT_DFNS /* defined in asm/serial.h */ }; - +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) +{ + *count = ARRAY_SIZE(old_serial_port); + return old_serial_port; +} #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) +#endif /* ARCH_HAS_DYNAMIC_LEGACY_SERIAL_PORTS */ + #ifdef CONFIG_SERIAL_8250_RSA @@ -1839,22 +1846,28 @@ { struct uart_8250_port *up; static int first = 1; + struct old_serial_port *old_ports; + int count; int i; if (!first) return; first = 0; - for (i = 0, up = serial8250_ports; i < ARRAY_SIZE(old_serial_port); + old_ports = get_legacy_serial_ports(&count); + if (old_ports == NULL) + return; + + for (i = 0, up = serial8250_ports; i < count; i++, up++) { - up->port.iobase = old_serial_port[i].port; - up->port.irq = irq_canonicalize(old_serial_port[i].irq); - up->port.uartclk = old_serial_port[i].baud_base * 16; - up->port.flags = old_serial_port[i].flags; - up->port.hub6 = old_serial_port[i].hub6; - up->port.membase = old_serial_port[i].iomem_base; - up->port.iotype = old_serial_port[i].io_type; - up->port.regshift = old_serial_port[i].iomem_reg_shift; + up->port.iobase = old_ports[i].port; + up->port.irq = irq_canonicalize(old_ports[i].irq); + up->port.uartclk = old_ports[i].baud_base * 16; + up->port.flags = old_ports[i].flags; + up->port.hub6 = old_ports[i].hub6; + up->port.membase = old_ports[i].iomem_base; + up->port.iotype = old_ports[i].io_type; + up->port.regshift = old_ports[i].iomem_reg_shift; up->port.ops = &serial8250_pops; if (share_irqs) up->port.flags |= UPF_SHARE_IRQ; @@ -1870,6 +1883,9 @@ for (i = 0; i < UART_NR; i++) { struct uart_8250_port *up = &serial8250_ports[i]; + if (!up->port.iobase) + continue; + up->port.line = i; up->port.ops = &serial8250_pops; init_timer(&up->timer); diff -urN linux-2.5/drivers/serial/8250.h linux-maple/drivers/serial/8250.h --- linux-2.5/drivers/serial/8250.h 2004-09-30 18:31:42.000000000 +1000 +++ /dev/null 2004-10-05 22:10:47.391719208 +1000 @@ -1,71 +0,0 @@ -/* - * linux/drivers/char/8250.h - * - * Driver for 8250/16550-type serial ports - * - * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. - * - * Copyright (C) 2001 Russell King. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * $Id: 8250.h,v 1.8 2002/07/21 21:32:30 rmk Exp $ - */ - -#include - -void serial8250_get_irq_map(unsigned int *map); -void serial8250_suspend_port(int line); -void serial8250_resume_port(int line); - -struct old_serial_port { - unsigned int uart; - unsigned int baud_base; - unsigned int port; - unsigned int irq; - unsigned int flags; - unsigned char hub6; - unsigned char io_type; - unsigned char *iomem_base; - unsigned short iomem_reg_shift; -}; - -/* - * This replaces serial_uart_config in include/linux/serial.h - */ -struct serial8250_config { - const char *name; - unsigned int fifo_size; - unsigned int tx_loadsz; - unsigned int flags; -}; - -#define UART_CAP_FIFO (1 << 8) /* UART has FIFO */ -#define UART_CAP_EFR (1 << 9) /* UART has EFR */ -#define UART_CAP_SLEEP (1 << 10) /* UART has IER sleep */ - -#undef SERIAL_DEBUG_PCI - -#if defined(__i386__) && (defined(CONFIG_M386) || defined(CONFIG_M486)) -#define SERIAL_INLINE -#endif - -#ifdef SERIAL_INLINE -#define _INLINE_ inline -#else -#define _INLINE_ -#endif - -#define PROBE_RSA (1 << 0) -#define PROBE_ANY (~0) - -#define HIGH_BITS_OFFSET ((sizeof(long)-sizeof(int))*8) - -#ifdef CONFIG_SERIAL_8250_SHARE_IRQ -#define SERIAL8250_SHARE_IRQS 1 -#else -#define SERIAL8250_SHARE_IRQS 0 -#endif diff -urN linux-2.5/drivers/serial/8250_pci.c linux-maple/drivers/serial/8250_pci.c --- linux-2.5/drivers/serial/8250_pci.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250_pci.c 2004-10-06 19:05:41.301674308 +1000 @@ -25,13 +25,12 @@ #include #include #include +#include #include #include #include -#include "8250.h" - /* * Definitions for PCI support. */ diff -urN linux-2.5/drivers/serial/8250_pnp.c linux-maple/drivers/serial/8250_pnp.c --- linux-2.5/drivers/serial/8250_pnp.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250_pnp.c 2004-10-06 19:05:55.788749883 +1000 @@ -25,13 +25,12 @@ #include #include #include +#include #include #include #include -#include "8250.h" - #define UNKNOWN_DEV 0x3000 diff -urN linux-2.5/drivers/serial/au1x00_uart.c linux-maple/drivers/serial/au1x00_uart.c --- linux-2.5/drivers/serial/au1x00_uart.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/au1x00_uart.c 2004-10-06 19:07:39.461032916 +1000 @@ -40,7 +40,7 @@ #endif #include -#include "8250.h" +#include /* * Debugging. diff -urN linux-2.5/drivers/serial/serial_cs.c linux-maple/drivers/serial/serial_cs.c --- linux-2.5/drivers/serial/serial_cs.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/serial_cs.c 2004-10-06 19:07:35.059700476 +1000 @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -55,8 +56,6 @@ #include #include -#include "8250.h" - #ifdef PCMCIA_DEBUG static int pc_debug = PCMCIA_DEBUG; MODULE_PARM(pc_debug, "i"); diff -urN linux-2.5/include/linux/8250.h linux-maple/include/linux/8250.h --- /dev/null 2004-10-05 22:10:47.391719208 +1000 +++ linux-maple/include/linux/8250.h 2004-10-06 19:06:45.680713598 +1000 @@ -0,0 +1,74 @@ +/* + * linux/drivers/char/8250.h + * + * Driver for 8250/16550-type serial ports + * + * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. + * + * Copyright (C) 2001 Russell King. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * $Id: 8250.h,v 1.8 2002/07/21 21:32:30 rmk Exp $ + */ + +#include + +void serial8250_get_irq_map(unsigned int *map); +void serial8250_suspend_port(int line); +void serial8250_resume_port(int line); + +/* + * This replaces serial_uart_config in include/linux/serial.h + */ +struct serial8250_config { + const char *name; + unsigned int fifo_size; + unsigned int tx_loadsz; + unsigned int flags; +}; + +#define UART_CAP_FIFO (1 << 8) /* UART has FIFO */ +#define UART_CAP_EFR (1 << 9) /* UART has EFR */ +#define UART_CAP_SLEEP (1 << 10) /* UART has IER sleep */ + +/* + * Definition of a legacy serial port + */ +struct old_serial_port { + unsigned int uart; + unsigned int baud_base; + unsigned int port; + unsigned int irq; + unsigned int flags; + unsigned char hub6; + unsigned char io_type; + unsigned char *iomem_base; + unsigned short iomem_reg_shift; +}; + +#undef SERIAL_DEBUG_PCI + +#if defined(__i386__) && (defined(CONFIG_M386) || defined(CONFIG_M486)) +#define SERIAL_INLINE +#endif + +#ifdef SERIAL_INLINE +#define _INLINE_ inline +#else +#define _INLINE_ +#endif + +#define PROBE_RSA (1 << 0) +#define PROBE_ANY (~0) + +#define HIGH_BITS_OFFSET ((sizeof(long)-sizeof(int))*8) + +#ifdef CONFIG_SERIAL_8250_SHARE_IRQ +#define SERIAL8250_SHARE_IRQS 1 +#else +#define SERIAL8250_SHARE_IRQS 0 +#endif From clmason at gmail.com Thu Oct 7 00:58:56 2004 From: clmason at gmail.com (Chris L. Mason) Date: Wed, 6 Oct 2004 11:58:56 -0300 Subject: iMac G5 available for testing Message-ID: <610e346604100607581144298e@mail.gmail.com> Hi all, I have a new iMac G5/1.8 GHz/17-inch system that I would like to make available for testing/debugging. If you have anything you would like me to try booting, checking in open firmware, etc., let me know. Thanks, Chris From benh at kernel.crashing.org Thu Oct 7 08:36:09 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Oct 2004 08:36:09 +1000 Subject: iMac G5 available for testing In-Reply-To: <610e346604100607581144298e@mail.gmail.com> References: <610e346604100607581144298e@mail.gmail.com> Message-ID: <1097102169.8448.14.camel@gaston> On Thu, 2004-10-07 at 00:58, Chris L. Mason wrote: > Hi all, > > I have a new iMac G5/1.8 GHz/17-inch system that I would like to make > available for testing/debugging. If you have anything you would like > me to try booting, checking in open firmware, etc., let me know. We have ordered one here. It will require some reverse engineering work since it's a new rev of the chipset and the good old PMU chip was finally, years later, replaced by a new "SMU" that is totally undocumented of course... Ben. From clmason at gmail.com Thu Oct 7 09:12:04 2004 From: clmason at gmail.com (Chris L. Mason) Date: Wed, 6 Oct 2004 20:12:04 -0300 Subject: iMac G5 available for testing In-Reply-To: <1097102169.8448.14.camel@gaston> References: <610e346604100607581144298e@mail.gmail.com> <1097102169.8448.14.camel@gaston> Message-ID: <610e34660410061612379af1c8@mail.gmail.com> On Thu, 07 Oct 2004 08:36:09 +1000, Benjamin Herrenschmidt wrote: > > > On Thu, 2004-10-07 at 00:58, Chris L. Mason wrote: > > Hi all, > > > > I have a new iMac G5/1.8 GHz/17-inch system that I would like to make > > available for testing/debugging. If you have anything you would like > > me to try booting, checking in open firmware, etc., let me know. > > We have ordered one here. It will require some reverse engineering work > since it's a new rev of the chipset and the good old PMU chip was finally, > years later, replaced by a new "SMU" that is totally undocumented of course... > Ah, wonderful. :) The good news is that with tgall's latest debug kernel, I do at least get to boot as far the ata drive detection before it freezes, although it gets kernel error too right after the tux logo. Here's an image of my boot attempt: http://homepage.mac.com/clmason/imacboot.jpg (Sorry for the bad quality of the image) Segher also told me how to use the romgrabber. I have a copy up at: http://homepage.mac.com/clmason/OF-5.2.2f1-2004-08-18 Chris From david at gibson.dropbear.id.au Thu Oct 7 11:01:54 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 7 Oct 2004 11:01:54 +1000 Subject: mapping memory in 0xb space In-Reply-To: References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: <20041007010154.GC25012@zax> On Tue, Oct 05, 2004 at 12:45:47PM -0500, Igor Grobman wrote: > One more followup on this issue, since I do have the base code working > now. The problem was in the fact that do_slb_bolted code sets the large > page bit in the SLB entry, but my code (and particularly hpte_insert code) > did not insert a proper large page mapping. > > > On Fri, 1 Oct 2004, David Gibson wrote: > > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > As for why I thought 0xbff would work, I reasoned that > > > since the highest bits are masked out in get_kernel_vsid(), and since > > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > > that is the same as some other VSID in 0xb region. However, I did not > > > consider the bug in do_slb_bolted that you describe below. > > > > Yes, with that bug the collision can be with a segment anywhere, not > > just in the 0xb region. > > > > I am not convinced anymore. The lower 36 bits of the ordinal are still > the same in do_slb_bolted and get_kernel_vsid. Multiplying the ordinal > by the 36-bit randomizer should produce the same lower 36 bits whether or > not the upper bits are different. do_slb_bolted eventually clears the > upper 28 bits, before using the VSID. I no longer think there can be > a conflict outside the 0xb region. Is my reasoning correct? Ah, yes, I think it is. Sorry, I guess I wasn't thinking very clearly when I decided the collisions could be anywhere. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Thu Oct 7 18:30:09 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Oct 2004 18:30:09 +1000 Subject: Gothic horrors in pci_dn.c Message-ID: <1097137808.4894.73.camel@gaston> Hi ! To all those who had to deal with the guts of the PCI layer on ppc64, I'd like your comments about these and what do you think I may break. Currently, the code in pci_dn.c does 2 things articulated around a single function. That function is traverse_pci_devices() and is supposed to traverse the PCI tree exposed by Open Firmware and call back a function passed as an argument for each node in the tree. The 2 things it does are - Setting up the "devfn" and "busno" fields of the device nodes in the tree in an initial traversal pass at boot - "Finding" a device node for a given pci_dev at any time However, the current code does a number of assumptions and is bogus in many cases. Among the issues are: - The tree traversal goes all the way down the tree only skipping things that don't have a "class code". That means potentially walking on subtrees of a PCI device that aren't PCI themselves (USB ? FireWire ?) and we have no guarantee that those busses have no "class-code" property, though we are sure to misinterpret anything we find here. - We try to manipulate host bridge nodes as if they were PCI devices, which leads us to various funny and totally bogus special cases. First, in update_dn_pci_info(), where we have an "intersting" (at least) heuristic to find out if a node if a host bridge or not, with an horrible special case for avoiding setting the devfn 0 on U3 on blades, and then we "use" those devfn and busno of the host bridge property in is_devfn_node() later on when trying to match which is why we have to do the above bogus workaround. - Our firmware (and Apple's too in some cases) is broken in the sense that it doesn't show the host bridge in the tree as a PCI device. Host bridges that are themselves visible as devices on their own PCI bus should have an additional node in the PCI domain named "host" that represent them. The solution to this however is very simple, but I need to make sure I won't break anything else by doing so. It's based on a few facts: - The "node" of the host bridge is _NOT_ a PCI node, and thus should not be traversed by traverse_pci_devices(). This is very easy to do without any assumption due to the way this function works, just remove 2 lines near the beginning before the for loop. - The result for the update_dn_pci_info() pass is that we can rip off the workaround completely. busno and devfn in the host bridge node are undefined and that how they should be as they won't be traversed. There is no "driver" for the host bridge that should make use of them. - Same thing with is_devfn_node(). - We initialize "sysdata" of all pci_dev to point the the host bridge by default. So if the host bridge happens to have an associated pci_dev, and no "specific" node (as explained above), then we'll point to the root node of that pci tree which is exactly what we want, cool ! - Now the only remaining problem is the test if (dn->devfn == dev->devfn && dn->busno == (dev->bus->number&0xff)) Which will result in incorrect result if the host bridge has undefined (and typically 0) values in devfn and busno fields and the device we are looking for happens to really be 0:00.0. This is fixed by forcing those fields on all PHB nodes to -1. (No special U3 case, all of them). Here's a patch (untested, it's getting late here) implementing those, I need to know if it will work at all. Comments welcome :) Note to Anton & Milton: Pretty much nothing relies anymore on the device nodes for PCI devices to exist. The only mandatory ones are PHBs, but you can easily statically lay them out in a static device-tree blob for BM. By default, all pci_dev point to the PHB. I have a couple of fixes coming in for u3_iommu to properly setup iommu_table for PHB nodes (it forgot to do it) and I confirm it works with no OF nodes for the devices themselves. Config space accesses never need the OF node neither except when you have RTAS, but then you don't care since you have real nodes for everything. I added a simple helper to my tree (will be pushed after 2.6.9) that gives you the pci_controller* from the pci_dev* without doing a full device-tree walk, and I use that for pmac & maple. You should do the same for PM. Ben. ===== arch/ppc64/kernel/pci_dn.c 1.18 vs edited ===== --- 1.18/arch/ppc64/kernel/pci_dn.c 2004-10-05 17:24:47 +10:00 +++ edited/arch/ppc64/kernel/pci_dn.c 2004-10-07 18:35:41 +10:00 @@ -46,28 +46,13 @@ { struct pci_controller *phb = data; u32 *regs; - char *device_type = get_property(dn, "device_type", NULL); - char *model; dn->phb = phb; - if (device_type && (strcmp(device_type, "pci") == 0) && - (get_property(dn, "class-code", NULL) == 0)) { - /* special case for PHB's. Sigh. */ - regs = (u32 *)get_property(dn, "bus-range", NULL); - dn->busno = regs[0]; - - model = (char *)get_property(dn, "model", NULL); - if (model && strstr(model, "U3")) - dn->devfn = -1; - else - dn->devfn = 0; /* assumption */ - } else { - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - dn->busno = (regs[0] >> 16) & 0xff; - dn->devfn = (regs[0] >> 8) & 0xff; - } + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + dn->busno = (regs[0] >> 16) & 0xff; + dn->devfn = (regs[0] >> 8) & 0xff; } return NULL; } @@ -96,20 +81,25 @@ struct device_node *dn, *nextdn; void *ret; - if (pre && ((ret = pre(start, data)) != NULL)) - return ret; + /* We started with a phb, iterate all childs */ for (dn = start->child; dn; dn = nextdn) { + u32 *classp, class; + nextdn = NULL; - if (get_property(dn, "class-code", NULL)) { - if (pre && ((ret = pre(dn, data)) != NULL)) - return ret; - if (dn->child) - /* Depth first...do children */ - nextdn = dn->child; - else if (dn->sibling) - /* ok, try next sibling instead. */ - nextdn = dn->sibling; - } + classp = (u32 *)get_property(dn, "class-code", NULL); + class = classp ? *classp : 0; + + if (pre && ((ret = pre(dn, data)) != NULL)) + return ret; + + /* If we are a PCI bridge, go down */ + if (dn->child && (class >> 8) == PCI_CLASS_BRIDGE_PCI && + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS) + /* Depth first...do children */ + nextdn = dn->child; + else if (dn->sibling) + /* ok, try next sibling instead. */ + nextdn = dn->sibling; if (!nextdn) { /* Walk up to next valid sibling. */ do { @@ -123,21 +113,6 @@ return NULL; } -/* - * Same as traverse_pci_devices except this does it for all phbs. - */ -static void *traverse_all_pci_devices(traverse_func pre) -{ - struct pci_controller *phb, *tmp; - void *ret; - - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) - if ((ret = traverse_pci_devices(phb->arch_data, pre, phb)) - != NULL) - return ret; - return NULL; -} - /* * Traversal func that looks for a value. @@ -147,6 +122,7 @@ { int busno = ((unsigned long)data >> 8) & 0xff; int devfn = ((unsigned long)data) & 0xff; + return ((devfn == dn->devfn) && (busno == dn->busno)) ? dn : NULL; } @@ -173,10 +149,8 @@ phb_dn = phb->arch_data; dn = traverse_pci_devices(phb_dn, is_devfn_node, (void *)searchval); - if (dn) { + if (dn) dev->sysdata = dn; - /* ToDo: call some device init hook here */ - } return dn; } EXPORT_SYMBOL(fetch_dev_dn); @@ -188,8 +162,16 @@ */ void __init pci_devs_phb_init(void) { + struct pci_controller *phb, *tmp; + /* This must be done first so the device nodes have valid pci info! */ - traverse_all_pci_devices(update_dn_pci_info); + list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { + struct device_node * dn = (struct device_node *) phb->arch_data; + /* PHB nodes themselves must not match */ + dn->devfn = dn->busno = -1; + dn->phb = phb; + traverse_pci_devices(phb->arch_data, update_dn_pci_info, phb); + } } From hollisb at us.ibm.com Thu Oct 7 20:40:27 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Thu, 7 Oct 2004 10:40:27 +0000 Subject: [patch] HVSI udbg Message-ID: <200410071040.27907.hollisb@us.ibm.com> This fixes a long-standing omission in HVSI support: dropping to xmon would basically hang your system, as there was no udbg code to read/write chars from xmon. It's based on the existing "LP" routines. Could we get this pushed upstream soon? -- Hollis Blanchard IBM Linux Technology Center ===== arch/ppc64/kernel/pSeries_lpar.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/pSeries_lpar.c Tue Sep 21 23:40:30 2004 +++ edited/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 10:52:23 2004 @@ -59,6 +59,74 @@ int vtermno; /* virtual terminal# for udbg */ +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long)))) +static void udbg_hvsi_putc(unsigned char c) +{ + /* packet's seqno isn't used anyways */ + uint8_t packet[] __ALIGNED__ = { 0xff, 5, 0, 0, c }; + int rc; + + if (c == '\n') + udbg_hvsi_putc('\r'); + + do { + rc = plpar_put_term_char(vtermno, sizeof(packet), packet); + } while (rc == H_Busy); +} + +static long hvsi_udbg_buf_len; +static uint8_t hvsi_udbg_buf[256]; + +static int udbg_hvsi_getc_poll(void) +{ + unsigned char ch; + int rc, i; + + if (hvsi_udbg_buf_len == 0) { + rc = plpar_get_term_char(vtermno, &hvsi_udbg_buf_len, hvsi_udbg_buf); + if (rc != H_Success || hvsi_udbg_buf[0] != 0xff) { + /* bad read or non-data packet */ + hvsi_udbg_buf_len = 0; + } else { + /* remove the packet header */ + for (i = 4; i < hvsi_udbg_buf_len; i++) + hvsi_udbg_buf[i-4] = hvsi_udbg_buf[i]; + hvsi_udbg_buf_len -= 4; + } + } + + if (hvsi_udbg_buf_len <= 0 || hvsi_udbg_buf_len > 256) { + /* no data ready */ + hvsi_udbg_buf_len = 0; + return -1; + } + + ch = hvsi_udbg_buf[0]; + /* shift remaining data down */ + for (i = 1; i < hvsi_udbg_buf_len; i++) { + hvsi_udbg_buf[i-1] = hvsi_udbg_buf[i]; + } + hvsi_udbg_buf_len--; + + return ch; +} + +static unsigned char udbg_hvsi_getc(void) +{ + int ch; + for (;;) { + ch = udbg_hvsi_getc_poll(); + if (ch == -1) { + /* This shouldn't be needed...but... */ + volatile unsigned long delay; + for (delay=0; delay < 2000000; delay++) + ; + } else { + return ch; + } + } +} + static void udbg_putcLP(unsigned char c) { char buf[16]; @@ -167,11 +235,15 @@ ppc_md.udbg_getc_poll = udbg_getc_pollLP; found = 1; } - } else { - /* XXX implement udbg_putcLP_vtty for hvterm-protocol1 case */ - printk(KERN_WARNING "%s doesn't speak hvterm1; " - "can't print udbg messages\n", - stdout_node->full_name); + } else if (device_is_compatible(stdout_node, "hvterm-protocol")) { + termno = (u32 *)get_property(stdout_node, "reg", NULL); + if (termno) { + vtermno = termno[0]; + ppc_md.udbg_putc = udbg_hvsi_putc; + ppc_md.udbg_getc = udbg_hvsi_getc; + ppc_md.udbg_getc_poll = udbg_hvsi_getc_poll; + found = 1; + } } } else if (strncmp(name, "serial", 6)) { /* XXX fix ISA serial console */ From johnrose at austin.ibm.com Fri Oct 8 03:54:21 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 07 Oct 2004 12:54:21 -0500 Subject: [PATCH] create iommu_free_table() Message-ID: <1097171661.7087.1.camel@sinatra.austin.ibm.com> The patch below creates iommu_free_table(). Iommu tables are not currently freed in PPC64. This could cause a memory leak for DLPAR of an EADS slot. The function verifies that there are no outstanding TCE entries for the range of the table before freeing it. I added a call to iommu_free_table() to the code that dynamically removes a device node. This should be fairly symmetrical with the table allocation, which happens during dynamic addition of a device node. Comments welcome. Thanks- John Signed-off-by: John Rose diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c --- a/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 +++ b/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 @@ -412,6 +412,38 @@ dn->iommu_table = iommu_init_table(tbl); } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} void iommu_setup_pSeries(void) { diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 +++ b/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 @@ -1818,6 +1818,9 @@ return -EBUSY; } + if (np->iommu_table) + iommu_free_table(np); + write_lock(&devtree_lock); OF_MARK_STALE(np); remove_node_proc_entries(np); diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h --- a/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 +++ b/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 @@ -113,6 +113,9 @@ /* Creates table for an individual device node */ extern void iommu_devnode_init(struct device_node *dn); +/* Frees table for an individual device node */ +extern void iommu_free_table(struct device_node *dn); + #endif /* CONFIG_PPC_MULTIPLATFORM */ #ifdef CONFIG_PPC_ISERIES From linas at austin.ibm.com Fri Oct 8 04:13:35 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 7 Oct 2004 13:13:35 -0500 Subject: [linas: [PATCH] PPC64: crash during firmware flash update] Message-ID: <20041007181335.GA21633@austin.ibm.com> Sent to the wrong mailing list :) ----- Forwarded message from linas ----- To: paulus at samba.org, anton at samba.org Cc: linuxppc64-dev at lists.linuxppc.org Subject: [PATCH] PPC64: crash during firmware flash update Race conditions during system shutdown after a firmware flash can sometimes lead to an invalid pointer deref (deref to freed memory). This patch fixes this. In addition, it makes sure that the proc entries created by the firmware flash module are removed when the module is unloaded. Signed-off-by: Linas Vepstas --- a/arch/ppc64/kernel/rtas_flash.c.orig 2004-09-20 11:59:18.000000000 -0500 +++ b/arch/ppc64/kernel/rtas_flash.c 2004-10-06 11:19:45.000000000 -0500 @@ -562,6 +562,7 @@ static int validate_flash_release(struct validate_flash(args_buf); } + /* The matching atomic_inc was in rtas_excl_open() */ atomic_dec(&dp->count); return 0; @@ -572,7 +573,8 @@ static void remove_flash_pde(struct proc if (dp) { if (dp->data != NULL) kfree(dp->data); - remove_proc_entry(dp->name, NULL); + dp->owner = NULL; + remove_proc_entry(dp->name, dp->parent); } } ----- End forwarded message ----- From benh at kernel.crashing.org Fri Oct 8 12:27:07 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 08 Oct 2004 12:27:07 +1000 Subject: Gothic horrors in pci_dn.c In-Reply-To: <1097137808.4894.73.camel@gaston> References: <1097137808.4894.73.camel@gaston> Message-ID: <1097202427.846.102.camel@gaston> On Thu, 2004-10-07 at 18:30, Benjamin Herrenschmidt wrote: > + /* If we are a PCI bridge, go down */ > + if (dn->child && (class >> 8) == PCI_CLASS_BRIDGE_PCI && > + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS) > + /* Depth first...do children */ > + nextdn = dn->child; Of course, that should have been + /* If we are a PCI bridge, go down */ + if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI || + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS)) + /* Depth first...do children */ + nextdn = dn->child; Ben. From paulus at samba.org Fri Oct 8 10:44:32 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 8 Oct 2004 10:44:32 +1000 Subject: [patch] HVSI udbg In-Reply-To: <200410071040.27907.hollisb@us.ibm.com> References: <200410071040.27907.hollisb@us.ibm.com> Message-ID: <16741.58096.932315.526999@cargo.ozlabs.ibm.com> Hollis, > --- 1.41/arch/ppc64/kernel/pSeries_lpar.c Tue Sep 21 23:40:30 2004 > +++ edited/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 10:52:23 2004 > @@ -59,6 +59,74 @@ > > int vtermno; /* virtual terminal# for udbg */ > > +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long)))) > +static void udbg_hvsi_putc(unsigned char c) > +{ > + /* packet's seqno isn't used anyways */ > + uint8_t packet[] __ALIGNED__ = { 0xff, 5, 0, 0, c }; > + int rc; All the tabs in the patch seem to have got changed to spaces. Is it your mailer or is the list software doing something bad? Paul. From arnd at arndb.de Fri Oct 8 16:22:57 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 8 Oct 2004 08:22:57 +0200 Subject: [patch] HVSI udbg In-Reply-To: <16741.58096.932315.526999@cargo.ozlabs.ibm.com> References: <200410071040.27907.hollisb@us.ibm.com> <16741.58096.932315.526999@cargo.ozlabs.ibm.com> Message-ID: <200410080823.03298.arnd@arndb.de> On Freedag 08 Oktober 2004 02:44, Paul Mackerras wrote: > All the tabs in the patch seem to have got changed to spaces. Is it > your mailer or is the list software doing something bad? It's the latest kmail (or Qt) update from Debian Sarge that broke this. I have the same problem here. Attachments appear to be still working. http://bugs.kde.org/show_bug.cgi?id=90688 Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041008/33488d20/attachment.pgp From hollisb at us.ibm.com Fri Oct 8 22:42:13 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 8 Oct 2004 12:42:13 +0000 Subject: [patch 2] HVSI udbg Message-ID: <200410081242.13486.hollisb@us.ibm.com> This patch (resent as attachment due to mailer troubles) adds support for the udbg early console interfaces when using an HVSI console. -- Hollis Blanchard IBM Linux Technology Center -------------- next part -------------- A non-text attachment was scrubbed... Name: hvsi-udbg.diff Type: text/x-diff Size: 2552 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041008/663aa099/attachment.diff From david at gibson.dropbear.id.au Mon Oct 11 12:11:46 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Mon, 11 Oct 2004 12:11:46 +1000 Subject: [PPC64] xmon sparse cleanups In-Reply-To: <16738.31164.464250.638432@cargo.ozlabs.ibm.com> References: <20041005064255.GF3695@zax> <16738.31164.464250.638432@cargo.ozlabs.ibm.com> Message-ID: <20041011021146.GA1556@zax> On Tue, Oct 05, 2004 at 08:38:52PM +1000, Paul Mackerras wrote: > David Gibson writes: > > > Andrew, please apply: > > > > This patch removes many sparse warnings from the xmon code. Mostly > > K&R function declarations and 0-instead-of-NULLs. > > The trouble with this patch is that it makes ppc-opc.c diverge from > the version in binutils, which is where it came from. I'd rather keep > it as close as possible to that version. I have no problem with the > changes to the other files. A corresponding patch has now gone into binutils CVS. As it happens there has already been a certain amount of divergence between the versions, presumably because the kernel copy hasn't been updated from binutils in quite a while. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From schwab at suse.de Tue Oct 12 06:11:42 2004 From: schwab at suse.de (Andreas Schwab) Date: Mon, 11 Oct 2004 22:11:42 +0200 Subject: 2.6.9-rc4: oops during ide probing Message-ID: I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: ide-pmac: cannot find MacIO node for Kauai ATA interface ide0: Found Apple OHare ATA controller, bus ID 0, irq 0 Oops: Kernel access of bad area, sig: 11 [#1] NIP [...] .ide_mm_inb+0x0/0x14 LR [...] .ide_wait_not_busy+0x98/0xf0 (Sorry, I couldn't capture the whole oops.) I've tried also with the patch from , but that didn't help. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From pbadari at us.ibm.com Tue Oct 12 08:07:32 2004 From: pbadari at us.ibm.com (Badari Pulavarty) Date: 11 Oct 2004 15:07:32 -0700 Subject: 2.6.9-rc4-mm1 doesn't boot on my Power3 box Message-ID: <1097532452.12861.398.camel@dyn318077bld.beaverton.ibm.com> Hi, My Power3 box doesn't boot with 2.6.9-rc4-mm1. I get following OOPs. (2.6.9-rc3-mm3 also same issue). Any fixes ? Thanks, Badari kernel BUG in __flush_tlb_pending at arch/ppc64/mm/tlb.c:125! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA PSERIES NIP: C00000000003E344 XER: 0000000020000000 LR: C000000000014DA0 REGS: c000000001963550 TRAP: 0700 Not tainted (2.6.9-rc4-mm1) MSR: a000000000023032 EE: 0 PR: 0 FP: 1 ME: 1 IR/DR: 11 TASK: c00000003f7577e0[1396] 'hotplug' THREAD: c000000001960000 CPU: 0 GPR00: 0000000004000000 C0000000019637D0 C0000000005D29F0 C0000000006B70A0 GPR04: C00000003FB597E0 000000028904198B C0000000005D1008 C0000000004583B0 GPR08: 0000000000260F00 C000000001960000 C0000000005D1008 0000000000000002 GPR12: 0000000022222482 C0000000004B9900 C00000003F757A80 00000030CAC526D0 GPR16: C0000000005D1008 000000000065E4C0 0000000000000000 C00000000F052500 GPR20: C0000000006BAD88 C000000001963990 C00000003FB597E0 C0000000006B9B38 GPR24: C000000001945200 C00000003F7577E0 0000000018221613 C00000003FB597E0 GPR28: 0000000000001260 C00000003F7577E0 0000000000000000 C0000000006B70A0 NIP [c00000000003e344] .__flush_tlb_pending+0x38/0x150 LR [c000000000014da0] .__switch_to+0xb4/0xd8 Call Trace: [c0000000019637d0] [00000000f7fad210] 0xf7fad210 (unreliable) --- Exception: 901 at .copy_page_range+0x218/0x61c LR = .copy_page_range+0x160/0x61c [c000000001963890] [c000000000014da0] .__switch_to+0xb4/0xd8 (unreliable) [c000000001963920] [c00000000039a5dc] .schedule+0x38c/0xc3c [c000000001963a40] [c00000000039b028] .cond_resched+0x4c/0x80 [c000000001963ac0] [c000000000096eb0] .copy_page_range+0x29c/0x61c [c000000001963bd0] [c00000000004fecc] .copy_process+0x8c0/0x148c [c000000001963ce0] [c000000000050b38] .do_fork+0xa0/0x25c [c000000001963dc0] [c000000000014680] .sys_clone+0x5c/0x74 [c000000001963e30] [c000000000010208] .ppc_clone+0x8/0xc From dwmw2 at infradead.org Tue Oct 12 23:51:49 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 12 Oct 2004 14:51:49 +0100 Subject: cond_syscall() and new ABI. Message-ID: <1097589108.318.425.camel@hades.cambridge.redhat.com> This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: /* * "Conditional" syscalls * * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); Two options -- either we ditch older toolchains (before 2002-03-01 probably), by switching to what we say in the comment, or we introduce an ifdef to choose whether to include the '.' in the symbol names... Both attached. Someone who cares can choose one :) -- dwmw2 -------------- next part -------------- ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:49:48 2004 @@ -468,7 +468,11 @@ * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 3) +#define cond_syscall(x) asm(".weak\t" #x "\n\t.set\t" #x ",sys_ni_syscall"); +#else #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); +#endif #endif /* __KERNEL__ */ -------------- next part -------------- ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:48:08 2004 @@ -468,7 +468,7 @@ * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); #endif /* __KERNEL__ */ From hch at lst.de Wed Oct 13 00:26:27 2004 From: hch at lst.de (Christoph Hellwig) Date: Tue, 12 Oct 2004 16:26:27 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <1097589108.318.425.camel@hades.cambridge.redhat.com> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> Message-ID: <20041012142627.GA19091@lst.de> On Tue, Oct 12, 2004 at 02:51:49PM +0100, David Woodhouse wrote: > This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: > > /* > * "Conditional" syscalls > * > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > Two options -- either we ditch older toolchains (before 2002-03-01 > probably), by switching to what we say in the comment, or we introduce > an ifdef to choose whether to include the '.' in the symbol names... > > Both attached. Someone who cares can choose one :) > > -- > dwmw2 > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 > +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:49:48 2004 > @@ -468,7 +468,11 @@ > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 3) > +#define cond_syscall(x) asm(".weak\t" #x "\n\t.set\t" #x ",sys_ni_syscall"); > +#else > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); this is broken. Gcc 3.4 doesn't even have support for the non-dotted ABI, nevermind uses it by default. > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 > +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:48:08 2004 > @@ -468,7 +468,7 @@ > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); this one otoh makes lots of sense - it's what most architectures use. From moilanen at austin.ibm.com Wed Oct 13 00:56:19 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:56:19 -0500 Subject: [PATCH 1/2][RFC] PPC64 no-exec support for user space In-Reply-To: <20041012095248.2b6418c4@localhost> References: <20041012095248.2b6418c4@localhost> Message-ID: <20041012095619.63a38530@localhost> Here is no-exec support for user space. This patch also includes base no-exec support. Once again it requires Ben's signal trampoline in vdso piece. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/head.S~nx-user-ppc64 arch/ppc64/kernel/head.S --- linux-2.6-bk/arch/ppc64/kernel/head.S~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S Thu Oct 7 15:23:52 2004 @@ -35,6 +35,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_ISERIES @@ -879,6 +880,7 @@ InstructionAccess_common: ld r3,_NIP(r1) andis. r4,r12,0x5820 li r5,0x400 + ori r4,r4,_PAGE_EXEC b .do_hash_page /* Try to handle as hpte fault */ .align 7 @@ -964,11 +966,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB) * accessing a userspace segment (even from the kernel). We assume * kernel addresses always have the high bit set. */ - rlwinm r4,r4,32-23,29,29 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */ rotldi r0,r3,15 /* Move high bit into MSR_PR posn */ orc r0,r12,r0 /* MSR_PR | ~high_bit */ rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ - ori r4,r4,1 /* add _PAGE_PRESENT */ /* * On iSeries, we soft-disable interrupts here, then diff -puN arch/ppc64/mm/fault.c~nx-user-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Thu Oct 7 15:23:52 2004 @@ -92,6 +92,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long code = SEGV_MAPERR; unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); + unsigned long is_exec = trap == 0x400; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -191,16 +192,19 @@ int do_page_fault(struct pt_regs *regs, good_area: code = SEGV_ACCERR; + if (is_exec) { + /* protection fault */ + if (error_code & 0x08000000) + goto bad_area; + if (!(vma->vm_flags & VM_EXEC)) + goto bad_area; /* a write */ - if (is_write) { + } else if (is_write) { if (!(vma->vm_flags & VM_WRITE)) goto bad_area; /* a read */ } else { - /* protection fault */ - if (error_code & 0x08000000) - goto bad_area; - if (!(vma->vm_flags & (VM_READ | VM_EXEC))) + if (!(vma->vm_flags & VM_READ)) goto bad_area; } diff -puN arch/ppc64/mm/hash_low.S~nx-user-ppc64 arch/ppc64/mm/hash_low.S --- linux-2.6-bk/arch/ppc64/mm/hash_low.S~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S Thu Oct 7 15:23:52 2004 @@ -89,7 +89,7 @@ _GLOBAL(__hash_page) /* Prepare new PTE value (turn access RW into DIRTY, then * add BUSY,HASHPTE and ACCESSED) */ - rlwinm r30,r4,5,24,24 /* _PAGE_RW -> _PAGE_DIRTY */ + rlwinm r30,r4,32-9+7,31-7,31-7 /* _PAGE_RW -> _PAGE_DIRTY */ or r30,r30,r31 ori r30,r30,_PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE /* Write the linux PTE atomically (setting busy) */ @@ -112,11 +112,11 @@ _GLOBAL(__hash_page) rldicl r5,r5,0,25 /* vsid & 0x0000007fffffffff */ rldicl r0,r3,64-12,48 /* (ea >> 12) & 0xffff */ xor r28,r5,r0 - - /* Convert linux PTE bits into HW equivalents - */ - andi. r3,r30,0x1fa /* Get basic set of flags */ - rlwinm r0,r30,32-2+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ + + /* Convert linux PTE bits into HW equivalents */ + andi. r3,r30,0x1fe /* Get basic set of flags */ + xori r3,r3,HW_NO_EXEC /* _PAGE_EXEC -> NOEXEC */ + rlwinm r0,r30,32-9+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ rlwinm r4,r30,32-7+1,30,30 /* _PAGE_DIRTY -> _PAGE_USER (r4) */ and r0,r0,r4 /* _PAGE_RW & _PAGE_DIRTY -> r0 bit 30 */ andc r0,r30,r0 /* r0 = pte & ~r0 */ diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c Thu Oct 7 15:23:52 2004 @@ -89,8 +89,11 @@ static int set_brk(unsigned long start, end = ELF_PAGEALIGN(end); if (end > start) { unsigned long addr = do_brk(start, end - start); + if (BAD_ADDR(addr)) return addr; + + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); } current->mm->start_brk = current->mm->brk = end; return 0; diff -puN include/asm-ppc64/elf.h~nx-user-ppc64 include/asm-ppc64/elf.h --- linux-2.6-bk/include/asm-ppc64/elf.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/elf.h Thu Oct 7 15:23:52 2004 @@ -226,6 +226,13 @@ do { \ else if (current->personality != PER_LINUX32) \ set_personality(PER_LINUX); \ } while (0) + +/* + * An executable for which elf_read_implies_exec() returns TRUE will + * have the READ_IMPLIES_EXEC personality flag set automatically. + */ +#define elf_read_implies_exec_binary(ex, have_pt_gnu_stack) (!(have_pt_gnu_stack)) + #endif /* diff -puN include/asm-ppc64/page.h~nx-user-ppc64 include/asm-ppc64/page.h --- linux-2.6-bk/include/asm-ppc64/page.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h Thu Oct 7 15:23:52 2004 @@ -233,8 +233,25 @@ extern int page_is_ram(unsigned long pfn #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) #endif /* __KERNEL__ */ #endif /* _PPC64_PAGE_H */ diff -puN include/asm-ppc64/pgtable.h~nx-user-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h Thu Oct 7 15:23:52 2004 @@ -86,24 +86,25 @@ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ #define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ -#define _PAGE_RW 0x0004 /* software: user write access allowed */ +#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ #define _PAGE_GUARDED 0x0008 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ #define _PAGE_WRITETHRU 0x0040 /* W: cache write-through */ #define _PAGE_DIRTY 0x0080 /* C: page changed */ #define _PAGE_ACCESSED 0x0100 /* R: page referenced */ -#define _PAGE_EXEC 0x0200 /* software: i-cache coherence required */ +#define _PAGE_RW 0x0200 /* software: user write access allowed */ #define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */ #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ #define _PAGE_GROUP_IX 0x7000 /* software: HPTE index within group */ /* Bits 0x7000 identify the index within an HPT Group */ #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_SECONDARY | _PAGE_GROUP_IX) + /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_HPTEFLAGS) +#define _PAGE_CHG_MASK (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | _PAGE_WRITETHRU | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HPTEFLAGS | PAGE_MASK) #define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT) @@ -119,31 +120,32 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) -#define PAGE_KERNEL_CI __pgprot(_PAGE_PRESENT | _PAGE_ACCESSED | \ - _PAGE_WRENABLE | _PAGE_NO_CACHE | _PAGE_GUARDED) /* - * The PowerPC can only do execute protection on a segment (256MB) basis, - * not on a page basis. So we consider execute permission the same as read. + * POWER4 and newer have per page execute protection, older chips can only + * do this on a segment (256MB) basis. + * * Also, write permissions imply read permissions. * This is the closest we can get.. + * + * Note due to the way vm flags are laid out, the bits are XWR */ #define __P000 PAGE_NONE -#define __P001 PAGE_READONLY_X +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY -#define __P011 PAGE_COPY_X -#define __P100 PAGE_READONLY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_X #define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY +#define __P110 PAGE_COPY_X #define __P111 PAGE_COPY_X #define __S000 PAGE_NONE -#define __S001 PAGE_READONLY_X +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED_X -#define __S100 PAGE_READONLY +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_X #define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED +#define __S110 PAGE_SHARED_X #define __S111 PAGE_SHARED_X #ifndef __ASSEMBLY__ @@ -200,7 +202,8 @@ int hash_huge_page(struct mm_struct *mm, }) #define pte_modify(_pte, newprot) \ - (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | pgprot_val(newprot))) + (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | \ + (pgprot_val(newprot) & ~_PAGE_CHG_MASK))) #define pte_none(pte) ((pte_val(pte) & ~_PAGE_HPTEFLAGS) == 0) #define pte_present(pte) (pte_val(pte) & _PAGE_PRESENT) @@ -270,9 +273,6 @@ static inline int pte_dirty(pte_t pte) { static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;} static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} -static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } -static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } - static inline pte_t pte_rdprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_USER; return pte; } static inline pte_t pte_exprotect(pte_t pte) { @@ -420,7 +420,7 @@ static inline void set_pte(pte_t *ptep, static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry, int dirty) { unsigned long bits = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); unsigned long old, tmp; __asm__ __volatile__( diff -puN arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 arch/ppc64/mm/hugetlbpage.c --- linux-2.6-bk/arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c Thu Oct 7 15:23:52 2004 @@ -29,8 +29,8 @@ /* HugePTE layout: * - * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 - * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH RW - HG=1 + * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 + * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH !EXEC RW HG=1 */ #define HUGEPTE_SHIFT 15 @@ -41,7 +41,8 @@ #define _HUGEPAGE_GROUP_IX 0x000000e0 #define _HUGEPAGE_HPTEFLAGS (_HUGEPAGE_HASHPTE | _HUGEPAGE_SECONDARY | \ _HUGEPAGE_GROUP_IX) -#define _HUGEPAGE_RW 0x00000004 +#define _HUGEPAGE_RW 0x00000002 +#define _HUGEPAGE_EXEC 0x00000004 /* this is inverted */ typedef struct {unsigned int val;} hugepte_t; #define hugepte_val(hugepte) ((hugepte).val) @@ -722,6 +723,7 @@ int hash_huge_page(struct mm_struct *mm, hugepte_t *ptep; unsigned long va, vpn; int is_write; + int is_exec; hugepte_t old_pte, new_pte; unsigned long hpteflags, prpn, flags; long slot; @@ -752,6 +754,10 @@ int hash_huge_page(struct mm_struct *mm, if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW))) return 1; + is_exec = access & _PAGE_EXEC; + if (unlikely(is_exec && !(hugepte_val(*ptep) & _HUGEPAGE_EXEC))) + return 1; + /* * At this point, we have a pte (old_pte) which can be used to build * or update an HPTE. There are 2 cases: @@ -769,7 +775,10 @@ int hash_huge_page(struct mm_struct *mm, old_pte = *ptep; new_pte = old_pte; - hpteflags = 0x2 | (! (hugepte_val(new_pte) & _HUGEPAGE_RW)); + /* _HUGEPAGE_EXEC -> HW_NO_EXEC since it's inverted */ + hpteflags = (hugepte_val(new_pte) & _HUGEPAGE_RW) | + (hugepte_val(new_pte) ^ HW_NO_EXEC) | + (!(hugepte_val(new_pte) & _HUGEPAGE_RW)); /* Check if pte already has an hpte (case 2) */ if (unlikely(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE)) { diff -L arch/ppc64/kernel/pSeries_htab.c -puN /dev/null /dev/null diff -puN arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 arch/ppc64/kernel/pSeries_lpar.c --- linux-2.6-bk/arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 15:23:52 2004 @@ -384,7 +384,7 @@ static void pSeries_lpar_hpte_updatebolt slot = pSeries_lpar_hpte_find(vpn); BUG_ON(slot == -1); - flags = newpp & 3; + flags = newpp & 7; lpar_rc = plpar_pte_protect(flags, slot, 0); BUG_ON(lpar_rc != H_Success); diff -puN arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 arch/ppc64/kernel/iSeries_htab.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c Thu Oct 7 15:23:52 2004 @@ -144,6 +144,10 @@ static long iSeries_hpte_updatepp(unsign HvCallHpt_get(&hpte, slot); if ((hpte.dw0.dw0.avpn == avpn) && (hpte.dw0.dw0.v)) { + /* + * Hypervisor expects bit's as NPPP, which is + * different from how they are mapped in our PP. + */ HvCallHpt_setPp(slot, (newpp & 0x3) | ((newpp & 0x4) << 1)); iSeries_hunlock(slot); return 0; _ From moilanen at austin.ibm.com Wed Oct 13 00:52:48 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:52:48 -0500 Subject: [PATCH 0/2][RFC] PPC64 no-exec support Message-ID: <20041012095248.2b6418c4@localhost> These patches add no exec support to PPC64. It should prohibit executing code out of the stack, or most any non-text segment. For distros that compile w/ pt_gnu_stacks, they depend on Ben's signal trampoline changes, or else it will hang on the first signal due to the return code being put on the signal context stack to return to the kernel on the completion of the signal handler. The patches include a base fixup from Anton of the wrong bit being used for no-exec and for read/write on the hardware PTEs. The patch is broken into two parts: 1/2: PPC64 no-exec support for user space: This will prohibit user space apps from executing in segments not marked as executable. The base support is in here as well. 2/2: PPC64 no-exec support for kernel space: This prohibits the kernel from executing non-text code. Thanks, Jake From moilanen at austin.ibm.com Wed Oct 13 00:58:52 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:58:52 -0500 Subject: [PATCH 2/2][RFC] PPC64 no-exec support for kernel space In-Reply-To: <20041012095248.2b6418c4@localhost> References: <20041012095248.2b6418c4@localhost> Message-ID: <20041012095852.29e583a3@localhost> Here is the kernel piece of no-exec. It marks all non-text pages as no-execute. It depends on the no-exec for user-space patch. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/module.c~nx-kernel-ppc64 arch/ppc64/kernel/module.c --- linux-2.6-bk/arch/ppc64/kernel/module.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c Thu Oct 7 15:23:55 2004 @@ -102,7 +102,8 @@ void *module_alloc(unsigned long size) { if (size == 0) return NULL; - return vmalloc(size); + + return vmalloc_exec(size); } /* Free memory returned from module_alloc */ diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Thu Oct 7 15:23:55 2004 @@ -75,6 +75,21 @@ static int store_updates_sp(struct pt_re return 0; } +pte_t *lookup_address(unsigned long address) +{ + pgd_t *pgd = pgd_offset_k(address); + pmd_t *pmd; + + if (pgd_none(*pgd)) + return NULL; + + pmd = pmd_offset(pgd, address); + if (pmd_none(*pmd)) + return NULL; + + return pte_offset_kernel(pmd, address); +} + /* * The error_code parameter is * - DSISR for a non-SLB data access fault, @@ -93,6 +108,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); unsigned long is_exec = trap == 0x400; + pte_t *ptep; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -245,6 +261,15 @@ bad_area_nosemaphore: info.si_addr = (void __user *) address; force_sig_info(SIGSEGV, &info, current); return 0; + } + + ptep = lookup_address(address); + + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { + if (printk_ratelimit()) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); + show_stack(current, (unsigned long *)__get_SP()); + do_exit(SIGKILL); } return SIGSEGV; diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c Thu Oct 7 15:23:55 2004 @@ -52,6 +52,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -89,12 +90,23 @@ static inline void loop_forever(void) ; } +int is_kernel_text(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) + return 1; + + return 0; +} + + + #ifdef CONFIG_PPC_MULTIPLATFORM static inline void create_pte_mapping(unsigned long start, unsigned long end, unsigned long mode, int large) { unsigned long addr; unsigned int step; + unsigned long tmp_mode; if (large) step = 16*MB; @@ -112,6 +124,13 @@ static inline void create_pte_mapping(un else vpn = va >> PAGE_SHIFT; + + tmp_mode = mode; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(addr)) + tmp_mode = mode | HW_NO_EXEC; + hash = hpt_hash(vpn, large); hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); @@ -120,12 +139,12 @@ static inline void create_pte_mapping(un if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); else #endif /* CONFIG_PPC_PSERIES */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); if (ret == -1) { ppc64_terminate_msg(0x20, "create_pte_mapping"); @@ -239,8 +258,6 @@ unsigned int hash_page_do_lazy_icache(un { struct page *page; -#define PPC64_HWNOEXEC (1 << 2) - if (!pfn_valid(pte_pfn(pte))) return pp; @@ -251,8 +268,8 @@ unsigned int hash_page_do_lazy_icache(un if (trap == 0x400) { __flush_dcache_icache(page_address(page)); set_bit(PG_arch_1, &page->flags); - } else - pp |= PPC64_HWNOEXEC; + } else + pp |= HW_NO_EXEC; } return pp; } diff -puN include/asm-ppc64/mmu.h~nx-kernel-ppc64 include/asm-ppc64/mmu.h diff -puN include/asm-ppc64/pgtable.h~nx-kernel-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h Thu Oct 7 15:23:55 2004 @@ -101,6 +101,12 @@ /* Bits 0x7000 identify the index within an HPT Group */ #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_SECONDARY | _PAGE_GROUP_IX) +#define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is + * inverted, even though it's the + * same value, hopefully it will be + * clearer in the code what is + * going on. */ + /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ @@ -120,6 +126,7 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) +#define PAGE_KERNEL_EXEC __pgprot(_PAGE_BASE | _PAGE_WRENABLE | _PAGE_EXEC) /* * POWER4 and newer have per page execute protection, older chips can only @@ -266,6 +273,7 @@ int hash_huge_page(struct mm_struct *mm, * The following only work if pte_present() is true. * Undefined behaviour if not.. */ +static inline int pte_user(pte_t pte) { return pte_val(pte) & _PAGE_USER;} static inline int pte_read(pte_t pte) { return pte_val(pte) & _PAGE_USER;} static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW;} static inline int pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC;} diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c Thu Oct 7 15:23:55 2004 @@ -622,6 +622,7 @@ static void __init iSeries_bolt_kernel(u { unsigned long pa; unsigned long mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + unsigned long tmp_mode; HPTE hpte; for (pa = saddr; pa < eaddr ;pa += PAGE_SIZE) { @@ -630,6 +631,12 @@ static void __init iSeries_bolt_kernel(u unsigned long va = (vsid << 28) | (pa & 0xfffffff); unsigned long vpn = va >> PAGE_SHIFT; unsigned long slot = HvCallHpt_findValid(&hpte, vpn); + + tmp_mode = mode_rw; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(ea)) + tmp_mode = mode_rw | HW_NO_EXEC; if (hpte.dw0.dw0.v) { /* HPTE exists, so just bolt it */ _ From dwmw2 at infradead.org Wed Oct 13 05:08:02 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 12 Oct 2004 20:08:02 +0100 Subject: cond_syscall() and new ABI. In-Reply-To: <200410122043.52351.arnd@arndb.de> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <20041012142627.GA19091@lst.de> <200410122043.52351.arnd@arndb.de> Message-ID: <1097608083.5178.5.camel@localhost.localdomain> On Tue, 2004-10-12 at 20:43 +0200, Arnd Bergmann wrote: > A better solution IMHO would be to include the right headers from sys.c > and have > > #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); That's true in theory, yes -- not that I can see any way that having the 'correct' prototype will actually make a difference in practice. > Also, someone should try to find out which toolchains don't support this > and if anybody is still using those. One issue seems to be the one from > http://seclists.org/lists/linux-kernel/2004/Jan/2474.html, but I'm not > sure if that is the problem that the comment refers to. That happens with both the current inline asm method, and with the 'alias' method which translates to basically the same asm output from gcc, but without the ifdefs. -- dwmw2 From arnd at arndb.de Wed Oct 13 04:43:52 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 12 Oct 2004 20:43:52 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <20041012142627.GA19091@lst.de> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <20041012142627.GA19091@lst.de> Message-ID: <200410122043.52351.arnd@arndb.de> On Dinsdag 12 Oktober 2004 16:26, Christoph Hellwig wrote: > > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > > --- 1.34/include/asm-ppc64/unistd.h???Tue Sep 14 01:23:12 2004 > > +++ edited/include/asm-ppc64/unistd.h?Tue Oct 12 14:48:08 2004 > > @@ -468,7 +468,7 @@ > > ? * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > > ? * but it doesn't work on all toolchains, so we just do it by hand > > ? */ > > -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); > > > this one otoh makes lots of sense - it's what most architectures use. It's also something that looks suboptimal to me. The syscalls should already have a proper protoype in , which typically is not "void sys_foo(void)". A better solution IMHO would be to include the right headers from sys.c and have #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); Also, someone should try to find out which toolchains don't support this and if anybody is still using those. One issue seems to be the one from http://seclists.org/lists/linux-kernel/2004/Jan/2474.html, but I'm not sure if that is the problem that the comment refers to. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041012/782ffcb7/attachment.pgp From anton at samba.org Wed Oct 13 05:39:02 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 13 Oct 2004 05:39:02 +1000 Subject: cond_syscall() and new ABI. In-Reply-To: <1097589108.318.425.camel@hades.cambridge.redhat.com> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> Message-ID: <20041012193902.GB3315@krispykreme.ozlabs.ibm.com> > This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: > > /* > * "Conditional" syscalls > * > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > Two options -- either we ditch older toolchains (before 2002-03-01 > probably), by switching to what we say in the comment, or we introduce > an ifdef to choose whether to include the '.' in the symbol names... http://ozlabs.org/ppc64-patches/ Has 5 remove -mminimal-toc patches which should fix this mess. The syscall table is currently abusing the ABI, it would be nice to fix it. If there are no complaints Id like to push this patchset once 2.6.10 opens. Anton From grave at ipno.in2p3.fr Wed Oct 13 17:19:56 2004 From: grave at ipno.in2p3.fr (grave) Date: Wed, 13 Oct 2004 07:19:56 +0000 Subject: libmotovec Message-ID: <1097651996l.1092l.0l@ipnnarval> Hi, Just to know : does the current powerpc kernel benefit from something like libmotovec ? Since the VMX is also on the ppc970 familly perhaps we will see more ibm processors in the future with such velocity engine so... xavier From benh at kernel.crashing.org Wed Oct 13 17:44:23 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Oct 2004 17:44:23 +1000 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: References: Message-ID: <1097653462.5553.43.camel@gaston> On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: > I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: Can you send me a dump of the whole device-tree ? Ben. From arnd at arndb.de Wed Oct 13 19:19:19 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 13 Oct 2004 11:19:19 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <1097608083.5178.5.camel@localhost.localdomain> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <200410122043.52351.arnd@arndb.de> <1097608083.5178.5.camel@localhost.localdomain> Message-ID: <200410131119.23409.arnd@arndb.de> On Dinsdag 12 Oktober 2004 21:08, David Woodhouse wrote: > On Tue, 2004-10-12 at 20:43 +0200, Arnd Bergmann wrote: > > A better solution IMHO would be to include the right headers from sys.c > > and have > > > > #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); > > That's true in theory, yes -- not that I can see any way that having the > 'correct' prototype will actually make a difference in practice. Right, my point was mostly about having an implementation that is less surprising to the reader, not about correctness. It might actually become a bug as soon as someone tries to build the kernel with a compiler that does inter-module analysis, but that's not likely to happen soon. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041013/d033fef2/attachment.pgp From schwab at suse.de Wed Oct 13 19:48:52 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 13 Oct 2004 11:48:52 +0200 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: <1097653462.5553.43.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 13 Oct 2004 17:44:23 +1000") References: <1097653462.5553.43.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: >> I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > Can you send me a dump of the whole device-tree ? By "dump" do you mean ls -R or something more fancy? Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Thu Oct 14 00:28:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 14 Oct 2004 00:28:53 +1000 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: References: <1097653462.5553.43.camel@gaston> Message-ID: <1097677732.10215.1.camel@gaston> On Wed, 2004-10-13 at 19:48, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: > >> I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > > > Can you send me a dump of the whole device-tree ? > > By "dump" do you mean ls -R or something more fancy? tarball of /proc/device-tree From linas at austin.ibm.com Thu Oct 14 05:23:56 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 14:23:56 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D6D89.6030300@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> Message-ID: <20041013192356.GE12237@austin.ibm.com> Hi, I'm copying over to the linuxppc64-dev at ozlabs.org mailing list, which is the right place to discuss this. On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > Linas Vepstas wrote: > >Hi, > > > >On Wed, Oct 13, 2004 at 09:12:23AM +0800, Zhen Huang was heard to remark: > > > >>Hi, > >> > >>The watchdog I mentioned means such a device: > >>Once we open it we must write to it regularly. > >>Otherwise the whole system will be reset. > >> > >>Many OS have software implement of this. > >>But the software watchdog will depend on the health of the OS. > >> > >>I want to know whether there have any hardware implement in pServer. > > > > > >Yes, there is a hardware watchdog; its implemented on all pSeries > >machines that have service processors (thus, it goes back to at > >least power3). However, it is not a unix 'device' that a user-land > >process can 'open'; it is only accessible through RTAS calls. The > >kernel daemon rtasd provides the regular heartbeat. > > > >The kernel enables the watchdog function with the 'enable_surveillance()' > >subroutine call (see arch/ppc64/kernel/rtasd.c). > >Once its enabled, the heartbeat is the 'event-scan' RTAS call, > >which the kernel must call regularly from each CPU. (I guess this > >helps detect hung CPU's on SMP systems). If the event-scan call > >isn't made within the 'surveillance timeout', the SP will reboot > >the OS (or call in a service request, etc.) > > > >I don't know if there is any interest in moving this heartbeat > >watchdog out from kernel space into user space; right now, > >rtasd is a kernel daemon, and it more or less just works. > > > >iIf it ever is converted to userland, its not likely it will > >every be a traditional unix device; instead, functions like > >this are moving to the sysfs file system. > > This would be a logical equivalent to the well-known and long-standing > 'softdog' device driver which already has a well-known API, which is also > implemented on other hardware devices and architectures. > > So, my suggestion would be that if it were moved to a userspace driver, > that the softdog API be retained. I might have volunteered to hack this up real quick, were it not for Mike Strosaker's correction, that the surveillance featues were taken out of Power5. Anyone on this list know why? --linas From strosake at austin.ibm.com Thu Oct 14 05:57:35 2004 From: strosake at austin.ibm.com (Mike Strosaker) Date: Wed, 13 Oct 2004 14:57:35 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013192356.GE12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> Message-ID: <416D88AF.1010706@austin.ibm.com> Linas Vepstas wrote: > I might have volunteered to hack this up real quick, were it not for > Mike Strosaker's correction, that the surveillance featues were taken > out of Power5. > > Anyone on this list know why? > I sent the reason I got from the hardware RAS folks to this list a while back. Luckily, it's still in my sent mail folder: "Because of the virtualization layer and partitioning, the surveillance requirement was moved to PHYP<->SP. Apparently, this was a hotly contested issue among the platform design folks (especially considering that partitioned power4 systems still have OS<->SP surveillance). I think the logic is: If an OS goes down, its not likely a server problem, hence no requirement to monitor from the server side. At least the platform gets notified of panics via os-term. I gather that some user space tools are expected to monitor for deadlocks/hangs (maybe clustering tools). " Thanks, Mike From alanr at unix.sh Thu Oct 14 07:30:02 2004 From: alanr at unix.sh (Alan Robertson) Date: Wed, 13 Oct 2004 15:30:02 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D88AF.1010706@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> Message-ID: <416D9E5A.9080102@unix.sh> Mike Strosaker wrote: > Linas Vepstas wrote: > >> I might have volunteered to hack this up real quick, were it not for >> Mike Strosaker's correction, that the surveillance featues were taken >> out of Power5. >> Anyone on this list know why? >> > > I sent the reason I got from the hardware RAS folks to this list a while > back. > Luckily, it's still in my sent mail folder: > > "Because of the virtualization layer and partitioning, the surveillance > requirement was moved to PHYP<->SP. Apparently, this was a hotly > contested issue among the platform design folks (especially considering > that > partitioned power4 systems still have OS<->SP surveillance). I think > the logic > is: If an OS goes down, its not likely a server problem, hence no > requirement > to monitor from the server side. > > At least the platform gets notified of panics via os-term. I gather > that some user space tools are expected to monitor for deadlocks/hangs > (maybe clustering tools). " This is about half-right. There is one particular circumstance which can ONLY be monitored from a hardware-level monitor. OS hangs. If the OS hangs, then, nothing but a hardware timer can bring the machine out of it's hung state. Hangs do NOT panic (by definition), and can't be reliably detected any other way. In highly available systems (like telecom systems), hardware level monitors are required. Leaving it out sends the message that "availability isn't important". The normal way that a highly available systems is to have layers (or a hierarchy) of watchers. At the bottom is the hardware monitor. Above that is an application monitor above that is resource monitors etc. But, there are certain kinds of faults which cannot be caught without this bottom layer monitor. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From linas at austin.ibm.com Thu Oct 14 08:12:54 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 17:12:54 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D9E5A.9080102@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> Message-ID: <20041013221254.GF12237@austin.ibm.com> Hi, On Wed, Oct 13, 2004 at 03:30:02PM -0600, Alan Robertson was heard to remark: > Mike Strosaker wrote: > >Linas Vepstas wrote: > > > >>I might have volunteered to hack this up real quick, were it not for > >>Mike Strosaker's correction, that the surveillance featues were taken > >>out of Power5. > >>Anyone on this list know why? > >> > > > >I sent the reason I got from the hardware RAS folks to this list a while > >back. > >Luckily, it's still in my sent mail folder: > > > >"Because of the virtualization layer and partitioning, the surveillance > >requirement was moved to PHYP<->SP. Apparently, this was a hotly > >contested issue among the platform design folks (especially considering > >that > >partitioned power4 systems still have OS<->SP surveillance). I think > >the logic > >is: If an OS goes down, its not likely a server problem, hence no > >requirement > >to monitor from the server side. > > > >At least the platform gets notified of panics via os-term. I gather > >that some user space tools are expected to monitor for deadlocks/hangs > >(maybe clustering tools). " > > This is about half-right. > > There is one particular circumstance which can ONLY be monitored from a > hardware-level monitor. > > OS hangs. Heh. I think I can clarify, after talking to the firmware folks. The core thinking behind the the "platform architecture" was to make sure that the underlying hardware, i.e. the "platform" wasn't hung. They were not concerned about the OS itself; they assumed that OS'es have thier own independent mechanisms for detecting hung-ness. >From the platform point of view, they are concerned that they'll have a machine with a dozen different partitons on it (a dozen different OS'es), and a hardware hang will take down all twelve. So they've got the hypervisor and service processor montioring each other, keeping things humming. If just one partition goes down due to a kernel hang/crash, well, that's too bad, but its not the end of the world from the platform point of view. I think Alan's point of view is from the other side of the table: why should someone buy 12 pci-card watchdogs, one for each partition, chewing up 12 pci slots, when the pSeries is already capable of doing watchdog functions? To add insult to injury, the sysadmin now needs to duct-tape each of the watchdog cards to some sort of kill-switch, to reboot a dead partition. The kill-switch needs to then ssh to the fsp or the hmc to start the reboot. So it gets pretty byzantine for something that could have been 'simple' and built-in. Never mind that the reliability goes down: the kill switch could fail, the pci watchdog card could fail (or get EEH'ed out), causing a reboot when no reboot was necessary, etc. --linas From jschopp at austin.ibm.com Thu Oct 14 08:32:10 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 13 Oct 2004 17:32:10 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013221254.GF12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> Message-ID: <416DACEA.1070900@austin.ibm.com> > I think Alan's point of view is from the other side of the table: > why should someone buy 12 pci-card watchdogs, one for each partition, > chewing up 12 pci slots, when the pSeries is already capable of doing > watchdog functions? To add insult to injury, the sysadmin now needs > to duct-tape each of the watchdog cards to some sort of kill-switch, > to reboot a dead partition. The kill-switch needs to then ssh to > the fsp or the hmc to start the reboot. So it gets pretty byzantine > for something that could have been 'simple' and built-in. Never mind > that the reliability goes down: the kill switch could fail, the > pci watchdog card could fail (or get EEH'ed out), causing a reboot > when no reboot was necessary, etc. I will miss the old school hardware watchdog. If I'd had a vote I would have voted to keep it. But since it is not a democracy I can only add a couple points to this argument. First, if people really care about reliability that much they will be running with hot spares in a HA environment. In that case there are already external monitors that activate the spare on any sign of problems. Second, this can all be done from the HMC. The HMC is perfectly capable of determining the partition is hung (LED error codes, heartbeat timeouts). It is also perfectly capable of rebooting a partition. I am not aware that there is a way to put the two together right now, so that the HMC automatically reboots the partition if it hangs, but it would certainly be an easy feature to add the HMC. From linas at austin.ibm.com Thu Oct 14 08:53:16 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 17:53:16 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416DACEA.1070900@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416DACEA.1070900@austin.ibm.com> Message-ID: <20041013225316.GH12237@austin.ibm.com> On Wed, Oct 13, 2004 at 05:32:10PM -0500, Joel Schopp was heard to remark: > > >I think Alan's point of view is from the other side of the table: > >why should someone buy 12 pci-card watchdogs, one for each partition, > >chewing up 12 pci slots, when the pSeries is already capable of doing > >watchdog functions? To add insult to injury, the sysadmin now needs > >to duct-tape each of the watchdog cards to some sort of kill-switch, > >to reboot a dead partition. The kill-switch needs to then ssh to > >the fsp or the hmc to start the reboot. So it gets pretty byzantine > >for something that could have been 'simple' and built-in. Never mind > >that the reliability goes down: the kill switch could fail, the > >pci watchdog card could fail (or get EEH'ed out), causing a reboot > >when no reboot was necessary, etc. > > I will miss the old school hardware watchdog. If I'd had a vote I would > have voted to keep it. But since it is not a democracy I can only add a > couple points to this argument. > > First, if people really care about reliability that much they will be > running with hot spares in a HA environment. In that case there are > already external monitors that activate the spare on any sign of problems. Yes, well, Alan is the guy who designs and builds these systems :) He's trying to figure out how to hook them up to the pSeries. You can't just cut the power, like you can for PC's :) http://www.linux-ha.org > Second, this can all be done from the HMC. The HMC is perfectly capable > of determining the partition is hung (LED error codes, heartbeat > timeouts). It is also perfectly capable of rebooting a partition. I am > not aware that there is a way to put the two together right now, so that > the HMC automatically reboots the partition if it hangs, but it would > certainly be an easy feature to add the HMC. The HMC is a natural place for this. One of Alan's complaints is that (non-pSeries) HMC's tend to be semi-proprietary and mostly unarchitected, with a wide variation from one model to another. The dependance on Java for core functions also makes them untrustworthy. --linas From alanr at unix.sh Thu Oct 14 14:41:26 2004 From: alanr at unix.sh (Alan Robertson) Date: Wed, 13 Oct 2004 22:41:26 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013221254.GF12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> Message-ID: <416E0376.1010500@unix.sh> Linas Vepstas wrote: > Hi, > > On Wed, Oct 13, 2004 at 03:30:02PM -0600, Alan Robertson was heard to remark: > >>Mike Strosaker wrote: >> >>>Linas Vepstas wrote: >>> >>> >>>>I might have volunteered to hack this up real quick, were it not for >>>>Mike Strosaker's correction, that the surveillance featues were taken >>>>out of Power5. >>>>Anyone on this list know why? >>>> >>> >>>I sent the reason I got from the hardware RAS folks to this list a while >>>back. >>>Luckily, it's still in my sent mail folder: >>> >>>"Because of the virtualization layer and partitioning, the surveillance >>>requirement was moved to PHYP<->SP. Apparently, this was a hotly >>>contested issue among the platform design folks (especially considering >>>that >>>partitioned power4 systems still have OS<->SP surveillance). I think >>>the logic >>>is: If an OS goes down, its not likely a server problem, hence no >>>requirement >>>to monitor from the server side. >>> >>>At least the platform gets notified of panics via os-term. I gather >>>that some user space tools are expected to monitor for deadlocks/hangs >>>(maybe clustering tools). " >> >>This is about half-right. >> >>There is one particular circumstance which can ONLY be monitored from a >>hardware-level monitor. >> >>OS hangs. > > > Heh. I think I can clarify, after talking to the firmware folks. > > The core thinking behind the the "platform architecture" was to make > sure that the underlying hardware, i.e. the "platform" wasn't hung. > They were not concerned about the OS itself; they assumed that OS'es > have thier own independent mechanisms for detecting hung-ness. > >>From the platform point of view, they are concerned that they'll > have a machine with a dozen different partitons on it (a dozen > different OS'es), and a hardware hang will take down all twelve. > So they've got the hypervisor and service processor montioring > each other, keeping things humming. If just one partition goes > down due to a kernel hang/crash, well, that's too bad, but its > not the end of the world from the platform point of view. And this is a great set of goals as far as they go. But, not sufficient when looking at the platform as something which actually delivers services, not just runs the hypervisor. [[I guess I forgot to say that in addition to being the architect for IBM's OSS Linux strategy and product, I worked for 21 years for Bell Labs on highly reliable telecommunications systems before this. So, I have some reasonable knowledge of how these kinds of things work in well-tested, well-proven systems. Typically, telephone systems are considered extremely reliable - because they follow a well-proven discipline of design. The international telephone system is in effect the worlds largest ultra-reliable computer. And, it has been since back when telephone switches were made with discrete transistors - largely because of good HA system design]] > I think Alan's point of view is from the other side of the table: > why should someone buy 12 pci-card watchdogs, one for each partition, > chewing up 12 pci slots, when the pSeries is already capable of doing > watchdog functions? To add insult to injury, the sysadmin now needs > to duct-tape each of the watchdog cards to some sort of kill-switch, > to reboot a dead partition. The kill-switch needs to then ssh to > the fsp or the hmc to start the reboot. So it gets pretty byzantine > for something that could have been 'simple' and built-in. Never mind > that the reliability goes down: the kill switch could fail, the > pci watchdog card could fail (or get EEH'ed out), causing a reboot > when no reboot was necessary, etc. Linas is right about the cost and complexity of the monitoring cards and the whole system. In addition, if we're trying to see pSeries as a premium highly-reliable system better than the competition, it just doesn't send the right message if you tell a customer that this is what they have to do. It looks really Rube Goldberg-ish (to say the least). In addition, from a technical perspective, there is a basic principle in HA systems which is being ignored here... A sick system cannot reliably monitor itself. If you're relying on a system which you believe to be sick to monitor itself, it will be unable to do this reliably under all circumstances - it's sick, and therefore not reliable -- by definition. Crazy people may not think they're insane ;-). The hardware watchdog timer is a 3rd party monitoring system, and therefore is likely to be reliable when the thing it is watching is sick - because its sanity is uncorrelated to the failure of the thing it is watching. For example, if by a programming error in the kernel, you halt or loop with interrupts disabled -- you're screwed with no way out. In mainframes I think this is called a disabled wait state. Of course, there are more complex ways to do this, but hopefully one example makes the point. This is the point of the hierarchy of monitoring I described before. This is very much standard operating procedure for reliable systems in the telecom industry (and many others). In fact, such a watchdog timer is a requirement for Carrier Grade Linux (CGL). Here is the standard way which highly available systems are architected to work -- and it's consistent with 35-year industry practice in telephony systems, the formal CGL requirements, and the architecture of the Linux-HA system. The hardware watchdog timer times out when it doesn't get a heartbeat in the allotted time. (duhhh!) Just before loading the BIOS, the watchdog timer should be set for some "reasonable" amount of time (like a few seconds) for the BIOS to load and begin executing. The BIOS should set the timer for a reasonable time for the bootstrap program to load. It must tickle it periodically while waiting for input from humans.* The bootstrap loader should work much the same way. Before it jumps to the OS, it should set the timer for a reasonable amount of time for the OS to take over the tickling.* When it first comes up, the OS takes over and tickles the watchdog timer. When the HA monitoring subsystem comes up, it takes over and tickles the watchdog timer. As HA-aware processes start up, they tickle individual watchdog timers maintained by the HA monitoring subsystem (apphbd). If they die, or hang, they are restarted by the Recovery Manager. As a special case, apphbd will restart the recovery manager as described below. The recovery manager registers with the HA monitoring subsystem and receives notification of insane or dead processes. If they're insane it kills them. When they die, it restarts them. If the recovery manager dies (or goes insane), then apphbd will (kill and) restart the recovery manager.** When the system panics, then the watchdog timer needs to be tickled while waiting for human input, and while making progress taking a dump. [but only when actually making progress]. When the OS jumps back into the BIOS for any reason then the timer is reset to some value suitable for the BIOS to take over and start tickling it. (~ same as the original value). Now if the BIOS or OS or bootstrap loader, or dump process craps out and hangs, or the hard disk can't boot, or a peripherial hangs the bus, then this watchdog timer will trigger, and the system will be reset - and you'll get a chance to try it again. [[If you fail too often in too short a period of time, then "phone home" or cry "uncle" or sit and cry if you like. Or, you can just keep persisting...]] Later on when HA monitoring system is running, if it (or the scheduler or other piece of the OS) craps out and the HA monitoring system doesn't (or isn't able to) tickle this watchdog timer - for whatever reason - then everything will reboot just like it should. Notice how many different kinds of errors this one single timer can detect and recover from - and how many of them cannot easily be recovered from at all without it. Note how handy it is in designing the system to know that your underlying hardware has this capability built-in. It eliminates a lot of complexity from several pieces of software, and does a better job too! Without this timer, you can't easily design a truly reliable system. (and maybe not at all). The lowest level monitor should be the simplest and most reliable. It monitors the OS. The driver for this in the kernel should also be solid and no-frills. The base-level HA monitoring system (which monitors processes for their health) should also be as simple as possible. Complexity is the enemy of reliability. If any of these components fail, then the system will be rebooted unnecessarily. This is a BadThing(TM). Now, to use this "right", the thing that any subsystem tickling the timer at the next higher level should do is periodically schedule something to evaluate its internal sanity (data structure consistency or queue lengths or whatever), and tickle the watchdog timer only when it passes whatever its internal sanity measure is. Then, if you go into an infinite loop, or doubt your own sanity long enough, someone else will eventually do something about it - you'll be killed and restarted (if a process) -- or rebooted (if you're the HA process monitor, or the BIOS, or bootstrap loader or OS). Of course, this doesn't *replace* external monitoring (see the note above about declaring oneself sick), but it is a good orthogonal measure, and simpler to implement for subsystems with limited external interfaces - like the bootstrap loader. * = Note that these layers may have to deal with bootstrap loaders and/or OSes which won't tickle the watchdog timer - so they have to shut it off (or set it really long) when booting a layer under them which isn't watchdog-aware. ** = The reason why the recovery manager is not part of the apphbd process in our design is because the apphbd process should be as simple as it can be - because it's death or insanity would trigger a system restart. Putting it in a separate process lessens the liklihood of an unnecessary system restart. This is not a necessity, but I believe it to be a good design choice - after all it was my design choice ;-) It is certainly true that we don't have to implement all these things today, or at all, but with the hardware watchdog timer, they're possible. And, without it, they're not. Even without implementing all these extra HA features, it still monitors the OS more reliably than it can monitor itself. So, I think this is a very worthwhile feature for the platform to have. Hope this helps! -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From arnd at arndb.de Thu Oct 14 21:35:13 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 14 Oct 2004 13:35:13 +0200 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013192356.GE12237@austin.ibm.com> References: <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> Message-ID: <200410141335.17333.arnd@arndb.de> On Middeweken 13 Oktober 2004 21:23, Linas Vepstas wrote: > On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > > This would be a logical equivalent to the well-known and long-standing > > 'softdog' device driver which already has a well-known API, which is also > > implemented on other hardware devices and architectures. > > > > So, my suggestion would be that if it were moved to a userspace driver, > > that the softdog API be retained. > > I might have volunteered to hack this up real quick, were it not for > Mike Strosaker's correction, that the surveillance featues were taken > out of Power5. ? FWIW, s390 linux has just added support for a hypervisor watchdog [1] that looks like a hardware watchdog to linux, but is implemented with hypercalls ("diag 0x288"). Since Power5 is typically running in hypervisor more, the watchdog interface could be provided completely by the firmware. Arnd <>< [1] http://ftp2.de.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/\ 2.6.9-rc4/2.6.9-rc4-mm1/broken-out/s390-9-12-z-vm-watchdog-timer.patch -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041014/29c36997/attachment.pgp From alanr at unix.sh Fri Oct 15 01:56:14 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 09:56:14 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <200410141335.17333.arnd@arndb.de> References: <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <200410141335.17333.arnd@arndb.de> Message-ID: <416EA19E.40300@unix.sh> Arnd Bergmann wrote: > On Middeweken 13 Oktober 2004 21:23, Linas Vepstas wrote: > >>On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > > >>>This would be a logical equivalent to the well-known and long-standing >>>'softdog' device driver which already has a well-known API, which is also >>>implemented on other hardware devices and architectures. >>> >>>So, my suggestion would be that if it were moved to a userspace driver, >>>that the softdog API be retained. >> >>I might have volunteered to hack this up real quick, were it not for >>Mike Strosaker's correction, that the surveillance featues were taken >>out of Power5. > > > FWIW, s390 linux has just added support for a hypervisor watchdog [1] > that looks like a hardware watchdog to linux, but is implemented with > hypercalls ("diag 0x288"). Since Power5 is typically running in hypervisor > more, the watchdog interface could be provided completely by the > firmware. The method of implementation isn't that important. Are these hypervisor calls (or equivalent) provided by power5? Is there any disadvantage to running under the hypervisor? -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From linas at austin.ibm.com Fri Oct 15 02:21:41 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 14 Oct 2004 11:21:41 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416E0376.1010500@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> Message-ID: <20041014162141.GA958@austin.ibm.com> Hi Alan, Long emails confuse me ... On Wed, Oct 13, 2004 at 10:41:26PM -0600, Alan Robertson was heard to remark: > Linas Vepstas wrote: > >why should someone buy 12 pci-card watchdogs, one for each partition, > >chewing up 12 pci slots, when the pSeries is already capable of doing > > It looks really Rube Goldberg-ish (to say the least). [...] > > The hardware watchdog timer is a 3rd party > monitoring system, and therefore is likely to be reliable when the thing it > is watching is sick - Not sure where you're going with this; are you saying that 3rd-party watchdog PCI cards, one for each partition, is a good idea, or a bad idea? Would you rather have the OS monitoring done with (a) watchdog PCI cards, (b) with 'surveillance' done by firmware/hypervisor, (c) or with some other method? > The bootstrap loader should work much the I guess I didn't get this exposition either. Although its nice to know that boot was successful, I see boot as a whole lot less important than monitoring the system once its gone 'online'. The boot sequence can be monitored much more loosely, with a whole-lot less complexity. The hypervisor knows when the OS boot sequence starts. If the OS hasn't completely booted after, say, 10 minutes, then it can call a human to look at the problem. I don't see why one needs to heartbeat once a second during boot; that's hard to do and seems un-neccessary. By contrast, I'd expect to turn on the once-per-second heartbeat just before the system goes 'online' or 'critical'. --linas From alanr at unix.sh Fri Oct 15 03:34:48 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 11:34:48 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041014162141.GA958@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> <20041014162141.GA958@austin.ibm.com> Message-ID: <416EB8B8.8040601@unix.sh> Linas Vepstas wrote: > Hi Alan, > > Long emails confuse me ... > > On Wed, Oct 13, 2004 at 10:41:26PM -0600, Alan Robertson was heard to remark: > >>Linas Vepstas wrote: >> >>>why should someone buy 12 pci-card watchdogs, one for each partition, >>>chewing up 12 pci slots, when the pSeries is already capable of doing >> >> It looks really Rube Goldberg-ish (to say the least). > > > [...] > >>The hardware watchdog timer is a 3rd party >>monitoring system, and therefore is likely to be reliable when the thing it >>is watching is sick - > > > > Not sure where you're going with this; are you saying that > 3rd-party watchdog PCI cards, one for each partition, is a > good idea, or a bad idea? > > Would you rather have the OS monitoring done with > (a) watchdog PCI cards, > (b) with 'surveillance' done by firmware/hypervisor, > (c) or with some other method? I would prefer (b). Because the software and address spaces of the firmware/hypervisor are separate, it is effectively a third party reset mechanism. The test I would use is: Does failure of the thing being monitored cause or correlate to failure in the thing doing the monitoring - and the answer is "no" -- therefore it's a third-party reset. I don't have a (c) method in mind that would work in this environment. Evaluating (a) and (b): Method (a): + is third party - is complex and hard to configure all around (think about configuring those cards with passwords and ssh, and ip addresses and partition names and so on - also think about how many things could break and keep this from working). - difficult to support - doesn't scale well in any obvious way - is relatively expensive for the customer (adds several hundred dollars for each partition - maybe as much as $1K) - difficult to bring into existence (compared to (b)) - is ugly, kludgy, and Rube Goldberg-ish. Method (b): + is third party + is relatively simple when compared to (a) (i.e., more reliable) + requires little/no special configuration to make it work + Shows off the advantages of pSeries architecture + adds no cost to the customer's solution + is comparatively easy to bring into existence (compared to a) + is a natural and clean solution. >> The bootstrap loader should work much the > > > I guess I didn't get this exposition either. ---- OK -- as I said this is an improvement over the above - but not absolutely critical -- But I'll try explaining it again and see if giving a shorter answer helps ------- > Although its nice to > know that boot was successful, I see boot as a whole lot less > important than monitoring the system once its gone 'online'. The boot > sequence can be monitored much more loosely, with a whole-lot less > complexity. The hypervisor knows when the OS boot sequence starts. > If the OS hasn't completely booted after, say, 10 minutes, then it > can call a human to look at the problem. I don't see why one needs > to heartbeat once a second during boot; that's hard to do and seems > un-neccessary. I didn't say anything about once a second. It could be once every 30 seconds - or even 5 minutes. That gives you lots of time, and you then only have to heartbeat in a couple of select places, and while in input loops waiting for human input. These aren't so much periodic heartbeats as they are progress reports. If you stop making progress, you get reset. > By contrast, I'd expect to turn on the once-per-second > heartbeat just before the system goes 'online' or 'critical'. This change decreases MTTR. MTTR has an effect on system availability - even in a redundant HA cluster - since MTTR determines the probability of "simultaneous" failures from which the HA system cannot recover. Calling a human is slow and often expensive (particularly on an emergency basis). It takes minutes to hours and may result in an extra service charge from someone (depending on who gets the call, what time it is, and what arrangements are made, etc.). A system which doesn't boot isn't providing service. If service isn't being provided, it doesn't matter why it's not being provided (OS, dump, bootstrap, BIOS, etc.)... The OS is not the only possible cause of failure. The OS is by far more likely than these others, but all software has bugs. And, hardware has transient failures as well as permanent ones. A system with these capabilities will continue to try and provide service in the presence of (transient) errors until it succeeds, or exceeds some retry threshold, meaning a human needs to intervene and fix whatever's wrong. This is essentially autonomic computing for the boot process. In short: With this architecture, the system will come up and provide service, or it is broken so badly that retrying won't help and a human really is needed. Otherwise, no recovery will be performed for errors which keep the system from coming up (after a crash or otherwise) and some outages may be unnecessarily prolonged. If your availability is poor, this will make zero difference. If your availability is very good, this helps a little. And, when your availability is very good, it's hard to find things that help even a little... Of course, being able to say "autonomic computing wired into the lowest levels of the system" probably has marketing value beyond the small amount of improved availability it provides ;-) [[If this system is running the air traffic control system while I'm in the air, I vote for adding this feature ;-)]]. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From alanr at unix.sh Fri Oct 15 07:19:17 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 15:19:17 -0600 Subject: My use of the term "3rd party" In-Reply-To: <416E0376.1010500@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> Message-ID: <416EED55.7050200@unix.sh> I just realized that this term has a different meaning to many people than it does to me in this context. I meant that it was an independent of the thing it was monitoring. That is, that its probability of failure is an independent random variable with respect to the thing it is measuring. In other words, the failure of the watchdog timer is uncorrelated to failures of the operating system or other user of the watchdog timer. I did *not* mean that you had to buy it from a 3rd party hardware manufacturer. My apologies for what was probably a poor choice of terminology. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From benh at kernel.crashing.org Fri Oct 15 19:16:32 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:16:32 +1000 Subject: Fan control for PowerMac7_3 Message-ID: <1097831790.1131.111.camel@gaston> Hi ! This is an experimental (read: totally untested) patch to the G5 fan control code. All I know is that it builds :) It should add proper support for all desktop G5s including liquid cooling. I suggest you run it with debug enabled (#undef DEBUG -> #define DEBUG in the beginning of the .c file) and send me the output though :) It does _NOT_ add support for the Xserve yet ! People who have already working cooling don't _need_ to test, they are welcome to do it though in case I broke something, but only send me the output if you feel something is wrong ... Should apply on top of current bk. Ben. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-15 19:09:05.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,13 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 15, 2004 : 1.1b1 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling */ #include @@ -101,7 +110,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b1" #undef DEBUG @@ -121,16 +130,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +424,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +441,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +497,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +630,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), + FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -659,6 +780,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state->index, (int)state->rpm, intake, pump, state->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +919,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +938,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1021,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1033,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1047,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1059,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1068,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1082,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1096,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1170,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1237,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1304,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1319,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1389,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1546,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) / 2) - 0x10; + else + fan_id = ((*reg) / 2) - 0x30; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,7 +1669,8 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From benh at kernel.crashing.org Fri Oct 15 19:19:42 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:19:42 +1000 Subject: Wrong patch! (Re: Fan control for PowerMac7_3) In-Reply-To: <1097831790.1131.111.camel@gaston> References: <1097831790.1131.111.camel@gaston> Message-ID: <1097831981.1131.113.camel@gaston> On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: > Hi ! > > This is an experimental (read: totally untested) patch to the G5 fan > control code. All I know is that it builds :) And I sent a wrong version ... sorry, the good one in a few minutes. Ben. From benh at kernel.crashing.org Fri Oct 15 19:20:50 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:20:50 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097831981.1131.113.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> Message-ID: <1097832049.1149.115.camel@gaston> On Fri, 2004-10-15 at 19:19, Benjamin Herrenschmidt wrote: > On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: > > Hi ! > > > > This is an experimental (read: totally untested) patch to the G5 fan > > control code. All I know is that it builds :) > > And I sent a wrong version ... sorry, the good one in a few minutes. Here it is: diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-15 19:20:06.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,13 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 15, 2004 : 1.1b1 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling */ #include @@ -101,7 +110,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b1" #undef DEBUG @@ -121,16 +130,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +424,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +441,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +497,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +630,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), + FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -659,6 +780,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state->index, (int)state->rpm, intake, pump, state->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +919,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +938,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1021,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1033,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1047,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1059,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1068,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1082,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1096,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1170,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1237,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1304,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1319,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1389,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1546,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,7 +1669,8 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From jimix at watson.ibm.com Sat Oct 16 01:53:17 2004 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Fri, 15 Oct 2004 11:53:17 -0400 Subject: [vHype-discussion] u64 in linux In-Reply-To: <1097849471.25095.97.camel@brick.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> Message-ID: <16751.62061.393716.650492@kitch0.watson.ibm.com> >>>>> "MO" == Michal Ostrowski writes: MO> In trying to integrate ppc64 changes into the vhype linux tree, I'm MO> coming across a problem with usage of "u64". MO> On x86, u64 is "unsigned long long". On ppc64 it is "unsigned long". *sigh* I thought the hell over size_t unsigned int vs. unsigned long would have tought everyone. BTW: a thread starts here: http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html After a whole lot of clicking it looks like a dropped patch. I guess its the cast, it seems thats the linux way at the moment. -JX From jimix at watson.ibm.com Sat Oct 16 02:46:56 2004 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Fri, 15 Oct 2004 12:46:56 -0400 Subject: u64 in linux In-Reply-To: <16751.62061.393716.650492@kitch0.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> Message-ID: <16751.65280.234326.437361@kitch0.watson.ibm.com> >>>>> "JX" == Jimi Xenidis writes: Forgive the CC to my internal list. The real question is, what was the result of this thread? JX> http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html And is casting the acceptable thing to do? -JX From hpa at zytor.com Sat Oct 16 03:33:45 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 10:33:45 -0700 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097832049.1149.115.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: <417009F9.6080007@zytor.com> Hi there, I tried to apply this patch to top-of-tree (bkcvs), but it looks like the current TOT doesn't compile on ppc64 for unrelated reasons: .config attached. arch/ppc64/kernel/built-in.o(.text+0x79f8): In function `.sys_call_table32': : undefined reference to `.sys_acct' arch/ppc64/kernel/built-in.o(.text+0x7c78): In function `.sys_call_table32': : undefined reference to `.sys_quotactl' arch/ppc64/kernel/built-in.o(.text+0x8078): In function `.sys_call_table32': : undefined reference to `.compat_mbind' arch/ppc64/kernel/built-in.o(.text+0x8080): In function `.sys_call_table32': : undefined reference to `.compat_get_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x8088): In function `.sys_call_table32': : undefined reference to `.compat_set_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x8090): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_open' arch/ppc64/kernel/built-in.o(.text+0x8098): In function `.sys_call_table32': : undefined reference to `.sys_mq_unlink' arch/ppc64/kernel/built-in.o(.text+0x80a0): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_timedsend' arch/ppc64/kernel/built-in.o(.text+0x80a8): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_timedreceive' arch/ppc64/kernel/built-in.o(.text+0x80b0): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_notify' arch/ppc64/kernel/built-in.o(.text+0x80b8): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_getsetattr' arch/ppc64/kernel/built-in.o(.text+0x8260): In function `.sys_call_table': : undefined reference to `.sys_acct' arch/ppc64/kernel/built-in.o(.text+0x84e0): In function `.sys_call_table': : undefined reference to `.sys_quotactl' arch/ppc64/kernel/built-in.o(.text+0x88e0): In function `.sys_call_table': : undefined reference to `.sys_mbind' arch/ppc64/kernel/built-in.o(.text+0x88e8): In function `.sys_call_table': : undefined reference to `.sys_get_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x88f0): In function `.sys_call_table': : undefined reference to `.sys_set_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x88f8): In function `.sys_call_table': : undefined reference to `.sys_mq_open' arch/ppc64/kernel/built-in.o(.text+0x8900): In function `.sys_call_table': : undefined reference to `.sys_mq_unlink' arch/ppc64/kernel/built-in.o(.text+0x8908): In function `.sys_call_table': : undefined reference to `.sys_mq_timedsend' arch/ppc64/kernel/built-in.o(.text+0x8910): In function `.sys_call_table': : undefined reference to `.sys_mq_timedreceive' arch/ppc64/kernel/built-in.o(.text+0x8918): In function `.sys_call_table': : undefined reference to `.sys_mq_notify' arch/ppc64/kernel/built-in.o(.text+0x8920): In function `.sys_call_table': : undefined reference to `.sys_mq_getsetattr' make: *** [.tmp_vmlinux1] Error 1 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: .config Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/6b5d420b/attachment.txt From arnd at arndb.de Sat Oct 16 04:58:58 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 15 Oct 2004 20:58:58 +0200 Subject: [vHype-discussion] u64 in linux In-Reply-To: <16751.62061.393716.650492@kitch0.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> Message-ID: <200410152059.03647.arnd@arndb.de> On Freedag 15 Oktober 2004 17:53, Jimi Xenidis wrote: > BTW: a thread starts here: > ? ?http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html > > After a whole lot of clicking it looks like a dropped patch. > > I guess its the cast, it seems thats the linux way at the moment. Yes, I think there have been some patches to drivers going in that direction. An alternative if the warning is in your own code is to use 'unsigned long long' or a user defined 'uval64' directly in the declaration instead of 'u64'. C99 also mandates that the macro PRIu64 contains the correct format string for uint64_t (which afaik is always the same as u64). It's currently not defined in linux, but could perhaps be added. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/2b1c6e18/attachment.pgp From hpa at zytor.com Sat Oct 16 05:21:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 12:21:07 -0700 Subject: [vHype-discussion] u64 in linux In-Reply-To: <200410152059.03647.arnd@arndb.de> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> Message-ID: <41702323.9010903@zytor.com> Arnd Bergmann wrote: > On Freedag 15 Oktober 2004 17:53, Jimi Xenidis wrote: > >>BTW: a thread starts here: >> http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html >> >>After a whole lot of clicking it looks like a dropped patch. >> >>I guess its the cast, it seems thats the linux way at the moment. > > > Yes, I think there have been some patches to drivers going in that > direction. > An alternative if the warning is in your own code is to use > 'unsigned long long' or a user defined 'uval64' directly in > the declaration instead of 'u64'. > > C99 also mandates that the macro PRIu64 contains the correct > format string for uint64_t (which afaik is always the same as u64). > It's currently not defined in linux, but could perhaps be added. > Also, in C99, you can print any integer type by casting it to [u]intmax_t and use %j. -hpa From hpa at zytor.com Sat Oct 16 05:27:58 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 12:27:58 -0700 Subject: [vHype-discussion] u64 in linux In-Reply-To: <41702323.9010903@zytor.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> <41702323.9010903@zytor.com> Message-ID: <417024BE.3060008@zytor.com> H. Peter Anvin wrote: > > Also, in C99, you can print any integer type by casting it to > [u]intmax_t and use %j. > By the way, my very firm opinion on this is that we should match and use as much as possible. Quite frankly actually resolves a lot of issues that previous attempts at creating these datatypes -- including the one in Linux -- have ignored. This is a good thing. Yes, there is ugliness, and I actually would have liked to see the C99 committee to have adopted the M$ extension %Inn (e.g. %I64d for a 64-bit signed decimal integer); to make matters worse GNU used %I for a different purpose to it's not even possible to make it a compatible extension. -hpa From olh at suse.de Sat Oct 16 05:34:00 2004 From: olh at suse.de (Olaf Hering) Date: Fri, 15 Oct 2004 21:34:00 +0200 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <20041015193400.GA14307@suse.de> On Fri, Oct 15, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: > > .config attached. > # Linux kernel version: 2.6.9-rc4 rc4-bk3 builds ok for me with that config. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From dwmw2 at infradead.org Sat Oct 16 06:52:42 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Fri, 15 Oct 2004 21:52:42 +0100 Subject: Reserve initrd pages. Message-ID: <1097873562.13633.732.camel@hades.cambridge.redhat.com> We don't mark initrd pages as reserved. If we manage to allocate enough other stuff before using the initrd, we end up eating into the initrd and we don't boot. Signed-Off-By: David Woodhouse ===== arch/ppc64/kernel/setup.c 1.83 vs edited ===== --- 1.83/arch/ppc64/kernel/setup.c 2004-10-04 20:17:37 +01:00 +++ edited/arch/ppc64/kernel/setup.c 2004-10-15 21:02:33 +01:00 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -990,6 +991,9 @@ /* set up the bootmem stuff with available memory */ do_init_bootmem(); + + if (initrd_start) + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); /* Select the correct idle loop for the platform. */ idle_setup(); -- dwmw2 From schwab at suse.de Sat Oct 16 07:00:48 2004 From: schwab at suse.de (Andreas Schwab) Date: Fri, 15 Oct 2004 23:00:48 +0200 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: (Andreas Schwab's message of "Mon, 11 Oct 2004 22:11:42 +0200") References: Message-ID: > I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > ide-pmac: cannot find MacIO node for Kauai ATA interface > ide0: Found Apple OHare ATA controller, bus ID 0, irq 0 > Oops: Kernel access of bad area, sig: 11 [#1] > NIP [...] .ide_mm_inb+0x0/0x14 > LR [...] .ide_wait_not_busy+0x98/0xf0 That turned out to be an apparent compiler bug. The kernel is working fine for me now. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From schwab at suse.de Sat Oct 16 08:16:28 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 16 Oct 2004 00:16:28 +0200 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097832049.1149.115.camel@gaston> (Benjamin Herrenschmidt's message of "Fri, 15 Oct 2004 19:20:50 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > On Fri, 2004-10-15 at 19:19, Benjamin Herrenschmidt wrote: >> On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: >> > Hi ! >> > >> > This is an experimental (read: totally untested) patch to the G5 fan >> > control code. All I know is that it builds :) >> >> And I sent a wrong version ... sorry, the good one in a few minutes. > > Here it is: Here's a patch to make it compile with DEBUG enabled: --- linux-2.6.9-rc4/drivers/macintosh/therm_pm72.c.~1~ 2004-10-16 00:02:36.705511068 +0200 +++ linux-2.6.9-rc4/drivers/macintosh/therm_pm72.c 2004-10-16 00:07:04.815455733 +0200 @@ -652,7 +652,7 @@ static int do_read_one_cpu_values(struct DBG(" cpu %d, fan reading error !\n", state->index); } else { state->rpm = rc; - DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); } /* Get some sensor readings and scale it */ @@ -691,8 +691,8 @@ static int do_read_one_cpu_values(struct state->last_power = *power; DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", - state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), - FIX32TOPRINT(*power)); + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); return 0; } @@ -850,7 +850,7 @@ static void do_monitor_cpu_combined(void state1->intake_rpm = state0->intake_rpm; DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", - state->index, (int)state->rpm, intake, pump, state->overtemp); + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); /* We should check for errors, shouldn't we ? But then, what * do we do once the error occurs ? For FCU notified fan Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From raosanth at us.ibm.com Sat Oct 16 07:00:25 2004 From: raosanth at us.ibm.com (Santhosh Rao) Date: Fri, 15 Oct 2004 16:00:25 -0500 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" Message-ID: Ok, it appears we aren't dropping into the open firmware debugger randomly, the kernel seems to give up early in the boot process Below is the output of an attempted boot of 2.6.9-rc4. Jose, ever seen anything like this? The machine is a p615 power-4 2-CPU box with 2GB of RAM. -- Sonny Output: Elapsed time since release of system processors: 1 mins 23 secs Config file read, 4096 bytes Welcome to yaboot version 1.3.11.SuSE Enter "help" to get some basic usage information boot: autobench Please wait, loading kernel... Elf64 kernel loaded... OF stdout device is: /pci at 400000000110/isa at 3/serial at i3f8 command line: root=/dev/sda3 elevator=noop elevator=noop memory layout at init: alloc_bottom : 000000000403c000 alloc_top : 0000000040000000 alloc_top_hi : 0000000080000000 rmo_top : 0000000080000000 ram_top : 0000000080000000 Looking for displays ERROR, cannot find space for TCE table. EXIT called ok 0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/46c8a0fc/attachment.htm From dwmw2 at infradead.org Sat Oct 16 09:15:50 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 00:15:50 +0100 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097882150.13633.754.camel@hades.cambridge.redhat.com> On Fri, 2004-10-15 at 21:52 +0100, David Woodhouse wrote: > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); That doesn't work if CONFIG_NUMA is set. This one does... --- linux-2.6.8/arch/ppc64/kernel/setup.c~ 2004-10-15 20:59:01.000000000 +0100 +++ linux-2.6.8/arch/ppc64/kernel/setup.c 2004-10-15 23:59:18.082932384 +0100 @@ -533,6 +533,8 @@ if (initrd_start) printk("Found initrd at 0x%lx:0x%lx\n", initrd_start, initrd_end); + lmb_reserve(__pa(initrd_start), initrd_end-initrd_start); + DBG(" <- check_for_initrd()\n"); #endif /* CONFIG_BLK_DEV_INITRD */ } -- dwmw2 From dwmw2 at infradead.org Sat Oct 16 09:35:23 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 00:35:23 +0100 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <1097883323.13633.757.camel@hades.cambridge.redhat.com> On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: Building with -mcall-aixdesc will work around that. -- dwmw2 From benh at kernel.crashing.org Sat Oct 16 10:39:21 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:39:21 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <1097887160.6527.15.camel@gaston> On Sat, 2004-10-16 at 03:33, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: Weird... could it be cond_syscall not working ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:42:02 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:42:02 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097883323.13633.757.camel@hades.cambridge.redhat.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> <1097883323.13633.757.camel@hades.cambridge.redhat.com> Message-ID: <1097887322.6487.21.camel@gaston> On Sat, 2004-10-16 at 09:35, David Woodhouse wrote: > On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > > Hi there, > > > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > > the current TOT doesn't compile on ppc64 for unrelated reasons: > > Building with -mcall-aixdesc will work around that. What is the exact problem ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:45:10 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:45:10 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: References: Message-ID: <1097887510.6487.23.camel@gaston> On Sat, 2004-10-16 at 07:00, Santhosh Rao wrote: > Ok, it appears we aren't dropping into the open firmware debugger > randomly, the kernel seems to give up early in the boot process > Below is the output of an attempted boot of 2.6.9-rc4. > > Jose, ever seen anything like this? > > The machine is a p615 power-4 2-CPU box with 2GB of RAM. Can you enable PROM_DEBUG in arch/ppc64/kernel/prom_init.c and send me the output log ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:46:18 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:46:18 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097887578.6546.25.camel@gaston> On Sat, 2004-10-16 at 06:52, David Woodhouse wrote: > We don't mark initrd pages as reserved. If we manage to allocate enough > other stuff before using the initrd, we end up eating into the initrd > and we don't boot. Hrm... that should be done in > Signed-Off-By: David Woodhouse > > ===== arch/ppc64/kernel/setup.c 1.83 vs edited ===== > --- 1.83/arch/ppc64/kernel/setup.c 2004-10-04 20:17:37 +01:00 > +++ edited/arch/ppc64/kernel/setup.c 2004-10-15 21:02:33 +01:00 > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -990,6 +991,9 @@ > > /* set up the bootmem stuff with available memory */ > do_init_bootmem(); > + > + if (initrd_start) > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); > > /* Select the correct idle loop for the platform. */ > idle_setup(); -- Benjamin Herrenschmidt From benh at kernel.crashing.org Sat Oct 16 10:47:41 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:47:41 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097887661.6487.28.camel@gaston> On Sat, 2004-10-16 at 06:52, David Woodhouse wrote: > We don't mark initrd pages as reserved. If we manage to allocate enough > other stuff before using the initrd, we end up eating into the initrd > and we don't boot. That should be done in mm/init.c, do_init_bootmem() itself: /* reserve the sections we're already using */ for (i=0; i < lmb.reserved.cnt; i++) { unsigned long physbase = lmb.reserved.region[i].physbase; unsigned long size = lmb.reserved.region[i].size; reserve_bootmem(physbase, size); } The initrd is part of the "reserved map" passed in by prom_init and thus is put in the list of reserved lmb regions. Ben. From benh at kernel.crashing.org Sat Oct 16 10:48:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:48:11 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097882150.13633.754.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> <1097882150.13633.754.camel@hades.cambridge.redhat.com> Message-ID: <1097887691.6527.30.camel@gaston> On Sat, 2004-10-16 at 09:15, David Woodhouse wrote: > On Fri, 2004-10-15 at 21:52 +0100, David Woodhouse wrote: > > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); > > That doesn't work if CONFIG_NUMA is set. This one does... Again, it should be already in the LMB reserve map, if not, then there is a bug, but that isn't the right fix. Ben. From dwmw2 at infradead.org Sat Oct 16 10:47:46 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 01:47:46 +0100 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097887322.6487.21.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> <1097883323.13633.757.camel@hades.cambridge.redhat.com> <1097887322.6487.21.camel@gaston> Message-ID: <1097887666.5788.2059.camel@baythorne.infradead.org> On Sat, 2004-10-16 at 10:42 +1000, Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 09:35, David Woodhouse wrote: > > On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > > > Hi there, > > > > > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > > > the current TOT doesn't compile on ppc64 for unrelated reasons: > > > > Building with -mcall-aixdesc will work around that. > > What is the exact problem ? cond_syscall not working due to new ABI. -- dwmw2 From benh at kernel.crashing.org Sat Oct 16 12:23:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 12:23:53 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097832049.1149.115.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: <1097893432.6546.37.camel@gaston> Ok, here's a new patch that fixes a few issues, it's been tested on a non-liquid cooled system and appear to work ok. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-16 12:21:42.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,14 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 16, 2004 : 1.1b2 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling + * - Workaround for some PowerMac7,3 with empty "fan" node in the devtree */ #include @@ -101,7 +111,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b2" #undef DEBUG @@ -121,16 +131,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +425,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +442,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +498,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +631,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -626,7 +748,7 @@ * input target is mpu.ttarget, input max is mpu.tmax */ integ_p = ((s64)state->mpu.pid_gr) * (s64)integral; - DBG(" integ_p: %d\n", (int)(deriv_p >> 36)); + DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sval = (state->mpu.tmax << 16) - ((integ_p >> 20) & 0xffffffff); adj_in_target = (state->mpu.ttarget << 16); if (adj_in_target > sval) @@ -659,6 +781,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +920,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +939,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1022,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1034,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1048,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1060,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1069,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1083,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1097,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3 at 0,f8000000"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1171,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1238,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1305,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1320,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1390,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1547,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,15 +1670,20 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); np = of_find_node_by_type(NULL, "fcu"); if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u3 at 0,f8000000/i2c at f8001000/fan at 15e"); + if (np == NULL) { + printk(KERN_ERR "Can't find FCU in device-tree !\n"); + return -ENODEV; + } } of_dev = of_platform_device_create(np, "temperature"); if (of_dev == NULL) { diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From hpa at zytor.com Sat Oct 16 14:24:05 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 21:24:05 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097893432.6546.37.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <4170A265.6030402@zytor.com> Benjamin Herrenschmidt wrote: > Ok, here's a new patch that fixes a few issues, it's been > tested on a non-liquid cooled system and appear to work ok. I'm testing it out right now. It is definitely suffering from some degree of oscillation when idling, and it seems to be one particular (set of) fan(s) that is having that problem. Note that one cause of oscillation at low speed is that there is a minimum speed below which the fans will simply stop. This may be what is happening here. Some time later I'll try to figure out which numbers to collect and try to generate a graph over time. It's definitely passing the stress test, though; make -j4 on the whole kernel (with the .config posted earlier) took 4:27.19. The other stress test -- which used to kill the old thermal driver dead in a matter of seconds -- is to start a bunch of "cat /dev/zero > /dev/null" is happily running, and nice and quiet. The oscillation is obnoxious, though. -hpa From nathanl at austin.ibm.com Sat Oct 16 14:37:04 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 15 Oct 2004 23:37:04 -0500 Subject: cpu hotplug broken in 2.6.9-rc4 Message-ID: <1097901423.3226.42.camel@biclops> (Urgh, sent this to the wrong list initially, sorry. Second try...) Hi- Seems that cpu hotplug got broken when benh's monster cleanup patch went into bk (in 2.6.9-rc2-bk10). System boots fine, but if I take down a cpu and then try to bring it back up, I get: # echo 1 > /sys/devices/system/cpu/cpu1/online Bad kernel stack pointer 7d23080 at 6373c0 cpu 0x1: Vector: c000000002ff4d80 at [c0000000077c7d40] pc: 00000000006373c0 lr: 00000000006373c0 sp: 7d23080 msr: 1002 current = 0xc00000000779a8a0 paca = 0xc000000000493d00 pid = 0, comm = swapper enter ? for help 1:mon> t SP (7d23080) is in userspace 1:mon> r R00 = 0000000000000000 R16 = 0000000000000000 R01 = 0000000007d23080 R17 = 0000000000000000 R02 = 0000000007ad4b68 R18 = 0000000000000000 R03 = 0000000000000001 R19 = 0000000000000000 R04 = 00000000006373c0 R20 = c000000000493880 R05 = 0000000000000001 R21 = 00016bb01a585f1a R06 = 0000000000000020 R22 = c000000000493d00 R07 = fffffffd00000000 R23 = c000000002565008 R08 = 00000000000d6000 R24 = 0000000000000001 R09 = c00000000737ef80 R25 = 0000000000000000 R10 = c00000000737ed40 R26 = 0000000000000008 R11 = c00000000737eb40 R27 = 0000000000000010 R12 = 0000000000000001 R28 = c0000000077a4000 R13 = c000000000493d00 R29 = 0000000007889698 R14 = 0000000000000000 R30 = c0000000077a4010 R15 = 0000000007ab0420 R31 = 0000000007d23080 pc = 00000000006373c0 lr = 00000000006373c0 msr = 0000000000001002 cr = 22000024 ctr = 800000000010dd60 xer = 0000000000000001 trap = c000000002ff4d80 For what it's worth, the least significant half of pc (00000000006373c0) matches the address of pseries_secondary_smp_init in the System.map: c0000000006373c0 D pseries_secondary_smp_init If I revert the monster patch from the 2.6.9-rc2-bk10 snapshot things work fine. I haven't been able to figure out yet how the stack pointer gets a bad value. Nathan From benh at kernel.crashing.org Sat Oct 16 14:50:07 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:50:07 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170A265.6030402@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> Message-ID: <1097902206.8965.2.camel@gaston> On Sat, 2004-10-16 at 14:24, H. Peter Anvin wrote: > Benjamin Herrenschmidt wrote: > > Ok, here's a new patch that fixes a few issues, it's been > > tested on a non-liquid cooled system and appear to work ok. > > I'm testing it out right now. It is definitely suffering from some > degree of oscillation when idling, and it seems to be one particular > (set of) fan(s) that is having that problem. Which ones ? The CPU fans ? > Note that one cause of oscillation at low speed is that there is a > minimum speed below which the fans will simply stop. This may be what > is happening here. Do the fan actually stop ? Yes we "floor" the fan speeds and indeed, Apple algorithm is known to slowly oscillate, on my box it's between 300 and 1000 RPM for the CPU fans over a period of a minute or 2. Such an oscillation is expected. Something worse would mean we get something wrong. Did you compare against OS X ? > Some time later I'll try to figure out which numbers to collect and try > to generate a graph over time. > > It's definitely passing the stress test, though; make -j4 on the whole > kernel (with the .config posted earlier) took 4:27.19. The other stress > test -- which used to kill the old thermal driver dead in a matter of > seconds -- is to start a bunch of "cat /dev/zero > /dev/null" is happily > running, and nice and quiet. > > The oscillation is obnoxious, though. Hehe... Ben. From hpa at zytor.com Sat Oct 16 14:58:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 21:58:07 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902206.8965.2.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> Message-ID: <4170AA5F.6060107@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:24, H. Peter Anvin wrote: > >>Benjamin Herrenschmidt wrote: >> >>>Ok, here's a new patch that fixes a few issues, it's been >>>tested on a non-liquid cooled system and appear to work ok. >> >>I'm testing it out right now. It is definitely suffering from some >>degree of oscillation when idling, and it seems to be one particular >>(set of) fan(s) that is having that problem. > > Which ones ? The CPU fans ? I don't know how to tell; it's a significant sound. Let me see if I can figure it out. >>Note that one cause of oscillation at low speed is that there is a >>minimum speed below which the fans will simply stop. This may be what >>is happening here. > > Do the fan actually stop ? Yes we "floor" the fan speeds and indeed, > Apple algorithm is known to slowly oscillate, on my box it's between > 300 and 1000 RPM for the CPU fans over a period of a minute or 2. > > Such an oscillation is expected. Something worse would mean we get > something wrong. Did you compare against OS X ? OS X doesn't sound like this. The oscillation period for what it's worth is 10 seconds. -hpa From benh at kernel.crashing.org Sat Oct 16 14:55:45 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:55:45 +1000 Subject: [PATCH] ppc64: Fix a typo in the code that reserves memory at boot Message-ID: <1097902544.8963.5.camel@gaston> Hi ! The code that marks memory regions as "reserved" early during boot has a typo (doing incorrect rounding of the top address) which can cause some areas to not be properly reserved. That may explain some cases of initrd corruption reported recently. Signed-off-by: Benjamin Herrenschmidt ===== arch/ppc64/kernel/prom_init.c 1.2 vs edited ===== --- 1.2/arch/ppc64/kernel/prom_init.c 2004-09-27 19:12:49 +10:00 +++ edited/arch/ppc64/kernel/prom_init.c 2004-10-16 14:53:28 +10:00 @@ -595,7 +595,7 @@ * dumb and just copy this entire array to the boot params */ base = _ALIGN_DOWN(base, PAGE_SIZE); - top = _ALIGN_DOWN(top, PAGE_SIZE); + top = _ALIGN_UP(top, PAGE_SIZE); size = top - base; if (cnt >= (MEM_RESERVE_MAP_SIZE - 1)) From benh at kernel.crashing.org Sat Oct 16 14:58:14 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:58:14 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170AA5F.6060107@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> Message-ID: <1097902694.8965.8.camel@gaston> On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > I don't know how to tell; it's a significant sound. Let me see if I can > figure it out. If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well with the case open, as long as you keep the plexiglass in place, which drives the air flow, and you'll be able to see the CPU and slots fans. You can also read the speed values from /sys/devices/temperature > OS X doesn't sound like this. The oscillation period for what it's > worth is 10 seconds. Ok, there must be something wrong then... Ben. From hpa at zytor.com Sat Oct 16 15:03:26 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:03:26 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902694.8965.8.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> Message-ID: <4170AB9E.5010006@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > > >>I don't know how to tell; it's a significant sound. Let me see if I can >>figure it out. > > > If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well > with the case open, as long as you keep the plexiglass in place, which > drives the air flow, and you'll be able to see the CPU and slots fans. > > You can also read the speed values from /sys/devices/temperature > That's what I'm about to do. Hang on. -hpa From benh at kernel.crashing.org Sat Oct 16 15:02:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:02:53 +1000 Subject: [PATCH] ppc64: Fix a typo in the code that reserves memory at boot In-Reply-To: <1097902544.8963.5.camel@gaston> References: <1097902544.8963.5.camel@gaston> Message-ID: <1097902973.9026.10.camel@gaston> On Sat, 2004-10-16 at 14:55, Benjamin Herrenschmidt wrote: > Hi ! > > The code that marks memory regions as "reserved" early during boot > has a typo (doing incorrect rounding of the top address) which can > cause some areas to not be properly reserved. That may explain some > cases of initrd corruption reported recently. > > Signed-off-by: Benjamin Herrenschmidt Ok, ignore it and take Anton's one instead. Ben. From hpa at zytor.com Sat Oct 16 15:32:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:32:07 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902694.8965.8.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> Message-ID: <4170B257.1010602@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > > >>I don't know how to tell; it's a significant sound. Let me see if I can >>figure it out. > > > If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well > with the case open, as long as you keep the plexiglass in place, which > drives the air flow, and you'll be able to see the CPU and slots fans. > > You can also read the speed values from /sys/devices/temperature > > >>OS X doesn't sound like this. The oscillation period for what it's >>worth is 10 seconds. > > > Ok, there must be something wrong then... > It's the backside fan that oscillates; backside_fan_pwm varies between 30 and 100 in what is pretty much a squarewave. See attached graph (and note how the other fans vary with workload.) I probably need to write a "power virus" program for the G5 to really test out the high end (a power virus is a program which keeps the chip running as hard as it can; generally keep all pipelines stuffed.) -hpa -------------- next part -------------- A non-text attachment was scrubbed... Name: temps.pdf Type: application/pdf Size: 11823 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/4f39c9d7/attachment.pdf From benh at kernel.crashing.org Sat Oct 16 15:33:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:33:04 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <1097904783.8961.23.camel@gaston> On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) > > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) think about also banging FPU and Altivec units then :) Since it's low oscillation point is 30, I suppose it properly detects U3H (can you verify that in the code, adding a printk for example in init_backside_state()). I'll double check the values used for the PID in darwin Ben From benh at kernel.crashing.org Sat Oct 16 15:43:19 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:43:19 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <1097905399.8963.26.camel@gaston> On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) > > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) Strange... The values used seem to be identical to OS X (a 75? target which is high actually, and a different G_d value than old U3). I would need to see the debug output and compare with the OS X driver built with debug output as well (don't ask me to fully understand the math of the PID algorithm) Ben. From hpa at zytor.com Sat Oct 16 16:01:08 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 23:01:08 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097904783.8961.23.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <1097904783.8961.23.camel@gaston> Message-ID: <4170B924.3040104@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) >> >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > think about also banging FPU and Altivec units then :) > Those would be included in "all pipelines." I need to learn more about the specifics of the G5 -- and general PowerPC stuff -- before I can write such a program, though. > Since it's low oscillation point is 30, I suppose it properly detects > U3H (can you verify that in the code, adding a printk for example in > init_backside_state()). > > I'll double check the values used for the PID in darwin I'll do that and compile with debugging enabled, and send you a log from hell. -hpa From nathanl at austin.ibm.com Sat Oct 16 16:14:17 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 16 Oct 2004 01:14:17 -0500 Subject: [PATCH] ppc64: fix smp_startup_cpu for cpu hotplug In-Reply-To: <1097901423.3226.42.camel@biclops> References: <1097901423.3226.42.camel@biclops> Message-ID: <1097907257.3226.47.camel@biclops> This change is needed in order to allow cpus to be onlined after boot. This used to work but the declaration of pseries_secondary_smp_init in this file was changed in Ben's big cleanup patch a while back, so the cpu would start at a bad address. Signed-off-by: Nathan Lynch smp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: 2.6.9-rc4/arch/ppc64/kernel/smp.c =================================================================== --- 2.6.9-rc4.orig/arch/ppc64/kernel/smp.c 2004-10-16 00:38:57.404529136 -0500 +++ 2.6.9-rc4/arch/ppc64/kernel/smp.c 2004-10-16 00:56:13.266054248 -0500 @@ -390,7 +390,8 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { int status; - unsigned long start_here = __pa(pseries_secondary_smp_init); + unsigned long start_here = __pa((u32)*((unsigned long *) + pseries_secondary_smp_init)); unsigned int pcpu; /* At boot time the cpus are already spinning in hold From schwab at suse.de Sun Oct 17 06:05:22 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 16 Oct 2004 22:05:22 +0200 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097893432.6546.37.camel@gaston> (Benjamin Herrenschmidt's message of "Sat, 16 Oct 2004 12:23:53 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Ok, here's a new patch that fixes a few issues, it's been > tested on a non-liquid cooled system and appear to work ok. That doesn't work very well for me. The fans are constantly spinning at a rather high rate independent of how loaded the system is. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From paulus at samba.org Sun Oct 17 10:53:21 2004 From: paulus at samba.org (Paul Mackerras) Date: Sun, 17 Oct 2004 10:53:21 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> H. Peter Anvin writes: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) The sharp rises look like the code thinks it gets into an over-temperature situation and turns the fans on full blast. It could be worth putting some printks in the overtemp code. > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) Hmmm, I should see if I can dig such a thing out of somewhere in IBM. Paul. From hpa at zytor.com Sun Oct 17 10:58:22 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Sat, 16 Oct 2004 17:58:22 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <16753.49793.159513.618588@cargo.ozlabs.ibm.com> Message-ID: <4171C3AE.3010302@zytor.com> Paul Mackerras wrote: > H. Peter Anvin writes: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) > > The sharp rises look like the code thinks it gets into an > over-temperature situation and turns the fans on full blast. It could > be worth putting some printks in the overtemp code. > Changing the unsigned variables to signed per Ben's suggestion seems to have solved the problem. > >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > Hmmm, I should see if I can dig such a thing out of somewhere in IBM. > That would be good; otherwise they're not too hard to write given a microarchitectural description. -hpa From benh at kernel.crashing.org Sun Oct 17 10:58:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 10:58:04 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <1097974684.8965.59.camel@gaston> On Sun, 2004-10-17 at 06:05, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > Ok, here's a new patch that fixes a few issues, it's been > > tested on a non-liquid cooled system and appear to work ok. > > That doesn't work very well for me. The fans are constantly spinning at a > rather high rate independent of how loaded the system is. Is it all fans or just the backside fan getting crazy ? This later bug is fixed by version #4 I'll post in a minute... Ben. From benh at kernel.crashing.org Sun Oct 17 11:01:03 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 11:01:03 +1000 Subject: Fan control for PowerMac7_3 (#4) In-Reply-To: <1097893432.6546.37.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <1097974861.8965.62.camel@gaston> This version fixes a bug with the backside fan doing crazy things, it appears to work properly on the dual 2.5Ghz now. Unless I get a negative report, I intend to submit it to Linus in a couple of days. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-16 18:49:57.000000000 +1000 @@ -46,6 +46,9 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? + * - Add RackMac3,1 support (XServe g5) * * History: * @@ -73,6 +76,15 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 16, 2004 : 1.1b3 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling + * - Workaround for some PowerMac7,3 with empty "fan" node in the devtree + * - Fix a signed/unsigned compare issue in some PID loops */ #include @@ -101,7 +113,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b3" #undef DEBUG @@ -121,16 +133,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +427,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +444,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +500,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +633,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -626,7 +750,7 @@ * input target is mpu.ttarget, input max is mpu.tmax */ integ_p = ((s64)state->mpu.pid_gr) * (s64)integral; - DBG(" integ_p: %d\n", (int)(deriv_p >> 36)); + DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sval = (state->mpu.tmax << 16) - ((integ_p >> 20) & 0xffffffff); adj_in_target = (state->mpu.ttarget << 16); if (adj_in_target > sval) @@ -655,15 +779,136 @@ DBG(" sum: %d\n", (int)sum); state->rpm += (s32)sum; - if (state->rpm < state->mpu.rminn_exhaust_fan) + if (state->rpm < (int)state->mpu.rminn_exhaust_fan) state->rpm = state->mpu.rminn_exhaust_fan; - if (state->rpm > state->mpu.rmaxn_exhaust_fan) + if (state->rpm > (int)state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < (int)state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > (int)state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; - if (intake < state->mpu.rminn_intake_fan) + if (intake < (int)state->mpu.rminn_intake_fan) intake = state->mpu.rminn_intake_fan; - if (intake > state->mpu.rmaxn_intake_fan) + if (intake > (int)state->mpu.rmaxn_intake_fan) intake = state->mpu.rmaxn_intake_fan; state->intake_rpm = intake; @@ -677,11 +922,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +941,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1024,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1036,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1050,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1062,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1071,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1085,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1099,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3 at 0,f8000000"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1173,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1240,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1307,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1322,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1392,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1549,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,15 +1672,20 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); np = of_find_node_by_type(NULL, "fcu"); if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u3 at 0,f8000000/i2c at f8001000/fan at 15e"); + if (np == NULL) { + printk(KERN_ERR "Can't find FCU in device-tree !\n"); + return -ENODEV; + } } of_dev = of_platform_device_create(np, "temperature"); if (of_dev == NULL) { diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-16 18:29:29.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + s32 G_d; + s32 G_p; + s32 G_r; + s32 input_target; + s32 output_min; + s32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From benh at kernel.crashing.org Sun Oct 17 11:12:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 11:12:44 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <16753.49793.159513.618588@cargo.ozlabs.ibm.com> Message-ID: <1097975564.14005.66.camel@gaston> On Sun, 2004-10-17 at 10:53, Paul Mackerras wrote: > H. Peter Anvin writes: > > > It's the backside fan that oscillates; backside_fan_pwm varies between > > 30 and 100 in what is pretty much a squarewave. See attached graph (and > > note how the other fans vary with workload.) > > The sharp rises look like the code thinks it gets into an > over-temperature situation and turns the fans on full blast. It could > be worth putting some printks in the overtemp code. It was in practice a problem when i turned the min/max values into variables, I set them unsigned. That caused that code to crap out: state->pwm += (s32)sum; if (state->pwm < backside_params.output_min) state->pwm = backside_params.output_min; if (state->pwm > backside_params.output_max) state->pwm = backside_params.output_max; When "sum" was negative enough to cause state->pwm to drop below 0 Turning backside_params.* to signed fixed this issue (and possibly others as the other factors are also used as signed fixed values into the previous calculations). Ben. From schwab at suse.de Mon Oct 18 00:50:47 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 17 Oct 2004 16:50:47 +0200 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097974684.8965.59.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 17 Oct 2004 10:58:04 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <1097974684.8965.59.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Is it all fans or just the backside fan getting crazy ? This later bug > is fixed by version #4 I'll post in a minute... I think it's all fans, but I'll test your patch just in case. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From hpa at zytor.com Sat Oct 16 15:57:09 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:57:09 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097905399.8963.26.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <1097905399.8963.26.camel@gaston> Message-ID: <4170B835.8050205@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) >> >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > > Strange... The values used seem to be identical to OS X (a 75? target > which is high actually, and a different G_d value than old U3). I would > need to see the debug output and compare with the OS X driver built with > debug output as well (don't ask me to fully understand the math of the > PID algorithm) > If you want the file I used to produce the graph, it has all the entries in /sys/devices/temperature snapshotted at 100 ms intervals (attached) in the following order: [time] backside_fan_pwm backside_temperature cpu0_current cpu0_exhaust_fan_rpm cpu0_intake_fan_rpm cpu0_temperature cpu0_voltage cpu1_current cpu1_exhaust_fan_rpm cpu1_intake_fan_rpm cpu1_temperature cpu1_voltage drives_fan_rpm drives_temperature I've also attached /var/log/dmesg in case that's useful. -hpa -------------- next part -------------- A non-text attachment was scrubbed... Name: dmesg.bz2 Type: application/x-bzip2 Size: 5502 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/7bbba4f9/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: temps.dat.bz2 Type: application/x-bzip2 Size: 25736 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/7bbba4f9/attachment-0001.bin From schwab at suse.de Mon Oct 18 01:37:22 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 17 Oct 2004 17:37:22 +0200 Subject: Fan control for PowerMac7_3 (#4) In-Reply-To: <1097974861.8965.62.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 17 Oct 2004 11:01:03 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <1097974861.8965.62.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > This version fixes a bug with the backside fan doing crazy things, > it appears to work properly on the dual 2.5Ghz now. Unless I get a > negative report, I intend to submit it to Linus in a couple of days. This works fine for me, too. Thanks! Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From olh at suse.de Mon Oct 18 04:55:57 2004 From: olh at suse.de (Olaf Hering) Date: Sun, 17 Oct 2004 20:55:57 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler Message-ID: <20041017185557.GA9619@suse.de> The zImage is a 32bit binary, but a native powerpc64-linux gcc will produce 64bit objects in arch/ppc64/boot. This patch fixes it. Signed-off-by: Olaf Hering diff -purN linux-2.6.9-final/arch/ppc64/boot/Makefile linux-2.6.9-final.native/arch/ppc64/boot/Makefile --- linux-2.6.9-final/arch/ppc64/boot/Makefile 2004-10-16 03:03:50.000000000 +0000 +++ linux-2.6.9-final.native/arch/ppc64/boot/Makefile 2004-10-17 18:44:33.229249956 +0000 @@ -23,14 +23,14 @@ CROSS32_COMPILE ?= #CROSS32_COMPILE = /usr/local/ppc/bin/powerpc-linux- -BOOTCC := $(CROSS32_COMPILE)gcc +BOOTCC := $(CROSS32_COMPILE)gcc -m32 HOSTCC := gcc BOOTCFLAGS := $(HOSTCFLAGS) $(LINUXINCLUDE) -fno-builtin -BOOTAS := $(CROSS32_COMPILE)as +BOOTAS := $(CROSS32_COMPILE)as -a32 BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -BOOTLD := $(CROSS32_COMPILE)ld +BOOTLD := $(CROSS32_COMPILE)ld -m elf32ppc BOOTLFLAGS := -Ttext 0x00400000 -e _start -T $(srctree)/$(src)/zImage.lds -BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy +BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy --target elf32-powerpc OBJCOPYFLAGS := contents,alloc,load,readonly,data src-boot := crt0.S string.S prom.c main.c zlib.c imagesize.c div64.S diff -purN linux-2.6.9-final/arch/ppc64/boot/zImage.lds linux-2.6.9-final.native/arch/ppc64/boot/zImage.lds --- linux-2.6.9-final/arch/ppc64/boot/zImage.lds 2004-10-16 03:01:55.000000000 +0000 +++ linux-2.6.9-final.native/arch/ppc64/boot/zImage.lds 2004-10-17 18:48:14.824288338 +0000 @@ -1,4 +1,4 @@ -OUTPUT_ARCH(powerpc) +OUTPUT_ARCH(powerpc:common) SEARCH_DIR(/lib); SEARCH_DIR(/usr/lib); SEARCH_DIR(/usr/local/lib); SEARCH_DIR(/usr/local/powerpc-any-elf/lib); /* Do we need any of these for elf? __DYNAMIC = 0; */ -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Mon Oct 18 07:46:26 2004 From: paulus at samba.org (Paul Mackerras) Date: Mon, 18 Oct 2004 07:46:26 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041017185557.GA9619@suse.de> References: <20041017185557.GA9619@suse.de> Message-ID: <16754.59442.992185.715900@cargo.ozlabs.ibm.com> Olaf Hering writes: > The zImage is a 32bit binary, but a native powerpc64-linux gcc will > produce 64bit objects in arch/ppc64/boot. > This patch fixes it. ... and breaks the compile on older toolchains that don't understand -m32. We need to make the -m32 conditional on HAS_BIARCH as defined in arch/ppc64/Makefile. Paul. From olh at suse.de Mon Oct 18 14:56:03 2004 From: olh at suse.de (Olaf Hering) Date: Mon, 18 Oct 2004 06:56:03 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <16754.59442.992185.715900@cargo.ozlabs.ibm.com> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> Message-ID: <20041018045603.GA8500@suse.de> On Mon, Oct 18, Paul Mackerras wrote: > Olaf Hering writes: > > > The zImage is a 32bit binary, but a native powerpc64-linux gcc will > > produce 64bit objects in arch/ppc64/boot. > > This patch fixes it. > > ... and breaks the compile on older toolchains that don't understand > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > in arch/ppc64/Makefile. how old? -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Mon Oct 18 15:55:52 2004 From: paulus at samba.org (Paul Mackerras) Date: Mon, 18 Oct 2004 15:55:52 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041018045603.GA8500@suse.de> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> Message-ID: <16755.23272.754150.209624@cargo.ozlabs.ibm.com> Olaf Hering writes: > > ... and breaks the compile on older toolchains that don't understand > > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > > in arch/ppc64/Makefile. > > how old? The gcc that comes with debian sid doesn't understand -m32. That's a 32-bit gcc, which means that I set CROSS_COMPILE when doing a ppc64 kernel compile. With your patch I have to set CROSS32_COMPILE as well, which seems silly when I'm compiling on a ppc32 box already. Ben H suggested making the default BOOTCC be $(CC) -m32, which makes sense to me. Paul. From olh at suse.de Mon Oct 18 17:54:33 2004 From: olh at suse.de (Olaf Hering) Date: Mon, 18 Oct 2004 09:54:33 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <16755.23272.754150.209624@cargo.ozlabs.ibm.com> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> <16755.23272.754150.209624@cargo.ozlabs.ibm.com> Message-ID: <20041018075433.GA24927@suse.de> On Mon, Oct 18, Paul Mackerras wrote: > Olaf Hering writes: > > > > ... and breaks the compile on older toolchains that don't understand > > > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > > > in arch/ppc64/Makefile. > > > > how old? > > The gcc that comes with debian sid doesn't understand -m32. That's a > 32-bit gcc, which means that I set CROSS_COMPILE when doing a ppc64 > kernel compile. With your patch I have to set CROSS32_COMPILE as > well, which seems silly when I'm compiling on a ppc32 box already. Makes sense, I confused a native powerpc64-linux gcc from last century with a native/cross powerpc-linux gcc from last century. > Ben H suggested making the default BOOTCC be $(CC) -m32, which makes > sense to me. That may break cross compile. I will provide a new patch. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From ananth at in.ibm.com Mon Oct 18 19:52:29 2004 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 18 Oct 2004 15:22:29 +0530 Subject: [PATCH] Kprobes for ppc64 Message-ID: <20041018095229.GA7394@in.ibm.com> Hi, Here is kprobes for ppc64. The patch applies on 2.6.9-rc4/2.6.9-final and provides the kprobes + jprobes functionality. My earlier post did not reach the mailing lists, hence this resend. Kprobes (Kernel dynamic probes) is a lightweight mechanism for kernel modules to insert probes into a running kernel, without the need to modify the underlying source. The probe handlers can then be coded to log relevent data at the probe point. More information on kprobes can be found at: http://www-124.ibm.com/developerworks/oss/linux/projects/kprobes/ Jprobes (or jumper probes) is a small infrastructure to access function arguments. It can be used by defining a small stub with the same template as the routine in kernel, within which the required parameters can be logged. The following pseudocode illustrates the usage of a jprobe, where the skbuff at tcp_v4_rcv() needs to be decoded: ............ struct jprobe jp; jtcp_v4_rcv(struct skbuff *skb) { /* decode and log skb related details as required */ jprobe_return(); return 0; } init_module { jp.kp.addr = (kprobe_opcode_t *); jp.entry = JPROBE_ENTRY(jtcp_v4_rcv); register_jprobe(&jp); return 0; } cleanup_module { unregister_jprobe(&jp); } ............ NOTE: 1. The current implementation uses xmon's emulate_step() and hence requires xmon to be compiled in. 2. arch_prepare_kprobe() now returns an int. I have made the necessary changes to i386 and sparc64 kprobes files, but is untested. Thanks, Ananth diff -Naurp temp/linux-2.6.9-rc4/arch/i386/kernel/kprobes.c linux-2.6.9-rc4/arch/i386/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/i386/kernel/kprobes.c 2004-10-11 08:27:50.000000000 +0530 +++ linux-2.6.9-rc4/arch/i386/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -58,9 +58,10 @@ static inline int is_IF_modifier(kprobe_ return 0; } -void arch_prepare_kprobe(struct kprobe *p) +int arch_prepare_kprobe(struct kprobe *p) { memcpy(p->insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + return 0; } static inline void disarm_kprobe(struct kprobe *p, struct pt_regs *regs) diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/Kconfig.debug linux-2.6.9-rc4/arch/ppc64/Kconfig.debug --- temp/linux-2.6.9-rc4/arch/ppc64/Kconfig.debug 2004-10-11 08:28:49.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/Kconfig.debug 2004-10-11 15:30:41.000000000 +0530 @@ -6,6 +6,16 @@ config DEBUG_STACKOVERFLOW bool "Check for stack overflows" depends on DEBUG_KERNEL +config KPROBES + bool "Kprobes" + depends on DEBUG_KERNEL + help + Kprobes allows you to trap at almost any kernel address and + execute a callback function. register_kprobe() establishes + a probepoint and specifies the callback. Kprobes is useful + for kernel debugging, non-intrusive instrumentation and testing. + If in doubt, say "N". + config DEBUG_STACK_USAGE bool "Stack utilization instrumentation" depends on DEBUG_KERNEL diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -0,0 +1,260 @@ +/* + * Kernel Probes (KProbes) + * arch/ppc64/kernel/kprobes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-Oct Created by Vamsi Krishna S Kernel + * Probes initial implementation ( includes contributions from + * Rusty Russell). + * 2004-July Suparna Bhattacharya added jumper probes + * interface to access function arguments. + * 2004-Oct Ananth N Mavinakayanahalli kprobes port + * for PPC64 + */ + +#include +#include +#include +#include +#include +#include + +/* kprobe_status settings */ +#define KPROBE_HIT_ACTIVE 0x00000001 +#define KPROBE_HIT_SS 0x00000002 + +static struct kprobe *current_kprobe; +static unsigned long kprobe_status, kprobe_saved_msr; +static struct pt_regs jprobe_saved_regs; + +/* we re-use xmon's emulate_step here */ +extern int emulate_step(struct pt_regs *regs, unsigned int instr); + +int arch_prepare_kprobe(struct kprobe *p) +{ + memcpy(p->insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + if (IS_MTMSRD(p->insn[0]) || IS_RFID(p->insn[0])) + /* cannot put bp on RFID/MTMSRD */ + return 1; + return 0; +} + +static inline void disarm_kprobe(struct kprobe *p, struct pt_regs *regs) +{ + *p->addr = p->opcode; + regs->nip = (unsigned long)p->addr; +} + +static inline void prepare_singlestep(struct kprobe *p, struct pt_regs *regs) +{ + regs->msr |= MSR_SE; + regs->nip = (unsigned long)&p->insn; +} + +/* + * Interrupts are disabled on entry as trap3 is an interrupt gate and they + * remain disabled thorough out this function. + */ +static inline int kprobe_handler(struct pt_regs *regs) +{ + struct kprobe *p; + int ret = 0; + unsigned int *addr = (unsigned int *)regs->nip; + + /* We're in an interrupt, but this is clear and BUG()-safe. */ + preempt_disable(); + + /* Check we're not actually recursing */ + if (kprobe_running()) { + /* We *are* holding lock here, so this is safe. + Disarm the probe we just hit, and ignore it. */ + p = get_kprobe(addr); + if (p) { + disarm_kprobe(p, regs); + ret = 1; + } else { + p = current_kprobe; + if (p->break_handler && p->break_handler(p, regs)) { + goto ss_probe; + } + } + /* If it's not ours, can't be delete race, (we hold lock). */ + goto no_kprobe; + } + + lock_kprobes(); + p = get_kprobe(addr); + if (!p) { + unlock_kprobes(); + if (*addr != BREAKPOINT_INSTRUCTION) { + /* + * The breakpoint instruction was removed right + * after we hit it. Another cpu has removed + * either a probepoint or a debugger breakpoint + * at this address. In either case, no further + * handling of this interrupt is appropriate. + */ + ret = 1; + } + /* Not one of ours: let kernel handle it */ + goto no_kprobe; + } + + kprobe_status = KPROBE_HIT_ACTIVE; + current_kprobe = p; + kprobe_saved_msr = regs->msr; + if (p->pre_handler(p, regs)) { + /* handler has already set things up, so skip ss setup */ + return 1; + } + +ss_probe: + prepare_singlestep(p, regs); + kprobe_status = KPROBE_HIT_SS; + return 1; + +no_kprobe: + preempt_enable_no_resched(); + return ret; +} + +/* + * Called after single-stepping. p->addr is the address of the + * instruction whose first byte has been replaced by the "breakpoint" + * instruction. To avoid the SMP problems that can occur when we + * temporarily put back the original opcode to single-step, we + * single-stepped a copy of the instruction. The address of this + * copy is p->insn. + */ +static void resume_execution(struct kprobe *p, struct pt_regs *regs) +{ + int ret; + + regs->nip = (unsigned long)p->addr; + ret = emulate_step(regs, p->insn[0]); + if (ret == 0) + regs->nip = (unsigned long)p->addr + 4; + + regs->msr &= ~MSR_SE; +} + +static inline int post_kprobe_handler(struct pt_regs *regs) +{ + if (!kprobe_running()) + return 0; + + if (current_kprobe->post_handler) + current_kprobe->post_handler(current_kprobe, regs, 0); + + resume_execution(current_kprobe, regs); + regs->msr |= kprobe_saved_msr; + + unlock_kprobes(); + preempt_enable_no_resched(); + + /* + * if somebody else is singlestepping across a probe point, msr + * will have SE set, in which case, continue the remaining processing + * of do_debug, as if this is not a probe hit. + */ + if (regs->msr & MSR_SE) + return 0; + + return 1; +} + +/* Interrupts disabled, kprobe_lock held. */ +static inline int kprobe_fault_handler(struct pt_regs *regs, int trapnr) +{ + if (current_kprobe->fault_handler + && current_kprobe->fault_handler(current_kprobe, regs, trapnr)) + return 1; + + if (kprobe_status & KPROBE_HIT_SS) { + resume_execution(current_kprobe, regs); + regs->msr |= kprobe_saved_msr; + + unlock_kprobes(); + preempt_enable_no_resched(); + } + return 0; +} + +/* + * Wrapper routine to for handling exceptions. + */ +int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, + void *data) +{ + struct die_args *args = (struct die_args *)data; + switch (val) { + case DIE_IABR_MATCH: + case DIE_DABR_MATCH: + case DIE_BPT: + if (kprobe_handler(args->regs)) + return NOTIFY_STOP; + break; + case DIE_SSTEP: + if (post_kprobe_handler(args->regs)) + return NOTIFY_STOP; + break; + case DIE_GPF: + case DIE_PAGE_FAULT: + if (kprobe_running() && + kprobe_fault_handler(args->regs, args->trapnr)) + return NOTIFY_STOP; + break; + default: + break; + } + return NOTIFY_DONE; +} + +int setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) +{ + struct jprobe *jp = container_of(p, struct jprobe, kp); + + memcpy(&jprobe_saved_regs, regs, sizeof(struct pt_regs)); + + /* setup return addr to the jprobe handler routine */ + regs->nip = (unsigned long)(((func_descr_t *)jp->entry)->entry); + regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc); + + return 1; +} + +void jprobe_return(void) +{ + preempt_enable_no_resched(); + asm volatile("trap" ::: "memory"); +} + +void jprobe_return_end(void) +{ +}; + +int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs) +{ + /* + * FIXME - we should ideally be validating that we got here 'cos + * of the "trap" in jprobe_return() above, before restoring the + * saved regs... + */ + memcpy(regs, &jprobe_saved_regs, sizeof(struct pt_regs)); + return 1; +} diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/Makefile linux-2.6.9-rc4/arch/ppc64/kernel/Makefile --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/Makefile 2004-10-11 08:28:50.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/Makefile 2004-10-11 15:30:41.000000000 +0530 @@ -56,5 +56,6 @@ obj-$(CONFIG_PPC_PMAC) += pmac_smp.o sm endif obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o +obj-$(CONFIG_KPROBES) += kprobes.o CFLAGS_ioctl32.o += -Ifs/ diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/traps.c linux-2.6.9-rc4/arch/ppc64/kernel/traps.c --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/traps.c 2004-10-11 08:27:59.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/traps.c 2004-10-11 15:30:41.000000000 +0530 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -61,6 +62,20 @@ EXPORT_SYMBOL(__debugger_dabr_match); EXPORT_SYMBOL(__debugger_fault_handler); #endif +struct notifier_block *ppc64_die_chain; +static spinlock_t die_notifier_lock = SPIN_LOCK_UNLOCKED; + +int register_die_notifier(struct notifier_block *nb) +{ + int err = 0; + unsigned long flags; + + spin_lock_irqsave(&die_notifier_lock, flags); + err = notifier_chain_register(&ppc64_die_chain, nb); + spin_unlock_irqrestore(&die_notifier_lock, flags); + return err; +} + /* * Trap & Exception support */ @@ -287,6 +302,9 @@ UnknownException(struct pt_regs *regs) void InstructionBreakpointException(struct pt_regs *regs) { + if (notify_die(DIE_BPT, "iabr_match", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_iabr_match(regs)) return; _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); @@ -297,6 +315,9 @@ SingleStepException(struct pt_regs *regs { regs->msr &= ~MSR_SE; /* Turn off 'trace' bit */ + if (notify_die(DIE_SSTEP, "single_step", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_sstep(regs)) return; @@ -470,6 +491,9 @@ ProgramCheckException(struct pt_regs *re } else if (regs->msr & 0x20000) { /* trap exception */ + if (notify_die(DIE_BPT, "breakpoint", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_bpt(regs)) return; diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/mm/fault.c linux-2.6.9-rc4/arch/ppc64/mm/fault.c --- temp/linux-2.6.9-rc4/arch/ppc64/mm/fault.c 2004-10-11 08:28:24.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/mm/fault.c 2004-10-11 15:30:41.000000000 +0530 @@ -36,6 +36,7 @@ #include #include #include +#include /* * Check whether the instruction at regs->nip is a store using @@ -96,6 +97,9 @@ int do_page_fault(struct pt_regs *regs, BUG_ON((trap == 0x380) || (trap == 0x480)); if (trap == 0x300) { + if (notify_die(DIE_PAGE_FAULT, "page_fault", regs, error_code, + 11, SIGSEGV) == NOTIFY_STOP) + return 0; if (debugger_fault_handler(regs)) return 0; } @@ -105,6 +109,9 @@ int do_page_fault(struct pt_regs *regs, return SIGSEGV; if (error_code & 0x00400000) { + if (notify_die(DIE_BPT, "dabr_match", regs, error_code, + 11, SIGSEGV) == NOTIFY_STOP) + return 0; if (debugger_dabr_match(regs)) return 0; } diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c --- temp/linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c 2004-10-11 08:28:48.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c 2004-10-11 15:30:41.000000000 +0530 @@ -132,7 +132,7 @@ static void csum(void); static void bootcmds(void); void dump_segments(void); static void symbol_lookup(void); -static int emulate_step(struct pt_regs *regs, unsigned int instr); +int emulate_step(struct pt_regs *regs, unsigned int instr); static void xmon_print_symbol(unsigned long address, const char *mid, const char *after); static const char *getvecname(unsigned long vec); @@ -781,7 +781,7 @@ static int branch_taken(unsigned int ins * or -1 if the instruction is one that should not be stepped, * such as an rfid, or a mtmsrd that would clear MSR_RI. */ -static int emulate_step(struct pt_regs *regs, unsigned int instr) +int emulate_step(struct pt_regs *regs, unsigned int instr) { unsigned int opcode, rd; unsigned long int imm; diff -Naurp temp/linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c 2004-10-11 08:28:49.000000000 +0530 +++ linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -38,10 +38,11 @@ * - Mark that we are no longer actively in a kprobe. */ -void arch_prepare_kprobe(struct kprobe *p) +int arch_prepare_kprobe(struct kprobe *p) { p->insn[0] = *p->addr; p->insn[1] = BREAKPOINT_INSTRUCTION_2; + return 0; } /* kprobe_status settings */ diff -Naurp temp/linux-2.6.9-rc4/include/asm-i386/kprobes.h linux-2.6.9-rc4/include/asm-i386/kprobes.h --- temp/linux-2.6.9-rc4/include/asm-i386/kprobes.h 2004-10-11 08:28:07.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-i386/kprobes.h 2004-10-11 19:28:07.000000000 +0530 @@ -38,6 +38,8 @@ typedef u8 kprobe_opcode_t; ? (MAX_STACK_SIZE) \ : (((unsigned long)current_thread_info()) + THREAD_SIZE - (ADDR))) +#define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)pentry + /* trap3/1 are intr gates for kprobes. So, restore the status of IF, * if necessary, before executing the original int3/1 (trap) handler. */ diff -Naurp temp/linux-2.6.9-rc4/include/asm-ppc64/kdebug.h linux-2.6.9-rc4/include/asm-ppc64/kdebug.h --- temp/linux-2.6.9-rc4/include/asm-ppc64/kdebug.h 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-ppc64/kdebug.h 2004-10-11 15:30:41.000000000 +0530 @@ -0,0 +1,43 @@ +#ifndef _PPC64_KDEBUG_H +#define _PPC64_KDEBUG_H 1 + +/* nearly identical to x86_64/i386 code */ + +#include + +struct pt_regs; + +struct die_args { + struct pt_regs *regs; + const char *str; + long err; + int trapnr; + int signr; +}; + +/* + Note - you should never unregister because that can race with NMIs. + If you really want to do it first unregister - then synchronize_kernel - + then free. + */ +int register_die_notifier(struct notifier_block *nb); +extern struct notifier_block *ppc64_die_chain; + +/* Grossly misnamed. */ +enum die_val { + DIE_OOPS = 1, + DIE_IABR_MATCH, + DIE_DABR_MATCH, + DIE_BPT, + DIE_SSTEP, + DIE_GPF, + DIE_PAGE_FAULT, +}; + +static inline int notify_die(enum die_val val,char *str,struct pt_regs *regs,long err,int trap, int sig) +{ + struct die_args args = { .regs=regs, .str=str, .err=err, .trapnr=trap,.signr=sig }; + return notifier_call_chain(&ppc64_die_chain, val, &args); +} + +#endif diff -Naurp temp/linux-2.6.9-rc4/include/asm-ppc64/kprobes.h linux-2.6.9-rc4/include/asm-ppc64/kprobes.h --- temp/linux-2.6.9-rc4/include/asm-ppc64/kprobes.h 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-ppc64/kprobes.h 2004-10-12 22:57:04.000000000 +0530 @@ -0,0 +1,53 @@ +#ifndef _ASM_KPROBES_H +#define _ASM_KPROBES_H +/* + * Kernel Probes (KProbes) + * include/asm-ppc64/kprobes.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-Oct Created by Vamsi Krishna S Kernel + * Probes initial implementation ( includes suggestions from + * Rusty Russell). + * 2004-Oct Modified for PPC64 by Ananth N Mavinakayanahalli + * + */ +#include +#include + +struct pt_regs; + +typedef unsigned int kprobe_opcode_t; +#define BREAKPOINT_INSTRUCTION 0x7fe00008 /* trap */ +#define MAX_INSN_SIZE 1 + +#define IS_MTMSRD(instr) (((instr) & 0xfc0007fe) == 0x7c000164) +#define IS_RFID(instr) (((instr) & 0xfc0007fe) == 0x4c000024) + +#define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)((func_descr_t *)pentry) + +#ifdef CONFIG_KPROBES +extern int kprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data); +#else /* !CONFIG_KPROBES */ +static inline int kprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data) +{ + return 0; +} +#endif +#endif /* _ASM_KPROBES_H */ diff -Naurp temp/linux-2.6.9-rc4/include/linux/kprobes.h linux-2.6.9-rc4/include/linux/kprobes.h --- temp/linux-2.6.9-rc4/include/linux/kprobes.h 2004-10-11 08:27:16.000000000 +0530 +++ linux-2.6.9-rc4/include/linux/kprobes.h 2004-10-11 15:30:41.000000000 +0530 @@ -94,7 +94,7 @@ static inline int kprobe_running(void) return kprobe_cpu == smp_processor_id(); } -extern void arch_prepare_kprobe(struct kprobe *p); +extern int arch_prepare_kprobe(struct kprobe *p); extern void show_registers(struct pt_regs *regs); /* Get the kprobe at this addr (if any). Must have called lock_kprobes */ diff -Naurp temp/linux-2.6.9-rc4/kernel/kprobes.c linux-2.6.9-rc4/kernel/kprobes.c --- temp/linux-2.6.9-rc4/kernel/kprobes.c 2004-10-11 08:29:12.000000000 +0530 +++ linux-2.6.9-rc4/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -27,6 +27,8 @@ * interface to access function arguments. * 2004-Sep Prasanna S Panchamukhi Changed Kprobes * exceptions notifier to be first on the priority list. + * 2004-Oct Ananth N Mavinakayanahalli + * arch_prepare_kprobe now returns an int. */ #include #include @@ -87,12 +89,17 @@ int register_kprobe(struct kprobe *p) hlist_add_head(&p->hlist, &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]); - arch_prepare_kprobe(p); + ret = arch_prepare_kprobe(p); + if (ret) { + unregister_kprobe(p); + ret = -EINVAL; + goto out; + } p->opcode = *p->addr; *p->addr = BREAKPOINT_INSTRUCTION; flush_icache_range((unsigned long) p->addr, (unsigned long) p->addr + sizeof(kprobe_opcode_t)); - out: +out: spin_unlock_irqrestore(&kprobe_lock, flags); return ret; } From segher at kernel.crashing.org Mon Oct 18 19:55:26 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 18 Oct 2004 11:55:26 +0200 Subject: [vHype-discussion] u64 in linux In-Reply-To: <200410152059.03647.arnd@arndb.de> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> Message-ID: > C99 also mandates that the macro PRIu64 contains the correct > format string for uint64_t (which afaik is always the same as u64). > It's currently not defined in linux, but could perhaps be added. Works fine for me: #include char x[] = PRIx64; char u[] = PRIu64; resulting in .globl u .section ".data" .align 3 .type u, @object .size u, 3 u: .string "lu" .globl x .align 3 .type x, @object .size x, 3 x: .string "lx" (this is on a PPC64 system, GCC 3.4.1). Segher From benh at kernel.crashing.org Mon Oct 18 20:58:27 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 18 Oct 2004 20:58:27 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041018075433.GA24927@suse.de> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> <16755.23272.754150.209624@cargo.ozlabs.ibm.com> <20041018075433.GA24927@suse.de> Message-ID: <1098097106.30570.6.camel@gaston> On Mon, 2004-10-18 at 17:54, Olaf Hering wrote: > > > Ben H suggested making the default BOOTCC be $(CC) -m32, which makes > > sense to me. How so ? The idea is to add -m32 to whatever compiler you are using for the rest of the kernel (assuming bi-arch) which is a lot more sane than using whatever _local_ compiler you are using _and_ assuming bi-arch... Of course, that would only be the "defaul", with the ability of explicitly passing CROSS32_COMPILE to make... Ben. From nathanl at austin.ibm.com Tue Oct 19 00:46:44 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 18 Oct 2004 09:46:44 -0500 Subject: [RFC] maxcpus boot option leads to dropped interrupts Message-ID: <1098110804.3165.63.camel@biclops> Hi- Our test group has discovered that booting a 2.6 kernel on a SMP pSeries LPAR with maxcpus=1 will either hang or take a very long time to boot, with lots of dropped interrupt messages or scsi timeouts, e.g. Probing IDE interface ide2... hde: IBM DROM00205, ATAPI CD/DVD-ROM drive Using cfq io scheduler ide2 at 0xfe400-0xfe407,0xfdc02 on irq 166 Probing IDE interface ide3... Probing IDE interface ide3... hde: ATAPI 24X DVD-ROM drive, 256kB Cache Uniform CD-ROM driver Revision: 3.20 ide-cd: cmd 0x25 timed out hde: lost interrupt hde: lost interrupt The problem goes away if CONFIG_IRQ_ALL_CPUS is not set. I am about 85% sure that this is due to the OF "start-cpu" method placing the primary threads of secondary cpus in the global interrupt queue (see the comment in arch/ppc64/kernel/smp.c::start_secondary). With the maxcpus parameter, we never "boot" those cpus; they simply sit in their spin loops waiting to be kicked. However, from the platform's point of view they are fair game to service device interrupts. The RTAS "start-cpu" method apparently does not behave the same way -- I can boot a single CPU (with SMT) Power5 LPAR with maxcpus=1 and interrupts are not lost, even though the secondary thread on the boot cpu has been started by RTAS. So this problem is limited to systems which have more than one cpu device node. I've worked around the problem by modifying the xics code to use the default interrupt server (the boot cpu) if cpu_online_map != cpu_present_map. However that's a nasty hack which will keep interrupts from being distributed in the smt-enabled=off case. I'm not sure whether this happens on non-xics machines. I'm looking for ideas on how to handle this. Some options that occur to me are: o Not booting secondary cpus from the OF client code (but the PPC-OF binding document says we can't do this). I believe I've tried this before, and RTAS was unable to start the secondary cpus later. So this is probably not the way to go. o In smp_cpus_done(), "shoot down" any cpus which have not been kicked out of their spin loops. I've got a very rough version of this working. However, this method assumes that the RTAS "stop-cpu" interface is available, which is a given on LPAR, but I'm not sure it's a safe bet on other systems. o Directing interrupts to the boot cpu instead of using the GIQ when the maxcpus option is detected. This might be the easiest alternative; however this could have a performance impact. Any other ideas? Keep in mind that I would like to get the code to a state which will allow us to hotplug-online cpus which were not started at boot. Nathan From olof at austin.ibm.com Tue Oct 19 05:40:27 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 18 Oct 2004 14:40:27 -0500 Subject: [PATCH] [PPC64] Fix CPU numa init code thinkos Message-ID: <20041018194027.GA11753@4> There seems to have been a couple of thinkos in the NUMA init code, in particular in find_cpu_node(): * Property size returned is in bytes, not words * Off-by-one error in loop iteration Signed-off-by: Nathan Lynch Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/mm/numa.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletion(-) diff -puN arch/ppc64/mm/numa.c~find-cpu-node arch/ppc64/mm/numa.c --- linux-2.5/arch/ppc64/mm/numa.c~find-cpu-node 2004-10-18 14:21:55.603312384 -0500 +++ linux-2.5-olof/arch/ppc64/mm/numa.c 2004-10-18 14:22:19.271552232 -0500 @@ -75,9 +75,11 @@ static struct device_node * __init find_ interrupt_server = (unsigned int *)get_property(cpu_node, "ibm,ppc-interrupt-server#s", &len); + len = len / sizeof(u32); + if (interrupt_server && (len > 0)) { while (len--) { - if (interrupt_server[len-1] == hw_cpuid) + if (interrupt_server[len] == hw_cpuid) return cpu_node; } } else { _ From benh at kernel.crashing.org Tue Oct 19 09:28:42 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 09:28:42 +1000 Subject: ppc64 breakage. In-Reply-To: <20041018222658.GA31577@redhat.com> References: <20041018222658.GA31577@redhat.com> Message-ID: <1098142122.18687.35.camel@gaston> On Tue, 2004-10-19 at 08:26, Dave Jones wrote: > hey guys, > > During a build for an iseries kernel, it blew up with .. > > arch/ppc64/kernel/built-in.o(.text+0x1cd5c): In function `ioport_map': > arch/ppc64/kernel/iomap.c:84: undefined reference to `._IO_IS_VALID' > make: *** [.tmp_vmlinux1] Error 1 > > Ideas ? > > The '.' looks odd. Toolchain bug ? No, it's my fault and I hate iSeries ! You can't do anything in this arch without breaking it :( Why do these systematically pop up just when linus released the new kernel ? Grrrrrr Anton/Paul, do iSeries has any kind of PIO on PCI at all ? Should I do #define _IO_IS_VALID(port) (0) or #define _IO_IS_VALID(port) (1) For iSeries ? Ben. From benh at kernel.crashing.org Tue Oct 19 09:33:56 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 09:33:56 +1000 Subject: ppc64 build failure. In-Reply-To: <20041018230529.GB31577@redhat.com> References: <20041018230529.GB31577@redhat.com> Message-ID: <1098142435.18679.38.camel@gaston> On Tue, 2004-10-19 at 09:05, Dave Jones wrote: > Ignore previous mail, this should fix it. > > .../... What about this one instead ? io_page_mask is set to 0 by default, so iSeries would automatically get _IO_IS_VALID(*) == 0 if it doesn't initialize it... ===== include/asm-ppc64/eeh.h 1.20 vs edited ===== --- 1.20/include/asm-ppc64/eeh.h 2004-10-06 16:05:23 +10:00 +++ edited/include/asm-ppc64/eeh.h 2004-10-19 09:31:54 +10:00 @@ -256,10 +256,6 @@ #undef EEH_CHECK_ALIGN -#define MAX_ISA_PORT 0x10000 -extern unsigned long io_page_mask; -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) & io_page_mask) - static inline u8 eeh_inb(unsigned long port) { u8 val; if (!_IO_IS_VALID(port)) ===== include/asm-ppc64/io.h 1.22 vs edited ===== --- 1.22/include/asm-ppc64/io.h 2004-09-21 19:14:10 +10:00 +++ edited/include/asm-ppc64/io.h 2004-10-19 09:32:20 +10:00 @@ -33,6 +33,12 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; +extern unsigned long io_page_mask; + +#define MAX_ISA_PORT 0x10000 + +#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ + & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ From benh at kernel.crashing.org Tue Oct 19 11:03:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 11:03:29 +1000 Subject: ppc64 build failure. In-Reply-To: <1098142435.18679.38.camel@gaston> References: <20041018230529.GB31577@redhat.com> <1098142435.18679.38.camel@gaston> Message-ID: <1098147809.11402.0.camel@gaston> On Tue, 2004-10-19 at 09:33, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-19 at 09:05, Dave Jones wrote: > > Ignore previous mail, this should fix it. > > > > .../... > > What about this one instead ? io_page_mask is set to 0 by default, > so iSeries would automatically get _IO_IS_VALID(*) == 0 if it doesn't > initialize it... OK, since nobody seem to really know what IO cycles are on iSeries, let's allow them rather than mask them, thus falling back to the former behaviour... ===== include/asm-ppc64/eeh.h 1.20 vs edited ===== --- 1.20/include/asm-ppc64/eeh.h 2004-10-06 16:05:23 +10:00 +++ edited/include/asm-ppc64/eeh.h 2004-10-19 09:31:54 +10:00 @@ -256,10 +256,6 @@ #undef EEH_CHECK_ALIGN -#define MAX_ISA_PORT 0x10000 -extern unsigned long io_page_mask; -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) & io_page_mask) - static inline u8 eeh_inb(unsigned long port) { u8 val; if (!_IO_IS_VALID(port)) ===== include/asm-ppc64/io.h 1.22 vs edited ===== --- 1.22/include/asm-ppc64/io.h 2004-09-21 19:14:10 +10:00 +++ edited/include/asm-ppc64/io.h 2004-10-19 09:32:20 +10:00 @@ -33,6 +33,12 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; +extern unsigned long io_page_mask; + +#define MAX_ISA_PORT 0x10000 + +#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ + & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ ===== arch/ppc64/kernel/iSeries_pci.c 1.24 vs edited ===== --- 1.24/arch/ppc64/kernel/iSeries_pci.c 2004-09-11 15:50:12 +10:00 +++ edited/arch/ppc64/kernel/iSeries_pci.c 2004-10-19 11:02:20 +10:00 @@ -55,6 +55,7 @@ extern unsigned long iSeries_Base_Io_Memory; extern struct iommu_table *tceTables[256]; +extern unsigned long io_page_mask; extern void iSeries_MmIoTest(void); @@ -196,6 +197,7 @@ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iSeries_IoMmTable_Initialize(); find_and_init_phbs(); + io_page_mask = -1; /* pci_assign_all_busses = 0; SFRXXX*/ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } From benh at kernel.crashing.org Tue Oct 19 18:28:20 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 18:28:20 +1000 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <200410190714.i9J7Elnx027734@hera.kernel.org> References: <200410190714.i9J7Elnx027734@hera.kernel.org> Message-ID: <1098174500.11449.65.camel@gaston> Hi ! That patch will unfortunately break a load of ppc64 boxes. If you look closely at the ppc64 code, you'll notice we don't use the irq_desc array directly but go through a get_irq_desc() accessor. This is because our interrupt numbers can be very large and scattered, and thus we have a remapping tree. I still like the idea of the patch, so it would be useful if you added the possibility for us to just change that behaviour, that is replace all occursences of irq_descs + i with get_irq_desc() and provide a generic one that just does that, with a #ifndef so that the architecture can provide it's own. If you agree with the principle, though, I suppose I can do it and send a proposed patch tomorrow. Ben. From hch at infradead.org Tue Oct 19 18:41:32 2004 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 19 Oct 2004 09:41:32 +0100 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <1098174500.11449.65.camel@gaston> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> Message-ID: <20041019084131.GA7100@infradead.org> On Tue, Oct 19, 2004 at 06:28:20PM +1000, Benjamin Herrenschmidt wrote: > Hi ! > > That patch will unfortunately break a load of ppc64 boxes. > > If you look closely at the ppc64 code, you'll notice we don't > use the irq_desc array directly but go through a get_irq_desc() > accessor. This is because our interrupt numbers can be very > large and scattered, and thus we have a remapping tree. > > I still like the idea of the patch, so it would be useful if > you added the possibility for us to just change that behaviour, > that is replace all occursences of irq_descs + i with get_irq_desc() > and provide a generic one that just does that, with a #ifndef so > that the architecture can provide it's own. > > If you agree with the principle, though, I suppose I can do it > and send a proposed patch tomorrow. The PPC64 changes were actually my fault. I think get_irq_desc() is okay. From mingo at elte.hu Tue Oct 19 19:15:57 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 19 Oct 2004 11:15:57 +0200 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <1098174500.11449.65.camel@gaston> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> Message-ID: <20041019091557.GA17473@elte.hu> * Benjamin Herrenschmidt wrote: > I still like the idea of the patch, so it would be useful if you added > the possibility for us to just change that behaviour, that is replace > all occursences of irq_descs + i with get_irq_desc() and provide a > generic one that just does that, with a #ifndef so that the > architecture can provide it's own. sure, we could do that. But since there are other architectures with large irq-vector spaces too, you might want to try to move it into the generic IRQ code and just provide a way to switch between 1:1 mapped and sparse-mapped variants. (of course this still means all of the direct indexing in kernel/irq/*.c would have to change.) Ingo From sfr at canb.auug.org.au Wed Oct 20 03:05:30 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 03:05:30 +1000 Subject: test - please ignore Message-ID: <20041020030530.582725f7.sfr@canb.auug.org.au> Just a test after updating the archives. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/4e41f74f/attachment.pgp From sfr at canb.auug.org.au Wed Oct 20 03:16:59 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 03:16:59 +1000 Subject: old mailing list archives Message-ID: <20041020031659.220bdfeb.sfr@canb.auug.org.au> Hi all, Thanks to Wolfgang Denk I have recovered (some of) the old list archives. Please see http://ozlabs.org/pipermail/linuxppc64-dev/ I don't know how complete the archive is ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: 00000000.mimetmp Type: application/pgp-signature Size: 190 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b832635f/attachment.pgp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b832635f/attachment-0001.pgp From jschopp at austin.ibm.com Wed Oct 20 02:52:20 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Tue, 19 Oct 2004 11:52:20 -0500 Subject: status of ppc64 patches Message-ID: <41754644.1010003@austin.ibm.com> 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. Thus, I expect the floodgates into mainline to open soon. I would hope that my patches would be sent on by the architecture maintainers at that time. I am concerned that we may be falling behind on reviewing patches in general and if we don't catch up several very deserving patches may miss this next window of opportunity. The backlog of "New" patches is over a month long now. http://ozlabs.org/ppc64-patches/ Either this page is out of date or we have a very serious bottleneck problem. I'm hoping it is the former, but guessing it is the latter. I think we should consider bringing another architecture maintainer on board to help spread out the load of reviewing and approving architecture patches. Somebody like Olof. Barring that I would like to volunteer some of my own cycles to review some of the current backlog, prioritize them, make sure they still compile/boot, and rebase them. From olof at austin.ibm.com Wed Oct 20 06:42:14 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 19 Oct 2004 15:42:14 -0500 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <41757C26.2030909@austin.ibm.com> Joel Schopp wrote: > 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. 2.6.9 was released yesterday. :) -Olof From sonny at burdell.org Wed Oct 20 09:00:54 2004 From: sonny at burdell.org (Sonny Rao) Date: Tue, 19 Oct 2004 19:00:54 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <1097887510.6487.23.camel@gaston> References: <1097887510.6487.23.camel@gaston> Message-ID: <20041019230054.GA3807@kevlar.burdell.org> On Sat, Oct 16, 2004 at 10:45:10AM +1000, Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 07:00, Santhosh Rao wrote: > > Ok, it appears we aren't dropping into the open firmware debugger > > randomly, the kernel seems to give up early in the boot process > > Below is the output of an attempted boot of 2.6.9-rc4. > > > > Jose, ever seen anything like this? > > > > The machine is a p615 power-4 2-CPU box with 2GB of RAM. > > Can you enable PROM_DEBUG in arch/ppc64/kernel/prom_init.c and send me the > output log ? > > Ben. > Ben, I'm still seeing this issue with 2.6.9 final, do you need anything else? I'm sure you're very busy, but please let me know if I can help. Sonny Rao From benh at kernel.crashing.org Wed Oct 20 09:38:52 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 20 Oct 2004 09:38:52 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <20041019230054.GA3807@kevlar.burdell.org> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> Message-ID: <1098229131.5792.9.camel@gaston> On Wed, 2004-10-20 at 09:00, Sonny Rao wrote: > Ben, I'm still seeing this issue with 2.6.9 final, do you need > anything else? I'm sure you're very busy, but please let me know if I > can help. Well, I can't reproduce here, but it seem basically that one of the calls to alloc_down() is failing, you may want to trace a bit. I'll try to find by myself too & let you know. Ben. From nathanl at austin.ibm.com Wed Oct 20 10:22:29 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 19 Oct 2004 19:22:29 -0500 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <1098231748.7493.114.camel@pants.austin.ibm.com> On Tue, 2004-10-19 at 11:52, Joel Schopp wrote: > I am concerned that we may be falling behind on reviewing patches in > general and if we don't catch up several very deserving patches may miss > this next window of opportunity. The backlog of "New" patches is over a > month long now. http://ozlabs.org/ppc64-patches/ > Either this page is out of date or we have a very serious bottleneck > problem. I'm hoping it is the former, but guessing it is the latter. It looks to me like the backlog is a bit smaller than a first glance at the page would suggest. It is somewhat out of date in that several of the patches that are marked "new" have already been picked up by Linus or akpm. I think quite a few of the items in the list do not correspond to patches that are intended for submission upstream (e.g. there are several revisions of "Fan control for PowerMac7_3"). > I think we should consider bringing another architecture maintainer on > board to help spread out the load of reviewing and approving > architecture patches. Somebody like Olof. Barring that I would like to The fact that a web page is slightly out of date and some minor non-bugfix patches were not forwarded upstream during the late 2.6.9-rc series fails to convince me that such a change is needed. If you feel a patch has been overlooked, it's usually just a matter of gently nudging one of the maintainers via email or IRC; it Works For Me (tm) ;) Nathan From olof at austin.ibm.com Wed Oct 20 11:03:01 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 19 Oct 2004 20:03:01 -0500 Subject: status of ppc64 patches In-Reply-To: <1098231748.7493.114.camel@pants.austin.ibm.com> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> Message-ID: <20041020010301.GA29579@4> On Tue, Oct 19, 2004 at 07:22:29PM -0500, Nathan Lynch wrote: > > I think we should consider bringing another architecture maintainer on > > board to help spread out the load of reviewing and approving > > architecture patches. Somebody like Olof. Barring that I would like to > > The fact that a web page is slightly out of date and some minor > non-bugfix patches were not forwarded upstream during the late 2.6.9-rc > series fails to convince me that such a change is needed. Agreed. The page is there for the maintainers to track their work, not for us to track them. :-) I hope that each person tracks their own work and follows up as needed. And even if, in the future, current maintainers need help looking at patches, there's no need to promote someone (myself or others) to a "full" maintainer just to pitch in and help out. Anyone has the opportunity to look at a patch and ask questions about it or say that they agree or disagree with it. This happens every day on LKML and other lists, there's no reason we should work differently on our architecture list. Also: Regarding re-basing patches: It has to be the duty of the developer of the patch to re-base it to current trees if it will no longer apply cleanly. I wouldn't expect Anton or Paul to forward-port my patches, just as little as I would expect Andrew Morton or Linus to do so. -Olof From benh at kernel.crashing.org Wed Oct 20 11:24:59 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 20 Oct 2004 11:24:59 +1000 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <20041019091557.GA17473@elte.hu> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> <20041019091557.GA17473@elte.hu> Message-ID: <1098235499.22943.16.camel@gaston> On Tue, 2004-10-19 at 19:15, Ingo Molnar wrote: > * Benjamin Herrenschmidt wrote: > > > I still like the idea of the patch, so it would be useful if you added > > the possibility for us to just change that behaviour, that is replace > > all occursences of irq_descs + i with get_irq_desc() and provide a > > generic one that just does that, with a #ifndef so that the > > architecture can provide it's own. > > sure, we could do that. But since there are other architectures with > large irq-vector spaces too, you might want to try to move it into the > generic IRQ code and just provide a way to switch between 1:1 mapped and > sparse-mapped variants. False alert ! In fact, Paulus rewrote that stuff a while ago and I totally forgot about it. We no longer do that, our get_irq_desc() is nowadays just doing (&irq_desc[(irq)]). We map the large physical interrupt numbers to "virtual" numbers that are the only thing the generic code sees, so it's fine. Ben. From sfr at canb.auug.org.au Wed Oct 20 15:47:30 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 15:47:30 +1000 Subject: [PATCH] PPC64 iSeries compile broken in 2.6.9-bk3 Message-ID: <20041020154730.39ea3509.sfr@canb.auug.org.au> Hi Andrew, One of the iSeries specific files used HZ without including linux/param.h and previously got away with it. Signed-off-by: Stephen Rothwell Please apply and send to Linus. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.6.9-bk3/arch/ppc64/kernel/iSeries_proc.c 2.6.9-bk3-sfr.1/arch/ppc64/kernel/iSeries_proc.c --- 2.6.9-bk3/arch/ppc64/kernel/iSeries_proc.c 2004-08-19 17:01:59.000000000 +1000 +++ 2.6.9-bk3-sfr.1/arch/ppc64/kernel/iSeries_proc.c 2004-10-20 15:21:23.000000000 +1000 @@ -20,6 +20,7 @@ #include #include #include +#include /* for HZ */ #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/28f59265/attachment.pgp From greg.quinn at anu.edu.au Wed Oct 20 16:09:55 2004 From: greg.quinn at anu.edu.au (Greg Quinn) Date: Wed, 20 Oct 2004 16:09:55 +1000 Subject: 64 bit compilation and linking - help Message-ID: <41760133.6020801@anu.edu.au> Sorry to intrude on your mailing list. Here at bios.org (Canbia) we've just acquired two new p615 machines courtesy of a generous IBM donation, and we want to put them to work ASAP. We've installed a Suse 9 Enterprise Server distribution. I'm trying to compile a C application in 64-bit mode, but can't get the compilation to succeed. For example ... cc -o m m.c prodices a 32 bit executable, ie pointers are 4 bytes. But ... cc -o m -m64 m.c dies with a bunch of messages like > /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/../../../../powerpc-suse-linux/bin/ld: > skipping incompatible > /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/../../../libc.so when > searching for -lc We seem to have the 64 bit libraries installed (in /lib64 and /usr/lib64), I just need a clue on how to compile and link with them. It's probably something very simple, so I'd appreciate 10 seconds of somebody's time. -- Greg Quinn CAMBIA http://www.cambiaip.org (02) 62464523 From olh at suse.de Wed Oct 20 16:30:46 2004 From: olh at suse.de (Olaf Hering) Date: Wed, 20 Oct 2004 08:30:46 +0200 Subject: 64 bit compilation and linking - help In-Reply-To: <41760133.6020801@anu.edu.au> References: <41760133.6020801@anu.edu.au> Message-ID: <20041020063046.GA28504@suse.de> On Wed, Oct 20, Greg Quinn wrote: > We seem to have the 64 bit libraries installed (in /lib64 and > /usr/lib64), I just need a clue on how to compile and link with them. > It's probably something very simple, so I'd appreciate 10 seconds of > somebody's time. you have not enough installed, look at 'rpm -qa | grep 64bit'. To install more rpms, use yast and search for package names wich contain '64bit'. I think you just need the glibc-devel-64bit for a simple hello_world.c. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Wed Oct 20 22:10:36 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 20 Oct 2004 22:10:36 +1000 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <16758.21948.795730.268143@cargo.ozlabs.ibm.com> Joel Schopp writes: > 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. > Thus, I expect the floodgates into mainline to open soon. I would hope > that my patches would be sent on by the architecture maintainers at that > time. And 2.6.9 is now out, and the floodgates are open, and patches are flowing again. As far as your patches are concerned, I am aware of two patches that change things so that we have __boot variants of __pa etc. However, your explanation didn't really get me excited about the change. You said something about "moving towards hotplug memory" but you didn't explain why these changes would help with that, or how I should choose which function to use when I'm making changes in future (that should actually go in a file somewhere under the Documentation directory), or why those changes need to go in now. > I think we should consider bringing another architecture maintainer on > board to help spread out the load of reviewing and approving > architecture patches. Somebody like Olof. Barring that I would like to > volunteer some of my own cycles to review some of the current backlog, > prioritize them, make sure they still compile/boot, and rebase them. Help with reviewing, compile/boot testing and rebasing patches is always welcome. :) Rebasing is really the responsibility of the original submitter though, since they generally know what has been changed and why better than anyone. Paul. From tiwari.amit at gmail.com Wed Oct 20 22:08:28 2004 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Wed, 20 Oct 2004 17:38:28 +0530 Subject: Max RAM Supported Message-ID: Hi, I have just installed YDL 4.0. The OS does not show all 6GB DRAM I have in my Power Mac G5. It shows only 1.97GB (I ran top to see how much physical memory I have). Looking at the net, http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Bligh-OLS2003.pdf says that kernel 2.5 should support approx 32GB of memory. Do I need to re-build the kernel to enable the support for all of available memory? If yes, with what options? 'High Memory Support' is already enabled in the kernel config. Amit From paulus at samba.org Wed Oct 20 22:21:03 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 20 Oct 2004 22:21:03 +1000 Subject: Max RAM Supported In-Reply-To: References: Message-ID: <16758.22575.18560.155884@cargo.ozlabs.ibm.com> Amit K Tiwari writes: > I have just installed YDL 4.0. The OS does not show all 6GB DRAM I > have in my Power Mac G5. It shows only 1.97GB (I ran top to see how > much physical memory I have). Looking at the net, Is that a 32-bit kernel or a 64-bit kernel? (If uname -m prints ppc, it's a 32-bit kernel; if it prints ppc64, it's a 64-bit kernel.) The 32-bit kernel only supports 2GB of RAM, because it can only use physical addresses below 4GB, and the space from 2GB - 4GB in the physical address space is used for I/O and ROM. The 64-bit kernel can address all of the physical address space. > Do I need to re-build the kernel to enable the support for all of > available memory? If yes, with what options? 'High Memory Support' is > already enabled in the kernel config. You need to build a 64-bit kernel (i.e. ARCH=ppc64) rather than a 32-bit kernel (ARCH=ppc). Paul. From dhowells at redhat.com Thu Oct 21 00:44:15 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 20 Oct 2004 15:44:15 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs Message-ID: <3506.1098283455@redhat.com> Hi Linus, Andrew, The attached patch adds syscalls for almost all archs (everything barring m68knommu which is in a real mess, and i386 which already has it). It also adds 32->64 compatibility where appropriate. David Signed-Off-By: David Howells --- warthog>diffstat keys-269bk4.diff arch/alpha/kernel/systbls.S | 3 +++ arch/arm/kernel/calls.S | 3 +++ arch/cris/arch-v10/kernel/entry.S | 3 +++ arch/h8300/kernel/syscalls.S | 3 +++ arch/ia64/ia32/ia32_entry.S | 4 ++++ arch/ia64/ia32/sys_ia32.c | 20 ++++++++++++++++++++ arch/ia64/kernel/entry.S | 6 +++--- arch/ia64/kernel/fsys.S | 6 +++--- arch/m32r/kernel/entry.S | 3 +++ arch/m68k/kernel/entry.S | 3 +++ arch/mips/kernel/scall32-o32.S | 3 +++ arch/mips/kernel/scall64-64.S | 3 +++ arch/mips/kernel/scall64-n32.S | 3 +++ arch/mips/kernel/scall64-o32.S | 3 +++ arch/parisc/kernel/syscall_table.S | 4 +++- arch/ppc/kernel/misc.S | 3 +++ arch/ppc64/kernel/misc.S | 6 ++++++ arch/ppc64/kernel/sys_ppc32.c | 33 +++++++++++++++++++++++++++++++++ arch/s390/kernel/compat_wrapper.S | 26 ++++++++++++++++++++++++++ arch/s390/kernel/syscalls.S | 3 +++ arch/sh/kernel/entry.S | 4 ++++ arch/sh64/kernel/syscalls.S | 4 +++- arch/sparc/kernel/systbls.S | 2 +- arch/sparc64/kernel/sys32.S | 3 +++ arch/sparc64/kernel/systbls.S | 4 ++-- arch/um/kernel/sys_call_table.c | 3 +++ arch/v850/kernel/entry.S | 3 +++ arch/x86_64/ia32/ia32entry.S | 4 ++++ include/asm-alpha/unistd.h | 5 ++++- include/asm-arm/unistd.h | 3 +++ include/asm-arm26/unistd.h | 3 +++ include/asm-cris/unistd.h | 5 ++++- include/asm-h8300/unistd.h | 5 ++++- include/asm-ia64/unistd.h | 3 +++ include/asm-m32r/unistd.h | 5 ++++- include/asm-m68k/unistd.h | 5 ++++- include/asm-mips/unistd.h | 17 +++++++++++++---- include/asm-parisc/unistd.h | 5 ++++- include/asm-ppc/unistd.h | 5 ++++- include/asm-ppc64/unistd.h | 5 ++++- include/asm-s390/unistd.h | 5 ++++- include/asm-sh/unistd.h | 5 ++++- include/asm-sh64/unistd.h | 5 ++++- include/asm-sparc/unistd.h | 3 +++ include/asm-sparc64/unistd.h | 3 +++ include/asm-v850/unistd.h | 3 +++ include/asm-x86_64/unistd.h | 8 +++++++- 47 files changed, 239 insertions(+), 27 deletions(-) diff -uNrp linux-2.6.9-bk4/arch/alpha/kernel/systbls.S linux-2.6.9-bk4-keys/arch/alpha/kernel/systbls.S --- linux-2.6.9-bk4/arch/alpha/kernel/systbls.S 2004-10-19 10:41:41.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/alpha/kernel/systbls.S 2004-10-20 14:47:43.275151615 +0100 @@ -458,6 +458,9 @@ sys_call_table: .quad sys_mq_notify .quad sys_mq_getsetattr .quad sys_waitid + .quad sys_add_key + .quad sys_request_key + .quad sys_keyctl .size sys_call_table, . - sys_call_table .type sys_call_table, @object diff -uNrp linux-2.6.9-bk4/arch/arm/kernel/calls.S linux-2.6.9-bk4-keys/arch/arm/kernel/calls.S --- linux-2.6.9-bk4/arch/arm/kernel/calls.S 2004-10-19 10:41:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/arm/kernel/calls.S 2004-10-20 14:57:39.641915157 +0100 @@ -295,6 +295,9 @@ __syscall_start: .long sys_mq_notify .long sys_mq_getsetattr /* 280 */ .long sys_waitid + .long sys_add_key + .long sys_request_key + .long sys_keyctl __syscall_end: .rept NR_syscalls - (__syscall_end - __syscall_start) / 4 diff -uNrp linux-2.6.9-bk4/arch/cris/arch-v10/kernel/entry.S linux-2.6.9-bk4-keys/arch/cris/arch-v10/kernel/entry.S --- linux-2.6.9-bk4/arch/cris/arch-v10/kernel/entry.S 2004-06-18 13:43:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/cris/arch-v10/kernel/entry.S 2004-10-20 14:44:52.215209105 +0100 @@ -1079,6 +1079,9 @@ sys_call_table: .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr + .long sys_add_key + .long sys_request_key /* 285 */ + .long sys_keyctl /* * NOTE!! This doesn't have to be exact - we just have diff -uNrp linux-2.6.9-bk4/arch/h8300/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/h8300/kernel/syscalls.S --- linux-2.6.9-bk4/arch/h8300/kernel/syscalls.S 2004-06-18 13:43:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/h8300/kernel/syscalls.S 2004-10-20 15:00:36.035535939 +0100 @@ -289,6 +289,9 @@ SYMBOL_NAME_LABEL(sys_call_table) .long SYMBOL_NAME(sys_utimes) .long SYMBOL_NAME(sys_fadvise64_64) .long SYMBOL_NAME(sys_ni_syscall) /* sys_vserver */ + .long SYMBOL_NAME(sys_add_key) + .long SYMBOL_NAME(sys_request_key) /* 275 */ + .long SYMBOL_NAME(sys_keyctl) .rept NR_syscalls-(.-SYMBOL_NAME(sys_call_table))/4 .long SYMBOL_NAME(sys_ni_syscall) diff -uNrp linux-2.6.9-bk4/arch/ia64/ia32/ia32_entry.S linux-2.6.9-bk4-keys/arch/ia64/ia32/ia32_entry.S --- linux-2.6.9-bk4/arch/ia64/ia32/ia32_entry.S 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/ia32/ia32_entry.S 2004-10-20 15:25:01.365546264 +0100 @@ -495,6 +495,10 @@ ia32_syscall_table: data8 compat_sys_mq_getsetattr data8 sys_ni_syscall /* reserved for kexec */ data8 sys32_waitid + data8 sys_ni_syscall /* reserved for setaltroot */ + data8 sys32_add_key + data8 sys32_request_key + data8 sys_keyctl // guard against failures to increase IA32_NR_syscalls .org ia32_syscall_table + 8*IA32_NR_syscalls diff -uNrp linux-2.6.9-bk4/arch/ia64/ia32/sys_ia32.c linux-2.6.9-bk4-keys/arch/ia64/ia32/sys_ia32.c --- linux-2.6.9-bk4/arch/ia64/ia32/sys_ia32.c 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/ia32/sys_ia32.c 2004-10-20 15:28:48.663376741 +0100 @@ -2687,6 +2687,26 @@ asmlinkage long sys32_waitid(int which, return copy_siginfo_to_user32(uinfo, &info); } + +asmlinkage long sys32_add_key(const char __user *_type, + const char __user *_description, + const void __user *_payload, + __u32 plen, + __u32 ringid) +{ + sys_add_key(_type, _description, _payload, (size_t) plen, + (key_serial_t) ringid); +} + +asmlinkage long sys32_request_key(const char __user *_type, + const char __user *_description, + const char __user *_callout_info, + __u32 destringid) +{ + sys_request_key(_type, _description, _callout_info, + (key_serial_t) destringid); +} + #ifdef NOTYET /* UNTESTED FOR IA64 FROM HERE DOWN */ asmlinkage long sys32_setreuid(compat_uid_t ruid, compat_uid_t euid) diff -uNrp linux-2.6.9-bk4/arch/ia64/kernel/entry.S linux-2.6.9-bk4-keys/arch/ia64/kernel/entry.S --- linux-2.6.9-bk4/arch/ia64/kernel/entry.S 2004-10-20 14:02:54.138626787 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/kernel/entry.S 2004-10-20 14:45:48.309267588 +0100 @@ -1528,9 +1528,9 @@ sys_call_table: data8 sys_ni_syscall // reserved for kexec_load data8 sys_ni_syscall data8 sys_setaltroot // 1270 - data8 sys_ni_syscall - data8 sys_ni_syscall - data8 sys_ni_syscall + data8 sys_add_key + data8 sys_request_key + data8 sys_keyctl data8 sys_ni_syscall data8 sys_ni_syscall // 1275 data8 sys_ni_syscall diff -uNrp linux-2.6.9-bk4/arch/ia64/kernel/fsys.S linux-2.6.9-bk4-keys/arch/ia64/kernel/fsys.S --- linux-2.6.9-bk4/arch/ia64/kernel/fsys.S 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/kernel/fsys.S 2004-10-20 14:46:27.814789684 +0100 @@ -868,9 +868,9 @@ fsyscall_table: data8 0 // kexec_load data8 0 data8 0 // 1270 - data8 0 - data8 0 - data8 0 + data8 0 // add_key + data8 0 // request_key + data8 0 // keyctl data8 0 data8 0 // 1275 data8 0 diff -uNrp linux-2.6.9-bk4/arch/m32r/kernel/entry.S linux-2.6.9-bk4-keys/arch/m32r/kernel/entry.S --- linux-2.6.9-bk4/arch/m32r/kernel/entry.S 2004-10-19 10:41:44.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/m32r/kernel/entry.S 2004-10-20 15:09:17.798751465 +0100 @@ -994,6 +994,9 @@ ENTRY(sys_call_table) .long sys_mq_getsetattr .long sys_ni_syscall /* reserved for kexec */ .long sys_waitid + .long sys_add_key /* 285 */ + .long sys_request_key + .long sys_keyctl syscall_table_size=(.-sys_call_table) diff -uNrp linux-2.6.9-bk4/arch/m68k/kernel/entry.S linux-2.6.9-bk4-keys/arch/m68k/kernel/entry.S --- linux-2.6.9-bk4/arch/m68k/kernel/entry.S 2004-06-18 13:43:44.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/m68k/kernel/entry.S 2004-10-20 14:45:20.678701183 +0100 @@ -663,3 +663,6 @@ sys_call_table: .long sys_lremovexattr .long sys_fremovexattr .long sys_futex /* 235 */ + .long sys_add_key + .long sys_request_key + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall32-o32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall32-o32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall32-o32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall32-o32.S 2004-10-20 14:30:46.698878816 +0100 @@ -628,6 +628,9 @@ out: jr ra sys sys_mq_notify 2 /* 4275 */ sys sys_mq_getsetattr 3 sys sys_ni_syscall 0 /* sys_vserver */ + sys sys_add_key 5 + sys sys_request_key 4 + sys sys_keyctl 5 .endm diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-64.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-64.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-64.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-64.S 2004-10-20 14:32:42.206470034 +0100 @@ -448,3 +448,6 @@ sys_call_table: PTR sys_mq_notify PTR sys_mq_getsetattr /* 5235 */ PTR sys_ni_syscall /* sys_vserver */ + PTR sys_add_key + PTR sys_request_key + PTR sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-n32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-n32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-n32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-n32.S 2004-10-20 15:12:10.687967430 +0100 @@ -358,3 +358,6 @@ EXPORT(sysn32_call_table) PTR compat_sys_mq_notify PTR compat_sys_mq_getsetattr /* 6239 */ PTR sys_ni_syscall /* sys_vserver */ + PTR sys_add_key + PTR sys_request_key + PTR sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-o32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-o32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-o32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-o32.S 2004-10-20 15:11:26.761722025 +0100 @@ -536,6 +536,9 @@ out: jr ra sys compat_sys_mq_notify 2 /* 4275 */ sys compat_sys_mq_getsetattr 3 sys sys_ni_syscall 0 /* sys_vserver */ + sys sys_add_key 5 + sys sys_request_key 4 + sys sys_keyctl 5 .endm diff -uNrp linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S --- linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S 2004-06-18 13:43:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S 2004-10-20 14:58:51.533643420 +0100 @@ -341,5 +341,7 @@ ENTRY_SAME(mq_timedreceive) ENTRY_SAME(mq_notify) ENTRY_SAME(mq_getsetattr) - /* Nothing yet */ /* 235 */ + ENTRY_SAME(add_key) /* 235 */ + ENTRY_SAME(request_key) + ENTRY_SAME(keyctl) diff -uNrp linux-2.6.9-bk4/arch/ppc/kernel/misc.S linux-2.6.9-bk4-keys/arch/ppc/kernel/misc.S --- linux-2.6.9-bk4/arch/ppc/kernel/misc.S 2004-10-19 10:41:46.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc/kernel/misc.S 2004-10-20 14:43:37.665815385 +0100 @@ -1447,3 +1447,6 @@ _GLOBAL(sys_call_table) .long sys_mq_notify .long sys_mq_getsetattr .long sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .long sys_add_key + .long sys_request_key /* 270 */ + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/ppc64/kernel/misc.S linux-2.6.9-bk4-keys/arch/ppc64/kernel/misc.S --- linux-2.6.9-bk4/arch/ppc64/kernel/misc.S 2004-10-20 14:02:55.974474037 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc64/kernel/misc.S 2004-10-20 14:57:18.470763092 +0100 @@ -963,6 +963,9 @@ _GLOBAL(sys_call_table32) .llong .compat_sys_mq_notify .llong .compat_sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys32_add_key + .llong .sys32_request_key + .llong .sys32_keyctl .balign 8 _GLOBAL(sys_call_table) @@ -1235,3 +1238,6 @@ _GLOBAL(sys_call_table) .llong .sys_mq_notify .llong .sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys_add_key + .llong .sys_request_key /* 270 */ + .llong .sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/ppc64/kernel/sys_ppc32.c linux-2.6.9-bk4-keys/arch/ppc64/kernel/sys_ppc32.c --- linux-2.6.9-bk4/arch/ppc64/kernel/sys_ppc32.c 2004-10-20 14:02:56.046468047 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc64/kernel/sys_ppc32.c 2004-10-20 15:29:22.936487493 +0100 @@ -1328,3 +1328,36 @@ long ppc32_timer_create(clockid_t clock, return err; } + +asmlinkage long sys32_add_key(const char __user *_type, + const char __user *_description, + const void __user *_payload, + u32 plen, + u32 ringid) +{ + sys_add_key(_type, _description, _payload, (size_t) plen, + (key_serial_t) ringid); +} + +asmlinkage long sys32_request_key(const char __user *_type, + const char __user *_description, + const char __user *_callout_info, + u32 destringid) +{ + sys_request_key(_type, _description, _callout_info, + (key_serial_t) destringid); +} + +/* Note: it is necessary to treat option as an unsigned int, + * with the corresponding cast to a signed int to insure that the + * proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode) + * and the register representation of a signed int (msr in 64-bit mode) is performed. + */ +asmlinkage long sys32_keyctl(u32 option, u32 arg2, u32 arg3, u32 arg4, u32 arg5) +{ + return sys_keyctl((int)option, + (unsigned long) arg2, + (unsigned long) arg3, + (unsigned long) arg4, + (unsigned long) arg5); +} diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S --- linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S 2004-06-18 13:43:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S 2004-10-20 15:08:00.071403677 +0100 @@ -1406,3 +1406,29 @@ compat_sys_mq_getsetattr_wrapper: llgtr %r3,%r3 # struct compat_mq_attr * llgtr %r4,%r4 # struct compat_mq_attr * jg compat_sys_mq_getsetattr + + .globl sys32_add_key_wrapper +sys32_add_key_wrapper: + lgfr %r2,%r2 # const char * + llgfr %r3,%r3 # const char * + llgfr %r4,%r4 # const void * + llgfr %r5,%r5 # size_t + llgfr %r6,%r6 # key_serial_t + jg sys_add_key # branch to system call + + .globl sys32_request_key_wrapper +sys32_request_key_wrapper: + lgfr %r2,%r2 # const char * + llgfr %r3,%r3 # const char * + llgfr %r4,%r4 # const char * + llgfr %r5,%r5 # key_serial_t + jg sys_request_key # branch to system call + + .globl sys32_keyctl_wrapper +sys32_keyctl_wrapper: + lgfr %r2,%r2 # int + llgfr %r3,%r3 # unsigned long + llgfr %r4,%r4 # unsigned long + llgfr %r5,%r5 # unsigned long + llgfr %r6,%r6 # unsigned long + jg sys_keyctl # branch to system call diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/s390/kernel/syscalls.S --- linux-2.6.9-bk4/arch/s390/kernel/syscalls.S 2004-06-18 13:43:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/s390/kernel/syscalls.S 2004-10-20 15:05:49.863555437 +0100 @@ -285,3 +285,6 @@ SYSCALL(sys_mq_timedsend,sys_mq_timedsen SYSCALL(sys_mq_timedreceive,sys_mq_timedreceive,compat_sys_mq_timedreceive_wrapper) SYSCALL(sys_mq_notify,sys_mq_notify,compat_sys_mq_notify_wrapper) SYSCALL(sys_mq_getsetattr,sys_mq_getsetattr,compat_sys_mq_getsetattr_wrapper) +SYSCALL(sys_add_key,sys_add_key,sys32_add_key_wrapper) +SYSCALL(sys_request_key,sys_request_key,sys32_request_key_wrapper) +SYSCALL(sys_keyctl,sys_keyctl,sys32_keyctl_wrapper) diff -uNrp linux-2.6.9-bk4/arch/sh/kernel/entry.S linux-2.6.9-bk4-keys/arch/sh/kernel/entry.S --- linux-2.6.9-bk4/arch/sh/kernel/entry.S 2004-10-20 14:02:56.666416464 +0100 +++ linux-2.6.9-bk4-keys/arch/sh/kernel/entry.S 2004-10-20 14:26:32.677689027 +0100 @@ -1140,5 +1140,9 @@ ENTRY(sys_call_table) .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr + .long sys_add_key + .long sys_request_key + .long sys_keyctl /* 285 */ + /* End of entry.S */ diff -uNrp linux-2.6.9-bk4/arch/sh64/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/sh64/kernel/syscalls.S --- linux-2.6.9-bk4/arch/sh64/kernel/syscalls.S 2004-09-16 12:05:50.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sh64/kernel/syscalls.S 2004-10-20 15:08:45.682499668 +0100 @@ -337,4 +337,6 @@ sys_call_table: .long sys_mq_timedreceive .long sys_mq_notify .long sys_mq_getsetattr /* 310 */ - + .long sys_add_key + .long sys_request_key + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/sparc/kernel/systbls.S linux-2.6.9-bk4-keys/arch/sparc/kernel/systbls.S --- linux-2.6.9-bk4/arch/sparc/kernel/systbls.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc/kernel/systbls.S 2004-10-20 14:25:23.775664787 +0100 @@ -75,7 +75,7 @@ sys_call_table: /*265*/ .long sys_timer_delete, sys_timer_create, sys_nis_syscall, sys_io_setup, sys_io_destroy /*270*/ .long sys_io_submit, sys_io_cancel, sys_io_getevents, sys_mq_open, sys_mq_unlink /*275*/ .long sys_mq_timedsend, sys_mq_timedreceive, sys_mq_notify, sys_mq_getsetattr, sys_waitid -/*280*/ .long sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .long sys_add_key, sys_request_key, sys_keyctl #ifdef CONFIG_SUNOS_EMUL /* Now the SunOS syscall table. */ diff -uNrp linux-2.6.9-bk4/arch/sparc64/kernel/sys32.S linux-2.6.9-bk4-keys/arch/sparc64/kernel/sys32.S --- linux-2.6.9-bk4/arch/sparc64/kernel/sys32.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc64/kernel/sys32.S 2004-10-20 15:22:48.095792589 +0100 @@ -135,6 +135,9 @@ SIGN2(sys32_shutdown, sys_shutdown, %o0, SIGN3(sys32_socketpair, sys_socketpair, %o0, %o1, %o2) SIGN1(sys32_getpeername, sys_getpeername, %o0) SIGN1(sys32_getsockname, sys_getsockname, %o0) +SIGN2(sys32_add_key, sys_add_key, %o3, %o4) +SIGN1(sys32_request_key, sys_request_key, %o3) +SIGN1(sys32_keyctl, sys_keyctl, %o0) .globl sys32_mmap2 sys32_mmap2: diff -uNrp linux-2.6.9-bk4/arch/sparc64/kernel/systbls.S linux-2.6.9-bk4-keys/arch/sparc64/kernel/systbls.S --- linux-2.6.9-bk4/arch/sparc64/kernel/systbls.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc64/kernel/systbls.S 2004-10-20 14:42:28.934934888 +0100 @@ -76,7 +76,7 @@ sys_call_table32: .word sys_timer_delete, sys32_timer_create, sys_ni_syscall, compat_sys_io_setup, sys_io_destroy /*270*/ .word sys32_io_submit, sys_io_cancel, compat_sys_io_getevents, sys32_mq_open, sys_mq_unlink .word sys_mq_timedsend, sys_mq_timedreceive, compat_sys_mq_notify, compat_sys_mq_getsetattr, compat_sys_waitid -/*280*/ .word sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .word sys32_add_key, sys32_request_key, sys32_keyctl #endif /* CONFIG_COMPAT */ @@ -142,7 +142,7 @@ sys_call_table: .word sys_timer_delete, sys_timer_create, sys_ni_syscall, sys_io_setup, sys_io_destroy /*270*/ .word sys_io_submit, sys_io_cancel, sys_io_getevents, sys_mq_open, sys_mq_unlink .word sys_mq_timedsend, sys_mq_timedreceive, sys_mq_notify, sys_mq_getsetattr, sys_waitid -/*280*/ .word sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .word sys_add_key, sys_request_key, sys_keyctl #if defined(CONFIG_SUNOS_EMUL) || defined(CONFIG_SOLARIS_EMUL) || \ defined(CONFIG_SOLARIS_EMUL_MODULE) diff -uNrp linux-2.6.9-bk4/arch/um/kernel/sys_call_table.c linux-2.6.9-bk4-keys/arch/um/kernel/sys_call_table.c --- linux-2.6.9-bk4/arch/um/kernel/sys_call_table.c 2004-10-19 10:41:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/um/kernel/sys_call_table.c 2004-10-20 14:44:10.557889241 +0100 @@ -306,6 +306,9 @@ syscall_handler_t *sys_call_table[] = { [ __NR_utimes ] (syscall_handler_t *) sys_utimes, [ __NR_fadvise64_64 ] (syscall_handler_t *) sys_fadvise64_64, [ __NR_vserver ] (syscall_handler_t *) sys_ni_syscall, + [ __NR_add_key ] (syscall_handler_t *) sys_add_key, + [ __NR_request_key ] (syscall_handler_t *) sys_request_key, + [ __NR_keyctl ] (syscall_handler_t *) sys_keyctl, ARCH_SYSCALLS [ LAST_SYSCALL + 1 ... NR_syscalls ] = diff -uNrp linux-2.6.9-bk4/arch/v850/kernel/entry.S linux-2.6.9-bk4-keys/arch/v850/kernel/entry.S --- linux-2.6.9-bk4/arch/v850/kernel/entry.S 2004-06-18 13:41:13.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/v850/kernel/entry.S 2004-10-20 15:02:06.154739578 +0100 @@ -1117,5 +1117,8 @@ C_DATA(sys_call_table): .long CSYM(sys_pivot_root) // 200 .long CSYM(sys_gettid) .long CSYM(sys_tkill) + .long CSYM(sys_add_key) + .long CSYM(sys_request_key) + .long CSYM(sys_keyctl) // 205 sys_call_table_end: C_END(sys_call_table) diff -uNrp linux-2.6.9-bk4/arch/x86_64/ia32/ia32entry.S linux-2.6.9-bk4-keys/arch/x86_64/ia32/ia32entry.S --- linux-2.6.9-bk4/arch/x86_64/ia32/ia32entry.S 2004-10-19 10:41:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/x86_64/ia32/ia32entry.S 2004-10-20 15:04:46.183013167 +0100 @@ -587,6 +587,10 @@ ia32_sys_call_table: .quad compat_sys_mq_getsetattr .quad quiet_ni_syscall /* reserved for kexec */ .quad sys32_waitid + .quad quiet_ni_syscall /* 285 reserved for setaltroot */ + .quad sys_add_key + .quad sys_request_key + .quad sys_keyctl /* don't forget to change IA32_NR_syscalls */ ia32_syscall_end: .rept IA32_NR_syscalls-(ia32_syscall_end-ia32_sys_call_table)/8 diff -uNrp linux-2.6.9-bk4/include/asm-alpha/unistd.h linux-2.6.9-bk4-keys/include/asm-alpha/unistd.h --- linux-2.6.9-bk4/include/asm-alpha/unistd.h 2004-10-19 10:42:11.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-alpha/unistd.h 2004-10-20 14:18:36.681064345 +0100 @@ -374,8 +374,11 @@ #define __NR_mq_notify 436 #define __NR_mq_getsetattr 437 #define __NR_waitid 438 +#define __NR_add_key 439 +#define __NR_request_key 440 +#define __NR_keyctl 441 -#define NR_SYSCALLS 439 +#define NR_SYSCALLS 442 #if defined(__GNUC__) diff -uNrp linux-2.6.9-bk4/include/asm-arm/unistd.h linux-2.6.9-bk4-keys/include/asm-arm/unistd.h --- linux-2.6.9-bk4/include/asm-arm/unistd.h 2004-10-19 10:42:12.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-arm/unistd.h 2004-10-20 14:17:35.183426405 +0100 @@ -306,6 +306,9 @@ #define __NR_mq_notify (__NR_SYSCALL_BASE+278) #define __NR_mq_getsetattr (__NR_SYSCALL_BASE+279) #define __NR_waitid (__NR_SYSCALL_BASE+280) +#define __NR_add_key (__NR_SYSCALL_BASE+281) +#define __NR_request_key (__NR_SYSCALL_BASE+282) +#define __NR_keyctl (__NR_SYSCALL_BASE+283) /* * The following SWIs are ARM private. diff -uNrp linux-2.6.9-bk4/include/asm-arm26/unistd.h linux-2.6.9-bk4-keys/include/asm-arm26/unistd.h --- linux-2.6.9-bk4/include/asm-arm26/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-arm26/unistd.h 2004-10-20 14:16:45.004804472 +0100 @@ -260,6 +260,9 @@ #define __NR_lremovexattr (__NR_SYSCALL_BASE+236) #define __NR_fremovexattr (__NR_SYSCALL_BASE+237) #define __NR_tkill (__NR_SYSCALL_BASE+238) +#define __NR_add_key (__NR_SYSCALL_BASE+239) +#define __NR_request_key (__NR_SYSCALL_BASE+240) +#define __NR_keyctl (__NR_SYSCALL_BASE+241) /* * The following SWIs are ARM private. diff -uNrp linux-2.6.9-bk4/include/asm-cris/unistd.h linux-2.6.9-bk4-keys/include/asm-cris/unistd.h --- linux-2.6.9-bk4/include/asm-cris/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-cris/unistd.h 2004-10-20 14:16:21.025897563 +0100 @@ -288,8 +288,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 283 +#define __NR_request_key 284 +#define __NR_keyctl 285 -#define NR_syscalls 283 +#define NR_syscalls 286 #ifdef __KERNEL__ diff -uNrp linux-2.6.9-bk4/include/asm-h8300/unistd.h linux-2.6.9-bk4-keys/include/asm-h8300/unistd.h --- linux-2.6.9-bk4/include/asm-h8300/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-h8300/unistd.h 2004-10-20 15:01:16.446016959 +0100 @@ -269,8 +269,11 @@ #define __NR_clock_gettime (__NR_timer_create+6) #define __NR_clock_getres (__NR_timer_create+7) #define __NR_clock_nanosleep (__NR_timer_create+8) +#define __NR_add_key 274 +#define __NR_request_key 275 +#define __NR_keyctl 276 -#define NR_syscalls 268 +#define NR_syscalls 277 /* user-visible error numbers are in the range -1 - -122: see diff -uNrp linux-2.6.9-bk4/include/asm-ia64/unistd.h linux-2.6.9-bk4-keys/include/asm-ia64/unistd.h --- linux-2.6.9-bk4/include/asm-ia64/unistd.h 2004-10-20 14:03:14.832904952 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ia64/unistd.h 2004-10-20 14:14:59.746996878 +0100 @@ -260,6 +260,9 @@ #define __NR_kexec_load 1268 #define __NR_vserver 1269 #define __NR_setaltroot 1270 +#define __NR_add_key 1271 +#define __NR_request_key 1272 +#define __NR_keyctl 1273 #ifdef __KERNEL__ diff -uNrp linux-2.6.9-bk4/include/asm-m32r/unistd.h linux-2.6.9-bk4-keys/include/asm-m32r/unistd.h --- linux-2.6.9-bk4/include/asm-m32r/unistd.h 2004-10-19 10:42:13.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-m32r/unistd.h 2004-10-20 14:14:34.284222397 +0100 @@ -294,8 +294,11 @@ #define __NR_mq_getsetattr (__NR_mq_open+5) #define __NR_sys_kexec_load 283 #define __NR_waitid 284 +#define __NR_add_key 285 +#define __NR_request_key 286 +#define __NR_keyctl 287 -#define NR_syscalls 285 +#define NR_syscalls 288 /* user-visible error numbers are in the range -1 - -124: see * diff -uNrp linux-2.6.9-bk4/include/asm-m68k/unistd.h linux-2.6.9-bk4-keys/include/asm-m68k/unistd.h --- linux-2.6.9-bk4/include/asm-m68k/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-m68k/unistd.h 2004-10-20 14:14:06.358663984 +0100 @@ -238,8 +238,11 @@ #define __NR_lremovexattr 233 #define __NR_fremovexattr 234 #define __NR_futex 235 +#define __NR_add_key 236 +#define __NR_request_key 237 +#define __NR_keyctl 238 -#define NR_syscalls 236 +#define NR_syscalls 239 /* user-visible error numbers are in the range -1 - -124: see */ diff -uNrp linux-2.6.9-bk4/include/asm-mips/unistd.h linux-2.6.9-bk4-keys/include/asm-mips/unistd.h --- linux-2.6.9-bk4/include/asm-mips/unistd.h 2004-09-16 12:06:18.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-mips/unistd.h 2004-10-20 14:12:31.321979696 +0100 @@ -298,16 +298,19 @@ #define __NR_mq_notify (__NR_Linux + 275) #define __NR_mq_getsetattr (__NR_Linux + 276) #define __NR_vserver (__NR_Linux + 277) +#define __NR_add_key (__NR_Linux + 278) +#define __NR_request_key (__NR_Linux + 279) +#define __NR_keyctl (__NR_Linux + 280) /* * Offset of the last Linux o32 flavoured syscall */ -#define __NR_Linux_syscalls 277 +#define __NR_Linux_syscalls 280 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */ #define __NR_O32_Linux 4000 -#define __NR_O32_Linux_syscalls 277 +#define __NR_O32_Linux_syscalls 280 #if _MIPS_SIM == _MIPS_SIM_ABI64 @@ -552,11 +555,14 @@ #define __NR_mq_notify (__NR_Linux + 234) #define __NR_mq_getsetattr (__NR_Linux + 235) #define __NR_vserver (__NR_Linux + 236) +#define __NR_add_key (__NR_Linux + 237) +#define __NR_request_key (__NR_Linux + 238) +#define __NR_keyctl (__NR_Linux + 239) /* * Offset of the last Linux flavoured syscall */ -#define __NR_Linux_syscalls 236 +#define __NR_Linux_syscalls 239 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */ @@ -810,11 +816,14 @@ #define __NR_mq_notify (__NR_Linux + 238) #define __NR_mq_getsetattr (__NR_Linux + 239) #define __NR_vserver (__NR_Linux + 240) +#define __NR_add_key (__NR_Linux + 241) +#define __NR_request_key (__NR_Linux + 242) +#define __NR_keyctl (__NR_Linux + 243) /* * Offset of the last N32 flavoured syscall */ -#define __NR_Linux_syscalls 240 +#define __NR_Linux_syscalls 243 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */ diff -uNrp linux-2.6.9-bk4/include/asm-parisc/unistd.h linux-2.6.9-bk4-keys/include/asm-parisc/unistd.h --- linux-2.6.9-bk4/include/asm-parisc/unistd.h 2004-09-16 12:06:18.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-parisc/unistd.h 2004-10-20 14:11:00.896901332 +0100 @@ -727,8 +727,11 @@ #define __NR_mq_timedreceive (__NR_Linux + 232) #define __NR_mq_notify (__NR_Linux + 233) #define __NR_mq_getsetattr (__NR_Linux + 234) +#define __NR_add_key (__NR_Linux + 235) +#define __NR_request_key (__NR_Linux + 236) +#define __NR_keyctl (__NR_Linux + 237) -#define __NR_Linux_syscalls 235 +#define __NR_Linux_syscalls 238 #define HPUX_GATEWAY_ADDR 0xC0000004 #define LINUX_GATEWAY_ADDR 0x100 diff -uNrp linux-2.6.9-bk4/include/asm-ppc/unistd.h linux-2.6.9-bk4-keys/include/asm-ppc/unistd.h --- linux-2.6.9-bk4/include/asm-ppc/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ppc/unistd.h 2004-10-20 14:10:32.629379614 +0100 @@ -273,8 +273,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #define __NR(n) #n diff -uNrp linux-2.6.9-bk4/include/asm-ppc64/unistd.h linux-2.6.9-bk4-keys/include/asm-ppc64/unistd.h --- linux-2.6.9-bk4/include/asm-ppc64/unistd.h 2004-10-19 10:42:14.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ppc64/unistd.h 2004-10-20 14:10:19.868498694 +0100 @@ -279,8 +279,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif diff -uNrp linux-2.6.9-bk4/include/asm-s390/unistd.h linux-2.6.9-bk4-keys/include/asm-s390/unistd.h --- linux-2.6.9-bk4/include/asm-s390/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-s390/unistd.h 2004-10-20 14:09:39.572899460 +0100 @@ -269,8 +269,11 @@ #define __NR_mq_timedreceive 274 #define __NR_mq_notify 275 #define __NR_mq_getsetattr 276 +#define __NR_add_key 277 +#define __NR_request_key 278 +#define __NR_keyctl 279 -#define NR_syscalls 277 +#define NR_syscalls 280 /* * There are some system calls that are not present on 64 bit, some diff -uNrp linux-2.6.9-bk4/include/asm-sh/unistd.h linux-2.6.9-bk4-keys/include/asm-sh/unistd.h --- linux-2.6.9-bk4/include/asm-sh/unistd.h 2004-10-20 14:03:16.058802954 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sh/unistd.h 2004-10-20 14:09:16.465821351 +0100 @@ -290,8 +290,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 283 +#define __NR_request_key 284 +#define __NR_keyctl 285 -#define NR_syscalls 283 +#define NR_syscalls 286 /* user-visible error numbers are in the range -1 - -124: see */ diff -uNrp linux-2.6.9-bk4/include/asm-sh64/unistd.h linux-2.6.9-bk4-keys/include/asm-sh64/unistd.h --- linux-2.6.9-bk4/include/asm-sh64/unistd.h 2004-09-16 12:06:19.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sh64/unistd.h 2004-10-20 14:08:45.352409218 +0100 @@ -333,8 +333,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 311 +#define __NR_request_key 312 +#define __NR_keyctl 313 -#define NR_syscalls 311 +#define NR_syscalls 314 /* user-visible error numbers are in the range -1 - -125: see */ diff -uNrp linux-2.6.9-bk4/include/asm-sparc/unistd.h linux-2.6.9-bk4-keys/include/asm-sparc/unistd.h --- linux-2.6.9-bk4/include/asm-sparc/unistd.h 2004-10-19 10:42:14.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sparc/unistd.h 2004-10-20 14:08:05.303740383 +0100 @@ -296,6 +296,9 @@ #define __NR_mq_notify 277 #define __NR_mq_getsetattr 278 #define __NR_waitid 279 +#define __NR_add_key 280 +#define __NR_request_key 281 +#define __NR_keyctl 282 /* WARNING: You MAY NOT add syscall numbers larger than 282, since * all of the syscall tables in the Sparc kernel are diff -uNrp linux-2.6.9-bk4/include/asm-sparc64/unistd.h linux-2.6.9-bk4-keys/include/asm-sparc64/unistd.h --- linux-2.6.9-bk4/include/asm-sparc64/unistd.h 2004-10-19 10:42:15.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sparc64/unistd.h 2004-10-20 14:07:45.586380476 +0100 @@ -298,6 +298,9 @@ #define __NR_mq_notify 277 #define __NR_mq_getsetattr 278 #define __NR_waitid 279 +#define __NR_add_key 280 +#define __NR_request_key 281 +#define __NR_keyctl 282 /* WARNING: You MAY NOT add syscall numbers larger than 282, since * all of the syscall tables in the Sparc kernel are diff -uNrp linux-2.6.9-bk4/include/asm-v850/unistd.h linux-2.6.9-bk4-keys/include/asm-v850/unistd.h --- linux-2.6.9-bk4/include/asm-v850/unistd.h 2004-09-16 12:06:20.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-v850/unistd.h 2004-10-20 14:06:45.477380562 +0100 @@ -205,6 +205,9 @@ #define __NR_pivot_root 200 #define __NR_gettid 201 #define __NR_tkill 202 +#define __NR_add_key 203 +#define __NR_request_key 204 +#define __NR_keyctl 205 /* Syscall protocol: diff -uNrp linux-2.6.9-bk4/include/asm-x86_64/unistd.h linux-2.6.9-bk4-keys/include/asm-x86_64/unistd.h --- linux-2.6.9-bk4/include/asm-x86_64/unistd.h 2004-10-19 10:42:16.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-x86_64/unistd.h 2004-10-20 14:06:01.645026869 +0100 @@ -556,8 +556,14 @@ __SYSCALL(__NR_mq_getsetattr, sys_mq_get __SYSCALL(__NR_kexec_load, sys_ni_syscall) #define __NR_waitid 247 __SYSCALL(__NR_waitid, sys_waitid) +#define __NR_add_key 248 +__SYSCALL(__NR_add_key, sys_add_key) +#define __NR_request_key 249 +__SYSCALL(__NR_request_key, sys_request_key) +#define __NR_keyctl 250 +__SYSCALL(__NR_keyctl, sys_keyctl) -#define __NR_syscall_max __NR_waitid +#define __NR_syscall_max __NR_keyctl #ifndef __NO_STUBS /* user-visible error numbers are in the range -1 - -4095 */ From hch at infradead.org Thu Oct 21 01:29:57 2004 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 20 Oct 2004 16:29:57 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020152957.GA21774@infradead.org> > Hi Linus, Andrew, > > The attached patch adds syscalls for almost all archs (everything barring > m68knommu which is in a real mess, and i386 which already has it). > > It also adds 32->64 compatibility where appropriate. Umm, that patch added the damn multiplexer that had been vetoed multiple times. Why did this happen? From matthew at wil.cx Thu Oct 21 01:49:22 2004 From: matthew at wil.cx (Matthew Wilcox) Date: Wed, 20 Oct 2004 16:49:22 +0100 Subject: [parisc-linux] [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> On Wed, Oct 20, 2004 at 03:44:15PM +0100, David Howells wrote: > The attached patch adds syscalls for almost all archs (everything barring > m68knommu which is in a real mess, and i386 which already has it). > > It also adds 32->64 compatibility where appropriate. > --- linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S 2004-06-18 13:43:47.000000000 +0100 > +++ linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S 2004-10-20 14:58:51.533643420 +0100 > @@ -341,5 +341,7 @@ > ENTRY_SAME(mq_timedreceive) > ENTRY_SAME(mq_notify) > ENTRY_SAME(mq_getsetattr) > - /* Nothing yet */ /* 235 */ > + ENTRY_SAME(add_key) /* 235 */ > + ENTRY_SAME(request_key) > + ENTRY_SAME(keyctl) Um, no. Should be ENTRY_COMP() if there's compat syscalls. And those particular syscall numbers have already been assigned (blame Linus for dropping the PA-RISC patch on the floor instead of including it in 2.6.9). -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain From dhowells at redhat.com Thu Oct 21 02:16:17 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 20 Oct 2004 17:16:17 +0100 Subject: [parisc-linux] [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> References: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> <3506.1098283455@redhat.com> Message-ID: <7779.1098288977@redhat.com> > Um, no. Should be ENTRY_COMP() if there's compat syscalls. Not all archs (of which PA-Risc is an example) seem to require the same fixups on the same syscalls. In some instances, the upper half of the register is implicitly zero on 32-bit syscall entry to a 64-bit kernel. In such cases, none of my syscalls require fixing up, assuming the pointers are automatically correct. > And those particular syscall numbers have already been assigned (blame Linus > for dropping the PA-RISC patch on the floor instead of including it in > 2.6.9). There's not a lot I can do about that, except wave a patch under Linus's nose and see who complains. Can you allocate three syscall numbers for me for parisc? David From johnrose at austin.ibm.com Thu Oct 21 02:35:32 2004 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 20 Oct 2004 11:35:32 -0500 Subject: [PATCH] __ioremap_explicit() criterion change Message-ID: <1098290132.15425.7.camel@sinatra.austin.ibm.com> The function __ioremap_explicit() misses a possible (obscure) case when reserving the imalloc area for the new region. This can result in the unexpected DLPAR-add failure for an I/O slot. The failure will be characterized by a kernel message resembling "could not obtain imalloc area for ea 0x..." Here's an explanation: At boot time, imalloc regions are created for the ranges of all PHBs. Upon removal of a child slot for one of these PHBs, the imalloc region is split so that the region for the child slot can be removed. A GFW testcase revealed the following scenario. A PHB is remapped at boot for virtual address range A through C. At boot, the partition owns a slot that spans from A to B. This slot is DLPAR-removed, leaving an imalloc region from B to C. At this point, the user DLPAR adds an EADS slot that was not present at boot, but is a child of the PHB. The new slot happens to have a range that directly matches the leftover PHB range, from B to C. The existing code does not expect this, so the operation fails. Signed-off-by: John Rose diff -Nru a/arch/ppc64/mm/init.c b/arch/ppc64/mm/init.c --- a/arch/ppc64/mm/init.c Wed Oct 20 11:17:47 2004 +++ b/arch/ppc64/mm/init.c Wed Oct 20 11:17:47 2004 @@ -263,7 +263,8 @@ */ ; } else { - area = im_get_area(ea, size, IM_REGION_UNUSED|IM_REGION_SUBSET); + area = im_get_area(ea, size, + IM_REGION_UNUSED|IM_REGION_SUBSET|IM_REGION_EXISTS); if (area == NULL) { printk(KERN_ERR "could not obtain imalloc area for ea 0x%lx\n", ea); return 1; From cchaney at us.ibm.com Thu Oct 21 03:04:20 2004 From: cchaney at us.ibm.com (Craig Chaney) Date: Wed, 20 Oct 2004 13:04:20 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <1098229131.5792.9.camel@gaston> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> Message-ID: <20041020170420.GA8345@sage.raleigh.ibm.com> On Wed, Oct 20, 2004 at 09:38:52AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2004-10-20 at 09:00, Sonny Rao wrote: > > > Ben, I'm still seeing this issue with 2.6.9 final, do you need > > anything else? I'm sure you're very busy, but please let me know if I > > can help. > > Well, I can't reproduce here, but it seem basically that one of the > calls to alloc_down() is failing, you may want to trace a bit. I'll > try to find by myself too & let you know. > > Ben. I can reproduce this on a p615 as well. I did a little bit of superficial tracking. The call to alloc_down fails because (RELOC(alloc_top) == RELOC(rmo_top)) is false. On LPAR platforms, alloc_top is set to rmo_top in prom_init_mem. However, for the p615, prom_find_machine_type() returns PLATFORM_PSERIES, which causes the logic in prom_init_mem to set alloc_top to 0x40000000. I can work around this by modifying prom_init_mem to set alloc_top to rmo_top if of_platform is either PLATFORM_PSERIES_LPAR or PLATFORM_PSERIES. This allows me to boot a 2.6.9-rc4 kernel on a p615. Hope this helps. -Craig From arnd at arndb.de Thu Oct 21 03:08:17 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Oct 2004 19:08:17 +0200 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <200410201908.18273.arnd@arndb.de> On Middeweken 20 Oktober 2004 16:44, David Howells wrote: > diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S > --- linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S 2004-06-18 13:43:49.000000000 +0100 > +++ linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S 2004-10-20 15:08:00.071403677 +0100 > @@ -1406,3 +1406,29 @@ compat_sys_mq_getsetattr_wrapper: > llgtr %r3,%r3 # struct compat_mq_attr * > llgtr %r4,%r4 # struct compat_mq_attr * > jg compat_sys_mq_getsetattr > + > + .globl sys32_add_key_wrapper > +sys32_add_key_wrapper: > + lgfr %r2,%r2 # const char * > + llgfr %r3,%r3 # const char * > + llgfr %r4,%r4 # const void * > + llgfr %r5,%r5 # size_t > + llgfr %r6,%r6 # key_serial_t > + jg sys_add_key # branch to system call > + > + .globl sys32_request_key_wrapper > +sys32_request_key_wrapper: > + lgfr %r2,%r2 # const char * > + llgfr %r3,%r3 # const char * > + llgfr %r4,%r4 # const char * > + llgfr %r5,%r5 # key_serial_t > + jg sys_request_key # branch to system call > + > + .globl sys32_keyctl_wrapper > +sys32_keyctl_wrapper: > + lgfr %r2,%r2 # int > + llgfr %r3,%r3 # unsigned long > + llgfr %r4,%r4 # unsigned long > + llgfr %r5,%r5 # unsigned long > + llgfr %r6,%r6 # unsigned long > + jg sys_keyctl # branch to system call The comments don't match with the code. Please use the correct lgfr/llgfr/llgtr opcodes for signed/unsigned/pointer extension. Note that for keyctl_wrapper, the actual conversion is not static but depends on the value of %r2. You probably want to code that conversion in C. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/2a01c183/attachment.pgp From akpm at osdl.org Thu Oct 21 03:50:27 2004 From: akpm at osdl.org (Andrew Morton) Date: Wed, 20 Oct 2004 10:50:27 -0700 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020152957.GA21774@infradead.org> References: <3506.1098283455@redhat.com> <20041020152957.GA21774@infradead.org> Message-ID: <20041020105027.54bf9e89.akpm@osdl.org> Christoph Hellwig wrote: > > > Hi Linus, Andrew, > > > > The attached patch adds syscalls for almost all archs (everything barring > > m68knommu which is in a real mess, and i386 which already has it). > > > > It also adds 32->64 compatibility where appropriate. > > Umm, that patch added the damn multiplexer that had been vetoed multiple > times. Why did this happen? Fifteen new syscalls was judged excessive and the keyfs interface was judged slow and bloaty. From hch at infradead.org Thu Oct 21 04:18:50 2004 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 20 Oct 2004 19:18:50 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020105027.54bf9e89.akpm@osdl.org> References: <3506.1098283455@redhat.com> <20041020152957.GA21774@infradead.org> <20041020105027.54bf9e89.akpm@osdl.org> Message-ID: <20041020181850.GA23979@infradead.org> On Wed, Oct 20, 2004 at 10:50:27AM -0700, Andrew Morton wrote: > Christoph Hellwig wrote: > > > > > Hi Linus, Andrew, > > > > > > The attached patch adds syscalls for almost all archs (everything barring > > > m68knommu which is in a real mess, and i386 which already has it). > > > > > > It also adds 32->64 compatibility where appropriate. > > > > Umm, that patch added the damn multiplexer that had been vetoed multiple > > times. Why did this happen? > > Fifteen new syscalls was judged excessive and the keyfs interface was > judged slow and bloaty. Maybe 15 syscalls just means the API is goddamn awfull and we certainly shouldn't merge it as-is. From linas at austin.ibm.com Thu Oct 21 04:45:01 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 20 Oct 2004 13:45:01 -0500 Subject: status of ppc64 patches In-Reply-To: <20041020010301.GA29579@4> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> <20041020010301.GA29579@4> Message-ID: <20041020184501.GF10026@austin.ibm.com> On Tue, Oct 19, 2004 at 08:03:01PM -0500, Olof Johansson was heard to remark: > > Also: Regarding re-basing patches: It has to be the duty of the developer > of the patch to re-base it to current trees if it will no longer apply > cleanly. I think this misses the point. I've re-based some of my patches more than half-a-dozen times, and this has gotten so tedious that I've just sort of stopped bothering sending in patches. Excessive delays in moving patches upstream just kills the development process. Patches need to be handled in a timely manner, while they are still 'fresh', so that they don't need to be rebased. Put it another way: it is, at this time, impossible for me to rebase, because I know that my patches will conflict with others in the un-applied patch queue. So all I can do is wait for the patch queue to shrink, wait till the others get into the Torvalds tree, then bk pull, then hurry, hurry, rebase, test, submit, and hope I get in before someone else does and wrecks it again. The turn-around time for "getting lucky" like this is over a month, and if one doesn't get lucky the first month, one has to wait a whole 'nother month for one's next shot. --linas From jschopp at austin.ibm.com Thu Oct 21 05:28:35 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 20 Oct 2004 14:28:35 -0500 Subject: status of ppc64 patches In-Reply-To: <16758.21948.795730.268143@cargo.ozlabs.ibm.com> References: <41754644.1010003@austin.ibm.com> <16758.21948.795730.268143@cargo.ozlabs.ibm.com> Message-ID: <4176BC63.8000700@austin.ibm.com> > As far as your patches are concerned, I am aware of two patches that > change things so that we have __boot variants of __pa etc. However, > your explanation didn't really get me excited about the change. You > said something about "moving towards hotplug memory" but you didn't > explain why these changes would help with that, or how I should choose > which function to use when I'm making changes in future (that should > actually go in a file somewhere under the Documentation directory), or > why those changes need to go in now. The direct answer is that this is a big part of the size of the CONFIG_NONLINEAR patch, without the controversial part that actually does CONFIG_NONLINEAR. CONFIG_NONLINEAR allows us to have big holes in physical memory and to grow physical memory after boot. These changes will be necessary for whatever ends up filling the role CONFIG_NONLINEAR currently does in our hotplug memory tree. So even if you hate CONFIG_NONLINEAR these patches will be necessary for memory hotplug because we will have to differentiate early boot memory from normal memory. We have a tree that does memory add, and is part of the way to doing remove. http://sprucegoose.sr71.net/patches It has 76 patches currently. It is a real job to continue to forward port it. We are trying to get it all upstream. But of course it would be insane to merge 76 very complex patches at once, especially when a few of them are still buggy. These changes need to go in now because they don't hurt anything and they help us a great deal on a project most everybody agrees is a good idea (memory hotplug). If we didn't have a continuous development model they could be ignored until 2.7, but to get large features into a kernel that is always stable it is necessary to merge things a bit at a time. Even if those bits are only worthwhile in the context of the yet unmerged bits. And I apologize for not making this all clear in my initial message. From paulus at samba.org Thu Oct 21 07:30:56 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 21 Oct 2004 07:30:56 +1000 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> Message-ID: <16758.55568.809557.670513@cargo.ozlabs.ibm.com> Jake, > We should probably check to make sure that all of the flash > list headers are above 4gig. Not just the first one. Why is the limit 4GB rather than the RMO size? Paul. From moilanen at austin.ibm.com Thu Oct 21 08:08:17 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 20 Oct 2004 17:08:17 -0500 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <16758.55568.809557.670513@cargo.ozlabs.ibm.com> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> <16758.55568.809557.670513@cargo.ozlabs.ibm.com> Message-ID: <20041020170817.0ee49b64@localhost> > > We should probably check to make sure that all of the flash > > list headers are above 4gig. Not just the first one. > > Why is the limit 4GB rather than the RMO size? According to the RPA (item E7-41 to be exact), the block-list can be anywhere under 4 gigs. RTAS will make hypervisor calls to access this memory. I would infer the reason why they want to allow the block-list outside the RMO is otherwise it may have been difficult for the OS to get an entire flash image under the RMO boundary. Thanks, Jake From davem at davemloft.net Thu Oct 21 08:01:49 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 15:01:49 -0700 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020150149.7be06d6d.davem@davemloft.net> David, I applaud your effort to take care of this. However, this patch will conflict with what I've sent into Linus already for Sparc. I also had to add the sys_altroot syscall entry as well. I've mentioned several times that perhaps the best way to deal with this problem is to purposefully break the build of platforms when new system calls are added. Simply adding a: #error new syscall entries for X and Y needed to include/asm-*/unistd.h would handle this just fine I think. That way it won't be missed, and if the platform maintainer wants to just ignore the new syscall they can choose to do that as well. From olof at austin.ibm.com Thu Oct 21 08:26:41 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 20 Oct 2004 17:26:41 -0500 Subject: [PATCH] create iommu_free_table() In-Reply-To: <1097171661.7087.1.camel@sinatra.austin.ibm.com> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> Message-ID: <4176E621.3040607@austin.ibm.com> John Rose wrote: > The patch below creates iommu_free_table(). Iommu tables are not currently > freed in PPC64. This could cause a memory leak for DLPAR of an EADS slot. The > function verifies that there are no outstanding TCE entries for the range of > the table before freeing it. I added a call to iommu_free_table() to the code > that dynamically removes a device node. This should be fairly symmetrical with > the table allocation, which happens during dynamic addition of a device node. > > Comments welcome. Looks good, just a couple of minor nitpicks below. -Olof > Signed-off-by: John Rose > > diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c > --- a/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 > +++ b/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 > @@ -412,6 +412,38 @@ > dn->iommu_table = iommu_init_table(tbl); > } > > +void iommu_free_table(struct device_node *dn) > +{ > + struct iommu_table *tbl = dn->iommu_table; > + unsigned long bitmap_sz, i; > + unsigned int order; > + > + if (!tbl || !tbl->it_map) { whitespace above looks wrong (or below?) > + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, > + dn->full_name); > + return; > + } > + > + /* verify that table contains no entries */ > + /* it_mapsize is in entries, and we're examining 64 at a time */ > + for (i = 0; i < (tbl->it_mapsize/64); i++) { > + if (tbl->it_map[i] != 0) { > + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", > + __FUNCTION__, dn->full_name); > + break; > + } Could this get spammy? It could be nice to see a WARN_ON(1) too, so the call stack is dumped. If that's added, a printk_ratelimit() would definately be warranted around both the printk and the WARN_ON(). > + } > + > + /* calculate bitmap size in bytes */ > + bitmap_sz = (tbl->it_mapsize + 7) / 8; > + > + /* free bitmap */ > + order = get_order(bitmap_sz); > + free_pages((unsigned long) tbl->it_map, order); > + > + /* free table */ > + kfree(tbl); whitespace > +} > > void iommu_setup_pSeries(void) > { > diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c > --- a/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 > +++ b/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 > @@ -1818,6 +1818,9 @@ > return -EBUSY; > } > > + if (np->iommu_table) > + iommu_free_table(np); > + > write_lock(&devtree_lock); > OF_MARK_STALE(np); > remove_node_proc_entries(np); > diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h > --- a/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 > +++ b/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 > @@ -113,6 +113,9 @@ > /* Creates table for an individual device node */ > extern void iommu_devnode_init(struct device_node *dn); > > +/* Frees table for an individual device node */ > +extern void iommu_free_table(struct device_node *dn); > + > #endif /* CONFIG_PPC_MULTIPLATFORM */ > > #ifdef CONFIG_PPC_ISERIES > > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > From davem at davemloft.net Thu Oct 21 09:04:50 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 16:04:50 -0700 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020225625.GD995@wotan.suse.de> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> Message-ID: <20041020160450.0914270b.davem@davemloft.net> On Thu, 21 Oct 2004 00:56:25 +0200 Andi Kleen wrote: > I don't think that's a good idea. Normally new system calls > are relatively obscure and the system works fine without them, > so urgent action is not needed. > > And I think we can trust architecture maintainers to regularly > sync the system calls with i386. I disagree quite strongly. One major frustration for users of non-x86 platforms is that functionality is often missing for some time that we can make trivial to keep in sync. I religiously watch what goes into Linus's tree for this purpose, but that is kind of a rediculious burdon to expect every platform maintainer to do. It's not just system calls, we have signal handling bug fixes, trap handling infrastructure, and now the nice generic IRQ handling subsystem as other examples. Simply put, if you're not watching the tree in painstaking detail every day, you miss all of these enhancements. The knowledge should come from the person putting the changes into the tree, therefore it gets done once and this makes it so that the other platform maintainers will find out about it automatically next time they update their tree. From ak at suse.de Thu Oct 21 09:25:09 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 01:25:09 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <20041020232509.GF995@wotan.suse.de> On Wed, Oct 20, 2004 at 04:04:50PM -0700, David S. Miller wrote: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: > > > I don't think that's a good idea. Normally new system calls > > are relatively obscure and the system works fine without them, > > so urgent action is not needed. > > > > And I think we can trust architecture maintainers to regularly > > sync the system calls with i386. > > I disagree quite strongly. One major frustration for users of > non-x86 platforms is that functionality is often missing for some > time that we can make trivial to keep in sync. I'm not sure really if the users of some embedded platform are all sheering for key management system calls... I guess they will prefer just something that compiles. > > I religiously watch what goes into Linus's tree for this purpose, > but that is kind of a rediculious burdon to expect every platform > maintainer to do. It's not just system calls, we have signal handling > bug fixes, trap handling infrastructure, and now the nice generic > IRQ handling subsystem as other examples. Most of that is optional. When the arch maintainer choses not to use it you have just unnecessarily broken the build. IMHO breaking the build unnecessarily is extremly bad because it will prevent all testing. And would you really want to hold up the whole linux testing machinery just for some obscure system call? IMHO not a good tradeoff. > > Simply put, if you're not watching the tree in painstaking detail > every day, you miss all of these enhancements. I would assume the other maintainers go at least from time to time through the i386 diffs and check if they miss anything (that is what I do). For system calls they do definitely, although it may take some time. > > The knowledge should come from the person putting the changes into > the tree, therefore it gets done once and this makes it so that > the other platform maintainers will find out about it automatically > next time they update their tree. And causing merging headaches and all kind of other problems. -Andi From ak at suse.de Thu Oct 21 08:56:25 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 00:56:25 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020150149.7be06d6d.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> Message-ID: <20041020225625.GD995@wotan.suse.de> On Wed, Oct 20, 2004 at 03:01:49PM -0700, David S. Miller wrote: > > David, I applaud your effort to take care of this. > However, this patch will conflict with what I've > sent into Linus already for Sparc. I also had to > add the sys_altroot syscall entry as well. > > I've mentioned several times that perhaps the best > way to deal with this problem is to purposefully > break the build of platforms when new system calls > are added. > > Simply adding a: > > #error new syscall entries for X and Y needed > > to include/asm-*/unistd.h would handle this just > fine I think. I don't think that's a good idea. Normally new system calls are relatively obscure and the system works fine without them, so urgent action is not needed. And I think we can trust architecture maintainers to regularly sync the system calls with i386. -Andi From davem at davemloft.net Thu Oct 21 09:41:44 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 16:41:44 -0700 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020232509.GF995@wotan.suse.de> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> <20041020232509.GF995@wotan.suse.de> Message-ID: <20041020164144.3457eafe.davem@davemloft.net> On Thu, 21 Oct 2004 01:25:09 +0200 Andi Kleen wrote: > IMHO breaking the build unnecessarily is extremly bad because > it will prevent all testing. And would you really want to hold > up the whole linux testing machinery just for some obscure > system call? IMHO not a good tradeoff. Then change the unistd.h cookie from "#error" to a "#warning". It accomplishes both of our goals. From ak at suse.de Thu Oct 21 10:10:42 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 02:10:42 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020164144.3457eafe.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> <20041020232509.GF995@wotan.suse.de> <20041020164144.3457eafe.davem@davemloft.net> Message-ID: <20041021001041.GI995@wotan.suse.de> On Wed, Oct 20, 2004 at 04:41:44PM -0700, David S. Miller wrote: > On Thu, 21 Oct 2004 01:25:09 +0200 > Andi Kleen wrote: > > > IMHO breaking the build unnecessarily is extremly bad because > > it will prevent all testing. And would you really want to hold > > up the whole linux testing machinery just for some obscure > > system call? IMHO not a good tradeoff. > > Then change the unistd.h cookie from "#error" to a "#warning". It > accomplishes both of our goals. #warnings would be fine for me. -Andi From benh at kernel.crashing.org Thu Oct 21 11:30:59 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:30:59 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <20041020170420.GA8345@sage.raleigh.ibm.com> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> <20041020170420.GA8345@sage.raleigh.ibm.com> Message-ID: <1098322258.4183.15.camel@gaston> On Thu, 2004-10-21 at 03:04, Craig Chaney wrote: > which causes the logic in prom_init_mem to set alloc_top to 0x40000000. > > I can work around this by modifying prom_init_mem to set alloc_top to rmo_top > if of_platform is either PLATFORM_PSERIES_LPAR or PLATFORM_PSERIES. This > allows me to boot a 2.6.9-rc4 kernel on a p615. Yes, alloc_top and rmo_top should be both "clamped". Can you try that patch and let me know ? Index: linux-work/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom_init.c 2004-10-20 18:38:08.911500096 +1000 +++ linux-work/arch/ppc64/kernel/prom_init.c 2004-10-21 11:30:23.570248584 +1000 @@ -675,7 +675,7 @@ if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) RELOC(alloc_top) = RELOC(rmo_top); else - RELOC(alloc_top) = min(0x40000000ul, RELOC(ram_top)); + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); RELOC(alloc_top_high) = RELOC(ram_top); From david at gibson.dropbear.id.au Thu Oct 21 11:32:07 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 11:32:07 +1000 Subject: [PPC64] Don't build virtual IO drivers for PowerMac Message-ID: <20041021013207.GH17760@zax> Andrew, please apply: Only compile vio.c on iSeries and pSeries, since other PPC64 platforms (PowerMac) don't use virtual IO. The resulting #ifdefs in dma.c are kind of ugly, but at least contained, and I can't see a nicer way of doing it for the time being. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/Makefile 2004-10-05 15:47:16.541962864 +1000 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o vio.o + iommu.o sysfs.o obj-$(CONFIG_PPC_OF) += of_device.o @@ -45,6 +45,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_BOOTX_TEXT) += btext.o obj-$(CONFIG_HVCS) += hvcserver.o +obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \ pmac_time.o pmac_nvram.o pmac_low_i2c.o \ Index: working-2.6/arch/ppc64/Kconfig =================================================================== --- working-2.6.orig/arch/ppc64/Kconfig 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/Kconfig 2004-10-05 15:47:16.541962864 +1000 @@ -110,6 +110,11 @@ processors, that is, which share physical processors between two or more partitions. +config IBMVIO + depends on PPC_PSERIES || PPC_ISERIES + bool + default y + config U3_DART bool depends on PPC_MULTIPLATFORM Index: working-2.6/arch/ppc64/kernel/dma.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/dma.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/dma.c 2004-10-05 16:02:01.372034952 +1000 @@ -17,8 +17,10 @@ { if (dev->bus == &pci_bus_type) return pci_dma_supported(to_pci_dev(dev), mask); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_dma_supported(to_vio_dev(dev), mask); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -28,8 +30,10 @@ { if (dev->bus == &pci_bus_type) return pci_set_dma_mask(to_pci_dev(dev), dma_mask); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_set_dma_mask(to_vio_dev(dev), dma_mask); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -40,8 +44,10 @@ { if (dev->bus == &pci_bus_type) return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_alloc_consistent(to_vio_dev(dev), size, dma_handle); +#endif /* CONFIG_IBMVIO */ BUG(); return NULL; } @@ -52,8 +58,10 @@ { if (dev->bus == &pci_bus_type) pci_free_consistent(to_pci_dev(dev), size, cpu_addr, dma_handle); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_free_consistent(to_vio_dev(dev), size, cpu_addr, dma_handle); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -64,8 +72,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_single(to_pci_dev(dev), cpu_addr, size, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_single(to_vio_dev(dev), cpu_addr, size, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return (dma_addr_t)0; } @@ -76,8 +86,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_single(to_pci_dev(dev), dma_addr, size, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_single(to_vio_dev(dev), dma_addr, size, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -89,8 +101,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_page(to_pci_dev(dev), page, offset, size, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_page(to_vio_dev(dev), page, offset, size, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return (dma_addr_t)0; } @@ -101,8 +115,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_page(to_pci_dev(dev), dma_address, size, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_page(to_vio_dev(dev), dma_address, size, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -113,8 +129,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_sg(to_pci_dev(dev), sg, nents, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_sg(to_vio_dev(dev), sg, nents, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -125,8 +143,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_sg(to_pci_dev(dev), sg, nhwentries, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_sg(to_vio_dev(dev), sg, nhwentries, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Thu Oct 21 11:35:49 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 11:35:49 +1000 Subject: [PPC64] Trivial sparse cleanups Message-ID: <20041021013549.GI17760@zax> Andrew, please apply: This patch squashes a handful of assorted sparse warnings in the ppc64 code. Should be pretty much trivial and self explanatory. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/nvram.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/nvram.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/nvram.c 2004-10-21 11:34:39.057902952 +1000 @@ -77,7 +77,7 @@ } -static ssize_t dev_nvram_read(struct file *file, char *buf, +static ssize_t dev_nvram_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { ssize_t len; @@ -117,7 +117,7 @@ } -static ssize_t dev_nvram_write(struct file *file, const char *buf, +static ssize_t dev_nvram_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { ssize_t len; Index: working-2.6/arch/ppc64/kernel/setup.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/setup.c 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/setup.c 2004-10-21 11:34:39.059902648 +1000 @@ -1111,7 +1111,7 @@ { /* ensure xmon is enabled */ xmon_init(); - debugger(0); + debugger(NULL); return 0; } Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-10-20 10:52:39.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-10-21 11:34:39.060902496 +1000 @@ -249,7 +249,7 @@ { if (within_hugepage_high_range(addr, len)) return 0; - else if ((addr < 0x100000000) && ((addr+len) < 0x100000000)) { + else if ((addr < 0x100000000UL) && ((addr+len) < 0x100000000UL)) { int err; /* Yes, we need both tests, in case addr+len overflows * 64-bit arithmetic */ Index: working-2.6/arch/ppc64/mm/hash_utils.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hash_utils.c 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hash_utils.c 2004-10-21 11:34:39.060902496 +1000 @@ -401,7 +401,7 @@ info.si_signo = SIGBUS; info.si_errno = 0; info.si_code = BUS_ADRERR; - info.si_addr = (void *)address; + info.si_addr = (void __user *)address; force_sig_info(SIGBUS, &info, current); return; } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Thu Oct 21 11:51:10 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:51:10 +1000 Subject: status of ppc64 patches In-Reply-To: <20041020184501.GF10026@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> <20041020010301.GA29579@4> <20041020184501.GF10026@austin.ibm.com> Message-ID: <1098323469.20954.27.camel@gaston> On Thu, 2004-10-21 at 04:45, Linas Vepstas wrote: > Put it another way: it is, at this time, impossible for me to rebase, > because I know that my patches will conflict with others in the > un-applied patch queue. So all I can do is wait for the patch queue to > shrink, wait till the others get into the Torvalds tree, then bk pull, > then hurry, hurry, rebase, test, submit, and hope I get in before > someone else does and wrecks it again. The turn-around time for > "getting lucky" like this is over a month, and if one doesn't get > lucky the first month, one has to wait a whole 'nother month for > one's next shot. For some reason, it seems other people have a lot more luck than you do ... Ben. From benh at kernel.crashing.org Thu Oct 21 11:55:32 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:55:32 +1000 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <1098323732.20955.31.camel@gaston> On Thu, 2004-10-21 at 09:04, David S. Miller wrote: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: > > > I don't think that's a good idea. Normally new system calls > > are relatively obscure and the system works fine without them, > > so urgent action is not needed. > > > > And I think we can trust architecture maintainers to regularly > > sync the system calls with i386. > > I disagree quite strongly. One major frustration for users of > non-x86 platforms is that functionality is often missing for some > time that we can make trivial to keep in sync. I agree with David here. It's also easy for arch/platform maintainers to "miss" a new syscall too ... for various reasons, we can't all read _everything_ that gets posted to lkml and we all do occasionally miss some csets going upstream, which means we can very well totally "forget" about addint the new syscall to the arch ... until somebody complains, which can be 1 or 2 releases later ! > I religiously watch what goes into Linus's tree for this purpose, > but that is kind of a rediculious burdon to expect every platform > maintainer to do. It's not just system calls, we have signal handling > bug fixes, trap handling infrastructure, and now the nice generic > IRQ handling subsystem as other examples. Right. > Simply put, if you're not watching the tree in painstaking detail > every day, you miss all of these enhancements. > > The knowledge should come from the person putting the changes into > the tree, therefore it gets done once and this makes it so that > the other platform maintainers will find out about it automatically > next time they update their tree. Agreed, Ben. From david at gibson.dropbear.id.au Thu Oct 21 13:36:17 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 13:36:17 +1000 Subject: [PPC64] xmon sparse cleanups Message-ID: <20041021033617.GK17760@zax> Andrew, please apply: This patch removes many sparse warnings from the xmon code. Mostly K&R function declarations and 0-instead-of-NULLs. There are still a whole bunch of warnings in xmon/ppc-opc.c, which is a copy of a file from binutils. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/xmon/xmon.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/xmon.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/xmon.c 2004-10-05 16:31:01.822963256 +1000 @@ -645,7 +645,7 @@ for (i = 0; i < NBPTS; ++i, ++bp) if (bp->enabled && pc == bp->address) return bp; - return 0; + return NULL; } static struct bpt *in_breakpoint_table(unsigned long nip, unsigned long *offp) @@ -1582,7 +1582,7 @@ extern char dec_exc; void -super_regs() +super_regs(void) { int cmd; unsigned long val; @@ -1816,7 +1816,7 @@ ""; void -memex() +memex(void) { int cmd, inc, i, nslash; unsigned long n; @@ -1967,7 +1967,7 @@ } int -bsesc() +bsesc(void) { int c; @@ -1985,7 +1985,7 @@ || ('a' <= (c) && (c) <= 'f') \ || ('A' <= (c) && (c) <= 'F')) void -dump() +dump(void) { int c; @@ -2150,7 +2150,7 @@ static unsigned mask; void -memlocate() +memlocate(void) { unsigned a, n; unsigned char val[4]; @@ -2183,7 +2183,7 @@ static unsigned long mlim = 0xffffffff; void -memzcan() +memzcan(void) { unsigned char v; unsigned a; @@ -2212,7 +2212,7 @@ /* Input scanning routines */ int -skipbl() +skipbl(void) { int c; @@ -2237,8 +2237,7 @@ }; int -scanhex(vp) -unsigned long *vp; +scanhex(unsigned long *vp) { int c, d; unsigned long v; @@ -2322,7 +2321,7 @@ } void -scannl() +scannl(void) { int c; @@ -2365,13 +2364,13 @@ static char *lineptr; void -flush_input() +flush_input(void) { lineptr = NULL; } int -inchar() +inchar(void) { if (lineptr == NULL || *lineptr == 0) { if (fgets(line, sizeof(line), stdin) == NULL) { @@ -2384,8 +2383,7 @@ } void -take_input(str) -char *str; +take_input(char *str) { lineptr = str; } Index: working-2.6/arch/ppc64/xmon/start.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/start.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/start.c 2004-10-05 16:33:50.355028808 +1000 @@ -173,7 +173,7 @@ c = xmon_getchar(); if (c == -1) { if (p == str) - return 0; + return NULL; break; } *p++ = c; -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From wjfast at yahoo.com Thu Oct 21 16:33:30 2004 From: wjfast at yahoo.com (Wjeeha Tahir) Date: Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Subject: Booting Linux from HardDisk on iSeries Message-ID: <20041021063330.34212.qmail@web14926.mail.yahoo.com> Hi, This is my first email on this group, and I am really hopeful to find solution to my problem here. I was installing linux on iSeries in my office but was getting problems. I have installed RedHat Linux 9 on an iSeries machine in LPAR. The version of kernel as given by uname -a command is 2.4.21-4.EL However after installation is complete I want to boot from disk rather than the cd drive. I think there is some need to copy some boot image onto the disk. I looked at theTechnical FAQ for Linux on iSeries: http://www-1.ibm.com/servers/eserver/iseries/linux/tech_faq.html#kernel and performed the following steps. I executed the command fdisk-l and the output was as follows: Disk /dev/iseries/vda: 4194 MB, 4194892800 bytes 255 heads, 63 sectors/track, 510 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/iseries/vda1 * 1 2 16033+ 41 PPC PReP Boot /dev/iseries/vda2 3 384 3068415 83 Linux /dev/iseries/vda3 385 510 1012095 82 Linux swap Hence my Prep Partition is /dev/iseries/vda1 Next the implementaion document tells me to execute the command dd if=/boot/vmlinux/good of=/dev/iseries/vda1 bs=4k However the problem is that there is no file by the name of vmlinux.good in the boot directory. I'll show you the listing of boot directory. [root at TestLinux /]# cd /boot [root at TestLinux boot]# ls cmdline-2.4.21-4.EL kernel.h System.map-2.4.21-4.EL config-2.4.21-4.EL message vmlinitrd-2.4.21-4.EL grub message.ja vmlinux-2.4.21-4.EL initrd-2.4.21-4.EL.img System.map Now I am at a loss at to what should be the input file for the dd command. I tried the command: dd if=/boot/vmlinitrd-2.4.21-4.EL of=/dev/iseries/vda1 bs=4k but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and I get the following error: Partition check: iseries/vda: iseries/vda1 iseries/vda2 iseries/vda3 iSeries virtual I/O: viod: Disk 00 size 4000M, sectors 63, heads 255, cylinders 510, sectsize 512 iSeries virtual I/O: viod: Disk 00 partition 01 start sector 63, # sector 32067 iSeries virtual I/O: viod: Disk 00 partition 02 start sector 32130, # sector 6136830 iSeries virtual I/O: viod: Disk 00 partition 03 start sector 6168960, # sector 2024190 Loading jbd.o module Journalled Block Device driver loaded Loading ext3.o module Mounting /proc filesystem Creating block devices Creating root device Mounting root filesystem VFS: Can't find ext3 filesystem on dev viod(112,1). mount: error 22 mounting ext3 pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2 umount /initrd/proc failed: 2 Freeing unused kernel memory: 156k init Kernel panic: No init found. Try passing init= option to kernel. Rebooting in 180 seconds.. Can anyone tell me the exact command specifying what to copy from where and to where. I'll be very thankful if you could help me in this. Kind Regards, Wjeeha Tahir --------------------------------- Do you Yahoo!? vote.yahoo.com - Register online to vote today! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b38e4fb9/attachment.htm From sfr at canb.auug.org.au Thu Oct 21 18:05:46 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 21 Oct 2004 18:05:46 +1000 Subject: Booting Linux from HardDisk on iSeries In-Reply-To: <20041021063330.34212.qmail@web14926.mail.yahoo.com> References: <20041021063330.34212.qmail@web14926.mail.yahoo.com> Message-ID: <20041021180546.780f3090.sfr@canb.auug.org.au> On Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Wjeeha Tahir wrote: > > but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, > "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and ^^^^ This should be vda2 ... Linux did boot, it just could not find its root file system ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/d0a03011/attachment.pgp From wjfast at yahoo.com Thu Oct 21 18:30:41 2004 From: wjfast at yahoo.com (Wjeeha Tahir) Date: Thu, 21 Oct 2004 01:30:41 -0700 (PDT) Subject: Booting Linux from HardDisk on iSeries In-Reply-To: <20041021180546.780f3090.sfr@canb.auug.org.au> Message-ID: <20041021083041.82774.qmail@web14921.mail.yahoo.com> I changed to vda2 but now Linux isnt booting at all. When the console connects to iSreies then the screen is blank. The errors that were being given initially are not appearing now. I am totally stuck. Please help in this regard. Stephen Rothwell wrote: On Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Wjeeha Tahir wrote: > > but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, > "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and ^^^^ This should be vda2 ... Linux did boot, it just could not find its root file system ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ > ATTACHMENT part 2 application/pgp-signature __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/e28b492e/attachment.htm From jbglaw at lug-owl.de Thu Oct 21 18:47:29 2004 From: jbglaw at lug-owl.de (Jan-Benedict Glaw) Date: Thu, 21 Oct 2004 10:47:29 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <20041021084728.GA5033@lug-owl.de> On Wed, 2004-10-20 16:04:50 -0700, David S. Miller wrote in message <20041020160450.0914270b.davem at davemloft.net>: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: *VAX hacker's hat on* > I disagree quite strongly. One major frustration for users of > non-x86 platforms is that functionality is often missing for some > time that we can make trivial to keep in sync. Full ACK. > Simply put, if you're not watching the tree in painstaking detail > every day, you miss all of these enhancements. Right; and these missing enhancements will cause extra-pain when they're used some time later from core code. That is, you missed the feature while it was discusses/accepted and need to put it in place later on. So you've got to do extra searching etc. > The knowledge should come from the person putting the changes into > the tree, therefore it gets done once and this makes it so that > the other platform maintainers will find out about it automatically > next time they update their tree. Here's my proposal: $ mkdir ./Documentation/new_enhancements_to_implement $ cat ./Documentation/new_enhancements_to_implement/new_key_syscalls << EOF > Dear Architecture Maintailers, > > please add these four new cryptographic key functions to your syscall > table. It's quite easy; just extend the ./include/arch-xxx/unistd.h > for four new defines and then add them to your ./arch/xxx/kernel/entry.S > file. For reference, here's my i386 patch doing this: > > diff -Nurp > --- path-old/to/file/one > +++ path-new/to/file/one > text > -del > +add > more text > > > Thanks, your keychain hacker:-) > EOF $ This way, all arch maintainers just *see* what needs to be done and get a small introduction on how to do that. I'd *really* like to see that! That would particularly help those that cannot do full-time hacking on their port (like us VAX hackers:-) MfG, JBG -- Jan-Benedict Glaw jbglaw at lug-owl.de . +49-172-7608481 _ O _ "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak! O O O ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA)); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/29da377d/attachment.pgp From wjfast at yahoo.com Thu Oct 21 19:33:32 2004 From: wjfast at yahoo.com (Wjeeha Tahir) Date: Thu, 21 Oct 2004 02:33:32 -0700 (PDT) Subject: Fwd: Re: Booting Linux from HardDisk on iSeries Message-ID: <20041021093332.48090.qmail@web14927.mail.yahoo.com> Just a correction.. after i changed to vda2 the following errors appear on the console: mf.c: Preparing to bounce... LINUXRH : Console connected. pty: 2048 Unix98 ptys configured NET4: Frame Diverter 0.46 RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Initializing Cryptographic API NET4: Linux TCP/IP 1.0 for NET4.0 IP: routing cache hash table of 2048 buckets, 32Kbytes TCP: Hash tables configured (established 16384 bind 16384) Linux IP multicast router 0.06 plus PIM-SM Initializing IPsec netlink socket NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: Compressed image found at block 0 Freeing initrd memory: 788k freed VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 156k init Kernel panic: No init found. Try passing init= option to kernel. Rebooting in 180 seconds.. What to do now?? Thanks and Kind Regards, Wjeeha Tahir Wjeeha Tahir wrote: Date: Thu, 21 Oct 2004 01:30:41 -0700 (PDT) From: Wjeeha Tahir Subject: Re: Booting Linux from HardDisk on iSeries To: Stephen Rothwell CC: linuxppc64-dev at ozlabs.org I changed to vda2 but now Linux isnt booting at all. When the console connects to iSreies then the screen is blank. The errors that were being given initially are not appearing now. I am totally stuck. Please help in this regard. Stephen Rothwell wrote: On Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Wjeeha Tahir wrote: > > but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, > "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and ^^^^ This should be vda2 ... Linux did boot, it just could not find its root file system ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ > ATTACHMENT part 2 application/pgp-signature __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------- Do you Yahoo!? vote.yahoo.com - Register online to vote today! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/7eb1a1b1/attachment.htm From cchaney at us.ibm.com Thu Oct 21 23:20:54 2004 From: cchaney at us.ibm.com (Craig Chaney) Date: Thu, 21 Oct 2004 09:20:54 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <1098322258.4183.15.camel@gaston> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> <20041020170420.GA8345@sage.raleigh.ibm.com> <1098322258.4183.15.camel@gaston> Message-ID: <20041021132054.GA15732@sage.raleigh.ibm.com> On Thu, Oct 21, 2004 at 11:30:59AM +1000, Benjamin Herrenschmidt wrote: > Yes, alloc_top and rmo_top should be both "clamped". Can you try that > patch and let me know ? Yup, it worked. Your patch allows 2.6.9-rc4 to boot on a p615. Thanks, Craig From sonny at burdell.org Fri Oct 22 01:41:04 2004 From: sonny at burdell.org (Sonny Rao) Date: Thu, 21 Oct 2004 11:41:04 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <20041021132054.GA15732@sage.raleigh.ibm.com> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> <20041020170420.GA8345@sage.raleigh.ibm.com> <1098322258.4183.15.camel@gaston> <20041021132054.GA15732@sage.raleigh.ibm.com> Message-ID: <20041021154104.GA15267@kevlar.burdell.org> On Thu, Oct 21, 2004 at 09:20:54AM -0400, Craig Chaney wrote: > On Thu, Oct 21, 2004 at 11:30:59AM +1000, Benjamin Herrenschmidt wrote: > > Yes, alloc_top and rmo_top should be both "clamped". Can you try that > > patch and let me know ? > > Yup, it worked. Your patch allows 2.6.9-rc4 to boot on a p615. Also worked on 2.6.9 final, thanks guys. Sonny From mjr at us.ibm.com Fri Oct 22 01:17:44 2004 From: mjr at us.ibm.com (Mike Ranweiler) Date: Thu, 21 Oct 2004 10:17:44 -0500 Subject: Fwd: Re: Booting Linux from HardDisk on iSeries In-Reply-To: <20041021093332.48090.qmail@web14927.mail.yahoo.com> References: <20041021093332.48090.qmail@web14927.mail.yahoo.com> Message-ID: <200410211017.45047.mjr@us.ibm.com> On Thursday 21 October 2004 04:33, Wjeeha Tahir wrote: > Just a correction.. after i changed to vda2 the following errors appear on > the console: I thought RHEL3 usually used something like 'ro root=LABEL=/' for a cmdline. The easiest way to do this is to boot from the B side and then put whatever's in /proc/cmdline from that boot into your IPL Parameters. You can also see that from strsst, 5, 1, 11, F10. Mike From benh at kernel.crashing.org Fri Oct 22 11:02:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 22 Oct 2004 11:02:44 +1000 Subject: [PATCH] ppc64: Fix boot on some non-LPAR pSeries Message-ID: <1098406963.6008.13.camel@gaston> Hi ! This patch fixes a problem when allocating the TCE tables (iommu) during early boot on some non-LPAR machines with a lot of memory. Signed-off-by: Benjamin Herrenschmidt --- linux-work.orig/arch/ppc64/kernel/prom_init.c 2004-10-20 18:38:08.911500096 +1000 +++ linux-work/arch/ppc64/kernel/prom_init.c 2004-10-21 11:30:23.570248584 +1000 @@ -675,7 +675,7 @@ if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) RELOC(alloc_top) = RELOC(rmo_top); else - RELOC(alloc_top) = min(0x40000000ul, RELOC(ram_top)); + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); RELOC(alloc_top_high) = RELOC(ram_top); From paulus at samba.org Fri Oct 22 11:59:17 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 22 Oct 2004 11:59:17 +1000 Subject: [PATCH] add syslog printing to xmon debugger. In-Reply-To: <20040916230647.GN9645@austin.ibm.com> References: <20040916230647.GN9645@austin.ibm.com> Message-ID: <16760.26997.131687.456670@cargo.ozlabs.ibm.com> Linas, > Andrew, > > Please apply at least the kernel/printk.c part of the patch, > if you are feeling at all charitable. Did you ever get any reaction to that? Paul. From paulus at samba.org Fri Oct 22 13:49:48 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 22 Oct 2004 13:49:48 +1000 Subject: [PATCH 1/1] ppc64: Block config accesses during BIST In-Reply-To: <200409012158.i81LwRGY176052@northrelay04.pok.ibm.com> References: <200409012158.i81LwRGY176052@northrelay04.pok.ibm.com> Message-ID: <16760.33628.666087.631340@cargo.ozlabs.ibm.com> Brian King writes: > Some PCI adapters on pSeries and iSeries hardware (ipr scsi adapters) > have an exposure today in that they issue BIST to the adapter to reset > the card. If, during the time it takes to complete BIST, userspace attempts > to access PCI config space, the host bus bridge will master abort the access > since the ipr adapter does not respond on the PCI bus for a brief period of > time when running BIST. This master abort results in the host PCI bridge > isolating that PCI device from the rest of the system, making the device > unusable until Linux is rebooted. This patch is an attempt to close that > exposure by introducing some blocking code in the arch specific PCI code. > The intent is to have the ipr device driver invoke these routines to > prevent userspace PCI accesses from occurring during this window. > > It has been tested by running BIST on an ipr adapter while running a > script which looped reading the config space of that adapter through sysfs. > Without the patch, an EEH error occurrs. With the patch there is no EEH > error. Tested on Power 5 and iSeries Power 4. The general idea seems fine to me. There are a couple of things I don't like about the patch though: (1) I don't see why we need separate implementations of pci_block_config_io, pci_unblock_config_io and pci_start_bist for iSeries and for the rest. (Maybe that just points up that we still have gratuitous differences between the iSeries and non-iSeries PCI code.) (2) I don't think we need to add a spinlock to the device node structure. A single global spinlock should suffice, particularly since we get serialized on the RTAS call anyway, and therefore there is no incentive to try to provide parallelism at the higher levels. Comments? Paul. From nathanl at austin.ibm.com Fri Oct 22 19:19:56 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 22 Oct 2004 04:19:56 -0500 Subject: [PATCH] ppc64: cpu hotplug notifier for numa Message-ID: <1098436795.17305.22.camel@biclops> The NUMA properties of all "possible" cpus are not necessarily available at boot time on pSeries LPAR. Only the properties for present cpus are known. This patch modifies the ppc64 numa code to map a cpu to its node right before it is brought up -- this means that secondary cpus are now mapped to their nodes during smp_init() (regardless of whether CONFIG_HOTPLUG_CPU=y). Cpus are removed from their nodes after they have gone offline. Also some minor cleanups: - Stash the "minimum common depth" in a global at boot time, so we don't have to rediscover it every time something changes. - Remove unnecessary variable from of_get_associativity() which is accessed while possibly uninitialized. - Remove the cpu portion from dump_numa_topology() since it will show only the boot cpu now. We could display this information from smp_cpus_done() if necessary. Tested on a 4-way 2-node Power5 system. Signed-off-by: Nathan Lynch --- numa.c | 192 +++++++++++++++++++++++++-------------------- 1 files changed, 108 insertions(+), 84 deletions(-) diff -puN arch/ppc64/mm/numa.c~ppc64-numa-cpu-hotplug-notifier arch/ppc64/mm/numa.c --- 2.6.9-bk6/arch/ppc64/mm/numa.c~ppc64-numa-cpu-hotplug-notifier 2004-10-22 01:37:04.000000000 -0500 +++ 2.6.9-bk6-nathanl/arch/ppc64/mm/numa.c 2004-10-22 01:37:04.000000000 -0500 @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #include #include @@ -39,6 +41,7 @@ int nr_cpus_in_node[MAX_NUMNODES] = { [0 struct pglist_data *node_data[MAX_NUMNODES]; bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; static unsigned long node0_io_hole_size; +static int min_common_depth; /* * We need somewhere to store start/span for each node until we have @@ -64,7 +67,24 @@ static inline void map_cpu_to_node(int c } } -static struct device_node * __init find_cpu_node(unsigned int cpu) +#ifdef CONFIG_HOTPLUG_CPU +static void unmap_cpu_from_node(unsigned long cpu) +{ + int node = numa_cpu_lookup_table[cpu]; + + dbg("removing cpu %lu from node %d\n", cpu, node); + + if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { + cpu_clear(cpu, numa_cpumask_lookup_table[node]); + nr_cpus_in_node[node]--; + } else { + printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n", + cpu, node); + } +} +#endif /* CONFIG_HOTPLUG_CPU */ + +static struct device_node * __devinit find_cpu_node(unsigned int cpu) { unsigned int hw_cpuid = get_hard_smp_processor_id(cpu); struct device_node *cpu_node = NULL; @@ -96,26 +116,21 @@ static struct device_node * __init find_ /* must hold reference to node during call */ static int *of_get_associativity(struct device_node *dev) - { - unsigned int *result; - int len; - - result = (unsigned int *)get_property(dev, "ibm,associativity", &len); - - if (len <= 0) - return NULL; - - return result; +{ + return (unsigned int *)get_property(dev, "ibm,associativity", NULL); } -static int of_node_numa_domain(struct device_node *device, int depth) +static int of_node_numa_domain(struct device_node *device) { int numa_domain; unsigned int *tmp; + if (min_common_depth == -1) + return 0; + tmp = of_get_associativity(device); - if (tmp && (tmp[0] >= depth)) { - numa_domain = tmp[depth]; + if (tmp && (tmp[0] >= min_common_depth)) { + numa_domain = tmp[min_common_depth]; } else { dbg("WARNING: no NUMA information for %s\n", device->full_name); @@ -138,7 +153,7 @@ static int of_node_numa_domain(struct de * * - Dave Hansen */ -static int find_min_common_depth(void) +static int __init find_min_common_depth(void) { int depth; unsigned int *ref_points; @@ -185,11 +200,72 @@ static unsigned long read_cell_ul(struct return result; } +/* + * Figure out to which domain a cpu belongs and stick it there. + * Return the id of the domain used. + */ +static int numa_setup_cpu(unsigned long lcpu) +{ + int numa_domain = 0; + struct device_node *cpu = find_cpu_node(lcpu); + + if (!cpu) { + WARN_ON(1); + goto out; + } + + numa_domain = of_node_numa_domain(cpu); + + if (numa_domain >= MAX_NUMNODES) { + /* + * POWER4 LPAR uses 0xffff as invalid node, + * dont warn in this case. + */ + if (numa_domain != 0xffff) + printk(KERN_ERR "WARNING: cpu %ld " + "maps to invalid NUMA node %d\n", + lcpu, numa_domain); + numa_domain = 0; + } +out: + node_set_online(numa_domain); + + map_cpu_to_node(lcpu, numa_domain); + + of_node_put(cpu); + + return numa_domain; +} + +static int cpu_numa_callback(struct notifier_block *nfb, + unsigned long action, + void *hcpu) +{ + unsigned long lcpu = (unsigned long)hcpu; + int ret = NOTIFY_DONE; + + switch (action) { + case CPU_UP_PREPARE: + if (min_common_depth == -1 || !numa_enabled) + map_cpu_to_node(lcpu, 0); + else + numa_setup_cpu(lcpu); + ret = NOTIFY_OK; + break; +#ifdef CONFIG_HOTPLUG_CPU + case CPU_DEAD: + case CPU_UP_CANCELED: + unmap_cpu_from_node(lcpu); + break; + ret = NOTIFY_OK; +#endif + } + return ret; +} + static int __init parse_numa_properties(void) { - struct device_node *cpu = NULL; struct device_node *memory = NULL; - int depth; int max_domain = 0; long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; unsigned long i; @@ -206,44 +282,13 @@ static int __init parse_numa_properties( for (i = 0; i < entries ; i++) numa_memory_lookup_table[i] = ARRAY_INITIALISER; - depth = find_min_common_depth(); - - dbg("NUMA associativity depth for CPU/Memory: %d\n", depth); - if (depth < 0) - return depth; - - for_each_cpu(i) { - int numa_domain; - - cpu = find_cpu_node(i); - - if (cpu) { - numa_domain = of_node_numa_domain(cpu, depth); - of_node_put(cpu); - - if (numa_domain >= MAX_NUMNODES) { - /* - * POWER4 LPAR uses 0xffff as invalid node, - * dont warn in this case. - */ - if (numa_domain != 0xffff) - printk(KERN_ERR "WARNING: cpu %ld " - "maps to invalid NUMA node %d\n", - i, numa_domain); - numa_domain = 0; - } - } else { - dbg("WARNING: no NUMA information for cpu %ld\n", i); - numa_domain = 0; - } - - node_set_online(numa_domain); + min_common_depth = find_min_common_depth(); - if (max_domain < numa_domain) - max_domain = numa_domain; + dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth); + if (min_common_depth < 0) + return min_common_depth; - map_cpu_to_node(i, numa_domain); - } + max_domain = numa_setup_cpu(boot_cpuid); memory = NULL; while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { @@ -267,7 +312,7 @@ new_range: start = _ALIGN_DOWN(start, MEMORY_INCREMENT); size = _ALIGN_UP(size, MEMORY_INCREMENT); - numa_domain = of_node_numa_domain(memory, depth); + numa_domain = of_node_numa_domain(memory); if (numa_domain >= MAX_NUMNODES) { if (numa_domain != 0xffff) @@ -341,8 +386,7 @@ static void __init setup_nonnuma(void) numa_memory_lookup_table[i] = ARRAY_INITIALISER; } - for (i = 0; i < NR_CPUS; i++) - map_cpu_to_node(i, 0); + map_cpu_to_node(boot_cpuid, 0); node_set_online(0); @@ -358,35 +402,10 @@ static void __init setup_nonnuma(void) static void __init dump_numa_topology(void) { unsigned int node; - unsigned int cpu, count; + unsigned int count; - for (node = 0; node < MAX_NUMNODES; node++) { - if (!node_online(node)) - continue; - - printk(KERN_INFO "Node %d CPUs:", node); - - count = 0; - /* - * If we used a CPU iterator here we would miss printing - * the holes in the cpumap. - */ - for (cpu = 0; cpu < NR_CPUS; cpu++) { - if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { - if (count == 0) - printk(" %u", cpu); - ++count; - } else { - if (count > 1) - printk("-%u", cpu - 1); - count = 0; - } - } - - if (count > 1) - printk("-%u", NR_CPUS - 1); - printk("\n"); - } + if (min_common_depth == -1 || !numa_enabled) + return; for (node = 0; node < MAX_NUMNODES; node++) { unsigned long i; @@ -414,6 +433,7 @@ static void __init dump_numa_topology(vo printk("-0x%lx", i); printk("\n"); } + return; } /* @@ -469,6 +489,10 @@ void __init do_init_bootmem(void) setup_nonnuma(); else dump_numa_topology(); + /* + * This must run before the sched domains notifier. + */ + hotcpu_notifier(cpu_numa_callback, 1); for (nid = 0; nid < numnodes; nid++) { unsigned long start_paddr, end_paddr; _ From johnrose at austin.ibm.com Sat Oct 23 04:15:39 2004 From: johnrose at austin.ibm.com (John Rose) Date: Fri, 22 Oct 2004 13:15:39 -0500 Subject: [PATCH] create iommu_free_table() In-Reply-To: <4176E621.3040607@austin.ibm.com> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> <4176E621.3040607@austin.ibm.com> Message-ID: <1098468939.31847.14.camel@sinatra.austin.ibm.com> Thanks for the comments and help... responses below. On Wed, 2004-10-20 at 17:26, Olof Johansson wrote: > > + for (i = 0; i < (tbl->it_mapsize/64); i++) { > > + if (tbl->it_map[i] != 0) { > > + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", > > + __FUNCTION__, dn->full_name); > > + break; > > + } > > Could this get spammy? It could be nice to see a WARN_ON(1) too, so the > call stack is dumped. If that's added, a printk_ratelimit() would > definately be warranted around both the printk and the WARN_ON(). I'd have to disagree here. Since the stack trace will always involve the removal of a device node prompted by a write to /proc, it doesn't reveal any useful info. The printk above includes the OF path of the device, so any offending driver can be tracked down. Here's a patch without the whitespace problems you pointed out. Thanks- John Signed-off-by: John Rose diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c --- a/arch/ppc64/kernel/pSeries_iommu.c Fri Oct 22 13:03:21 2004 +++ b/arch/ppc64/kernel/pSeries_iommu.c Fri Oct 22 13:03:21 2004 @@ -412,6 +412,38 @@ dn->iommu_table = iommu_init_table(tbl); } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} void iommu_setup_pSeries(void) { diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Fri Oct 22 13:03:21 2004 +++ b/arch/ppc64/kernel/prom.c Fri Oct 22 13:03:21 2004 @@ -1818,6 +1818,9 @@ return -EBUSY; } + if (np->iommu_table) + iommu_free_table(np); + write_lock(&devtree_lock); OF_MARK_STALE(np); remove_node_proc_entries(np); diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h --- a/include/asm-ppc64/iommu.h Fri Oct 22 13:03:21 2004 +++ b/include/asm-ppc64/iommu.h Fri Oct 22 13:03:21 2004 @@ -113,6 +113,9 @@ /* Creates table for an individual device node */ extern void iommu_devnode_init(struct device_node *dn); +/* Frees table for an individual device node */ +extern void iommu_free_table(struct device_node *dn); + #endif /* CONFIG_PPC_MULTIPLATFORM */ #ifdef CONFIG_PPC_ISERIES From brking at us.ibm.com Sat Oct 23 06:27:59 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Fri, 22 Oct 2004 15:27:59 -0500 Subject: [PATCH 2/2] ipr_block_config_io_during_bist Message-ID: <200410222028.i9MKRxvC024092@d03av02.boulder.ibm.com> Change ipr to use new ppc64 pci APIs to block PCI config space accesses when running BIST to prevent PCI master aborts. Signed-off-by: Brian King --- linux-2.6.9-bk7-bjking1/drivers/scsi/ipr.c | 5 ++++- linux-2.6.9-bk7-bjking1/drivers/scsi/ipr.h | 7 +++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff -puN drivers/scsi/ipr.c~ipr_block_config_io_during_bist drivers/scsi/ipr.c --- linux-2.6.9-bk7/drivers/scsi/ipr.c~ipr_block_config_io_during_bist 2004-10-22 15:25:07.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/drivers/scsi/ipr.c 2004-10-22 15:25:07.000000000 -0500 @@ -4935,6 +4935,7 @@ static int ipr_reset_restore_cfg_space(s int rc; ENTER; + pci_unblock_config_io(ioa_cfg->pdev); rc = pci_restore_state(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { @@ -4989,9 +4990,11 @@ static int ipr_reset_start_bist(struct i int rc; ENTER; - rc = pci_write_config_byte(ioa_cfg->pdev, PCI_BIST, PCI_BIST_START); + pci_block_config_io(ioa_cfg->pdev); + rc = pci_start_bist(ioa_cfg->pdev); if (rc != PCIBIOS_SUCCESSFUL) { + pci_unblock_config_io(ioa_cfg->pdev); ipr_cmd->ioasa.ioasc = cpu_to_be32(IPR_IOASC_PCI_ACCESS_ERROR); rc = IPR_RC_JOB_CONTINUE; } else { diff -puN drivers/scsi/ipr.h~ipr_block_config_io_during_bist drivers/scsi/ipr.h --- linux-2.6.9-bk7/drivers/scsi/ipr.h~ipr_block_config_io_during_bist 2004-10-22 15:25:07.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/drivers/scsi/ipr.h 2004-10-22 15:25:07.000000000 -0500 @@ -1112,6 +1112,13 @@ __FUNCTION__, __LINE__, ioa_cfg #define ipr_remove_dump_file(kobj, attr) do { } while(0) #endif +#if !defined(CONFIG_PPC_PSERIES) && !defined(CONFIG_PPC_ISERIES) +#define pci_block_config_io(dev) do { } while(0) +#define pci_unblock_config_io(dev) do { } while(0) +#define pci_start_bist(dev) \ + pci_write_config_byte(dev, PCI_BIST, PCI_BIST_START) +#endif + /* * Error logging macros */ _ From brking at us.ibm.com Sat Oct 23 06:27:51 2004 From: brking at us.ibm.com (brking at us.ibm.com) Date: Fri, 22 Oct 2004 15:27:51 -0500 Subject: [PATCH 1/2] ppc64: Block config accesses during BIST (revised) Message-ID: <200410222027.i9MKRroN023754@d03av02.boulder.ibm.com> Some PCI adapters on pSeries and iSeries hardware (ipr scsi adapters) have an exposure today in that they issue BIST to the adapter to reset the card. If, during the time it takes to complete BIST, userspace attempts to access PCI config space, the host bus bridge will master abort the access since the ipr adapter does not respond on the PCI bus for a brief period of time when running BIST. This master abort results in the host PCI bridge isolating that PCI device from the rest of the system, making the device unusable until Linux is rebooted. This patch is an attempt to close that exposure by introducing some blocking code in the arch specific PCI code. The intent is to have the ipr device driver invoke these routines to prevent userspace PCI accesses from occurring during this window. It has been tested by running BIST on an ipr adapter while running a script which looped reading the config space of that adapter through sysfs. Without the patch, an EEH error occurrs. With the patch there is no EEH error. Tested on Power 5 and iSeries Power 4. Signed-off-by: Brian King --- linux-2.6.9-bk7-bjking1/arch/ppc64/kernel/iSeries_pci.c | 128 +++++++++- linux-2.6.9-bk7-bjking1/arch/ppc64/kernel/pSeries_pci.c | 103 +++++++- linux-2.6.9-bk7-bjking1/include/asm-ppc64/iSeries/iSeries_pci.h | 1 linux-2.6.9-bk7-bjking1/include/asm-ppc64/pci.h | 6 linux-2.6.9-bk7-bjking1/include/asm-ppc64/prom.h | 4 5 files changed, 226 insertions(+), 16 deletions(-) diff -puN include/asm-ppc64/prom.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/prom.h --- linux-2.6.9-bk7/include/asm-ppc64/prom.h~ppc64_block_cfg_io_during_bist 2004-10-22 10:13:40.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/include/asm-ppc64/prom.h 2004-10-22 10:13:40.000000000 -0500 @@ -210,11 +210,15 @@ extern struct device_node *of_chosen; /* flag descriptions */ #define OF_STALE 0 /* node is slated for deletion */ #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */ +#define OF_NO_CFGIO 2 /* config space accesses should fail */ #define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags) #define OF_MARK_STALE(x) set_bit(OF_STALE, &x->_flags) #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, &x->_flags) #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, &x->_flags) +#define OF_IS_CFGIO_BLOCKED(x) test_bit(OF_NO_CFGIO, &x->_flags) +#define OF_UNBLOCK_CFGIO(x) clear_bit(OF_NO_CFGIO, &x->_flags) +#define OF_BLOCK_CFGIO(x) set_bit(OF_NO_CFGIO, &x->_flags) /* * Until 32-bit ppc can add proc_dir_entries to its device_node diff -puN arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist arch/ppc64/kernel/pSeries_pci.c --- linux-2.6.9-bk7/arch/ppc64/kernel/pSeries_pci.c~ppc64_block_cfg_io_during_bist 2004-10-22 10:13:40.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/arch/ppc64/kernel/pSeries_pci.c 2004-10-22 10:13:40.000000000 -0500 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -52,17 +53,16 @@ static int ibm_read_pci_config; static int ibm_write_pci_config; static int s7a_workaround; +static spinlock_t config_lock = SPIN_LOCK_UNLOCKED; extern unsigned long pci_probe_only; -static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +static int __rtas_read_config(struct device_node *dn, int where, int size, u32 *val) { int returnval = -1; unsigned long buid, addr; int ret; - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -86,6 +86,23 @@ static int rtas_read_config(struct devic return PCIBIOS_SUCCESSFUL; } +static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val) +{ + unsigned long flags; + int ret = 0; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + + spin_lock_irqsave(&config_lock, flags); + if (OF_IS_CFGIO_BLOCKED(dn)) + *val = -1; + else + ret = __rtas_read_config(dn, where, size, val); + spin_unlock_irqrestore(&config_lock, flags); + return ret; +} + static int rtas_pci_read_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) @@ -104,13 +121,11 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +static int __rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; - if (!dn) - return PCIBIOS_DEVICE_NOT_FOUND; if (where & (size - 1)) return PCIBIOS_BAD_REGISTER_NUMBER; @@ -128,6 +143,21 @@ static int rtas_write_config(struct devi return PCIBIOS_SUCCESSFUL; } +static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +{ + unsigned long flags; + int ret = 0; + + if (!dn) + return PCIBIOS_DEVICE_NOT_FOUND; + + spin_lock_irqsave(&config_lock, flags); + if (!OF_IS_CFGIO_BLOCKED(dn)) + ret = __rtas_write_config(dn, where, size, val); + spin_unlock_irqrestore(&config_lock, flags); + return ret; +} + static int rtas_pci_write_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) @@ -151,6 +181,67 @@ struct pci_ops rtas_pci_ops = { rtas_pci_write_config }; +/** + * pci_block_config_io - Block PCI config reads/writes + * @pdev: pci device struct + * + * This function blocks any PCI config accesses from occurring. + * Device drivers may call this prior to running BIST if the + * adapter cannot handle PCI config reads or writes when + * running BIST. When blocked, any writes will be ignored and + * treated as successful and any reads will return all 1's data. + * + * Return value: + * nothing + **/ +void pci_block_config_io(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + OF_BLOCK_CFGIO(dn); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_block_config_io); + +/** + * pci_unblock_config_io - Unblock PCI config reads/writes + * @pdev: pci device struct + * + * This function allows PCI config accesses to resume. + * + * Return value: + * nothing + **/ +void pci_unblock_config_io(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + unsigned long flags; + + spin_lock_irqsave(&config_lock, flags); + OF_UNBLOCK_CFGIO(dn); + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_unblock_config_io); + +/** + * pci_start_bist - Start BIST on a PCI device + * @pdev: pci device struct + * + * This function allows a device driver to start BIST + * when PCI config accesses are disabled. + * + * Return value: + * nothing + **/ +int pci_start_bist(struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev); + return __rtas_write_config(dn, PCI_BIST, 1, PCI_BIST_START); +} +EXPORT_SYMBOL(pci_start_bist); + static void python_countermeasures(unsigned long addr) { void *chip_regs; diff -puN include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/pci.h --- linux-2.6.9-bk7/include/asm-ppc64/pci.h~ppc64_block_cfg_io_during_bist 2004-10-22 10:13:40.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/include/asm-ppc64/pci.h 2004-10-22 10:13:40.000000000 -0500 @@ -235,6 +235,12 @@ extern int pci_read_irq_line(struct pci_ extern void pcibios_add_platform_entries(struct pci_dev *dev); +extern void pci_block_config_io(struct pci_dev *dev); + +extern void pci_unblock_config_io(struct pci_dev *dev); + +extern int pci_start_bist(struct pci_dev *dev); + #endif /* __KERNEL__ */ #endif /* __PPC64_PCI_H */ diff -puN include/asm-ppc64/iSeries/iSeries_pci.h~ppc64_block_cfg_io_during_bist include/asm-ppc64/iSeries/iSeries_pci.h --- linux-2.6.9-bk7/include/asm-ppc64/iSeries/iSeries_pci.h~ppc64_block_cfg_io_during_bist 2004-10-22 10:13:40.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/include/asm-ppc64/iSeries/iSeries_pci.h 2004-10-22 10:13:40.000000000 -0500 @@ -91,6 +91,7 @@ struct iSeries_Device_Node { int ReturnCode; /* Return Code Holder */ int IoRetry; /* Current Retry Count */ int Flags; /* Possible flags(disable/bist)*/ +#define ISERIES_CFGIO_BLOCKED 1 u16 Vendor; /* Vendor ID */ u8 LogicalSlot; /* Hv Slot Index for Tces */ struct iommu_table* iommu_table;/* Device TCE Table */ diff -puN arch/ppc64/kernel/iSeries_pci.c~ppc64_block_cfg_io_during_bist arch/ppc64/kernel/iSeries_pci.c --- linux-2.6.9-bk7/arch/ppc64/kernel/iSeries_pci.c~ppc64_block_cfg_io_during_bist 2004-10-22 10:13:40.000000000 -0500 +++ linux-2.6.9-bk7-bjking1/arch/ppc64/kernel/iSeries_pci.c 2004-10-22 10:13:40.000000000 -0500 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -86,6 +87,7 @@ static int Pci_Retry_Max = 3; /* Only re static int Pci_Error_Flag = 1; /* Set Retry Error on. */ static struct pci_ops iSeries_pci_ops; +static spinlock_t config_lock = SPIN_LOCK_UNLOCKED; /* * Log Error infor in Flight Recorder to system Console. @@ -510,16 +512,12 @@ static u64 hv_cfg_write_func[4] = { /* * Read PCI config space */ -static int iSeries_pci_read_config(struct pci_bus *bus, unsigned int devfn, +static int __iSeries_pci_read_config(struct iSeries_Device_Node *node, int offset, int size, u32 *val) { - struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); u64 fn; struct HvCallPci_LoadReturn ret; - if (node == NULL) - return PCIBIOS_DEVICE_NOT_FOUND; - fn = hv_cfg_read_func[(size - 1) & 3]; HvCall3Ret16(fn, &ret, node->DsaAddr.DsaAddr, offset, 0); @@ -532,20 +530,36 @@ static int iSeries_pci_read_config(struc return 0; } +static int iSeries_pci_read_config(struct pci_bus *bus, unsigned int devfn, + int offset, int size, u32 *val) +{ + struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); + int ret = PCIBIOS_DEVICE_NOT_FOUND; + unsigned long flags; + + if (node) { + ret = 0; + spin_lock_irqsave(&config_lock, flags); + if (node->Flags & ISERIES_CFGIO_BLOCKED) + *val = -1; + else + ret = __iSeries_pci_read_config(node, offset, size, val); + spin_unlock_irqrestore(&config_lock, flags); + } + + return ret; +} + /* * Write PCI config space */ -static int iSeries_pci_write_config(struct pci_bus *bus, unsigned int devfn, +static int __iSeries_pci_write_config(struct iSeries_Device_Node *node, int offset, int size, u32 val) { - struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); u64 fn; u64 ret; - if (node == NULL) - return PCIBIOS_DEVICE_NOT_FOUND; - fn = hv_cfg_write_func[(size - 1) & 3]; ret = HvCall4(fn, node->DsaAddr.DsaAddr, offset, val, 0); @@ -555,6 +569,23 @@ static int iSeries_pci_write_config(stru return 0; } +static int iSeries_pci_write_config(struct pci_bus *bus, unsigned int devfn, + int offset, int size, u32 val) +{ + struct iSeries_Device_Node *node = find_Device_Node(bus->number, devfn); + int ret = PCIBIOS_DEVICE_NOT_FOUND; + unsigned long flags; + + if (node) { + spin_lock_irqsave(&config_lock, flags); + if (!(node->Flags & ISERIES_CFGIO_BLOCKED)) + ret = __iSeries_pci_write_config(node, offset, size, val); + spin_unlock_irqrestore(&config_lock, flags); + } + + return ret; +} + static struct pci_ops iSeries_pci_ops = { .read = iSeries_pci_read_config, .write = iSeries_pci_write_config @@ -817,3 +848,80 @@ void iSeries_Write_Long(u32 data, volati } while (CheckReturnCode("WWL", DevNode, rc) != 0); } EXPORT_SYMBOL(iSeries_Write_Long); + +/** + * pci_block_config_io - Block PCI config reads/writes + * @pdev: pci device struct + * + * This function blocks any PCI config accesses from occurring. + * Device drivers may call this prior to running BIST if the + * adapter cannot handle PCI config reads or writes when + * running BIST. When blocked, any writes will be ignored and + * treated as successful and any reads will return all 1's data. + * + * Return value: + * nothing + **/ +void pci_block_config_io(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + unsigned long flags; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return; + + spin_lock_irqsave(&config_lock, flags); + node->Flags |= ISERIES_CFGIO_BLOCKED; + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_block_config_io); + +/** + * pci_unblock_config_io - Unblock PCI config reads/writes + * @pdev: pci device struct + * + * This function allows PCI config accesses to resume. + * + * Return value: + * nothing + **/ +void pci_unblock_config_io(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + unsigned long flags; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return; + + spin_lock_irqsave(&config_lock, flags); + node->Flags &= ~ISERIES_CFGIO_BLOCKED; + spin_unlock_irqrestore(&config_lock, flags); +} +EXPORT_SYMBOL(pci_unblock_config_io); + +/** + * pci_start_bist - Start BIST on a PCI device + * @pdev: pci device struct + * + * This function allows a device driver to start BIST + * when PCI config accesses are disabled. + * + * Return value: + * nothing + **/ +int pci_start_bist(struct pci_dev *pdev) +{ + struct iSeries_Device_Node *node; + + node = find_Device_Node(pdev->bus->number, pdev->devfn); + + if (node == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + return __iSeries_pci_write_config(node, PCI_BIST, 1, PCI_BIST_START); +} +EXPORT_SYMBOL(pci_start_bist); _ From paulus at samba.org Sat Oct 23 18:19:00 2004 From: paulus at samba.org (Paul Mackerras) Date: Sat, 23 Oct 2004 18:19:00 +1000 Subject: [PATCH] Kprobes for ppc64 In-Reply-To: <20041018095229.GA7394@in.ibm.com> References: <20041018095229.GA7394@in.ibm.com> Message-ID: <16762.5108.282382.603502@cargo.ozlabs.ibm.com> Ananth N Mavinakayanahalli writes: > Here is kprobes for ppc64. The patch applies on 2.6.9-rc4/2.6.9-final > and provides the kprobes + jprobes functionality. > 1. The current implementation uses xmon's emulate_step() and hence > requires xmon to be compiled in. We can move emulate_step out to arch/ppc64/lib/step.c (and take out the printfs). > 2. arch_prepare_kprobe() now returns an int. I have made the necessary > changes to i386 and sparc64 kprobes files, but is untested. Are you going to send this upstream? > + * Interrupts are disabled on entry as trap3 is an interrupt gate and they > + * remain disabled thorough out this function. > + */ > +static inline int kprobe_handler(struct pt_regs *regs) Comments about "trap3" and "interrupt gate" don't help me understand this function on ppc64. :) At present interrupts are enabled in a program check exception handler but disabled in a single-step handler. When does this function get called? > @@ -96,6 +97,9 @@ int do_page_fault(struct pt_regs *regs, > BUG_ON((trap == 0x380) || (trap == 0x480)); > > if (trap == 0x300) { > + if (notify_die(DIE_PAGE_FAULT, "page_fault", regs, error_code, > + 11, SIGSEGV) == NOTIFY_STOP) > + return 0; Hmmm, this seems a bit heavyweight for adding to the page fault path. Have you done any benchmarks with vs. without kprobes? On the whole the patch looks OK. I haven't checked the kprobe_handler code to see if I think it's all SMP- and preempt-safe, but I assume you have done it similarly on x86 and checked it there. Paul. From paulus at samba.org Sat Oct 23 18:20:37 2004 From: paulus at samba.org (Paul Mackerras) Date: Sat, 23 Oct 2004 18:20:37 +1000 Subject: [PPC64] xmon sparse cleanups In-Reply-To: <20041021033617.GK17760@zax> References: <20041021033617.GK17760@zax> Message-ID: <16762.5205.563634.564951@cargo.ozlabs.ibm.com> David Gibson writes: > This patch removes many sparse warnings from the xmon code. Mostly > K&R function declarations and 0-instead-of-NULLs. There are still a > whole bunch of warnings in xmon/ppc-opc.c, which is a copy of a file > from binutils. > > Signed-off-by: David Gibson Acked-by: Paul Mackerras From paulus at samba.org Sat Oct 23 18:22:29 2004 From: paulus at samba.org (Paul Mackerras) Date: Sat, 23 Oct 2004 18:22:29 +1000 Subject: [PPC64] Trivial sparse cleanups In-Reply-To: <20041021013549.GI17760@zax> References: <20041021013549.GI17760@zax> Message-ID: <16762.5317.444887.668294@cargo.ozlabs.ibm.com> David Gibson writes: > This patch squashes a handful of assorted sparse warnings in the ppc64 > code. Should be pretty much trivial and self explanatory. > > Signed-off-by: David Gibson Acked-by: Paul Mackerras From paulus at samba.org Sat Oct 23 18:40:00 2004 From: paulus at samba.org (Paul Mackerras) Date: Sat, 23 Oct 2004 18:40:00 +1000 Subject: [PATCH] ppc64: cpu hotplug notifier for numa In-Reply-To: <1098436795.17305.22.camel@biclops> References: <1098436795.17305.22.camel@biclops> Message-ID: <16762.6368.569207.26902@cargo.ozlabs.ibm.com> Nathan Lynch writes: > This patch modifies the ppc64 numa code to map a cpu to its node right > before it is brought up -- this means that secondary cpus are now > mapped to their nodes during smp_init() (regardless of whether > CONFIG_HOTPLUG_CPU=y). Cpus are removed from their nodes after they > have gone offline. I get this when compiling with CONFIG_NUMA=n: arch/ppc64/mm/numa.c:243: warning: `cpu_numa_callback' defined but not used Only a small point, but it would be nicer if that was fixed. Paul. From nathanl at austin.ibm.com Sun Oct 24 05:02:08 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 23 Oct 2004 14:02:08 -0500 Subject: [PATCH] ppc64: cpu hotplug notifier for numa In-Reply-To: <16762.6368.569207.26902@cargo.ozlabs.ibm.com> References: <1098436795.17305.22.camel@biclops> <16762.6368.569207.26902@cargo.ozlabs.ibm.com> Message-ID: <1098558128.23102.2.camel@biclops> On Sat, 2004-10-23 at 03:40, Paul Mackerras wrote: > Nathan Lynch writes: > > > This patch modifies the ppc64 numa code to map a cpu to its node right > > before it is brought up -- this means that secondary cpus are now > > mapped to their nodes during smp_init() (regardless of whether > > CONFIG_HOTPLUG_CPU=y). Cpus are removed from their nodes after they > > have gone offline. > > I get this when compiling with CONFIG_NUMA=n: > > arch/ppc64/mm/numa.c:243: warning: `cpu_numa_callback' defined but not used > > Only a small point, but it would be nicer if that was fixed. I assume you meant CONFIG_HOTPLUG_CPU=n. That warning actually indicates a bug; I'm registering the notifier in the wrong way. Will send a corrected patch. Nathan From geert at linux-m68k.org Thu Oct 21 18:03:25 2004 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Thu, 21 Oct 2004 10:03:25 +0200 (MEST) Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020164144.3457eafe.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> <20041020232509.GF995@wotan.suse.de> <20041020164144.3457eafe.davem@davemloft.net> Message-ID: On Wed, 20 Oct 2004, David S. Miller wrote: > On Thu, 21 Oct 2004 01:25:09 +0200 > Andi Kleen wrote: > > > IMHO breaking the build unnecessarily is extremly bad because > > it will prevent all testing. And would you really want to hold > > up the whole linux testing machinery just for some obscure > > system call? IMHO not a good tradeoff. > > Then change the unistd.h cookie from "#error" to a "#warning". It > accomplishes both of our goals. Please do so! And not only for syscalls, but also for other things. That way we can procmail all mails sent to lkml or bk-commits-head that add #warnings to arch// or include/asm-/. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From nathanl at austin.ibm.com Sun Oct 24 07:36:43 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 23 Oct 2004 16:36:43 -0500 Subject: [PATCH] ppc64: cpu hotplug notifier for numa (take 2) In-Reply-To: <1098558128.23102.2.camel@biclops> References: <1098436795.17305.22.camel@biclops> <16762.6368.569207.26902@cargo.ozlabs.ibm.com> <1098558128.23102.2.camel@biclops> Message-ID: <1098567403.23102.28.camel@biclops> The NUMA properties of all "possible" cpus are not necessarily available at boot time on ppc64 LPAR. Only the properties for present cpus are known. This patch modifies the ppc64 numa code to map a cpu to its node right before it is brought up -- this means that secondary cpus are now mapped to their nodes during smp_init(). Cpus are removed from their nodes after they have gone offline. Also some minor cleanups: - Stash the "minimum common depth" in a global at boot time, so we don't have to rediscover it every time something changes. - Remove unnecessary variable from of_get_associativity() which is accessed while possibly uninitialized. - Remove the cpu portion from dump_numa_topology() since it will show only the boot cpu now. We could display this information from smp_cpus_done() if necessary. Tested on a 4-way 2-node Power5 system. Signed-off-by: Nathan Lynch --- --- diff -puN arch/ppc64/mm/numa.c~ppc64-numa-cpu-hotplug-notifier arch/ppc64/mm/numa.c --- 2.6.10-rc1/arch/ppc64/mm/numa.c~ppc64-numa-cpu-hotplug-notifier 2004-10-23 15:10:39.000000000 -0500 +++ 2.6.10-rc1-nathanl/arch/ppc64/mm/numa.c 2004-10-23 16:28:58.000000000 -0500 @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #include #include @@ -39,6 +41,7 @@ int nr_cpus_in_node[MAX_NUMNODES] = { [0 struct pglist_data *node_data[MAX_NUMNODES]; bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; static unsigned long node0_io_hole_size; +static int min_common_depth; /* * We need somewhere to store start/span for each node until we have @@ -64,7 +67,24 @@ static inline void map_cpu_to_node(int c } } -static struct device_node * __init find_cpu_node(unsigned int cpu) +#ifdef CONFIG_HOTPLUG_CPU +static void unmap_cpu_from_node(unsigned long cpu) +{ + int node = numa_cpu_lookup_table[cpu]; + + dbg("removing cpu %lu from node %d\n", cpu, node); + + if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { + cpu_clear(cpu, numa_cpumask_lookup_table[node]); + nr_cpus_in_node[node]--; + } else { + printk(KERN_ERR "WARNING: cpu %lu not found in node %d\n", + cpu, node); + } +} +#endif /* CONFIG_HOTPLUG_CPU */ + +static struct device_node * __devinit find_cpu_node(unsigned int cpu) { unsigned int hw_cpuid = get_hard_smp_processor_id(cpu); struct device_node *cpu_node = NULL; @@ -96,26 +116,21 @@ static struct device_node * __init find_ /* must hold reference to node during call */ static int *of_get_associativity(struct device_node *dev) - { - unsigned int *result; - int len; - - result = (unsigned int *)get_property(dev, "ibm,associativity", &len); - - if (len <= 0) - return NULL; - - return result; +{ + return (unsigned int *)get_property(dev, "ibm,associativity", NULL); } -static int of_node_numa_domain(struct device_node *device, int depth) +static int of_node_numa_domain(struct device_node *device) { int numa_domain; unsigned int *tmp; + if (min_common_depth == -1) + return 0; + tmp = of_get_associativity(device); - if (tmp && (tmp[0] >= depth)) { - numa_domain = tmp[depth]; + if (tmp && (tmp[0] >= min_common_depth)) { + numa_domain = tmp[min_common_depth]; } else { dbg("WARNING: no NUMA information for %s\n", device->full_name); @@ -138,7 +153,7 @@ static int of_node_numa_domain(struct de * * - Dave Hansen */ -static int find_min_common_depth(void) +static int __init find_min_common_depth(void) { int depth; unsigned int *ref_points; @@ -185,11 +200,72 @@ static unsigned long read_cell_ul(struct return result; } +/* + * Figure out to which domain a cpu belongs and stick it there. + * Return the id of the domain used. + */ +static int numa_setup_cpu(unsigned long lcpu) +{ + int numa_domain = 0; + struct device_node *cpu = find_cpu_node(lcpu); + + if (!cpu) { + WARN_ON(1); + goto out; + } + + numa_domain = of_node_numa_domain(cpu); + + if (numa_domain >= MAX_NUMNODES) { + /* + * POWER4 LPAR uses 0xffff as invalid node, + * dont warn in this case. + */ + if (numa_domain != 0xffff) + printk(KERN_ERR "WARNING: cpu %ld " + "maps to invalid NUMA node %d\n", + lcpu, numa_domain); + numa_domain = 0; + } +out: + node_set_online(numa_domain); + + map_cpu_to_node(lcpu, numa_domain); + + of_node_put(cpu); + + return numa_domain; +} + +static int cpu_numa_callback(struct notifier_block *nfb, + unsigned long action, + void *hcpu) +{ + unsigned long lcpu = (unsigned long)hcpu; + int ret = NOTIFY_DONE; + + switch (action) { + case CPU_UP_PREPARE: + if (min_common_depth == -1 || !numa_enabled) + map_cpu_to_node(lcpu, 0); + else + numa_setup_cpu(lcpu); + ret = NOTIFY_OK; + break; +#ifdef CONFIG_HOTPLUG_CPU + case CPU_DEAD: + case CPU_UP_CANCELED: + unmap_cpu_from_node(lcpu); + break; + ret = NOTIFY_OK; +#endif + } + return ret; +} + static int __init parse_numa_properties(void) { - struct device_node *cpu = NULL; struct device_node *memory = NULL; - int depth; int max_domain = 0; long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT; unsigned long i; @@ -206,44 +282,13 @@ static int __init parse_numa_properties( for (i = 0; i < entries ; i++) numa_memory_lookup_table[i] = ARRAY_INITIALISER; - depth = find_min_common_depth(); - - dbg("NUMA associativity depth for CPU/Memory: %d\n", depth); - if (depth < 0) - return depth; - - for_each_cpu(i) { - int numa_domain; - - cpu = find_cpu_node(i); - - if (cpu) { - numa_domain = of_node_numa_domain(cpu, depth); - of_node_put(cpu); - - if (numa_domain >= MAX_NUMNODES) { - /* - * POWER4 LPAR uses 0xffff as invalid node, - * dont warn in this case. - */ - if (numa_domain != 0xffff) - printk(KERN_ERR "WARNING: cpu %ld " - "maps to invalid NUMA node %d\n", - i, numa_domain); - numa_domain = 0; - } - } else { - dbg("WARNING: no NUMA information for cpu %ld\n", i); - numa_domain = 0; - } - - node_set_online(numa_domain); + min_common_depth = find_min_common_depth(); - if (max_domain < numa_domain) - max_domain = numa_domain; + dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth); + if (min_common_depth < 0) + return min_common_depth; - map_cpu_to_node(i, numa_domain); - } + max_domain = numa_setup_cpu(boot_cpuid); memory = NULL; while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { @@ -267,7 +312,7 @@ new_range: start = _ALIGN_DOWN(start, MEMORY_INCREMENT); size = _ALIGN_UP(size, MEMORY_INCREMENT); - numa_domain = of_node_numa_domain(memory, depth); + numa_domain = of_node_numa_domain(memory); if (numa_domain >= MAX_NUMNODES) { if (numa_domain != 0xffff) @@ -341,8 +386,7 @@ static void __init setup_nonnuma(void) numa_memory_lookup_table[i] = ARRAY_INITIALISER; } - for (i = 0; i < NR_CPUS; i++) - map_cpu_to_node(i, 0); + map_cpu_to_node(boot_cpuid, 0); node_set_online(0); @@ -358,35 +402,10 @@ static void __init setup_nonnuma(void) static void __init dump_numa_topology(void) { unsigned int node; - unsigned int cpu, count; + unsigned int count; - for (node = 0; node < MAX_NUMNODES; node++) { - if (!node_online(node)) - continue; - - printk(KERN_INFO "Node %d CPUs:", node); - - count = 0; - /* - * If we used a CPU iterator here we would miss printing - * the holes in the cpumap. - */ - for (cpu = 0; cpu < NR_CPUS; cpu++) { - if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { - if (count == 0) - printk(" %u", cpu); - ++count; - } else { - if (count > 1) - printk("-%u", cpu - 1); - count = 0; - } - } - - if (count > 1) - printk("-%u", NR_CPUS - 1); - printk("\n"); - } + if (min_common_depth == -1 || !numa_enabled) + return; for (node = 0; node < MAX_NUMNODES; node++) { unsigned long i; @@ -414,6 +433,7 @@ static void __init dump_numa_topology(vo printk("-0x%lx", i); printk("\n"); } + return; } /* @@ -460,6 +480,10 @@ static unsigned long careful_allocation( void __init do_init_bootmem(void) { int nid; + static struct notifier_block ppc64_numa_nb = { + .notifier_call = cpu_numa_callback, + .priority = 1 /* Must run before sched domains notifier. */ + }; min_low_pfn = 0; max_low_pfn = lmb_end_of_DRAM() >> PAGE_SHIFT; @@ -470,6 +494,8 @@ void __init do_init_bootmem(void) else dump_numa_topology(); + register_cpu_notifier(&ppc64_numa_nb); + for (nid = 0; nid < numnodes; nid++) { unsigned long start_paddr, end_paddr; int i; _ From dwm at austin.ibm.com Sun Oct 24 10:55:36 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Sat, 23 Oct 2004 19:55:36 -0500 Subject: [PATCH 1/1] build modular usb isd200 with modular ide Message-ID: <200410240055.i9O0taCf006206@falcon10.austin.ibm.com> Name: inline ide_fix_driveid() Rationale: This is a fix for bugme.osdl 3819. With any of the 2.6.9 release flavors (vanilla, mm1, ac3), one cannot build the usb isd200 module due to the dependency on ide_fix_driveid() being exported from ide-iops. Description: When building IDE modular, the current ide_fix_driveid() is exported from ide-iops.c. This patch makes the function an inline. Status: compile tested on ppc64. Other issues prevent run test. Signed-off-by: Doug Maxey ChangeLog: ++doug drivers/ide/ide-iops.c | 98 ------------------------------------------------- include/linux/ide.h | 48 +++++++++++++++++++++++- 2 files changed, 47 insertions(+), 99 deletions(-) diff -Nwupa lk-2.6.9-mm1/drivers/ide/ide-iops.c lk-2.6.9-mm1.edit/drivers/ide/ide-iops.c --- lk-2.6.9-mm1/drivers/ide/ide-iops.c 2004-10-22 15:10:30.465342832 -0500 +++ lk-2.6.9-mm1.edit/drivers/ide/ide-iops.c 2004-10-23 00:22:16.901355000 -0500 @@ -352,104 +352,6 @@ EXPORT_SYMBOL(atapi_output_bytes); /* * Beginning of Taskfile OPCODE Library and feature sets. */ -void ide_fix_driveid (struct hd_driveid *id) -{ -#ifndef __LITTLE_ENDIAN -# ifdef __BIG_ENDIAN - int i; - u16 *stringcast; - - id->config = __le16_to_cpu(id->config); - id->cyls = __le16_to_cpu(id->cyls); - id->reserved2 = __le16_to_cpu(id->reserved2); - id->heads = __le16_to_cpu(id->heads); - id->track_bytes = __le16_to_cpu(id->track_bytes); - id->sector_bytes = __le16_to_cpu(id->sector_bytes); - id->sectors = __le16_to_cpu(id->sectors); - id->vendor0 = __le16_to_cpu(id->vendor0); - id->vendor1 = __le16_to_cpu(id->vendor1); - id->vendor2 = __le16_to_cpu(id->vendor2); - stringcast = (u16 *)&id->serial_no[0]; - for (i = 0; i < (20/2); i++) - stringcast[i] = __le16_to_cpu(stringcast[i]); - id->buf_type = __le16_to_cpu(id->buf_type); - id->buf_size = __le16_to_cpu(id->buf_size); - id->ecc_bytes = __le16_to_cpu(id->ecc_bytes); - stringcast = (u16 *)&id->fw_rev[0]; - for (i = 0; i < (8/2); i++) - stringcast[i] = __le16_to_cpu(stringcast[i]); - stringcast = (u16 *)&id->model[0]; - for (i = 0; i < (40/2); i++) - stringcast[i] = __le16_to_cpu(stringcast[i]); - id->dword_io = __le16_to_cpu(id->dword_io); - id->reserved50 = __le16_to_cpu(id->reserved50); - id->field_valid = __le16_to_cpu(id->field_valid); - id->cur_cyls = __le16_to_cpu(id->cur_cyls); - id->cur_heads = __le16_to_cpu(id->cur_heads); - id->cur_sectors = __le16_to_cpu(id->cur_sectors); - id->cur_capacity0 = __le16_to_cpu(id->cur_capacity0); - id->cur_capacity1 = __le16_to_cpu(id->cur_capacity1); - id->lba_capacity = __le32_to_cpu(id->lba_capacity); - id->dma_1word = __le16_to_cpu(id->dma_1word); - id->dma_mword = __le16_to_cpu(id->dma_mword); - id->eide_pio_modes = __le16_to_cpu(id->eide_pio_modes); - id->eide_dma_min = __le16_to_cpu(id->eide_dma_min); - id->eide_dma_time = __le16_to_cpu(id->eide_dma_time); - id->eide_pio = __le16_to_cpu(id->eide_pio); - id->eide_pio_iordy = __le16_to_cpu(id->eide_pio_iordy); - for (i = 0; i < 2; ++i) - id->words69_70[i] = __le16_to_cpu(id->words69_70[i]); - for (i = 0; i < 4; ++i) - id->words71_74[i] = __le16_to_cpu(id->words71_74[i]); - id->queue_depth = __le16_to_cpu(id->queue_depth); - for (i = 0; i < 4; ++i) - id->words76_79[i] = __le16_to_cpu(id->words76_79[i]); - id->major_rev_num = __le16_to_cpu(id->major_rev_num); - id->minor_rev_num = __le16_to_cpu(id->minor_rev_num); - id->command_set_1 = __le16_to_cpu(id->command_set_1); - id->command_set_2 = __le16_to_cpu(id->command_set_2); - id->cfsse = __le16_to_cpu(id->cfsse); - id->cfs_enable_1 = __le16_to_cpu(id->cfs_enable_1); - id->cfs_enable_2 = __le16_to_cpu(id->cfs_enable_2); - id->csf_default = __le16_to_cpu(id->csf_default); - id->dma_ultra = __le16_to_cpu(id->dma_ultra); - id->trseuc = __le16_to_cpu(id->trseuc); - id->trsEuc = __le16_to_cpu(id->trsEuc); - id->CurAPMvalues = __le16_to_cpu(id->CurAPMvalues); - id->mprc = __le16_to_cpu(id->mprc); - id->hw_config = __le16_to_cpu(id->hw_config); - id->acoustic = __le16_to_cpu(id->acoustic); - id->msrqs = __le16_to_cpu(id->msrqs); - id->sxfert = __le16_to_cpu(id->sxfert); - id->sal = __le16_to_cpu(id->sal); - id->spg = __le32_to_cpu(id->spg); - id->lba_capacity_2 = __le64_to_cpu(id->lba_capacity_2); - for (i = 0; i < 22; i++) - id->words104_125[i] = __le16_to_cpu(id->words104_125[i]); - id->last_lun = __le16_to_cpu(id->last_lun); - id->word127 = __le16_to_cpu(id->word127); - id->dlf = __le16_to_cpu(id->dlf); - id->csfo = __le16_to_cpu(id->csfo); - for (i = 0; i < 26; i++) - id->words130_155[i] = __le16_to_cpu(id->words130_155[i]); - id->word156 = __le16_to_cpu(id->word156); - for (i = 0; i < 3; i++) - id->words157_159[i] = __le16_to_cpu(id->words157_159[i]); - id->cfa_power = __le16_to_cpu(id->cfa_power); - for (i = 0; i < 14; i++) - id->words161_175[i] = __le16_to_cpu(id->words161_175[i]); - for (i = 0; i < 31; i++) - id->words176_205[i] = __le16_to_cpu(id->words176_205[i]); - for (i = 0; i < 48; i++) - id->words206_254[i] = __le16_to_cpu(id->words206_254[i]); - id->integrity_word = __le16_to_cpu(id->integrity_word); -# else -# error "Please fix " -# endif -#endif -} - -EXPORT_SYMBOL(ide_fix_driveid); void ide_fixstring (u8 *s, const int bytecount, const int byteswap) { diff -Nwupa lk-2.6.9-mm1/include/linux/ide.h lk-2.6.9-mm1.edit/include/linux/ide.h --- lk-2.6.9-mm1/include/linux/ide.h 2004-10-22 15:10:36.748318728 -0500 +++ lk-2.6.9-mm1.edit/include/linux/ide.h 2004-10-23 15:28:52.635380680 -0500 @@ -1204,7 +1204,53 @@ extern ide_startstop_t ide_abort(ide_dri */ extern void ide_cmd(ide_drive_t *, u8, u8, ide_handler_t *); -extern void ide_fix_driveid(struct hd_driveid *); +/* + * ide_fix_driveid - fix IDENTIFY DEVICE data for big endian machines. + * @id - pointer to data from drive. + * + * Could be a one liner except for the 3 x 32 bit and 2 x 64 bit + * fields. Offsets are from d1532v1r4. + */ +static inline void ide_fix_driveid (struct hd_driveid *id) +{ +#ifndef __LITTLE_ENDIAN +# ifdef __BIG_ENDIAN + u16 *sp = (u16*)id; + + for (; sp < ((u16*)id) + 61; sp++) *sp = __le16_to_cpu(*sp); + + /* lba_capacity */ + *((u32*)sp) = __le32_to_cpu(*((u32*)sp)); + sp += 2; + + for (; sp < ((u16*)id) + 98; sp++) + *sp = __le16_to_cpu(*sp); + + /* Streaming Perfomance Granularity. words 98-99 */ + *((u32*)sp) = __le32_to_cpu(*((u32*)sp)); + sp += 2; /* word 100 */ + + /* lba_capacity2. words 100-103 */ + *((u64*)sp) = __le64_to_cpu(*((u64*)sp)); + sp += 4; /* word 104 */ + + for (; sp < ((u16*)id) + 117; sp++) + *sp = __le16_to_cpu(*sp); + + + /* Words per Logical Sector. words 117-118 */ + *((u32*)sp) = __le32_to_cpu(*((u32*)sp)); + sp += 2; /* word 119 */ + + for (; sp < ((u16*)id) + 256; sp++) + *sp = __le16_to_cpu(*sp); + +# else +# error "Please fix " +# endif +#endif +} + /* * ide_fixstring() cleans up and (optionally) byte-swaps a text string, * removing leading/trailing blanks and compressing internal blanks. From hch at lst.de Sun Oct 24 20:03:19 2004 From: hch at lst.de (Christoph Hellwig) Date: Sun, 24 Oct 2004 12:03:19 +0200 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <200410240055.i9O0taCf006206@falcon10.austin.ibm.com> References: <200410240055.i9O0taCf006206@falcon10.austin.ibm.com> Message-ID: <20041024100319.GA17183@lst.de> On Sat, Oct 23, 2004 at 07:55:36PM -0500, Doug Maxey wrote: > > Name: inline ide_fix_driveid() > > Rationale: > This is a fix for bugme.osdl 3819. bugme.osdl.org doesn't know of a bug #3819. > With any of the 2.6.9 release flavors (vanilla, mm1, ac3), one > cannot build the usb isd200 module due to the dependency on > ide_fix_driveid() being exported from ide-iops. > > Description: > When building IDE modular, the current ide_fix_driveid() is > exported from ide-iops.c. This patch makes the function an inline. Still doesn't make any sense. ide_fix_driveid is properly exported from ide-iops.c, so you use it from other modules. The only case that doesn't work is modular ide and builtin usb-storage, and the BLK_DEV_IDE depency should fix that one. If you think that depency is ugly (I do) just copy the routine to isd200.c, it's a) too large to inline but b) just a trivial byteswap that should need much changes over time. From bzolnier at gmail.com Sun Oct 24 22:45:53 2004 From: bzolnier at gmail.com (Bartlomiej Zolnierkiewicz) Date: Sun, 24 Oct 2004 14:45:53 +0200 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <20041024100319.GA17183@lst.de> References: <200410240055.i9O0taCf006206@falcon10.austin.ibm.com> <20041024100319.GA17183@lst.de> Message-ID: <58cb370e041024054575c09679@mail.gmail.com> On Sun, 24 Oct 2004 12:03:19 +0200, Christoph Hellwig wrote: > On Sat, Oct 23, 2004 at 07:55:36PM -0500, Doug Maxey wrote: > > > > Name: inline ide_fix_driveid() > > > > Rationale: > > This is a fix for bugme.osdl 3819. > > bugme.osdl.org doesn't know of a bug #3819. > > > With any of the 2.6.9 release flavors (vanilla, mm1, ac3), one > > cannot build the usb isd200 module due to the dependency on > > ide_fix_driveid() being exported from ide-iops. > > > > Description: > > When building IDE modular, the current ide_fix_driveid() is > > exported from ide-iops.c. This patch makes the function an inline. > > Still doesn't make any sense. ide_fix_driveid is properly exported from > ide-iops.c, so you use it from other modules. The only case that > doesn't work is modular ide and builtin usb-storage, and the BLK_DEV_IDE > depency should fix that one. The new ide_fix_driveid function seems buggy, ie. it byte-swaps id->max_multsect with id->vendor3. > If you think that depency is ugly (I do) just copy the routine to > isd200.c, it's a) too large to inline but b) just a trivial byteswap > that should need much changes over time. The dependency is a bug, is for IDE driver only. Doug, if you kill debugging code in isd200.c then only: id->command_set_1 id->model id->fw_rev id->capability id->lba_capacity id->heads id->cyls id->sectors id->command_set_2 need to be byte-swapped. From dwm at austin.ibm.com Mon Oct 25 09:11:12 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Sun, 24 Oct 2004 18:11:12 -0500 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <20041024100319.GA17183@lst.de> Message-ID: <200410242311.i9ONBCKp019869@falcon10.austin.ibm.com> On Sun, 24 Oct 2004 12:03:19 +0200, Christoph Hellwig wrote: >On Sat, Oct 23, 2004 at 07:55:36PM -0500, Doug Maxey wrote: >> >> Name: inline ide_fix_driveid() >> >> Rationale: >> This is a fix for bugme.osdl 3819. > >bugme.osdl.org doesn't know of a bug #3819. Uh Oh. Should be 3618. Have no idea where 3819 came from. > >> With any of the 2.6.9 release flavors (vanilla, mm1, ac3), one >> cannot build the usb isd200 module due to the dependency on >> ide_fix_driveid() being exported from ide-iops. >> >> Description: >> When building IDE modular, the current ide_fix_driveid() is >> exported from ide-iops.c. This patch makes the function an inline. > >Still doesn't make any sense. ide_fix_driveid is properly exported from >ide-iops.c, so you use it from other modules. The only case that >doesn't work is modular ide and builtin usb-storage, and the BLK_DEV_IDE >depency should fix that one. > >If you think that depency is ugly (I do) just copy the routine to What happened to common code that may have more uses than originally intended? Do it right once in one place, and make it available. >isd200.c, it's a) too large to inline but b) just a trivial byteswap >that should need much changes over time. Except for those few 32 and 64 bit quantities that need word (16 bit) swaps, I agree completely. The points I was trying to make were that 1) This is called in only a few places. 2) it is never on a fast path. 2) The sequence of the named elements was a little bit much. Meaning of the words change, and quite a few of the fields no longer have the original meaning or definition. 3) Having a singular (even if somewhat large) inline handles all current (and future) uses. ++doug From benh at kernel.crashing.org Mon Oct 25 10:47:09 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 25 Oct 2004 10:47:09 +1000 Subject: [PATCH] ppc64: cleanups of ppc64 pci.c Message-ID: <1098665227.16132.11.camel@gaston> Hi ! This patch applies on top of previously posted "ppc64: Move PCI IO mapping from pSeries_pci.c to pci.c". It does cosmetic cleanups & add some debug macros to pci.c without actually changing any functionality. Further patches against ppc64 pci.c that I'll post will be against a file already patched with this one. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pci.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pci.c 2004-10-25 10:30:34.841855848 +1000 +++ linux-work/arch/ppc64/kernel/pci.c 2004-10-25 10:36:50.724712968 +1000 @@ -11,6 +11,8 @@ * 2 of the License, or (at your option) any later version. */ +#undef DEBUG + #include #include #include @@ -39,6 +41,12 @@ #include "pci.h" +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + unsigned long pci_probe_only = 1; unsigned long pci_assign_all_buses = 0; @@ -106,11 +114,11 @@ dev->resource[i].flags &= ~IORESOURCE_IO; } } -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, fixup_windbond_82c105); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105, + fixup_windbond_82c105); -void -pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, - struct resource *res) +void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region, + struct resource *res) { unsigned long offset = 0; struct pci_controller *hose = PCI_GET_PHB_PTR(dev); @@ -215,8 +223,7 @@ /* * Allocate pci_controller(phb) initialized common variables. */ -struct pci_controller * __init -pci_alloc_pci_controller(enum phb_types controller_type) +struct pci_controller * __init pci_alloc_pci_controller(enum phb_types controller_type) { struct pci_controller *hose; @@ -246,8 +253,7 @@ /* * Dymnamically allocate pci_controller(phb), initialize common variables. */ -struct pci_controller * -pci_alloc_phb_dynamic(enum phb_types controller_type) +struct pci_controller * pci_alloc_phb_dynamic(enum phb_types controller_type) { struct pci_controller *hose; @@ -430,9 +436,9 @@ * * Returns negative error code on failure, zero on success. */ -static __inline__ int -__pci_mmap_make_offset(struct pci_dev *dev, struct vm_area_struct *vma, - enum pci_mmap_state mmap_state) +static __inline__ int __pci_mmap_make_offset(struct pci_dev *dev, + struct vm_area_struct *vma, + enum pci_mmap_state mmap_state) { struct pci_controller *hose = PCI_GET_PHB_PTR(dev); unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; @@ -487,9 +493,9 @@ * Set vm_flags of VMA, as appropriate for this architecture, for a pci device * mapping. */ -static __inline__ void -__pci_mmap_set_flags(struct pci_dev *dev, struct vm_area_struct *vma, - enum pci_mmap_state mmap_state) +static __inline__ void __pci_mmap_set_flags(struct pci_dev *dev, + struct vm_area_struct *vma, + enum pci_mmap_state mmap_state) { vma->vm_flags |= VM_SHM | VM_LOCKED | VM_IO; } @@ -498,9 +504,10 @@ * Set vm_page_prot of VMA, as appropriate for this architecture, for a pci * device mapping. */ -static __inline__ void -__pci_mmap_set_pgprot(struct pci_dev *dev, struct vm_area_struct *vma, - enum pci_mmap_state mmap_state, int write_combine) +static __inline__ void __pci_mmap_set_pgprot(struct pci_dev *dev, + struct vm_area_struct *vma, + enum pci_mmap_state mmap_state, + int write_combine) { long prot = pgprot_val(vma->vm_page_prot); @@ -613,7 +620,7 @@ } void __devinit pci_process_bridge_OF_ranges(struct pci_controller *hose, - struct device_node *dev) + struct device_node *dev) { unsigned int *ranges; unsigned long size; @@ -654,6 +661,8 @@ res = &hose->io_resource; res->flags = IORESOURCE_IO; res->start = pci_addr; + DBG("phb%d: IO 0x%lx -> 0x%lx\n", hose->global_number, + res->start, res->start + size - 1); break; case 2: /* memory space */ memno = 0; @@ -666,6 +675,8 @@ res = &hose->mem_resources[memno]; res->flags = IORESOURCE_MEM; res->start = cpu_phys_addr; + DBG("phb%d: MEM 0x%lx -> 0x%lx\n", hose->global_number, + res->start, res->start + size - 1); } break; } @@ -873,7 +884,8 @@ for (i = 0; i < PCI_NUM_RESOURCES; i++) { if (dev->resource[i].flags & IORESOURCE_IO) { - unsigned long offset = (unsigned long)hose->io_base_virt - pci_io_base; + unsigned long offset = (unsigned long)hose->io_base_virt + - pci_io_base; unsigned long start, end, mask; start = dev->resource[i].start += offset; From benh at kernel.crashing.org Mon Oct 25 11:26:30 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 25 Oct 2004 11:26:30 +1000 Subject: [PATCH] ppc64: Rework PCI <-> OF node matching Message-ID: <1098667590.26695.1.camel@gaston> This patch reworks the code that deals with matching PCI devices with Open Firmware device nodes. This code made several incorrect assumptions and can be simplified significantly. The main functional difference now is that PHBs are no longer special cased, but that shouldn't cause any specific problem. It also fixes a problem where u3_iommu.c wouldn't work for PCI devices that lacked a matching OF device node. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/u3_iommu.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/u3_iommu.c 2004-10-17 12:07:07.000000000 +1000 +++ linux-work/arch/ppc64/kernel/u3_iommu.c 2004-10-25 11:12:22.000000000 +1000 @@ -267,6 +267,7 @@ void iommu_setup_u3(void) { + struct pci_controller *phb, *tmp; struct pci_dev *dev = NULL; struct device_node *dn; @@ -299,6 +300,11 @@ if (dn) dn->iommu_table = &iommu_table_u3; } + /* We also make sure we set all PHBs ... */ + list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { + dn = (struct device_node *)phb->arch_data; + dn->iommu_table = &iommu_table_u3; + } } void __init alloc_u3_dart_table(void) Index: linux-work/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pci_dn.c 2004-10-20 13:01:00.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pci_dn.c 2004-10-25 11:15:30.000000000 +1000 @@ -46,29 +46,13 @@ { struct pci_controller *phb = data; u32 *regs; - char *device_type = get_property(dn, "device_type", NULL); - char *model; dn->phb = phb; - if (device_type && (strcmp(device_type, "pci") == 0) && - (get_property(dn, "class-code", NULL) == 0)) { - /* special case for PHB's. Sigh. */ - regs = (u32 *)get_property(dn, "bus-range", NULL); - dn->busno = regs[0]; - - model = (char *)get_property(dn, "model", NULL); - - if (strstr(model, "U3")) - dn->devfn = -1; - else - dn->devfn = 0; /* assumption */ - } else { - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - dn->busno = (regs[0] >> 16) & 0xff; - dn->devfn = (regs[0] >> 8) & 0xff; - } + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + dn->busno = (regs[0] >> 16) & 0xff; + dn->devfn = (regs[0] >> 8) & 0xff; } return NULL; } @@ -97,20 +81,25 @@ struct device_node *dn, *nextdn; void *ret; - if (pre && ((ret = pre(start, data)) != NULL)) - return ret; + /* We started with a phb, iterate all childs */ for (dn = start->child; dn; dn = nextdn) { + u32 *classp, class; + nextdn = NULL; - if (get_property(dn, "class-code", NULL)) { - if (pre && ((ret = pre(dn, data)) != NULL)) - return ret; - if (dn->child) - /* Depth first...do children */ - nextdn = dn->child; - else if (dn->sibling) - /* ok, try next sibling instead. */ - nextdn = dn->sibling; - } + classp = (u32 *)get_property(dn, "class-code", NULL); + class = classp ? *classp : 0; + + if (pre && ((ret = pre(dn, data)) != NULL)) + return ret; + + /* If we are a PCI bridge, go down */ + if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI || + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS)) + /* Depth first...do children */ + nextdn = dn->child; + else if (dn->sibling) + /* ok, try next sibling instead. */ + nextdn = dn->sibling; if (!nextdn) { /* Walk up to next valid sibling. */ do { @@ -124,26 +113,16 @@ return NULL; } -/* - * Same as traverse_pci_devices except this does it for all phbs. - */ -static void *traverse_all_pci_devices(traverse_func pre) +void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb) { - struct pci_controller *phb, *tmp; - void *ret; + struct device_node * dn = (struct device_node *) phb->arch_data; - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) - if ((ret = traverse_pci_devices(phb->arch_data, pre, phb)) - != NULL) - return ret; - return NULL; -} + /* PHB nodes themselves must not match */ + dn->devfn = dn->busno = -1; + dn->phb = phb; -void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb) -{ /* Update dn->phb ptrs for new phb and children devices */ - traverse_pci_devices((struct device_node *)phb->arch_data, - update_dn_pci_info, phb); + traverse_pci_devices(dn, update_dn_pci_info, phb); } /* @@ -154,6 +133,7 @@ { int busno = ((unsigned long)data >> 8) & 0xff; int devfn = ((unsigned long)data) & 0xff; + return ((devfn == dn->devfn) && (busno == dn->busno)) ? dn : NULL; } @@ -180,10 +160,8 @@ phb_dn = phb->arch_data; dn = traverse_pci_devices(phb_dn, is_devfn_node, (void *)searchval); - if (dn) { + if (dn) dev->sysdata = dn; - /* ToDo: call some device init hook here */ - } return dn; } EXPORT_SYMBOL(fetch_dev_dn); @@ -195,8 +173,11 @@ */ void __init pci_devs_phb_init(void) { + struct pci_controller *phb, *tmp; + /* This must be done first so the device nodes have valid pci info! */ - traverse_all_pci_devices(update_dn_pci_info); + list_for_each_entry_safe(phb, tmp, &hose_list, list_node) + pci_devs_phb_init_dynamic(phb); } -- Benjamin Herrenschmidt From benh at kernel.crashing.org Mon Oct 25 11:50:30 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 25 Oct 2004 11:50:30 +1000 Subject: [PATCH] ppc64: cleanup/split SMP code Message-ID: <1098669030.30012.8.camel@gaston> Hi ! This patch depends at least on two previously posted ones (and not yet merged). [PATCH] ppc64: Fix pSeries secondary CPU setup [PATCH] ppc64: Rewrite the openpic driver Splits arch/ppc64/kernel/smp.c into 3 different files, smp.c, pSeries_smp.c and iSeries_smp.c, thus removing most of the #define mess in those files and making it easier to add a new platform. Signed-off-by: Benjamin Herrenschmidt Index: linux-work/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/pSeries_smp.c 2004-10-25 11:29:38.804091696 +1000 @@ -0,0 +1,393 @@ +/* + * SMP support for pSeries machines. + * + * Dave Engebretsen, Peter Bergner, and + * Mike Corrigan {engebret|bergner|mikec}@us.ibm.com + * + * Plus various changes from other IBM teams... + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mpic.h" + +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif + +extern void pseries_secondary_smp_init(unsigned long); + +static void vpa_init(int cpu) +{ + unsigned long flags, pcpu = get_hard_smp_processor_id(cpu); + + /* Register the Virtual Processor Area (VPA) */ + flags = 1UL << (63 - 18); + register_vpa(flags, pcpu, __pa((unsigned long)&(paca[cpu].lppaca))); +} + + +/* Get state of physical CPU. + * Return codes: + * 0 - The processor is in the RTAS stopped state + * 1 - stop-self is in progress + * 2 - The processor is not in the RTAS stopped state + * -1 - Hardware Error + * -2 - Hardware Busy, Try again later. + */ +static int query_cpu_stopped(unsigned int pcpu) +{ + int cpu_status; + int status, qcss_tok; + + qcss_tok = rtas_token("query-cpu-stopped-state"); + if (qcss_tok == RTAS_UNKNOWN_SERVICE) + return -1; + status = rtas_call(qcss_tok, 1, 2, &cpu_status, pcpu); + if (status != 0) { + printk(KERN_ERR + "RTAS query-cpu-stopped-state failed: %i\n", status); + return status; + } + + return cpu_status; +} + + +#ifdef CONFIG_HOTPLUG_CPU + +int __cpu_disable(void) +{ + /* FIXME: go put this in a header somewhere */ + extern void xics_migrate_irqs_away(void); + + systemcfg->processorCount--; + + /*fix boot_cpuid here*/ + if (smp_processor_id() == boot_cpuid) + boot_cpuid = any_online_cpu(cpu_online_map); + + /* FIXME: abstract this to not be platform specific later on */ + xics_migrate_irqs_away(); + return 0; +} + +void __cpu_die(unsigned int cpu) +{ + int tries; + int cpu_status; + unsigned int pcpu = get_hard_smp_processor_id(cpu); + + for (tries = 0; tries < 25; tries++) { + cpu_status = query_cpu_stopped(pcpu); + if (cpu_status == 0 || cpu_status == -1) + break; + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(HZ/5); + } + if (cpu_status != 0) { + printk("Querying DEAD? cpu %i (%i) shows %i\n", + cpu, pcpu, cpu_status); + } + + /* Isolation and deallocation are definatly done by + * drslot_chrp_cpu. If they were not they would be + * done here. Change isolate state to Isolate and + * change allocation-state to Unusable. + */ + paca[cpu].cpu_start = 0; +} + +/* Search all cpu device nodes for an offline logical cpu. If a + * device node has a "ibm,my-drc-index" property (meaning this is an + * LPAR), paranoid-check whether we own the cpu. For each "thread" + * of a cpu, if it is offline and has the same hw index as before, + * grab that in preference. + */ +static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) +{ + struct device_node *np = NULL; + unsigned int best = -1U; + + while ((np = of_find_node_by_type(np, "cpu"))) { + int nr_threads, len; + u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); + u32 *tid = (u32 *) + get_property(np, "ibm,ppc-interrupt-server#s", &len); + + if (!tid) + tid = (u32 *)get_property(np, "reg", &len); + + if (!tid) + continue; + + /* If there is a drc-index, make sure that we own + * the cpu. + */ + if (index) { + int state; + int rc = rtas_get_sensor(9003, *index, &state); + if (rc != 0 || state != 1) + continue; + } + + nr_threads = len / sizeof(u32); + + while (nr_threads--) { + if (0 == query_cpu_stopped(tid[nr_threads])) { + best = tid[nr_threads]; + if (best == old_hwindex) + goto out; + } + } + } +out: + of_node_put(np); + return best; +} + +/** + * smp_startup_cpu() - start the given cpu + * + * At boot time, there is nothing to do. At run-time, call RTAS with + * the appropriate start location, if the cpu is in the RTAS stopped + * state. + * + * Returns: + * 0 - failure + * 1 - success + */ +static inline int __devinit smp_startup_cpu(unsigned int lcpu) +{ + int status; + unsigned long start_here = __pa((u32)*((unsigned long *) + pseries_secondary_smp_init)); + unsigned int pcpu; + + /* At boot time the cpus are already spinning in hold + * loops, so nothing to do. */ + if (system_state < SYSTEM_RUNNING) + return 1; + + pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); + if (pcpu == -1U) { + printk(KERN_INFO "No more cpus available, failing\n"); + return 0; + } + + /* Fixup atomic count: it exited inside IRQ handler. */ + paca[lcpu].__current->thread_info->preempt_count = 0; + + /* At boot this is done in prom.c. */ + paca[lcpu].hw_cpu_id = pcpu; + + status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, + pcpu, start_here, lcpu); + if (status != 0) { + printk(KERN_ERR "start-cpu failed: %i\n", status); + return 0; + } + return 1; +} +#else /* ... CONFIG_HOTPLUG_CPU */ +static inline int __devinit smp_startup_cpu(unsigned int lcpu) +{ + return 1; +} +#endif /* CONFIG_HOTPLUG_CPU */ + +static inline void smp_xics_do_message(int cpu, int msg) +{ + set_bit(msg, &xics_ipi_message[cpu].value); + mb(); + xics_cause_IPI(cpu); +} + +static void smp_xics_message_pass(int target, int msg) +{ + unsigned int i; + + if (target < NR_CPUS) { + smp_xics_do_message(target, msg); + } else { + for_each_online_cpu(i) { + if (target == MSG_ALL_BUT_SELF + && i == smp_processor_id()) + continue; + smp_xics_do_message(i, msg); + } + } +} + +extern void xics_request_IPIs(void); + +static int __init smp_xics_probe(void) +{ + xics_request_IPIs(); + + return cpus_weight(cpu_possible_map); +} + +static void __devinit smp_xics_setup_cpu(int cpu) +{ + if (cpu != boot_cpuid) + xics_setup_cpu(); +} + +static spinlock_t timebase_lock = SPIN_LOCK_UNLOCKED; +static unsigned long timebase = 0; + +static void __devinit pSeries_give_timebase(void) +{ + spin_lock(&timebase_lock); + rtas_call(rtas_token("freeze-time-base"), 0, 1, NULL); + timebase = get_tb(); + spin_unlock(&timebase_lock); + + while (timebase) + barrier(); + rtas_call(rtas_token("thaw-time-base"), 0, 1, NULL); +} + +static void __devinit pSeries_take_timebase(void) +{ + while (!timebase) + barrier(); + spin_lock(&timebase_lock); + set_tb(timebase >> 32, timebase & 0xffffffff); + timebase = 0; + spin_unlock(&timebase_lock); +} + +static void __devinit pSeries_late_setup_cpu(int cpu) +{ + extern unsigned int default_distrib_server; + + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { + vpa_init(cpu); + } + +#ifdef CONFIG_IRQ_ALL_CPUS + /* Put the calling processor into the GIQ. This is really only + * necessary from a secondary thread as the OF start-cpu interface + * performs this function for us on primary threads. + */ + /* TODO: 9005 is #defined in rtas-proc.c -- move to a header */ + rtas_set_indicator(9005, default_distrib_server, 1); +#endif +} + + +void __devinit smp_pSeries_kick_cpu(int nr) +{ + BUG_ON(nr < 0 || nr >= NR_CPUS); + + if (!smp_startup_cpu(nr)) + return; + + /* + * The processor is currently spinning, waiting for the + * cpu_start field to become non-zero After we set cpu_start, + * the processor will continue on to secondary_start + */ + paca[nr].cpu_start = 1; +} + +static struct smp_ops_t pSeries_mpic_smp_ops = { + .message_pass = smp_mpic_message_pass, + .probe = smp_mpic_probe, + .kick_cpu = smp_pSeries_kick_cpu, + .setup_cpu = smp_mpic_setup_cpu, + .late_setup_cpu = pSeries_late_setup_cpu, +}; + +static struct smp_ops_t pSeries_xics_smp_ops = { + .message_pass = smp_xics_message_pass, + .probe = smp_xics_probe, + .kick_cpu = smp_pSeries_kick_cpu, + .setup_cpu = smp_xics_setup_cpu, + .late_setup_cpu = pSeries_late_setup_cpu, +}; + +/* This is called very early */ +void __init smp_init_pSeries(void) +{ + int ret, i; + + DBG(" -> smp_init_pSeries()\n"); + + if (naca->interrupt_controller == IC_OPEN_PIC) + smp_ops = &pSeries_mpic_smp_ops; + else + smp_ops = &pSeries_xics_smp_ops; + + /* Start secondary threads on SMT systems; primary threads + * are already in the running state. + */ + for_each_present_cpu(i) { + if (query_cpu_stopped(get_hard_smp_processor_id(i)) == 0) { + printk("%16.16x : starting thread\n", i); + DBG("%16.16x : starting thread\n", i); + rtas_call(rtas_token("start-cpu"), 3, 1, &ret, + get_hard_smp_processor_id(i), + __pa((u32)*((unsigned long *) + pseries_secondary_smp_init)), + i); + } + } + + if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) + vpa_init(boot_cpuid); + + /* Non-lpar has additional take/give timebase */ + if (systemcfg->platform == PLATFORM_PSERIES) { + smp_ops->give_timebase = pSeries_give_timebase; + smp_ops->take_timebase = pSeries_take_timebase; + } + + + DBG(" <- smp_init_pSeries()\n"); +} + Index: linux-work/arch/ppc64/kernel/smp.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/smp.c 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/arch/ppc64/kernel/smp.c 2004-10-25 11:29:38.849084856 +1000 @@ -43,19 +43,14 @@ #include #include #include -#include -#include -#include #include #include #include -#include #include #include +#include #include "mpic.h" -#include -#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -89,110 +84,6 @@ /* Low level assembly function used to backup CPU 0 state */ extern void __save_cpu_setup(void); -extern void pseries_secondary_smp_init(unsigned long); - -#ifdef CONFIG_PPC_ISERIES -static unsigned long iSeries_smp_message[NR_CPUS]; - -void iSeries_smp_message_recv( struct pt_regs * regs ) -{ - int cpu = smp_processor_id(); - int msg; - - if ( num_online_cpus() < 2 ) - return; - - for ( msg = 0; msg < 4; ++msg ) - if ( test_and_clear_bit( msg, &iSeries_smp_message[cpu] ) ) - smp_message_recv( msg, regs ); -} - -static inline void smp_iSeries_do_message(int cpu, int msg) -{ - set_bit(msg, &iSeries_smp_message[cpu]); - HvCall_sendIPI(&(paca[cpu])); -} - -static void smp_iSeries_message_pass(int target, int msg) -{ - int i; - - if (target < NR_CPUS) - smp_iSeries_do_message(target, msg); - else { - for_each_online_cpu(i) { - if (target == MSG_ALL_BUT_SELF - && i == smp_processor_id()) - continue; - smp_iSeries_do_message(i, msg); - } - } -} - -static int smp_iSeries_numProcs(void) -{ - unsigned np, i; - - np = 0; - for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { - cpu_set(i, cpu_possible_map); - cpu_set(i, cpu_present_map); - ++np; - } - } - return np; -} - -static int smp_iSeries_probe(void) -{ - unsigned i; - unsigned np = 0; - - for (i=0; i < NR_CPUS; ++i) { - if (paca[i].lppaca.xDynProcStatus < 2) { - /*paca[i].active = 1;*/ - ++np; - } - } - - return np; -} - -static void smp_iSeries_kick_cpu(int nr) -{ - BUG_ON(nr < 0 || nr >= NR_CPUS); - - /* Verify that our partition has a processor nr */ - if (paca[nr].lppaca.xDynProcStatus >= 2) - return; - - /* The processor is currently spinning, waiting - * for the cpu_start field to become non-zero - * After we set cpu_start, the processor will - * continue on to secondary_start in iSeries_head.S - */ - paca[nr].cpu_start = 1; -} - -static void __devinit smp_iSeries_setup_cpu(int nr) -{ -} - -static struct smp_ops_t iSeries_smp_ops = { - .message_pass = smp_iSeries_message_pass, - .probe = smp_iSeries_probe, - .kick_cpu = smp_iSeries_kick_cpu, - .setup_cpu = smp_iSeries_setup_cpu, -}; - -/* This is called very early. */ -void __init smp_init_iSeries(void) -{ - smp_ops = &iSeries_smp_ops; - systemcfg->processorCount = smp_iSeries_numProcs(); -} -#endif #ifdef CONFIG_PPC_MULTIPLATFORM void smp_mpic_message_pass(int target, int msg) @@ -238,213 +129,20 @@ mpic_setup_this_cpu(); } -#endif /* CONFIG_PPC_MULTIPLATFORM */ - -#ifdef CONFIG_PPC_PSERIES - -/* Get state of physical CPU. - * Return codes: - * 0 - The processor is in the RTAS stopped state - * 1 - stop-self is in progress - * 2 - The processor is not in the RTAS stopped state - * -1 - Hardware Error - * -2 - Hardware Busy, Try again later. - */ -int query_cpu_stopped(unsigned int pcpu) -{ - int cpu_status; - int status, qcss_tok; - - DBG(" -> query_cpu_stopped(%d)\n", pcpu); - qcss_tok = rtas_token("query-cpu-stopped-state"); - if (qcss_tok == RTAS_UNKNOWN_SERVICE) - return -1; - status = rtas_call(qcss_tok, 1, 2, &cpu_status, pcpu); - if (status != 0) { - printk(KERN_ERR - "RTAS query-cpu-stopped-state failed: %i\n", status); - return status; - } - - DBG(" <- query_cpu_stopped(), status: %d\n", cpu_status); - - return cpu_status; -} - -#ifdef CONFIG_HOTPLUG_CPU - -int __cpu_disable(void) -{ - /* FIXME: go put this in a header somewhere */ - extern void xics_migrate_irqs_away(void); - - systemcfg->processorCount--; - - /*fix boot_cpuid here*/ - if (smp_processor_id() == boot_cpuid) - boot_cpuid = any_online_cpu(cpu_online_map); - - /* FIXME: abstract this to not be platform specific later on */ - xics_migrate_irqs_away(); - return 0; -} - -void __cpu_die(unsigned int cpu) -{ - int tries; - int cpu_status; - unsigned int pcpu = get_hard_smp_processor_id(cpu); - - for (tries = 0; tries < 25; tries++) { - cpu_status = query_cpu_stopped(pcpu); - if (cpu_status == 0 || cpu_status == -1) - break; - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(HZ/5); - } - if (cpu_status != 0) { - printk("Querying DEAD? cpu %i (%i) shows %i\n", - cpu, pcpu, cpu_status); - } - - /* Isolation and deallocation are definatly done by - * drslot_chrp_cpu. If they were not they would be - * done here. Change isolate state to Isolate and - * change allocation-state to Unusable. - */ - paca[cpu].cpu_start = 0; - - /* So we can recognize if it fails to come up next time. */ - cpu_callin_map[cpu] = 0; -} - -/* Kill this cpu */ -void cpu_die(void) -{ - local_irq_disable(); - /* Some hardware requires clearing the CPPR, while other hardware does not - * it is safe either way - */ - pSeriesLP_cppr_info(0, 0); - rtas_stop_self(); - /* Should never get here... */ - BUG(); - for(;;); -} - -/* Search all cpu device nodes for an offline logical cpu. If a - * device node has a "ibm,my-drc-index" property (meaning this is an - * LPAR), paranoid-check whether we own the cpu. For each "thread" - * of a cpu, if it is offline and has the same hw index as before, - * grab that in preference. - */ -static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) -{ - struct device_node *np = NULL; - unsigned int best = -1U; - - while ((np = of_find_node_by_type(np, "cpu"))) { - int nr_threads, len; - u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); - u32 *tid = (u32 *) - get_property(np, "ibm,ppc-interrupt-server#s", &len); - - if (!tid) - tid = (u32 *)get_property(np, "reg", &len); - - if (!tid) - continue; - - /* If there is a drc-index, make sure that we own - * the cpu. - */ - if (index) { - int state; - int rc = rtas_get_sensor(9003, *index, &state); - if (rc != 0 || state != 1) - continue; - } - - nr_threads = len / sizeof(u32); - - while (nr_threads--) { - if (0 == query_cpu_stopped(tid[nr_threads])) { - best = tid[nr_threads]; - if (best == old_hwindex) - goto out; - } - } - } -out: - of_node_put(np); - return best; -} - -/** - * smp_startup_cpu() - start the given cpu - * - * At boot time, there is nothing to do. At run-time, call RTAS with - * the appropriate start location, if the cpu is in the RTAS stopped - * state. - * - * Returns: - * 0 - failure - * 1 - success - */ -static inline int __devinit smp_startup_cpu(unsigned int lcpu) -{ - int status; - unsigned long start_here = __pa((u32)*((unsigned long *) - pseries_secondary_smp_init)); - unsigned int pcpu; - - /* At boot time the cpus are already spinning in hold - * loops, so nothing to do. */ - if (system_state < SYSTEM_RUNNING) - return 1; - - pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); - if (pcpu == -1U) { - printk(KERN_INFO "No more cpus available, failing\n"); - return 0; - } - - /* Fixup atomic count: it exited inside IRQ handler. */ - paca[lcpu].__current->thread_info->preempt_count = 0; - - /* At boot this is done in prom.c. */ - paca[lcpu].hw_cpu_id = pcpu; - - status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, - pcpu, start_here, lcpu); - if (status != 0) { - printk(KERN_ERR "start-cpu failed: %i\n", status); - return 0; - } - return 1; -} -#else /* ... CONFIG_HOTPLUG_CPU */ -static inline int __devinit smp_startup_cpu(unsigned int lcpu) -{ - return 1; -} -#endif /* CONFIG_HOTPLUG_CPU */ - -static void smp_pSeries_kick_cpu(int nr) +void __devinit smp_generic_kick_cpu(int nr) { BUG_ON(nr < 0 || nr >= NR_CPUS); - if (!smp_startup_cpu(nr)) - return; - /* * The processor is currently spinning, waiting for the * cpu_start field to become non-zero After we set cpu_start, * the processor will continue on to secondary_start */ paca[nr].cpu_start = 1; + mb(); } -#endif /* CONFIG_PPC_PSERIES */ + +#endif /* CONFIG_PPC_MULTIPLATFORM */ static void __init smp_space_timers(unsigned int max_cpus) { @@ -461,136 +159,6 @@ } } -#ifdef CONFIG_PPC_PSERIES -static void vpa_init(int cpu) -{ - unsigned long flags, pcpu = get_hard_smp_processor_id(cpu); - - /* Register the Virtual Processor Area (VPA) */ - flags = 1UL << (63 - 18); - register_vpa(flags, pcpu, __pa((unsigned long)&(paca[cpu].lppaca))); -} - -static inline void smp_xics_do_message(int cpu, int msg) -{ - set_bit(msg, &xics_ipi_message[cpu].value); - mb(); - xics_cause_IPI(cpu); -} - -static void smp_xics_message_pass(int target, int msg) -{ - unsigned int i; - - if (target < NR_CPUS) { - smp_xics_do_message(target, msg); - } else { - for_each_online_cpu(i) { - if (target == MSG_ALL_BUT_SELF - && i == smp_processor_id()) - continue; - smp_xics_do_message(i, msg); - } - } -} - -extern void xics_request_IPIs(void); - -static int __init smp_xics_probe(void) -{ -#ifdef CONFIG_SMP - xics_request_IPIs(); -#endif - - return cpus_weight(cpu_possible_map); -} - -static void __devinit smp_xics_setup_cpu(int cpu) -{ - if (cpu != boot_cpuid) - xics_setup_cpu(); -} - -static spinlock_t timebase_lock = SPIN_LOCK_UNLOCKED; -static unsigned long timebase = 0; - -static void __devinit pSeries_give_timebase(void) -{ - spin_lock(&timebase_lock); - rtas_call(rtas_token("freeze-time-base"), 0, 1, NULL); - timebase = get_tb(); - spin_unlock(&timebase_lock); - - while (timebase) - barrier(); - rtas_call(rtas_token("thaw-time-base"), 0, 1, NULL); -} - -static void __devinit pSeries_take_timebase(void) -{ - while (!timebase) - barrier(); - spin_lock(&timebase_lock); - set_tb(timebase >> 32, timebase & 0xffffffff); - timebase = 0; - spin_unlock(&timebase_lock); -} - -static struct smp_ops_t pSeries_mpic_smp_ops = { - .message_pass = smp_mpic_message_pass, - .probe = smp_mpic_probe, - .kick_cpu = smp_pSeries_kick_cpu, - .setup_cpu = smp_mpic_setup_cpu, -}; - -static struct smp_ops_t pSeries_xics_smp_ops = { - .message_pass = smp_xics_message_pass, - .probe = smp_xics_probe, - .kick_cpu = smp_pSeries_kick_cpu, - .setup_cpu = smp_xics_setup_cpu, -}; - -/* This is called very early */ -void __init smp_init_pSeries(void) -{ - int ret, i; - - DBG(" -> smp_init_pSeries()\n"); - - if (naca->interrupt_controller == IC_OPEN_PIC) - smp_ops = &pSeries_mpic_smp_ops; - else - smp_ops = &pSeries_xics_smp_ops; - - /* Start secondary threads on SMT systems; primary threads - * are already in the running state. - */ - for_each_present_cpu(i) { - if (query_cpu_stopped(get_hard_smp_processor_id(i)) == 0) { - printk("%16.16x : starting thread\n", i); - DBG("%16.16x : starting thread\n", i); - rtas_call(rtas_token("start-cpu"), 3, 1, &ret, - get_hard_smp_processor_id(i), - __pa((u32)*((unsigned long *) - pseries_secondary_smp_init)), - i); - } - } - - if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) - vpa_init(boot_cpuid); - - /* Non-lpar has additional take/give timebase */ - if (systemcfg->platform == PLATFORM_PSERIES) { - smp_ops->give_timebase = pSeries_give_timebase; - smp_ops->take_timebase = pSeries_take_timebase; - } - - - DBG(" <- smp_init_pSeries()\n"); -} -#endif /* CONFIG_PPC_PSERIES */ - void smp_local_timer_interrupt(struct pt_regs * regs) { update_process_times(user_mode(regs)); @@ -813,6 +381,8 @@ { unsigned int cpu; + DBG("smp_prepare_cpus\n"); + /* * setup_cpu may need to be called on the boot cpu. We havent * spun any cpus up but lets be paranoid. @@ -877,6 +447,11 @@ paca[cpu].stab_real = virt_to_abs(tmp); } + /* Make sure callin-map entry is 0 (can be leftover a CPU + * hotplug + */ + cpu_callin_map[cpu] = 0; + /* The information for processor bringup must * be written out to main store before we release * the processor. @@ -884,6 +459,7 @@ mb(); /* wake up cpus */ + DBG("smp: kicking cpu %d\n", cpu); smp_ops->kick_cpu(cpu); /* @@ -923,7 +499,7 @@ return 0; } -extern unsigned int default_distrib_server; + /* Activate a secondary processor. */ int __devinit start_secondary(void *unused) { @@ -940,20 +516,8 @@ if (smp_ops->take_timebase) smp_ops->take_timebase(); -#ifdef CONFIG_PPC_PSERIES - if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) { - vpa_init(cpu); - } - -#ifdef CONFIG_IRQ_ALL_CPUS - /* Put the calling processor into the GIQ. This is really only - * necessary from a secondary thread as the OF start-cpu interface - * performs this function for us on primary threads. - */ - /* TODO: 9005 is #defined in rtas-proc.c -- move to a header */ - rtas_set_indicator(9005, default_distrib_server, 1); -#endif -#endif + if (smp_ops->late_setup_cpu) + smp_ops->late_setup_cpu(cpu); spin_lock(&call_lock); cpu_set(cpu, cpu_online_map); Index: linux-work/include/asm-ppc64/machdep.h =================================================================== --- linux-work.orig/include/asm-ppc64/machdep.h 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/include/asm-ppc64/machdep.h 2004-10-25 11:29:38.890078624 +1000 @@ -28,6 +28,7 @@ int (*probe)(void); void (*kick_cpu)(int nr); void (*setup_cpu)(int nr); + void (*late_setup_cpu)(int nr); void (*take_timebase)(void); void (*give_timebase)(void); }; @@ -86,6 +87,7 @@ void (*power_off)(void); void (*halt)(void); void (*panic)(char *str); + void (*cpu_die)(void); int (*set_rtc_time)(struct rtc_time *); void (*get_rtc_time)(struct rtc_time *); Index: linux-work/include/asm-ppc64/smp.h =================================================================== --- linux-work.orig/include/asm-ppc64/smp.h 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/include/asm-ppc64/smp.h 2004-10-25 11:29:38.894078016 +1000 @@ -28,6 +28,8 @@ extern int boot_cpuid; +extern void cpu_die(void) __attribute__((noreturn)); + #ifdef CONFIG_SMP extern void smp_send_debugger_break(int cpu); @@ -57,9 +59,7 @@ extern int __cpu_disable(void); extern void __cpu_die(unsigned int cpu); -extern void cpu_die(void) __attribute__((noreturn)); -extern int query_cpu_stopped(unsigned int pcpu); -#endif /* !(CONFIG_SMP) */ +#endif /* CONFIG_SMP */ #define get_hard_smp_processor_id(CPU) (paca[(CPU)].hw_cpu_id) #define set_hard_smp_processor_id(CPU, VAL) \ @@ -70,6 +70,12 @@ extern int smp_mpic_probe(void); extern void smp_mpic_setup_cpu(int cpu); extern void smp_mpic_message_pass(int target, int msg); +extern void smp_generic_kick_cpu(int nr); + +extern void smp_generic_give_timebase(void); +extern void smp_generic_take_timebase(void); + +extern struct smp_ops_t *smp_ops; #endif /* __ASSEMBLY__ */ Index: linux-work/arch/ppc64/kernel/iSeries_smp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-work/arch/ppc64/kernel/iSeries_smp.c 2004-10-25 11:29:38.896077712 +1000 @@ -0,0 +1,151 @@ +/* + * SMP support for iSeries machines. + * + * Dave Engebretsen, Peter Bergner, and + * Mike Corrigan {engebret|bergner|mikec}@us.ibm.com + * + * Plus various changes from other IBM teams... + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static unsigned long iSeries_smp_message[NR_CPUS]; + +void iSeries_smp_message_recv( struct pt_regs * regs ) +{ + int cpu = smp_processor_id(); + int msg; + + if ( num_online_cpus() < 2 ) + return; + + for ( msg = 0; msg < 4; ++msg ) + if ( test_and_clear_bit( msg, &iSeries_smp_message[cpu] ) ) + smp_message_recv( msg, regs ); +} + +static inline void smp_iSeries_do_message(int cpu, int msg) +{ + set_bit(msg, &iSeries_smp_message[cpu]); + HvCall_sendIPI(&(paca[cpu])); +} + +static void smp_iSeries_message_pass(int target, int msg) +{ + int i; + + if (target < NR_CPUS) + smp_iSeries_do_message(target, msg); + else { + for_each_online_cpu(i) { + if (target == MSG_ALL_BUT_SELF + && i == smp_processor_id()) + continue; + smp_iSeries_do_message(i, msg); + } + } +} + +static int smp_iSeries_numProcs(void) +{ + unsigned np, i; + + np = 0; + for (i=0; i < NR_CPUS; ++i) { + if (paca[i].lppaca.xDynProcStatus < 2) { + cpu_set(i, cpu_possible_map); + cpu_set(i, cpu_present_map); + ++np; + } + } + return np; +} + +static int smp_iSeries_probe(void) +{ + unsigned i; + unsigned np = 0; + + for (i=0; i < NR_CPUS; ++i) { + if (paca[i].lppaca.xDynProcStatus < 2) { + /*paca[i].active = 1;*/ + ++np; + } + } + + return np; +} + +static void smp_iSeries_kick_cpu(int nr) +{ + BUG_ON(nr < 0 || nr >= NR_CPUS); + + /* Verify that our partition has a processor nr */ + if (paca[nr].lppaca.xDynProcStatus >= 2) + return; + + /* The processor is currently spinning, waiting + * for the cpu_start field to become non-zero + * After we set cpu_start, the processor will + * continue on to secondary_start in iSeries_head.S + */ + paca[nr].cpu_start = 1; +} + +static void __devinit smp_iSeries_setup_cpu(int nr) +{ +} + +static struct smp_ops_t iSeries_smp_ops = { + .message_pass = smp_iSeries_message_pass, + .probe = smp_iSeries_probe, + .kick_cpu = smp_iSeries_kick_cpu, + .setup_cpu = smp_iSeries_setup_cpu, +}; + +/* This is called very early. */ +void __init smp_init_iSeries(void) +{ + smp_ops = &iSeries_smp_ops; + systemcfg->processorCount = smp_iSeries_numProcs(); +} + Index: linux-work/arch/ppc64/kernel/pmac_smp.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pmac_smp.c 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pmac_smp.c 2004-10-25 11:29:38.909075736 +1000 @@ -21,6 +21,9 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ + +#undef DEBUG + #include #include #include @@ -51,6 +54,11 @@ #include "mpic.h" +#ifdef DEBUG +#define DBG(fmt...) udbg_printf(fmt) +#else +#define DBG(fmt...) +#endif extern void pmac_secondary_start_1(void); extern void pmac_secondary_start_2(void); @@ -102,15 +110,16 @@ * b .pmac_secondary_start - KERNELBASE */ switch(nr) { - case 1: - new_vector = (unsigned long)pmac_secondary_start_1; - break; - case 2: - new_vector = (unsigned long)pmac_secondary_start_2; - break; - case 3: - new_vector = (unsigned long)pmac_secondary_start_3; - break; + case 1: + new_vector = (unsigned long)pmac_secondary_start_1; + break; + case 2: + new_vector = (unsigned long)pmac_secondary_start_2; + break; + case 3: + default: + new_vector = (unsigned long)pmac_secondary_start_3; + break; } *vector = 0x48000002 + (new_vector - KERNELBASE); @@ -149,13 +158,10 @@ */ if (num_online_cpus() < 2) g5_phy_disable_cpu1(); - if (ppc_md.progress) ppc_md.progress("core99_setup_cpu 0 done", 0x349); + if (ppc_md.progress) ppc_md.progress("smp_core99_setup_cpu 0 done", 0x349); } } -extern void smp_generic_give_timebase(void); -extern void smp_generic_take_timebase(void); - struct smp_ops_t core99_smp_ops __pmacdata = { .message_pass = smp_mpic_message_pass, .probe = smp_core99_probe, Index: linux-work/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/pSeries_setup.c 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/arch/ppc64/kernel/pSeries_setup.c 2004-10-25 11:29:38.911075432 +1000 @@ -321,6 +321,20 @@ } } +static void pSeries_cpu_die(void) +{ + local_irq_disable(); + /* Some hardware requires clearing the CPPR, while other hardware does not + * it is safe either way + */ + pSeriesLP_cppr_info(0, 0); + rtas_stop_self(); + /* Should never get here... */ + BUG(); + for(;;); +} + + /* * Early initialization. Relocation is on but do not reference unbolted pages */ @@ -588,6 +602,7 @@ .power_off = rtas_power_off, .halt = rtas_halt, .panic = rtas_os_term, + .cpu_die = pSeries_cpu_die, .get_boot_time = pSeries_get_boot_time, .get_rtc_time = pSeries_get_rtc_time, .set_rtc_time = pSeries_set_rtc_time, Index: linux-work/arch/ppc64/kernel/setup.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/setup.c 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/arch/ppc64/kernel/setup.c 2004-10-25 11:29:38.923073608 +1000 @@ -1308,3 +1308,10 @@ early_param("xmon", early_xmon); #endif +void cpu_die(void) +{ + if (ppc_md.cpu_die) + ppc_md.cpu_die(); + local_irq_disable(); + for (;;); +} Index: linux-work/arch/ppc64/kernel/Makefile =================================================================== --- linux-work.orig/arch/ppc64/kernel/Makefile 2004-10-25 10:24:50.000000000 +1000 +++ linux-work/arch/ppc64/kernel/Makefile 2004-10-25 11:29:38.932072240 +1000 @@ -53,6 +53,8 @@ ifdef CONFIG_SMP obj-$(CONFIG_PPC_PMAC) += pmac_smp.o smp-tbsync.o +obj-$(CONFIG_PPC_ISERIES) += iSeries_smp.o +obj-$(CONFIG_PPC_PSERIES) += pSeries_smp.o endif obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o From sfr at canb.auug.org.au Mon Oct 25 17:35:24 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 25 Oct 2004 17:35:24 +1000 Subject: [PATCH] iSeries console: cleanup after tty_write user copies removal Message-ID: <20041025173524.43932e3e.sfr@canb.auug.org.au> Hi Andrew, This patch just removes more of the infrastructure in the PPC64 iSeries console driver that is no longer needed since we no longer need to do copies from user mode in the tty drivers. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.6.10-rc1-bk2/drivers/char/viocons.c 2.6.10-rc1-bk2-viocons.1/drivers/char/viocons.c --- 2.6.10-rc1-bk2/drivers/char/viocons.c 2004-10-25 15:37:13.000000000 +1000 +++ 2.6.10-rc1-bk2-viocons.1/drivers/char/viocons.c 2004-10-25 17:03:17.000000000 +1000 @@ -83,15 +83,6 @@ u8 data[VIOCHAR_MAX_DATA]; }; -/* - * This is a place where we handle the distribution of memory - * for copy_from_user() calls. The buffer_available array is to - * help us determine which buffer to use. - */ -#define VIOCHAR_NUM_CFU_BUFFERS 7 -static struct viocharlpevent viocons_cfu_buffer[VIOCHAR_NUM_CFU_BUFFERS]; -static atomic_t viocons_cfu_buffer_available[VIOCHAR_NUM_CFU_BUFFERS]; - #define VIOCHAR_WINDOW 10 #define VIOCHAR_HIGHWATERMARK 3 @@ -207,50 +198,6 @@ } /* - * This function should ONLY be called once from viocons_init2 - */ -static void viocons_init_cfu_buffer(void) -{ - int i; - - for (i = 1; i < VIOCHAR_NUM_CFU_BUFFERS; i++) - atomic_set(&viocons_cfu_buffer_available[i], 1); -} - -static struct viocharlpevent *viocons_get_cfu_buffer(void) -{ - int i; - - /* - * Grab the first available buffer. It doesn't matter if we - * are interrupted during this array traversal as long as we - * get an available space. - */ - for (i = 0; i < VIOCHAR_NUM_CFU_BUFFERS; i++) - if (atomic_dec_if_positive(&viocons_cfu_buffer_available[i]) - == 0 ) - return &viocons_cfu_buffer[i]; - hvlog("\n\rviocons: viocons_get_cfu_buffer : no free buffers found"); - return NULL; -} - -static void viocons_free_cfu_buffer(struct viocharlpevent *buffer) -{ - int i; - - i = buffer - &viocons_cfu_buffer[0]; - if (i >= (sizeof(viocons_cfu_buffer) / sizeof(viocons_cfu_buffer[0]))) { - hvlog("\n\rviocons: viocons_free_cfu_buffer : buffer pointer not found in list."); - return; - } - if (atomic_read(&viocons_cfu_buffer_available[i]) != 0) { - hvlog("\n\rviocons: WARNING : returning unallocated cfu buffer."); - return; - } - atomic_set(&viocons_cfu_buffer_available[i], 1); -} - -/* * Add data to our pending-send buffers. * * NOTE: Don't use printk in here because it gets nastily recursive. @@ -438,15 +385,14 @@ * NOTE: Don't use printk in here because it gets nastily recursive. hvlog * can be used to log to the hypervisor buffer */ -static int internal_write(struct port_info *pi, const char *buf, - size_t len, struct viocharlpevent *viochar) +static int internal_write(struct port_info *pi, const char *buf, size_t len) { HvLpEvent_Rc hvrc; size_t bleft; size_t curlen; const char *curbuf; unsigned long flags; - int copy_needed = (viochar == NULL); + struct viocharlpevent *viochar; /* * Write to the hvlog of inbound data are now done prior to @@ -462,25 +408,13 @@ spin_lock_irqsave(&consolelock, flags); - /* - * If the internal_write() was passed a pointer to a - * viocharlpevent then we don't need to allocate a new one - * (this is the case where we are internal_writing user space - * data). If we aren't writing user space data then we need - * to get an event from viopath. - */ - if (copy_needed) { - /* This one is fetched from the viopath data structure */ - viochar = (struct viocharlpevent *) - vio_get_event_buffer(viomajorsubtype_chario); - /* Make sure we got a buffer */ - if (viochar == NULL) { - spin_unlock_irqrestore(&consolelock, flags); - hvlog("\n\rviocons: Can't get viochar buffer in internal_write()."); - return -EAGAIN; - } - initDataEvent(viochar, pi->lp); + viochar = vio_get_event_buffer(viomajorsubtype_chario); + if (viochar == NULL) { + spin_unlock_irqrestore(&consolelock, flags); + hvlog("\n\rviocons: Can't get vio buffer in internal_write()."); + return -EAGAIN; } + initDataEvent(viochar, pi->lp); curbuf = buf; bleft = len; @@ -493,25 +427,16 @@ curlen = bleft; viochar->event.xCorrelationToken = pi->seq++; - - if (copy_needed) { - memcpy(viochar->data, curbuf, curlen); - viochar->len = curlen; - } - + memcpy(viochar->data, curbuf, curlen); + viochar->len = curlen; viochar->event.xSizeMinus1 = offsetof(struct viocharlpevent, data) + curlen; hvrc = HvCallEvent_signalLpEvent(&viochar->event); if (hvrc) { - spin_unlock_irqrestore(&consolelock, flags); - if (copy_needed) - vio_free_event_buffer(viomajorsubtype_chario, viochar); - hvlog("viocons: error sending event! %d\n", (int)hvrc); - return len - bleft; + goto out; } - curbuf += curlen; bleft -= curlen; } @@ -519,14 +444,9 @@ /* If we didn't send it all, buffer as much of it as we can. */ if (bleft > 0) bleft -= buffer_add(pi, curbuf, bleft); - /* - * Since we grabbed it from the viopath data structure, return - * it to the data structure. - */ - if (copy_needed) - vio_free_event_buffer(viomajorsubtype_chario, viochar); +out: + vio_free_event_buffer(viomajorsubtype_chario, viochar); spin_unlock_irqrestore(&consolelock, flags); - return len - bleft; } @@ -603,18 +523,8 @@ hvlogOutput(s, count); - if (!viopath_isactive(pi->lp)) { - /* - * This is a VERY noisy trace message in the case where the - * path manager is not active or in the case where this - * function is called prior to viocons initialization. It is - * being commented out for the sake of a clear trace buffer. - */ -#if 0 - hvlog("\n\rviocons_write: path not active to lp %d", pi->lp); -#endif + if (!viopath_isactive(pi->lp)) return; - } /* * Any newline character found will cause a @@ -627,17 +537,16 @@ * Newline found. Print everything up to and * including the newline */ - internal_write(pi, &s[begin], index - begin + 1, - NULL); + internal_write(pi, &s[begin], index - begin + 1); begin = index + 1; /* Emit a carriage return as well */ - internal_write(pi, &cr, 1, NULL); + internal_write(pi, &cr, 1); } } /* If any characters left to write, write them now */ if ((index - begin) > 0) - internal_write(pi, &s[begin], index - begin, NULL); + internal_write(pi, &s[begin], index - begin); } /* @@ -721,11 +630,9 @@ /* * TTY Write method */ -static int viotty_write(struct tty_struct *tty, - const unsigned char *buf, int count) +static int viotty_write(struct tty_struct *tty, const unsigned char *buf, + int count) { - int ret; - int total = 0; struct port_info *pi; pi = get_port_data(tty); @@ -746,16 +653,10 @@ * viotty_write call and, since the viopath isn't active to this * partition, return count. */ - if (!viopath_isactive(pi->lp)) { - /* Noisy trace. Commented unless needed. */ -#if 0 - hvlog("\n\rviotty_write: viopath NOT active for lp %d.",pi->lp); -#endif + if (!viopath_isactive(pi->lp)) return count; - } - total = internal_write(pi, buf, count, NULL); - return total; + return internal_write(pi, buf, count); } /* @@ -774,7 +675,7 @@ hvlogOutput(&ch, 1); if (viopath_isactive(pi->lp)) - internal_write(pi, &ch, 1, NULL); + internal_write(pi, &ch, 1); } /* @@ -1270,8 +1171,6 @@ viotty_driver = NULL; } - viocons_init_cfu_buffer(); - unregister_console(&viocons_early); register_console(&viocons); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041025/1ce92fe4/attachment.pgp From dwm at austin.ibm.com Tue Oct 26 08:55:50 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Mon, 25 Oct 2004 17:55:50 -0500 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <58cb370e041024054575c09679@mail.gmail.com> Message-ID: <200410252255.i9PMto6B024865@falcon10.austin.ibm.com> On Sun, 24 Oct 2004 14:45:53 +0200, Bartlomiej Zolnierkiewicz wrote: ... > >The new ide_fix_driveid function seems buggy, >ie. it byte-swaps id->max_multsect with id->vendor3. Ok, lets look at those vars. Both are defined in hdreg.h as bytes. No fields in the data from the device are bytes, but are 16 bit. On big endian, the relative positions for an LE u16 are swapped. If the swap is not done on those, then one replaces the other when read. Probably not what was intended. It appears that another bug is being fixed here. Do you not agree that all reads when doing IDENTIFY xxx DEVICE are retrieved as u16? If not, then the current ide_fix_driveid() code is wrong also. Backup data, taken from the raw bits on the wire via datatransit: $ hexdump -C eio/ata/041025-2.6.9-rc3-wcd-4-dwm.data.bin 00000000 40 00 ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00 |@..?7.......?...| 00000010 00 00 00 00 20 20 20 20 20 20 20 20 20 20 36 20 |.... 6 | 00000020 54 34 30 33 32 30 41 32 00 00 00 00 30 00 42 50 |T40320A2....0.BP| 00000030 30 31 45 33 20 20 4f 54 48 53 42 49 20 41 4b 4d |01E3 OTHSBI AKM| 00000040 30 34 36 32 41 47 42 58 20 20 20 20 20 20 20 20 |0462AGBX | 00000050 20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 80 | ..| 00000060 00 00 00 2f 00 40 00 02 00 00 07 00 ff 3f 10 00 |.../. at .......?..| 00000070 3f 00 10 fc fb 00 10 01 00 53 a8 04 07 00 07 00 |?........S......| 00000080 03 00 78 00 78 00 78 00 78 00 00 00 00 00 00 00 |..x.x.x.x.......| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000a0 7e 00 00 00 6b 7c 08 59 03 40 49 7c 08 18 03 40 |~...k|.Y. at I|...@| 000000b0 3f 20 0f 00 00 00 80 00 fe ff 4b 60 00 00 00 00 |? ........K`....| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000100 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a5 89 |................| 00000200 Note that bytes 5E-5F are '10 80'. Per d1532, table 15, in the version I am looking at: 47 M F 15-8 80h F 7-0 00h = Reserved F 01h-FFh = Maximum number of sectors that shall be transferred per interrupt on READ/WRITE MULTIPLE commands To match max_multisect and vendor3, the bytes must be swapped. unsigned char max_multsect; /* 0=not_implemented */ unsigned char vendor3; /* vendor unique */ Ouch! Oh man. Depending on LE byte ordering in a u16, but only for certain vars. Should this be ifdef'd in hdregs.h? And, and, oh jeez... What is the solution here? Preserve the definitely non-arch neutral format in hdregs.h? All the char values are troubling. Or copy and rename the entire ide_fix_driveid() into isd200? This would be Christoph's choice. ... >The dependency is a bug, is for IDE driver only. The isd200 _is_ a bridge to ATA/ATAPI devices. Does this mean it cannot use common code, just because it is not in drivers/ide? > >Doug, if you kill debugging code in isd200.c then only: > >id->command_set_1 >id->model >id->fw_rev >id->capability >id->lba_capacity >id->heads >id->cyls >id->sectors >id->command_set_2 > >need to be byte-swapped. > I don't plan on killing any debug code. From paulus at samba.org Tue Oct 26 09:12:28 2004 From: paulus at samba.org (Paul Mackerras) Date: Tue, 26 Oct 2004 09:12:28 +1000 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <200410252255.i9PMto6B024865@falcon10.austin.ibm.com> References: <58cb370e041024054575c09679@mail.gmail.com> <200410252255.i9PMto6B024865@falcon10.austin.ibm.com> Message-ID: <16765.34908.93713.977225@cargo.ozlabs.ibm.com> Doug Maxey writes: > Ok, lets look at those vars. Both are defined in hdreg.h as bytes. > No fields in the data from the device are bytes, but are 16 bit. On big > endian, the relative positions for an LE u16 are swapped. If the swap is > not done on those, then one replaces the other when read. Probably not > what was intended. It appears that another bug is being fixed here. No. The only sane way to do things is to transfer data from the device to memory as a byte stream, in other words, preserving the ordering of the individual bytes. That is what we do on PPC and PPC64 platforms. That ordering is preserved (and must be preserved) irrespective of whether the transfer is actually done in 8, 16 or 32 bit chunks. That means that 16-bit quantities might need to be byte-swapped to be interpreted in host byte order, but single-byte fields should always be in their correct sequence. Paul. From bzolnier at gmail.com Tue Oct 26 09:20:08 2004 From: bzolnier at gmail.com (Bartlomiej Zolnierkiewicz) Date: Tue, 26 Oct 2004 01:20:08 +0200 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <200410252255.i9PMto6B024865@falcon10.austin.ibm.com> References: <58cb370e041024054575c09679@mail.gmail.com> <200410252255.i9PMto6B024865@falcon10.austin.ibm.com> Message-ID: <58cb370e0410251620279fb0ee@mail.gmail.com> On Mon, 25 Oct 2004 17:55:50 -0500, Doug Maxey wrote: > >The dependency is a bug, is for IDE driver only. > > The isd200 _is_ a bridge to ATA/ATAPI devices. Does this mean it cannot use > common code, just because it is not in drivers/ide? no but the common ATA/ATAPI code resides in hdreg.h or/and ata.h, ide.h is for IDE driver _only_ > >Doug, if you kill debugging code in isd200.c then only: > > > >id->command_set_1 > >id->model > >id->fw_rev > >id->capability > >id->lba_capacity > >id->heads > >id->cyls > >id->sectors > >id->command_set_2 > > > >need to be byte-swapped. > > > > I don't plan on killing any debug code. I do :) From dwm at austin.ibm.com Tue Oct 26 09:55:47 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Mon, 25 Oct 2004 18:55:47 -0500 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <16765.34908.93713.977225@cargo.ozlabs.ibm.com> Message-ID: <200410252355.i9PNtlSp025091@falcon10.austin.ibm.com> On Tue, 26 Oct 2004 09:12:28 +1000, Paul Mackerras wrote: >Doug Maxey writes: > >> Ok, lets look at those vars. Both are defined in hdreg.h as bytes. >> No fields in the data from the device are bytes, but are 16 bit. On big >> endian, the relative positions for an LE u16 are swapped. If the swap is >> not done on those, then one replaces the other when read. Probably not >> what was intended. It appears that another bug is being fixed here. > >No. The only sane way to do things is to transfer data from the >device to memory as a byte stream, in other words, preserving the >ordering of the individual bytes. That is what we do on PPC and PPC64 >platforms. That ordering is preserved (and must be preserved) >irrespective of whether the transfer is actually done in 8, 16 or 32 >bit chunks. Oh yes, I am aware. Just happen to be working on PPC64. Have been writing drivers for this base for several years. It's the olde LE device vs BE host. The transfers are done as a 16 bit quantity, PIO. And yes, I understand, "we have always done it this way". Works well when you only have to deal with single arch. Possibly I am not making point very well, that one is preserving the correct byte order and let the structures reflect to native location. Strings get swapped, 16, 32, and 64 bit fields likewise. I just missed the LE order that is is being preserved for *some* few fields only. > >That means that 16-bit quantities might need to be byte-swapped to be >interpreted in host byte order, but single-byte fields should always >be in their correct sequence. There is not a single reference to byte field in the ATA spec for IDENTIFY DEVICE. It just happens that some of the fields are 8 bits long. Or 32 or 64. > >Paul. > ++doug From paulus at samba.org Tue Oct 26 12:05:29 2004 From: paulus at samba.org (Paul Mackerras) Date: Tue, 26 Oct 2004 12:05:29 +1000 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <200410252355.i9PNtlSp025091@falcon10.austin.ibm.com> References: <16765.34908.93713.977225@cargo.ozlabs.ibm.com> <200410252355.i9PNtlSp025091@falcon10.austin.ibm.com> Message-ID: <16765.45289.684525.732044@cargo.ozlabs.ibm.com> Doug Maxey writes: > Oh yes, I am aware. Just happen to be working on PPC64. Have been > writing drivers for this base for several years. It's the olde LE For Linux or some other OS? > device vs BE host. The transfers are done as a 16 bit quantity, PIO. > And yes, I understand, "we have always done it this way". Works well > when you only have to deal with single arch. No, we haven't always done it this way on PPC. :) Various different ways have been tried over the years and this is the only way that doesn't suck. > Possibly I am not making point very well, that one is preserving the > correct byte order and let the structures reflect to native location. I can't parse that sentence unambiguously... > Strings get swapped, 16, 32, and 64 bit fields likewise. I just missed the > LE order that is is being preserved for *some* few fields only. Strings shouldn't get swapped, or at least, strings should only need to be swapped on a BE platform if they also need to be swapped on an LE platform. > There is not a single reference to byte field in the ATA spec for > IDENTIFY DEVICE. It just happens that some of the fields are 8 bits long. Or > 32 or 64. And your point is... ? Paul. From benh at kernel.crashing.org Tue Oct 26 17:16:46 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 26 Oct 2004 17:16:46 +1000 Subject: problems with iommu_free_table() In-Reply-To: <1097171661.7087.1.camel@sinatra.austin.ibm.com> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> Message-ID: <1098775007.6916.10.camel@gaston> On Thu, 2004-10-07 at 12:54 -0500, John Rose wrote: > The patch below creates iommu_free_table(). Iommu tables are not currently > freed in PPC64. This could cause a memory leak for DLPAR of an EADS slot. The > function verifies that there are no outstanding TCE entries for the range of > the table before freeing it. I added a call to iommu_free_table() to the code > that dynamically removes a device node. This should be fairly symmetrical with > the table allocation, which happens during dynamic addition of a device node. Ouch, I should have commented earlier... now it went in and has problems: - It breaks build without CONFIG_PPC_PSERIES (try a pmac-only build). There is, more generally, a tendency at calling things in pSeries_iommu.c with the prefix "iommu_" without any mention of "pSeries" in the name. Hey guys ! pSeries isn't alone anymore ! So please call those pSeries-specific things pSeries_* or tce_* or whatever, but don't add back confusion where I had such a hard time splitting things. - It seems that any call to of_remove_node() will call iommu_free_table() on np->iommu_table. That sounds bad. The iommu_table pointer is copied at init time from the parent to all child nodes. So if we add a phb, and then remove a device from that bus, we end up disposing of the phb's iommu table ... I'll send a patch fixing G5 build by renaming iommu_free_table to tce_free_table() and putting the call in #ifdef CONFIG_PPC_PSERIES for now, but if you start hooking too much between prom.c and the higher level, you should start thinking about doing things differently. That is have of_remove_node() stay what it should have been from the beginning: a low level function removing the node and just that, and have the _caller_ to the grunt work of knowing what else need to be removed/freed/etc... Ben. From benh at kernel.crashing.org Tue Oct 26 17:21:26 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 26 Oct 2004 17:21:26 +1000 Subject: problems with iommu_free_table() In-Reply-To: <1098775007.6916.10.camel@gaston> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> <1098775007.6916.10.camel@gaston> Message-ID: <1098775287.6898.14.camel@gaston> On Tue, 2004-10-26 at 17:16 +1000, Benjamin Herrenschmidt wrote: > I'll send a patch fixing G5 build by renaming iommu_free_table to > tce_free_table() and putting the call in #ifdef CONFIG_PPC_PSERIES for > now, but if you start hooking too much between prom.c and the higher > level, you should start thinking about doing things differently. That is > have of_remove_node() stay what it should have been from the beginning: > a low level function removing the node and just that, and have the > _caller_ to the grunt work of knowing what else need to be > removed/freed/etc... Ok, I'm keeping the name for now, just doing ifdef's plus adding a fat comment to iommu.h If you want iommu's in general to have the ability to add/remove tables, then those calls (iommu_devnode_init, iommu_free_table, ...) should end up beeing ppc_md. hooks so the actual implementation of the iommu knows how to deal with them. If that is to remain a pSeries-only API (I don't mind at this point), then rename those to tce_* something or pSeries_iommu_* or whatever making it clear they are pSeries only, and be careful of not breaking build with non-pSeries. Ben. From ananth at in.ibm.com Tue Oct 26 18:47:38 2004 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 26 Oct 2004 14:17:38 +0530 Subject: [PATCH] Kprobes for ppc64 In-Reply-To: <16762.5108.282382.603502@cargo.ozlabs.ibm.com> References: <20041018095229.GA7394@in.ibm.com> <16762.5108.282382.603502@cargo.ozlabs.ibm.com> Message-ID: <20041026084738.GA7425@in.ibm.com> On Sat, Oct 23, 2004 at 06:19:00PM +1000, Paul Mackerras wrote: > Ananth N Mavinakayanahalli writes: > > > 2. arch_prepare_kprobe() now returns an int. I have made the necessary > > changes to i386 and sparc64 kprobes files, but is untested. > > Are you going to send this upstream? Prasanna has a set of changes which he will be pushing to akpm shortly. This will be part of the set. > > + * Interrupts are disabled on entry as trap3 is an interrupt gate and they > > + * remain disabled thorough out this function. > > + */ > > +static inline int kprobe_handler(struct pt_regs *regs) > > Comments about "trap3" and "interrupt gate" don't help me understand > this function on ppc64. :) At present interrupts are enabled in a > program check exception handler but disabled in a single-step handler. > When does this function get called? Ah, I missed the comment .. my bad :( kprobe_handler() gets invoked from ProgramCheckException(). > > @@ -96,6 +97,9 @@ int do_page_fault(struct pt_regs *regs, > > BUG_ON((trap == 0x380) || (trap == 0x480)); > > > > if (trap == 0x300) { > > + if (notify_die(DIE_PAGE_FAULT, "page_fault", regs, error_code, > > + 11, SIGSEGV) == NOTIFY_STOP) > > + return 0; > > Hmmm, this seems a bit heavyweight for adding to the page fault path. > Have you done any benchmarks with vs. without kprobes? Hmm no, not yet. > On the whole the patch looks OK. I haven't checked the kprobe_handler > code to see if I think it's all SMP- and preempt-safe, but I assume > you have done it similarly on x86 and checked it there. Yes - the port is based off the initial x86 code. It is SMP and preempt safe. Thanks for your comments! I will rework the patch a bit and post it soon. Thanks, Ananth From olof at austin.ibm.com Tue Oct 26 23:45:46 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 26 Oct 2004 08:45:46 -0500 Subject: problems with iommu_free_table() In-Reply-To: <1098775007.6916.10.camel@gaston> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> <1098775007.6916.10.camel@gaston> Message-ID: <417E550A.1040400@austin.ibm.com> Benjamin Herrenschmidt wrote: > Ouch, I should have commented earlier... now it went in and has > problems: > > - It breaks build without CONFIG_PPC_PSERIES (try a pmac-only build). > There is, more generally, a tendency at calling things in > pSeries_iommu.c with the prefix "iommu_" without any mention of > "pSeries" in the name. Hey guys ! pSeries isn't alone anymore ! So > please call those pSeries-specific things pSeries_* or tce_* or > whatever, but don't add back confusion where I had such a hard time > splitting things. Actually, you're wrong. :) It's not pSeries-specific, see below. > - It seems that any call to of_remove_node() will call > iommu_free_table() on np->iommu_table. That sounds bad. The iommu_table > pointer is copied at init time from the parent to all child nodes. So if > we add a phb, and then remove a device from that bus, we end up > disposing of the phb's iommu table ... Yep, you're right. There's two ways to fix this: Add reference counting to the iommu tables and do automatic deallocation, or only delete the tables for PHB deallocation. The second option would be preferred, since it should be the right way to solve the layering violation. > I'll send a patch fixing G5 build by renaming iommu_free_table to > tce_free_table() and putting the call in #ifdef CONFIG_PPC_PSERIES for > now, This is the wrong solution. iommu_free_table is a companion to iommu_init_table, and it _is_ generic code, it just ended up in the wrong file (I didn't catch that myself, sorry about that). -Olof From johnrose at austin.ibm.com Wed Oct 27 02:41:35 2004 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 26 Oct 2004 11:41:35 -0500 Subject: [PATCH] ppc64: Fix g5-only build In-Reply-To: <1098775712.6897.17.camel@gaston> References: <1098775712.6897.17.camel@gaston> Message-ID: <1098808895.32293.23.camel@sinatra.austin.ibm.com> Forgive me for the cross-post, but I'm trying to answer two list messages on the same topic. I think it's more productive to just fix the bug than to commit a giant comment pointing out a small bug, so I've attached an alternate fix (build tested for g5 :). > - It breaks build without CONFIG_PPC_PSERIES (try a pmac-only build). > There is, more generally, a tendency at calling things in > pSeries_iommu.c with the prefix "iommu_" without any mention of > "pSeries" in the name. Hey guys ! pSeries isn't alone anymore ! So > please call those pSeries-specific things pSeries_* or tce_* or > whatever, but don't add back confusion where I had such a hard time > splitting things. Apologies for the build break. I mistakenly placed the function in a pSeries file. In our view, this is a generic function, complementary to iommu_init_table(), so I've moved it to iommu.c. > - It seems that any call to of_remove_node() will call > iommu_free_table() on np->iommu_table. That sounds bad. The iommu_table > pointer is copied at init time from the parent to all child nodes. So if > we add a phb, and then remove a device from that bus, we end up > disposing of the phb's iommu table ... Good catch, although table allocation doesn't always happen at the PHB level. On POWER5, it happens at the EADS level. My fix checks for the ibm,dma-window property before calling the free function. This is the criterion for which the table is alloc'ed in the first place. Thanks- John Signed-off-by: John Rose diff -Nru a/arch/ppc64/kernel/iommu.c b/arch/ppc64/kernel/iommu.c --- a/arch/ppc64/kernel/iommu.c Tue Oct 26 11:36:42 2004 +++ b/arch/ppc64/kernel/iommu.c Tue Oct 26 11:36:42 2004 @@ -425,6 +425,39 @@ return tbl; } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} + /* Creates TCEs for a user provided buffer. The user buffer must be * contiguous real kernel storage (not vmalloc). The address of the buffer * passed here is the kernel (virtual) address of the buffer. The buffer diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c --- a/arch/ppc64/kernel/pSeries_iommu.c Tue Oct 26 11:36:42 2004 +++ b/arch/ppc64/kernel/pSeries_iommu.c Tue Oct 26 11:36:42 2004 @@ -412,39 +412,6 @@ dn->iommu_table = iommu_init_table(tbl); } -void iommu_free_table(struct device_node *dn) -{ - struct iommu_table *tbl = dn->iommu_table; - unsigned long bitmap_sz, i; - unsigned int order; - - if (!tbl || !tbl->it_map) { - printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, - dn->full_name); - return; - } - - /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { - if (tbl->it_map[i] != 0) { - printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", - __FUNCTION__, dn->full_name); - break; - } - } - - /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; - - /* free bitmap */ - order = get_order(bitmap_sz); - free_pages((unsigned long) tbl->it_map, order); - - /* free table */ - kfree(tbl); -} - void iommu_setup_pSeries(void) { struct pci_dev *dev = NULL; diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Tue Oct 26 11:36:42 2004 +++ b/arch/ppc64/kernel/prom.c Tue Oct 26 11:36:42 2004 @@ -1818,8 +1818,9 @@ return -EBUSY; } - if (np->iommu_table) + if ((np->iommu_table) && get_property(np, "ibm,dma-window", NULL)) { iommu_free_table(np); + } write_lock(&devtree_lock); OF_MARK_STALE(np); From johnrose at austin.ibm.com Wed Oct 27 04:03:01 2004 From: johnrose at austin.ibm.com (John Rose) Date: Tue, 26 Oct 2004 13:03:01 -0500 Subject: [PATCH] iommu fixes, round 2 In-Reply-To: <1098808895.32293.23.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> Message-ID: <1098813781.32293.40.camel@sinatra.austin.ibm.com> Ben's patch went in before my note went out, so please disregard my previous post. All this might have been more easily addressed in one note on one list. As opposed to posting three msgs on two lists and committing a patch that creates code comments on proposed reorgs. Might I humbly request that our patches/reorg ideas sit on the ppc64 list for a bit before pushing to Linus? Here's a patch that fixes the original build break, and removes the ifdef's and comments that were added by Ben's patch. We feel that iommu_free_table() is generic so we've moved it to iommu.c. This fixes the build break (sorry g5 :). It's complementary to iommu_init_table(), which is generic. Secondly, the attempt to free the table in of_remove_node() is "as symmetric as possible" with of_finish_node_dynamic(), where the table is allocated. If that's what the comment means by layering violation, I humbly disagree. Thirdly, iommu_devnode_init() also has an iSeries implementation, so it's not pSeries-specific. No need to rename it, as suggested in the comment. Thanks- John Signed-off-by: John Rose diff -Nru a/arch/ppc64/kernel/iommu.c b/arch/ppc64/kernel/iommu.c --- a/arch/ppc64/kernel/iommu.c Tue Oct 26 12:51:42 2004 +++ b/arch/ppc64/kernel/iommu.c Tue Oct 26 12:51:42 2004 @@ -425,6 +425,39 @@ return tbl; } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} + /* Creates TCEs for a user provided buffer. The user buffer must be * contiguous real kernel storage (not vmalloc). The address of the buffer * passed here is the kernel (virtual) address of the buffer. The buffer diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c --- a/arch/ppc64/kernel/pSeries_iommu.c Tue Oct 26 12:51:42 2004 +++ b/arch/ppc64/kernel/pSeries_iommu.c Tue Oct 26 12:51:42 2004 @@ -412,39 +412,6 @@ dn->iommu_table = iommu_init_table(tbl); } -void iommu_free_table(struct device_node *dn) -{ - struct iommu_table *tbl = dn->iommu_table; - unsigned long bitmap_sz, i; - unsigned int order; - - if (!tbl || !tbl->it_map) { - printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, - dn->full_name); - return; - } - - /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { - if (tbl->it_map[i] != 0) { - printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", - __FUNCTION__, dn->full_name); - break; - } - } - - /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; - - /* free bitmap */ - order = get_order(bitmap_sz); - free_pages((unsigned long) tbl->it_map, order); - - /* free table */ - kfree(tbl); -} - void iommu_setup_pSeries(void) { struct pci_dev *dev = NULL; diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Tue Oct 26 12:51:42 2004 +++ b/arch/ppc64/kernel/prom.c Tue Oct 26 12:51:42 2004 @@ -1818,13 +1818,8 @@ return -EBUSY; } - /* XXX This is a layering violation, should be moved to the caller - * --BenH. - */ -#ifdef CONFIG_PPC_PSERIES - if (np->iommu_table) + if ((np->iommu_table) && get_property(np, "ibm,dma-window", NULL)) iommu_free_table(np); -#endif /* CONFIG_PPC_PSERIES */ write_lock(&devtree_lock); OF_MARK_STALE(np); diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h --- a/include/asm-ppc64/iommu.h Tue Oct 26 12:51:42 2004 +++ b/include/asm-ppc64/iommu.h Tue Oct 26 12:51:42 2004 @@ -111,17 +111,9 @@ extern void iommu_setup_u3(void); /* Creates table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ extern void iommu_devnode_init(struct device_node *dn); /* Frees table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ extern void iommu_free_table(struct device_node *dn); #endif /* CONFIG_PPC_MULTIPLATFORM */ From dwm at austin.ibm.com Wed Oct 27 04:05:12 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Tue, 26 Oct 2004 13:05:12 -0500 Subject: [PATCH 1/1] build modular usb isd200 with modular ide In-Reply-To: <16765.45289.684525.732044@cargo.ozlabs.ibm.com> Message-ID: <200410261805.i9QI5CKM029850@falcon10.austin.ibm.com> On Tue, 26 Oct 2004 12:05:29 +1000, Paul Mackerras wrote: >For Linux or some other OS? Linux for little over a year, 2.6 for about 3 months, some _other_ OS for several years. > >> device vs BE host. The transfers are done as a 16 bit quantity, PIO. >> And yes, I understand, "we have always done it this way". Works well >> when you only have to deal with single arch. > >No, we haven't always done it this way on PPC. :) Various different >ways have been tried over the years and this is the only way that >doesn't suck. > >> Possibly I am not making point very well, that one is preserving the >> correct byte order and let the structures reflect to native location. > >I can't parse that sentence unambiguously... s/to native location/the normalized (for the host) layout/ > >> Strings get swapped, 16, 32, and 64 bit fields likewise. I just missed the >> LE order that is is being preserved for *some* few fields only. > >Strings shouldn't get swapped, or at least, strings should only need >to be swapped on a BE platform if they also need to be swapped on an >LE platform. > >> There is not a single reference to byte field in the ATA spec for >> IDENTIFY DEVICE. It just happens that some of the fields are 8 bits long. Or >> 32 or 64. > >And your point is... ? To me, and I do seem to be in the minority, it seems that normalizing the entire bytestream is the right thing (tm). But I can see the point that leaving certain parts non-normalized is cheaper. It was my mistake missing the use of the char fields. GIITD. In any event, with 2.6.10-rc1 the problem seems to be solved in spite of my meddling. :-) ++doug From benh at kernel.crashing.org Wed Oct 27 09:32:35 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 09:32:35 +1000 Subject: problems with iommu_free_table() In-Reply-To: <417E550A.1040400@austin.ibm.com> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> <1098775007.6916.10.camel@gaston> <417E550A.1040400@austin.ibm.com> Message-ID: <1098833556.6916.44.camel@gaston> On Tue, 2004-10-26 at 08:45 -0500, Olof Johansson wrote: > Benjamin Herrenschmidt wrote: > Actually, you're wrong. :) It's not pSeries-specific, see below. Well, it's implemented in pSeries_iommu.c ... > Yep, you're right. There's two ways to fix this: Add reference counting > to the iommu tables and do automatic deallocation, or only delete the > tables for PHB deallocation. The second option would be preferred, since > it should be the right way to solve the layering violation. Agreed. > > I'll send a patch fixing G5 build by renaming iommu_free_table to > > tce_free_table() and putting the call in #ifdef CONFIG_PPC_PSERIES for > > now, > > This is the wrong solution. iommu_free_table is a companion to > iommu_init_table, and it _is_ generic code, it just ended up in the > wrong file (I didn't catch that myself, sorry about that). It's the right fix for now until you or John do something better :) Besides, I don't fully agree with iommu_free_table() beeing the 'pending' of iommu_init_table() since it does kfree etc... it makes assumptions on how the caller allocated the tables... not _that_ bad but don't even try calling that on the U3 ones :) From pbadari at us.ibm.com Wed Oct 27 09:33:18 2004 From: pbadari at us.ibm.com (Badari Pulavarty) Date: 26 Oct 2004 16:33:18 -0700 Subject: 2.6.9 iommu_alloc failures on PPC64 Message-ID: <1098833598.20643.116.camel@dyn318077bld.beaverton.ibm.com> Hi, When I run IO tests with 2.6.9 kernel on PPC64, I get hundreds of following messages and eventually get OOPS from qlogic driver. Is this a known problems ? BTW, this happens only with JFS not ext3. Thanks, Badari iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000000d1c80000 npages 10 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000000b56c0000 npages 10 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000000de3b8000 npages 8 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000001b16e8000 npages 6 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000000ddd88000 npages 8 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000001aead0000 npages e iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000001bd2a0000 npages 6 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000001b89e0000 npages e iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000001b8350000 npages 6 iommu_alloc failed, tbl c0000000e3fe1f00 vaddr c0000000cc440000 npages 10 From benh at kernel.crashing.org Wed Oct 27 09:46:57 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 09:46:57 +1000 Subject: [PATCH] iommu fixes, round 2 In-Reply-To: <1098813781.32293.40.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> Message-ID: <1098834417.6916.62.camel@gaston> > We feel that iommu_free_table() is generic so we've moved it to iommu.c. > This fixes the build break (sorry g5 :). It's complementary to > iommu_init_table(), which is generic. > > Secondly, the attempt to free the table in of_remove_node() is "as > symmetric as possible" with of_finish_node_dynamic(), where the table is > allocated. If that's what the comment means by layering violation, I > humbly disagree. Nope. All the "finish" node routines are high level routines that parse the device-tree to fill various additional things in the device nodes. There are some remote plans of getting rid of them in the long run... of_remove_node() is a low level routine that is responsible for removing the node from the tree, and dealing with the /proc things, and that should be all. If you want to keep the iommu removal in prom.c, then you should create an of_finish_dynamic_node() or something like that, that does that kind of high level stuff before calling of_remove_node(). > Thirdly, iommu_devnode_init() also has an iSeries implementation, so > it's not pSeries-specific. No need to rename it, as suggested in the > comment. Then let's move it, but the fact that it does a kfree() and that sort of things means it actually makes assumptions on how the iommu table was allocated in the first place, which is not under control of the generic code at this point. From olof at austin.ibm.com Wed Oct 27 09:56:41 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 26 Oct 2004 18:56:41 -0500 Subject: problems with iommu_free_table() In-Reply-To: <1098833556.6916.44.camel@gaston> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> <1098775007.6916.10.camel@gaston> <417E550A.1040400@austin.ibm.com> <1098833556.6916.44.camel@gaston> Message-ID: <417EE439.3080206@austin.ibm.com> Benjamin Herrenschmidt wrote: > It's the right fix for now until you or John do something better :) > Besides, I don't fully agree with iommu_free_table() beeing the > 'pending' of iommu_init_table() since it does kfree etc... it makes > assumptions on how the caller allocated the tables... not _that_ bad but > don't even try calling that on the U3 ones :) Right, we discussed it today. The whole iommu table init code flow is a bit awkward today with alloc/setup/init. It's nonintuitive. :( -Olof From benh at kernel.crashing.org Wed Oct 27 09:49:43 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 09:49:43 +1000 Subject: [PATCH] ppc64: Fix g5-only build In-Reply-To: <1098808895.32293.23.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> Message-ID: <1098834583.6898.64.camel@gaston> On Tue, 2004-10-26 at 11:41 -0500, John Rose wrote: > Forgive me for the cross-post, but I'm trying to answer two list > messages on the same topic. I think it's more productive to just fix > the bug than to commit a giant comment pointing out a small bug, so I've > attached an alternate fix (build tested for g5 :). I replied the other list, let's stop this thread here. Ben. From benh at kernel.crashing.org Wed Oct 27 09:56:55 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 09:56:55 +1000 Subject: List message size limit Message-ID: <1098835015.6917.69.camel@gaston> Hi ! The limit of messages sizes on this list is about 40k. This is too small for a lot of patches. I'd like it to be pumped to 128k or even more, though that needs to be discussed first in case a majority of subscribers disagree.. Ben. From benh at kernel.crashing.org Wed Oct 27 10:05:31 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 10:05:31 +1000 Subject: 2.6.9 iommu_alloc failures on PPC64 In-Reply-To: <1098833598.20643.116.camel@dyn318077bld.beaverton.ibm.com> References: <1098833598.20643.116.camel@dyn318077bld.beaverton.ibm.com> Message-ID: <1098835531.6917.76.camel@gaston> On Tue, 2004-10-26 at 16:33 -0700, Badari Pulavarty wrote: > Hi, > > When I run IO tests with 2.6.9 kernel on PPC64, I get hundreds of > following messages and eventually get OOPS from qlogic driver. > Is this a known problems ? > > BTW, this happens only with JFS not ext3. I suppose JFS is flooding the driver with so many large requests that the small table on your machine gets full (what machine is this precisely ?). The qlogic driver should be fixed to handle iommu failures more gracefully. It should be possible in most cases to just wait for pending IOs to complete & try again. I don't know if it's possible to ask the upper layer to breakup the request. Ben. From jk at ozlabs.org Wed Oct 27 10:10:44 2004 From: jk at ozlabs.org (Jeremy Kerr) Date: Wed, 27 Oct 2004 10:10:44 +1000 Subject: List message size limit In-Reply-To: <1098835015.6917.69.camel@gaston> References: <1098835015.6917.69.camel@gaston> Message-ID: <200410271010.45081.jk@ozlabs.org> Hi all, > The limit of messages sizes on this list is about 40k. This is too small > for a lot of patches. I'd like it to be pumped to 128k or even more, > though that needs to be discussed first in case a majority of > subscribers disagree.. Just a side-note here: patches that are provided by a URL (ie, those too large to be attached) will not be picked up by the patch tracking system at present. However, I could extend it to check URLs that appear in a message, possibly with some special syntax to let the parser know that it should follow the link (to reduce unnecessary downloads). Any suggestions? Jeremy From olof at austin.ibm.com Wed Oct 27 11:02:13 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 26 Oct 2004 20:02:13 -0500 Subject: List message size limit In-Reply-To: <200410271010.45081.jk@ozlabs.org> References: <1098835015.6917.69.camel@gaston> <200410271010.45081.jk@ozlabs.org> Message-ID: <20041027010213.GA23655@4> On Wed, Oct 27, 2004 at 10:10:44AM +1000, Jeremy Kerr wrote: > Just a side-note here: patches that are provided by a URL (ie, those too large > to be attached) will not be picked up by the patch tracking system at > present. However, I could extend it to check URLs that appear in a message, > possibly with some special syntax to let the parser know that it should > follow the link (to reduce unnecessary downloads). > > Any suggestions? I say let's just up the size high enough for it to not be a concern (1MB?). The amount of spam coming across is very low (none as far as I've been able to tell), and if turns out to be a problem it can be lowered again. -Olof From pbadari at us.ibm.com Wed Oct 27 10:10:50 2004 From: pbadari at us.ibm.com (Badari Pulavarty) Date: 26 Oct 2004 17:10:50 -0700 Subject: 2.6.9 iommu_alloc failures on PPC64 In-Reply-To: <1098835531.6917.76.camel@gaston> References: <1098833598.20643.116.camel@dyn318077bld.beaverton.ibm.com> <1098835531.6917.76.camel@gaston> Message-ID: <1098835849.20643.122.camel@dyn318077bld.beaverton.ibm.com> On Tue, 2004-10-26 at 17:05, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-26 at 16:33 -0700, Badari Pulavarty wrote: > > Hi, > > > > When I run IO tests with 2.6.9 kernel on PPC64, I get hundreds of > > following messages and eventually get OOPS from qlogic driver. > > Is this a known problems ? > > > > BTW, this happens only with JFS not ext3. > > I suppose JFS is flooding the driver with so many large requests that > the small table on your machine gets full (what machine is this > precisely ?). Its my latest P-570. Thanks, Badari From benh at kernel.crashing.org Wed Oct 27 12:05:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 27 Oct 2004 12:05:53 +1000 Subject: List message size limit In-Reply-To: <20041027010213.GA23655@4> References: <1098835015.6917.69.camel@gaston> <200410271010.45081.jk@ozlabs.org> <20041027010213.GA23655@4> Message-ID: <1098842753.610.1.camel@gaston> On Tue, 2004-10-26 at 20:02 -0500, Olof Johansson wrote: > On Wed, Oct 27, 2004 at 10:10:44AM +1000, Jeremy Kerr wrote: > > > Just a side-note here: patches that are provided by a URL (ie, those too large > > to be attached) will not be picked up by the patch tracking system at > > present. However, I could extend it to check URLs that appear in a message, > > possibly with some special syntax to let the parser know that it should > > follow the link (to reduce unnecessary downloads). > > > > Any suggestions? > > I say let's just up the size high enough for it to not be a concern > (1MB?). The amount of spam coming across is very low (none as far as > I've been able to tell), and if turns out to be a problem it can be > lowered again. 1Mb is probably too big for archives, and even my monster patch was only about 350K :) I think 256K would be a good limit. (Let's start the who gets the best random number game now :) Ben. From dhowells at redhat.com Wed Oct 27 19:47:08 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 27 Oct 2004 10:47:08 +0100 Subject: List message size limit In-Reply-To: <20041027010213.GA23655@4> References: <20041027010213.GA23655@4> <1098835015.6917.69.camel@gaston> <200410271010.45081.jk@ozlabs.org> Message-ID: <26685.1098870428@redhat.com> > > I say let's just up the size high enough for it to not be a concern > (1MB?). The amount of spam coming across is very low (none as far as > I've been able to tell), and if turns out to be a problem it can be > lowered again. Make the limit larger only for list subscribees. David From pbadari at us.ibm.com Thu Oct 28 01:31:34 2004 From: pbadari at us.ibm.com (Badari Pulavarty) Date: 27 Oct 2004 08:31:34 -0700 Subject: 2.6.9 iommu_alloc failures on PPC64 In-Reply-To: <1098835531.6917.76.camel@gaston> References: <1098833598.20643.116.camel@dyn318077bld.beaverton.ibm.com> <1098835531.6917.76.camel@gaston> Message-ID: <1098891094.20643.134.camel@dyn318077bld.beaverton.ibm.com> Ben, SLES9 seems to work fine, which has qlogic driver version 8.00.00b14 2.6.9 with qlogic driver version 8.00.00b15-k is having problems. FYI. Thanks, Badari On Tue, 2004-10-26 at 17:05, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-26 at 16:33 -0700, Badari Pulavarty wrote: > > Hi, > > > > When I run IO tests with 2.6.9 kernel on PPC64, I get hundreds of > > following messages and eventually get OOPS from qlogic driver. > > Is this a known problems ? > > > > BTW, this happens only with JFS not ext3. > > I suppose JFS is flooding the driver with so many large requests that > the small table on your machine gets full (what machine is this > precisely ?). > > The qlogic driver should be fixed to handle iommu failures more > gracefully. It should be possible in most cases to just wait for pending > IOs to complete & try again. I don't know if it's possible to ask the > upper layer to breakup the request. > > Ben. > > > From hollisb at us.ibm.com Wed Oct 27 23:38:08 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Wed, 27 Oct 2004 13:38:08 +0000 Subject: [resend patch] HVSI early boot console Message-ID: <1098884287.3486.5.camel@localhost> Hi Linus, I've retested this with the current BK tree as you requested. This patch adds support for the udbg early console interfaces when using an HVSI console. Please apply. Signed-off-by: Hollis Blanchard -- Hollis Blanchard IBM Linux Technology Center --- arch/ppc64/kernel/pSeries_lpar.c.orig Tue Sep 21 23:40:30 2004 +++ arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 10:52:23 2004 @@ -59,6 +59,74 @@ int vtermno; /* virtual terminal# for udbg */ +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long)))) +static void udbg_hvsi_putc(unsigned char c) +{ + /* packet's seqno isn't used anyways */ + uint8_t packet[] __ALIGNED__ = { 0xff, 5, 0, 0, c }; + int rc; + + if (c == '\n') + udbg_hvsi_putc('\r'); + + do { + rc = plpar_put_term_char(vtermno, sizeof(packet), packet); + } while (rc == H_Busy); +} + +static long hvsi_udbg_buf_len; +static uint8_t hvsi_udbg_buf[256]; + +static int udbg_hvsi_getc_poll(void) +{ + unsigned char ch; + int rc, i; + + if (hvsi_udbg_buf_len == 0) { + rc = plpar_get_term_char(vtermno, &hvsi_udbg_buf_len, hvsi_udbg_buf); + if (rc != H_Success || hvsi_udbg_buf[0] != 0xff) { + /* bad read or non-data packet */ + hvsi_udbg_buf_len = 0; + } else { + /* remove the packet header */ + for (i = 4; i < hvsi_udbg_buf_len; i++) + hvsi_udbg_buf[i-4] = hvsi_udbg_buf[i]; + hvsi_udbg_buf_len -= 4; + } + } + + if (hvsi_udbg_buf_len <= 0 || hvsi_udbg_buf_len > 256) { + /* no data ready */ + hvsi_udbg_buf_len = 0; + return -1; + } + + ch = hvsi_udbg_buf[0]; + /* shift remaining data down */ + for (i = 1; i < hvsi_udbg_buf_len; i++) { + hvsi_udbg_buf[i-1] = hvsi_udbg_buf[i]; + } + hvsi_udbg_buf_len--; + + return ch; +} + +static unsigned char udbg_hvsi_getc(void) +{ + int ch; + for (;;) { + ch = udbg_hvsi_getc_poll(); + if (ch == -1) { + /* This shouldn't be needed...but... */ + volatile unsigned long delay; + for (delay=0; delay < 2000000; delay++) + ; + } else { + return ch; + } + } +} + static void udbg_putcLP(unsigned char c) { char buf[16]; @@ -167,11 +235,15 @@ ppc_md.udbg_getc_poll = udbg_getc_pollLP; found = 1; } - } else { - /* XXX implement udbg_putcLP_vtty for hvterm-protocol1 case */ - printk(KERN_WARNING "%s doesn't speak hvterm1; " - "can't print udbg messages\n", - stdout_node->full_name); + } else if (device_is_compatible(stdout_node, "hvterm-protocol")) { + termno = (u32 *)get_property(stdout_node, "reg", NULL); + if (termno) { + vtermno = termno[0]; + ppc_md.udbg_putc = udbg_hvsi_putc; + ppc_md.udbg_getc = udbg_hvsi_getc; + ppc_md.udbg_getc_poll = udbg_hvsi_getc_poll; + found = 1; + } } } else if (strncmp(name, "serial", 6)) { /* XXX fix ISA serial console */ From hollisb at us.ibm.com Wed Oct 27 23:40:04 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Wed, 27 Oct 2004 13:40:04 +0000 Subject: [resend patch] HVSI reset support Message-ID: <1098884404.3484.11.camel@localhost> Hi Linus, I've retested this with current BK as you requested. This patch adds support for when the service processor (the other end of the console) resets due to a critical error; we can resume the connection when it comes back. Please apply. Signed-off-by: Hollis Blanchard -- Hollis Blanchard IBM Linux Technology Center --- drivers/char/hvsi.c.orig Mon Sep 13 19:23:15 2004 +++ drivers/char/hvsi.c Wed Oct 20 17:10:34 2004 @@ -29,11 +29,6 @@ * the OS cannot change the speed of the port through this protocol. */ -/* TODO: - * test FSP reset - * add udbg support for xmon/kdb - */ - #undef DEBUG #include @@ -54,6 +49,7 @@ #include #include #include +#include #define HVSI_MAJOR 229 #define HVSI_MINOR 128 @@ -74,6 +70,7 @@ struct hvsi_struct { struct work_struct writer; + struct work_struct handshaker; wait_queue_head_t emptyq; /* woken when outbuf is emptied */ wait_queue_head_t stateq; /* woken when HVSI state changes */ spinlock_t lock; @@ -109,6 +106,7 @@ HVSI_WAIT_FOR_VER_QUERY, HVSI_OPEN, HVSI_WAIT_FOR_MCTRL_RESPONSE, + HVSI_FSP_DIED, }; #define HVSI_CONSOLE 0x1 @@ -172,6 +170,13 @@ } u; } __attribute__((packed)); + + +static inline int is_console(struct hvsi_struct *hp) +{ + return hp->flags & HVSI_CONSOLE; +} + static inline int is_open(struct hvsi_struct *hp) { /* if we're waiting for an mctrl then we're already open */ @@ -188,6 +193,7 @@ "HVSI_WAIT_FOR_VER_QUERY", "HVSI_OPEN", "HVSI_WAIT_FOR_MCTRL_RESPONSE", + "HVSI_FSP_DIED", }; const char *name = state_names[hp->state]; @@ -296,14 +302,9 @@ return 0; } -/* - * we can't call tty_hangup() directly here because we need to call that - * outside of our lock - */ -static struct tty_struct *hvsi_recv_control(struct hvsi_struct *hp, - uint8_t *packet) +static void hvsi_recv_control(struct hvsi_struct *hp, uint8_t *packet, + struct tty_struct **to_hangup, struct hvsi_struct **to_handshake) { - struct tty_struct *to_hangup = NULL; struct hvsi_control *header = (struct hvsi_control *)packet; switch (header->verb) { @@ -313,15 +314,14 @@ pr_debug("hvsi%i: CD dropped\n", hp->index); hp->mctrl &= TIOCM_CD; if (!(hp->tty->flags & CLOCAL)) - to_hangup = hp->tty; + *to_hangup = hp->tty; } break; case VSV_CLOSE_PROTOCOL: - printk(KERN_DEBUG - "hvsi%i: service processor closed connection!\n", hp->index); - __set_state(hp, HVSI_CLOSED); - to_hangup = hp->tty; - hp->tty = NULL; + pr_debug("hvsi%i: service processor came back\n", hp->index); + if (hp->state != HVSI_CLOSED) { + *to_handshake = hp; + } break; default: printk(KERN_WARNING "hvsi%i: unknown HVSI control packet: ", @@ -329,8 +329,6 @@ dump_packet(packet); break; } - - return to_hangup; } static void hvsi_recv_response(struct hvsi_struct *hp, uint8_t *packet) @@ -388,8 +386,8 @@ switch (hp->state) { case HVSI_WAIT_FOR_VER_QUERY: - __set_state(hp, HVSI_OPEN); hvsi_version_respond(hp, query->seqno); + __set_state(hp, HVSI_OPEN); break; default: printk(KERN_ERR "hvsi%i: unexpected query: ", hp->index); @@ -467,17 +465,20 @@ * incoming data). */ static int hvsi_load_chunk(struct hvsi_struct *hp, struct tty_struct **flip, - struct tty_struct **hangup) + struct tty_struct **hangup, struct hvsi_struct **handshake) { uint8_t *packet = hp->inbuf; int chunklen; *flip = NULL; *hangup = NULL; + *handshake = NULL; chunklen = hvsi_read(hp, hp->inbuf_end, HVSI_MAX_READ); - if (chunklen == 0) + if (chunklen == 0) { + pr_debug("%s: 0-length read\n", __FUNCTION__); return 0; + } pr_debug("%s: got %i bytes\n", __FUNCTION__, chunklen); dbg_dump_hex(hp->inbuf_end, chunklen); @@ -509,7 +510,7 @@ *flip = hvsi_recv_data(hp, packet); break; case VS_CONTROL_PACKET_HEADER: - *hangup = hvsi_recv_control(hp, packet); + hvsi_recv_control(hp, packet, hangup, handshake); break; case VS_QUERY_RESPONSE_PACKET_HEADER: hvsi_recv_response(hp, packet); @@ -526,8 +527,8 @@ packet += len_packet(packet); - if (*hangup) { - pr_debug("%s: hangup\n", __FUNCTION__); + if (*hangup || *handshake) { + pr_debug("%s: hangup or handshake\n", __FUNCTION__); /* * we need to send the hangup now before receiving any more data. * If we get "data, hangup, data", we can't deliver the second @@ -560,16 +561,15 @@ struct hvsi_struct *hp = (struct hvsi_struct *)arg; struct tty_struct *flip; struct tty_struct *hangup; + struct hvsi_struct *handshake; unsigned long flags; - irqreturn_t handled = IRQ_NONE; int again = 1; pr_debug("%s\n", __FUNCTION__); while (again) { spin_lock_irqsave(&hp->lock, flags); - again = hvsi_load_chunk(hp, &flip, &hangup); - handled = IRQ_HANDLED; + again = hvsi_load_chunk(hp, &flip, &hangup, &handshake); spin_unlock_irqrestore(&hp->lock, flags); /* @@ -587,6 +587,11 @@ if (hangup) { tty_hangup(hangup); } + + if (handshake) { + pr_debug("hvsi%i: attempting re-handshake\n", handshake->index); + schedule_work(&handshake->handshaker); + } } spin_lock_irqsave(&hp->lock, flags); @@ -603,7 +608,7 @@ tty_flip_buffer_push(flip); } - return handled; + return IRQ_HANDLED; } /* for boot console, before the irq handler is running */ @@ -757,6 +762,23 @@ return 0; } +static void hvsi_handshaker(void *arg) +{ + struct hvsi_struct *hp = (struct hvsi_struct *)arg; + + if (hvsi_handshake(hp) >= 0) + return; + + printk(KERN_ERR "hvsi%i: re-handshaking failed\n", hp->index); + if (is_console(hp)) { + /* + * ttys will re-attempt the handshake via hvsi_open, but + * the console will not. + */ + printk(KERN_ERR "hvsi%i: lost console!\n", hp->index); + } +} + static int hvsi_put_chars(struct hvsi_struct *hp, const char *buf, int count) { struct hvsi_data packet __ALIGNED__; @@ -808,6 +830,10 @@ tty->driver_data = hp; tty->low_latency = 1; /* avoid throttle/tty_flip_buffer_push race */ + mb(); + if (hp->state == HVSI_FSP_DIED) + return -EIO; + spin_lock_irqsave(&hp->lock, flags); hp->tty = tty; hp->count++; @@ -815,7 +841,7 @@ h_vio_signal(hp->vtermno, VIO_IRQ_ENABLE); spin_unlock_irqrestore(&hp->lock, flags); - if (hp->flags & HVSI_CONSOLE) + if (is_console(hp)) return 0; /* this has already been handshaked as the console */ ret = hvsi_handshake(hp); @@ -889,7 +915,7 @@ hp->inbuf_end = hp->inbuf; /* discard remaining partial packets */ /* only close down connection if it is not the console */ - if (!(hp->flags & HVSI_CONSOLE)) { + if (!is_console(hp)) { h_vio_signal(hp->vtermno, VIO_IRQ_DISABLE); /* no more irqs */ __set_state(hp, HVSI_CLOSED); /* @@ -943,12 +969,13 @@ return; n = hvsi_put_chars(hp, hp->outbuf, hp->n_outbuf); - if (n != 0) { - /* - * either all data was sent or there was an error, and we throw away - * data on error. - */ + if (n > 0) { + /* success */ + pr_debug("%s: wrote %i chars\n", __FUNCTION__, n); hp->n_outbuf = 0; + } else if (n == -EIO) { + __set_state(hp, HVSI_FSP_DIED); + printk(KERN_ERR "hvsi%i: service processor died\n", hp->index); } } @@ -966,6 +993,19 @@ spin_lock_irqsave(&hp->lock, flags); + pr_debug("%s: %i chars in buffer\n", __FUNCTION__, hp->n_outbuf); + + if (!is_open(hp)) { + /* + * We could have a non-open connection if the service processor died + * while we were busily scheduling ourselves. In that case, it could + * be minutes before the service processor comes back, so only try + * again once a second. + */ + schedule_delayed_work(&hp->writer, HZ); + goto out; + } + hvsi_push(hp); if (hp->n_outbuf > 0) schedule_delayed_work(&hp->writer, 10); @@ -982,6 +1022,7 @@ wake_up_interruptible(&hp->tty->write_wait); } +out: spin_unlock_irqrestore(&hp->lock, flags); } @@ -1022,6 +1063,8 @@ spin_lock_irqsave(&hp->lock, flags); + pr_debug("%s: %i chars in buffer\n", __FUNCTION__, hp->n_outbuf); + if (!is_open(hp)) { /* we're either closing or not yet open; don't accept data */ pr_debug("%s: not open\n", __FUNCTION__); @@ -1294,6 +1337,7 @@ hp = &hvsi_ports[hvsi_count]; INIT_WORK(&hp->writer, hvsi_write_worker, hp); + INIT_WORK(&hp->handshaker, hvsi_handshaker, hp); init_waitqueue_head(&hp->emptyq); init_waitqueue_head(&hp->stateq); hp->lock = SPIN_LOCK_UNLOCKED; From dhowells at redhat.com Thu Oct 28 05:08:41 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 27 Oct 2004 20:08:41 +0100 Subject: [PATCH] Make key management syscalls work on PPC/PPC64 Message-ID: <24857.1098904121@redhat.com> The attached patch permits my key management stuff to be used on PPC, PPC64 and PPC on PPC64. Syscall numbers were allocated by Paul Mackerras. I've updated my keyctl utility to work on PPC/PPC64 too: http://people.redhat.com/~dhowells/keys/keyctl.c Signed-Off-By: David Howells --- warthog>diffstat keys-269bk5.diff arch/ppc/kernel/misc.S | 3 + arch/ppc64/Kconfig | 5 ++ arch/ppc64/kernel/misc.S | 6 +++ arch/ppc64/kernel/sys_ppc32.c | 18 +++++++++ include/asm-ppc/unistd.h | 5 ++ include/asm-ppc64/unistd.h | 5 ++ include/linux/compat.h | 2 + security/keys/Makefile | 1 security/keys/compat.c | 78 ++++++++++++++++++++++++++++++++++++++++++ security/keys/internal.h | 20 ++++++++++ security/keys/keyctl.c | 54 +++++++++++++---------------- 11 files changed, 166 insertions(+), 31 deletions(-) diff -uNrp linux-2.6.9-bk5/arch/ppc/kernel/misc.S linux-2.6.9-bk5-keys/arch/ppc/kernel/misc.S --- linux-2.6.9-bk5/arch/ppc/kernel/misc.S 2004-10-19 10:41:46.000000000 +0100 +++ linux-2.6.9-bk5-keys/arch/ppc/kernel/misc.S 2004-10-22 10:27:40.000000000 +0100 @@ -1447,3 +1447,6 @@ _GLOBAL(sys_call_table) .long sys_mq_notify .long sys_mq_getsetattr .long sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .long sys_add_key + .long sys_request_key /* 270 */ + .long sys_keyctl diff -uNrp linux-2.6.9-bk5/arch/ppc64/Kconfig linux-2.6.9-bk5-keys/arch/ppc64/Kconfig --- linux-2.6.9-bk5/arch/ppc64/Kconfig 2004-10-21 11:21:45.000000000 +0100 +++ linux-2.6.9-bk5-keys/arch/ppc64/Kconfig 2004-10-22 14:01:30.000000000 +0100 @@ -356,6 +356,11 @@ source "arch/ppc64/Kconfig.debug" source "security/Kconfig" +config KEYS_COMPAT + bool + depends on COMPAT && KEYS + default y + source "crypto/Kconfig" source "lib/Kconfig" diff -uNrp linux-2.6.9-bk5/arch/ppc64/kernel/misc.S linux-2.6.9-bk5-keys/arch/ppc64/kernel/misc.S --- linux-2.6.9-bk5/arch/ppc64/kernel/misc.S 2004-10-21 11:21:45.000000000 +0100 +++ linux-2.6.9-bk5-keys/arch/ppc64/kernel/misc.S 2004-10-22 11:08:44.000000000 +0100 @@ -963,6 +963,9 @@ _GLOBAL(sys_call_table32) .llong .compat_sys_mq_notify .llong .compat_sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys32_add_key + .llong .sys32_request_key + .llong .compat_keyctl .balign 8 _GLOBAL(sys_call_table) @@ -1235,3 +1238,6 @@ _GLOBAL(sys_call_table) .llong .sys_mq_notify .llong .sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys_add_key + .llong .sys_request_key /* 270 */ + .llong .sys_keyctl diff -uNrp linux-2.6.9-bk5/arch/ppc64/kernel/sys_ppc32.c linux-2.6.9-bk5-keys/arch/ppc64/kernel/sys_ppc32.c --- linux-2.6.9-bk5/arch/ppc64/kernel/sys_ppc32.c 2004-10-21 11:21:45.000000000 +0100 +++ linux-2.6.9-bk5-keys/arch/ppc64/kernel/sys_ppc32.c 2004-10-22 13:55:56.000000000 +0100 @@ -1328,3 +1328,21 @@ long ppc32_timer_create(clockid_t clock, return err; } + +asmlinkage long sys32_add_key(const char __user *_type, + const char __user *_description, + const void __user *_payload, + u32 plen, + u32 ringid) +{ + return sys_add_key(_type, _description, _payload, plen, ringid); +} + +asmlinkage long sys32_request_key(const char __user *_type, + const char __user *_description, + const char __user *_callout_info, + u32 destringid) +{ + return sys_request_key(_type, _description, _callout_info, destringid); +} + diff -uNrp linux-2.6.9-bk5/include/asm-ppc/unistd.h linux-2.6.9-bk5-keys/include/asm-ppc/unistd.h --- linux-2.6.9-bk5/include/asm-ppc/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk5-keys/include/asm-ppc/unistd.h 2004-10-22 10:27:40.000000000 +0100 @@ -273,8 +273,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #define __NR(n) #n diff -uNrp linux-2.6.9-bk5/include/asm-ppc64/unistd.h linux-2.6.9-bk5-keys/include/asm-ppc64/unistd.h --- linux-2.6.9-bk5/include/asm-ppc64/unistd.h 2004-10-19 10:42:14.000000000 +0100 +++ linux-2.6.9-bk5-keys/include/asm-ppc64/unistd.h 2004-10-22 10:27:40.000000000 +0100 @@ -279,8 +279,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif diff -uNrp linux-2.6.9-bk5/include/linux/compat.h linux-2.6.9-bk5-keys/include/linux/compat.h --- linux-2.6.9-bk5/include/linux/compat.h 2004-10-19 10:42:16.000000000 +0100 +++ linux-2.6.9-bk5-keys/include/linux/compat.h 2004-10-22 11:02:14.000000000 +0100 @@ -119,6 +119,8 @@ long compat_sys_shmat(int first, int sec long compat_sys_shmctl(int first, int second, void __user *uptr); long compat_sys_semtimedop(int semid, struct sembuf __user *tsems, unsigned nsems, const struct compat_timespec __user *timeout); +asmlinkage long compat_keyctl(u32 option, + u32 arg2, u32 arg3, u32 arg4, u32 arg5); asmlinkage ssize_t compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsigned long vlen); diff -uNrp linux-2.6.9-bk5/security/keys/compat.c linux-2.6.9-bk5-keys/security/keys/compat.c --- linux-2.6.9-bk5/security/keys/compat.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.9-bk5-keys/security/keys/compat.c 2004-10-22 14:02:07.000000000 +0100 @@ -0,0 +1,78 @@ +/* compat.c: 32-bit compatibility syscall for 64-bit systems + * + * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells at redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include "internal.h" + +/*****************************************************************************/ +/* + * the key control system call, 32-bit compatibility version for 64-bit archs + * - this should only be called if the 64-bit arch uses weird pointers in + * 32-bit mode or doesn't guarantee that the top 32-bits of the argument + * registers on taking a 32-bit syscall are zero + * - if you can, you should call sys_keyctl directly + */ +asmlinkage long compat_keyctl(u32 option, + u32 arg2, u32 arg3, u32 arg4, u32 arg5) +{ + switch (option) { + case KEYCTL_GET_KEYRING_ID: + return keyctl_get_keyring_ID(arg2, arg3); + + case KEYCTL_JOIN_SESSION_KEYRING: + return keyctl_join_session_keyring(compat_ptr(arg3)); + + case KEYCTL_UPDATE: + return keyctl_update_key(arg2, compat_ptr(arg3), arg4); + + case KEYCTL_REVOKE: + return keyctl_revoke_key(arg2); + + case KEYCTL_DESCRIBE: + return keyctl_describe_key(arg2, compat_ptr(arg3), arg4); + + case KEYCTL_CLEAR: + return keyctl_keyring_clear(arg2); + + case KEYCTL_LINK: + return keyctl_keyring_link(arg2, arg3); + + case KEYCTL_UNLINK: + return keyctl_keyring_unlink(arg2, arg3); + + case KEYCTL_SEARCH: + return keyctl_keyring_search(arg2, compat_ptr(arg3), + compat_ptr(arg4), arg5); + + case KEYCTL_READ: + return keyctl_read_key(arg2, compat_ptr(arg3), arg4); + + case KEYCTL_CHOWN: + return keyctl_chown_key(arg2, arg3, arg4); + + case KEYCTL_SETPERM: + return keyctl_setperm_key(arg2, arg3); + + case KEYCTL_INSTANTIATE: + return keyctl_instantiate_key(arg2, compat_ptr(arg3), arg4, + arg5); + + case KEYCTL_NEGATE: + return keyctl_negate_key(arg2, arg3, arg4); + + default: + return -EOPNOTSUPP; + } + +} /* end compat_keyctl() */ diff -uNrp linux-2.6.9-bk5/security/keys/internal.h linux-2.6.9-bk5-keys/security/keys/internal.h --- linux-2.6.9-bk5/security/keys/internal.h 2004-10-21 11:22:11.000000000 +0100 +++ linux-2.6.9-bk5-keys/security/keys/internal.h 2004-10-21 11:39:25.000000000 +0100 @@ -81,6 +81,26 @@ extern struct key *find_keyring_by_name( extern int install_thread_keyring(struct task_struct *tsk); +/* + * keyctl functions + */ +extern long keyctl_get_keyring_ID(key_serial_t, int); +extern long keyctl_join_session_keyring(const char __user *); +extern long keyctl_update_key(key_serial_t, const void __user *, size_t); +extern long keyctl_revoke_key(key_serial_t); +extern long keyctl_keyring_clear(key_serial_t); +extern long keyctl_keyring_link(key_serial_t, key_serial_t); +extern long keyctl_keyring_unlink(key_serial_t, key_serial_t); +extern long keyctl_describe_key(key_serial_t, char __user *, size_t); +extern long keyctl_keyring_search(key_serial_t, const char __user *, + const char __user *, key_serial_t); +extern long keyctl_read_key(key_serial_t, char __user *, size_t); +extern long keyctl_chown_key(key_serial_t, uid_t, gid_t); +extern long keyctl_setperm_key(key_serial_t, key_perm_t); +extern long keyctl_instantiate_key(key_serial_t, const void __user *, + size_t, key_serial_t); +extern long keyctl_negate_key(key_serial_t, unsigned, key_serial_t); + /* * debugging key validation diff -uNrp linux-2.6.9-bk5/security/keys/keyctl.c linux-2.6.9-bk5-keys/security/keys/keyctl.c --- linux-2.6.9-bk5/security/keys/keyctl.c 2004-10-21 11:22:11.000000000 +0100 +++ linux-2.6.9-bk5-keys/security/keys/keyctl.c 2004-10-21 11:54:48.000000000 +0100 @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -231,7 +232,7 @@ asmlinkage long sys_request_key(const ch * - the keyring must have search permission to be found * - implements keyctl(KEYCTL_GET_KEYRING_ID) */ -static long keyctl_get_keyring_ID(key_serial_t id, int create) +long keyctl_get_keyring_ID(key_serial_t id, int create) { struct key *key; long ret; @@ -254,7 +255,7 @@ static long keyctl_get_keyring_ID(key_se * join the session keyring * - implements keyctl(KEYCTL_JOIN_SESSION_KEYRING) */ -static long keyctl_join_session_keyring(const char __user *_name) +long keyctl_join_session_keyring(const char __user *_name) { char *name; long nlen, ret; @@ -297,9 +298,9 @@ static long keyctl_join_session_keyring( * - the key must be writable * - implements keyctl(KEYCTL_UPDATE) */ -static long keyctl_update_key(key_serial_t id, - const void __user *_payload, - size_t plen) +long keyctl_update_key(key_serial_t id, + const void __user *_payload, + size_t plen) { struct key *key; void *payload; @@ -346,7 +347,7 @@ static long keyctl_update_key(key_serial * - the key must be writable * - implements keyctl(KEYCTL_REVOKE) */ -static long keyctl_revoke_key(key_serial_t id) +long keyctl_revoke_key(key_serial_t id) { struct key *key; long ret; @@ -372,7 +373,7 @@ static long keyctl_revoke_key(key_serial * - the keyring must be writable * - implements keyctl(KEYCTL_CLEAR) */ -static long keyctl_keyring_clear(key_serial_t ringid) +long keyctl_keyring_clear(key_serial_t ringid) { struct key *keyring; long ret; @@ -398,7 +399,7 @@ static long keyctl_keyring_clear(key_ser * - the key must be linkable * - implements keyctl(KEYCTL_LINK) */ -static long keyctl_keyring_link(key_serial_t id, key_serial_t ringid) +long keyctl_keyring_link(key_serial_t id, key_serial_t ringid) { struct key *keyring, *key; long ret; @@ -432,7 +433,7 @@ static long keyctl_keyring_link(key_seri * - we don't need any permissions on the key * - implements keyctl(KEYCTL_UNLINK) */ -static long keyctl_keyring_unlink(key_serial_t id, key_serial_t ringid) +long keyctl_keyring_unlink(key_serial_t id, key_serial_t ringid) { struct key *keyring, *key; long ret; @@ -470,9 +471,9 @@ static long keyctl_keyring_unlink(key_se * type;uid;gid;perm;description * - implements keyctl(KEYCTL_DESCRIBE) */ -static long keyctl_describe_key(key_serial_t keyid, - char __user *buffer, - size_t buflen) +long keyctl_describe_key(key_serial_t keyid, + char __user *buffer, + size_t buflen) { struct key *key; char *tmpbuf; @@ -532,10 +533,10 @@ static long keyctl_describe_key(key_seri * there's one specified * - implements keyctl(KEYCTL_SEARCH) */ -static long keyctl_keyring_search(key_serial_t ringid, - const char __user *_type, - const char __user *_description, - key_serial_t destringid) +long keyctl_keyring_search(key_serial_t ringid, + const char __user *_type, + const char __user *_description, + key_serial_t destringid) { struct key_type *ktype; struct key *keyring, *key, *dest; @@ -649,9 +650,7 @@ static int keyctl_read_key_same(const st * irrespective of how much we may have copied * - implements keyctl(KEYCTL_READ) */ -static long keyctl_read_key(key_serial_t keyid, - char __user *buffer, - size_t buflen) +long keyctl_read_key(key_serial_t keyid, char __user *buffer, size_t buflen) { struct key *key, *skey; long ret; @@ -711,7 +710,7 @@ static long keyctl_read_key(key_serial_t * - if the uid or gid is -1, then that parameter is not changed * - implements keyctl(KEYCTL_CHOWN) */ -static long keyctl_chown_key(key_serial_t id, uid_t uid, gid_t gid) +long keyctl_chown_key(key_serial_t id, uid_t uid, gid_t gid) { struct key *key; long ret; @@ -770,7 +769,7 @@ static long keyctl_chown_key(key_serial_ * - the keyring owned by the changer * - implements keyctl(KEYCTL_SETPERM) */ -static long keyctl_setperm_key(key_serial_t id, key_perm_t perm) +long keyctl_setperm_key(key_serial_t id, key_perm_t perm) { struct key *key; long ret; @@ -814,10 +813,10 @@ static long keyctl_setperm_key(key_seria * instantiate the key with the specified payload, and, if one is given, link * the key into the keyring */ -static long keyctl_instantiate_key(key_serial_t id, - const void __user *_payload, - size_t plen, - key_serial_t ringid) +long keyctl_instantiate_key(key_serial_t id, + const void __user *_payload, + size_t plen, + key_serial_t ringid) { struct key *key, *keyring; void *payload; @@ -877,9 +876,7 @@ static long keyctl_instantiate_key(key_s * negatively instantiate the key with the given timeout (in seconds), and, if * one is given, link the key into the keyring */ -static long keyctl_negate_key(key_serial_t id, - unsigned timeout, - key_serial_t ringid) +long keyctl_negate_key(key_serial_t id, unsigned timeout, key_serial_t ringid) { struct key *key, *keyring; long ret; @@ -916,7 +913,6 @@ static long keyctl_negate_key(key_serial /*****************************************************************************/ /* * the key control system call - * - currently invoked through prctl() */ asmlinkage long sys_keyctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) diff -uNrp linux-2.6.9-bk5/security/keys/Makefile linux-2.6.9-bk5-keys/security/keys/Makefile --- linux-2.6.9-bk5/security/keys/Makefile 2004-10-21 11:22:11.000000000 +0100 +++ linux-2.6.9-bk5-keys/security/keys/Makefile 2004-10-22 10:49:39.000000000 +0100 @@ -10,4 +10,5 @@ obj-y := \ user_defined.o \ request_key.o +obj-$(CONFIG_KEYS_COMPAT) += compat.o obj-$(CONFIG_PROC_FS) += proc.o From paulus at samba.org Thu Oct 28 09:08:17 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 28 Oct 2004 09:08:17 +1000 Subject: [PATCH] iommu fixes, round 2 In-Reply-To: <1098813781.32293.40.camel@sinatra.austin.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> Message-ID: <16768.10849.741580.850491@cargo.ozlabs.ibm.com> John Rose writes: > Thirdly, iommu_devnode_init() also has an iSeries implementation, so > it's not pSeries-specific. No need to rename it, as suggested in the > comment. I would rather we didn't have two functions with the same name, as we do for iommu_devnode_init (with iSeries and pSeries implementations), because that is one more obstacle to eventually making a single kernel binary that can run on iSeries and on other machines. That goal is still some distance off but we shouldn't make it harder to reach if possible. That's the motivation for having a function pointer in ppc_md for it. Paul. From dwm at austin.ibm.com Thu Oct 28 09:16:29 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Wed, 27 Oct 2004 18:16:29 -0500 Subject: 2.6.10.-rc1 on ppc64 - returning from prom_init hang Message-ID: <200410272316.i9RNGTpj005995@falcon10.austin.ibm.com> Anyone have any thoughts on this? I have xmon=on, but it stops before it gets there... for this iteration, added console=hvsi1, did not change anything. This libata-dev-2.6 on power5 system. Config file read, 1024 bytes Welcome Welcome to yaboot version 1.3.12 Enter "help" to get some basic usage information boot: 2.6.10-rc1-ata-1 * linux boot: 2.6.10-rc1-ata-1 Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 02300000, size: 1306 Kbytes OF stdout device is: /vdevice/vty at 30000001 Hypertas detected, assuming LPAR ! command line: root=/dev/VolGroup00/LogVol00 ro rhgb quiet console=hvsi1 xmon=on memory layout at init: alloc_bottom : 0000000002447000 alloc_top : 0000000008000000 alloc_top_hi : 0000000075000000 rmo_top : 0000000008000000 ram_top : 0000000075000000 Looking for displays found display : /pci at 800000020000002/pci at 2,2/pci at 1/display at 0, opening ... done instantiating rtas at 0x00000000077d9000... done 0000000000000000 : boot cpu 0000000000000000 0000000000000002 : starting cpu hw idx 0000000000000002... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000002748000 -> 0x00000000027492f8 Device tree struct 0x000000000274a000 -> 0x000000000275a000 Calling quiesce ... returning from prom_init ++doug From johnrose at austin.ibm.com Thu Oct 28 09:25:47 2004 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 27 Oct 2004 18:25:47 -0500 Subject: [PATCH] iommu fixes, round 2 In-Reply-To: <16768.10849.741580.850491@cargo.ozlabs.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> Message-ID: <1098919547.18158.4.camel@sinatra.austin.ibm.com> On Wed, 2004-10-27 at 18:08, Paul Mackerras wrote: > John Rose writes: > > > Thirdly, iommu_devnode_init() also has an iSeries implementation, so > > it's not pSeries-specific. No need to rename it, as suggested in the > > comment. > > I would rather we didn't have two functions with the same name, as we > do for iommu_devnode_init (with iSeries and pSeries implementations), > because that is one more obstacle to eventually making a single kernel > binary that can run on iSeries and on other machines. That goal is > still some distance off but we shouldn't make it harder to reach if > possible. That's the motivation for having a function pointer in > ppc_md for it. Good point. To contradict my earlier statement, let's rename them :) None of the other functions in [i,p]Series_iommu.c share names. The two implementations are called from i and p-specific locations anyway, so renaming won't be a problem. Will post a patch tmw. Thanks- John From paulus at samba.org Thu Oct 28 13:59:30 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 28 Oct 2004 13:59:30 +1000 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <20041020170817.0ee49b64@localhost> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> <16758.55568.809557.670513@cargo.ozlabs.ibm.com> <20041020170817.0ee49b64@localhost> Message-ID: <16768.28322.583827.9327@cargo.ozlabs.ibm.com> Jake Moilanen writes: > According to the RPA (item E7-41 to be exact), the block-list can be > anywhere under 4 gigs. RTAS will make hypervisor calls to access this > memory. OK, but I don't see that we make any attempt at all to try to make sure the memory for the block list pages is below 4G. I also don't see where we check the ibm,flash-block-version property (to see if we can in fact use a linked list of headers) or where we check that the pages we are using don't overlap OF's memory (i.e. real-size bytes starting at real-base). Since this is happening at reboot time, I suggest we copy the block list into rtas_rmo_buf. That is big enough to accommodate up to 8k entries, which will do for up to 32MB of flash image, which should be enough for now, shouldn't it? If not we can just make rtas_rmo_buf a bit bigger. As for not overlapping OF, we just need a little allocator function that keeps on allocating pages until it gets one that doesn't overlap with OF, and then frees all the extra pages it had to allocate. Those pages could be linked together so we don't have to maintain a big array of page pointers. The common case will be that we get a page we can use (i.e. which doesn't overlap OF) on the first try. Regards, Paul. From paulus at samba.org Thu Oct 28 14:44:59 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 28 Oct 2004 14:44:59 +1000 Subject: module.viomap support for ppc64 In-Reply-To: <20040813094040.GA1769@suse.de> References: <20040812173751.GA30564@suse.de> <1092339278.19137.8.camel@localhost> <1092354195.25196.11.camel@bach> <20040813094040.GA1769@suse.de> Message-ID: <16768.31051.268932.927382@cargo.ozlabs.ibm.com> Olaf Hering writes: > A hack for 2.6.8-rc4 is below. Can I read the alias file via > while read a b c ; do : done < modules.alias ? > Is b supposed to contain not spaces? What special delimiter chars are > allowed? The 'name' and 'compat' property can contain almost any char. > I used '^' for the time being. Olaf, do you still want these changes made? I rebased your patch on current BK (see below). Dave, any comments on this patch? Paul. diff -urN linux-2.5/arch/ppc64/kernel/vio.c test/arch/ppc64/kernel/vio.c --- linux-2.5/arch/ppc64/kernel/vio.c 2004-09-24 15:23:06.000000000 +1000 +++ test/arch/ppc64/kernel/vio.c 2004-10-28 14:22:59.791014944 +1000 @@ -143,7 +143,7 @@ { DBGENTER(); - while (ids->type) { + while (ids->type[0]) { if ((strncmp(dev->type, ids->type, strlen(ids->type)) == 0) && device_is_compatible(dev->dev.platform_data, ids->compat)) return ids; diff -urN linux-2.5/drivers/block/viodasd.c test/drivers/block/viodasd.c --- linux-2.5/drivers/block/viodasd.c 2004-06-30 15:40:03.000000000 +1000 +++ test/drivers/block/viodasd.c 2004-10-28 14:21:06.962994664 +1000 @@ -778,7 +778,7 @@ */ static struct vio_device_id viodasd_device_table[] __devinitdata = { { "viodasd", "" }, - { 0, } + { "", "" } }; MODULE_DEVICE_TABLE(vio, viodasd_device_table); diff -urN linux-2.5/drivers/cdrom/viocd.c test/drivers/cdrom/viocd.c --- linux-2.5/drivers/cdrom/viocd.c 2004-08-24 07:22:47.000000000 +1000 +++ test/drivers/cdrom/viocd.c 2004-10-28 14:21:27.959005552 +1000 @@ -693,7 +693,7 @@ */ static struct vio_device_id viocd_device_table[] __devinitdata = { { "viocd", "" }, - { 0, } + { "", "" } }; MODULE_DEVICE_TABLE(vio, viocd_device_table); diff -urN linux-2.5/drivers/char/hvc_console.c test/drivers/char/hvc_console.c --- linux-2.5/drivers/char/hvc_console.c 2004-10-22 07:00:21.000000000 +1000 +++ test/drivers/char/hvc_console.c 2004-10-28 14:21:36.504029776 +1000 @@ -581,7 +581,7 @@ static struct vio_device_id hvc_driver_table[] __devinitdata= { {"serial", "hvterm1"}, - { NULL, } + { "", "" } }; MODULE_DEVICE_TABLE(vio, hvc_driver_table); diff -urN linux-2.5/drivers/char/hvcs.c test/drivers/char/hvcs.c --- linux-2.5/drivers/char/hvcs.c 2004-10-22 07:00:21.000000000 +1000 +++ test/drivers/char/hvcs.c 2004-10-28 14:17:51.265058720 +1000 @@ -527,7 +527,7 @@ static struct vio_device_id hvcs_driver_table[] __devinitdata= { {"serial-server", "hvterm2"}, - { NULL, } + { "", "" } }; MODULE_DEVICE_TABLE(vio, hvcs_driver_table); diff -urN linux-2.5/drivers/char/viotape.c test/drivers/char/viotape.c --- linux-2.5/drivers/char/viotape.c 2004-06-30 15:40:03.000000000 +1000 +++ test/drivers/char/viotape.c 2004-10-28 14:22:59.446934232 +1000 @@ -991,7 +991,7 @@ */ static struct vio_device_id viotape_device_table[] __devinitdata = { { "viotape", "" }, - { 0, } + { "", "" } }; MODULE_DEVICE_TABLE(vio, viotape_device_table); diff -urN linux-2.5/drivers/net/ibmveth.c test/drivers/net/ibmveth.c --- linux-2.5/drivers/net/ibmveth.c 2004-09-16 21:51:58.000000000 +1000 +++ test/drivers/net/ibmveth.c 2004-10-28 14:16:32.795007496 +1000 @@ -1125,7 +1125,7 @@ static struct vio_device_id ibmveth_device_table[] __devinitdata= { { "network", "IBM,l-lan"}, - { 0,} + { "",""} }; MODULE_DEVICE_TABLE(vio, ibmveth_device_table); diff -urN linux-2.5/drivers/net/iseries_veth.c test/drivers/net/iseries_veth.c --- linux-2.5/drivers/net/iseries_veth.c 2004-10-20 21:20:19.000000000 +1000 +++ test/drivers/net/iseries_veth.c 2004-10-28 14:22:59.046995032 +1000 @@ -1353,7 +1353,7 @@ */ static struct vio_device_id veth_device_table[] __devinitdata = { { "vlan", "" }, - { NULL, NULL } + { "", "" } }; MODULE_DEVICE_TABLE(vio, veth_device_table); diff -urN linux-2.5/drivers/scsi/ibmvscsi/ibmvscsi.c test/drivers/scsi/ibmvscsi/ibmvscsi.c --- linux-2.5/drivers/scsi/ibmvscsi/ibmvscsi.c 2004-07-29 07:33:14.000000000 +1000 +++ test/drivers/scsi/ibmvscsi/ibmvscsi.c 2004-10-28 14:22:45.765019224 +1000 @@ -1368,7 +1368,7 @@ */ static struct vio_device_id ibmvscsi_device_table[] __devinitdata = { {"vscsi", "IBM,v-scsi"}, - {0,} + { "", "" } }; MODULE_DEVICE_TABLE(vio, ibmvscsi_device_table); diff -urN linux-2.5/include/asm-ppc64/vio.h test/include/asm-ppc64/vio.h --- linux-2.5/include/asm-ppc64/vio.h 2004-06-30 15:40:04.000000000 +1000 +++ test/include/asm-ppc64/vio.h 2004-10-28 14:16:32.797007192 +1000 @@ -86,9 +86,10 @@ extern struct bus_type vio_bus_type; +#define VIO_DEVTABLE_PROPERTY_LENGTH 32 struct vio_device_id { - char *type; - char *compat; + char type[VIO_DEVTABLE_PROPERTY_LENGTH]; + char compat[VIO_DEVTABLE_PROPERTY_LENGTH]; }; struct vio_driver { diff -urN linux-2.5/include/linux/mod_devicetable.h test/include/linux/mod_devicetable.h --- linux-2.5/include/linux/mod_devicetable.h 2004-02-09 18:25:16.000000000 +1100 +++ test/include/linux/mod_devicetable.h 2004-10-28 14:16:32.798007040 +1000 @@ -164,5 +164,10 @@ } devs[PNP_MAX_DEVICES]; }; +#define VIO_DEVTABLE_PROPERTY_LENGTH 32 +struct VIO_device_id { + char name[VIO_DEVTABLE_PROPERTY_LENGTH]; + char compat[VIO_DEVTABLE_PROPERTY_LENGTH]; +}; #endif /* LINUX_MOD_DEVICETABLE_H */ From david at gibson.dropbear.id.au Thu Oct 28 16:01:51 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 28 Oct 2004 16:01:51 +1000 Subject: [PPC64] Rework ppc64 hugepage code Message-ID: <20041028060151.GA1680@zax> Andrew, please apply: Rework the ppc64 hugepage code. Instead of using specially marked pmd entries in the normal pagetables to represent hugepages, use normal pte_t entries, in a special set of pagetables used for hugepages only. Using pte_t instead of a special hugepte_t makes the code more similar to that for other architecturess, allowing more possibilities for consolidating the hugepage code. Using independent pagetables for the hugepages is also a prerequisite for moving the hugepages into their own region well outside the normal user address space. The restrictions imposed by the powerpc mmu's segment design mean we probably want to do that in the fairly near future. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-ppc64/pgtable.h 2004-10-21 11:55:01.000000000 +1000 +++ working-2.6/include/asm-ppc64/pgtable.h 2004-10-27 12:06:02.635023544 +1000 @@ -98,6 +98,7 @@ #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ #define _PAGE_GROUP_IX 0x7000 /* software: HPTE index within group */ +#define _PAGE_HUGE 0x10000 /* 16MB page */ /* Bits 0x7000 identify the index within an HPT Group */ #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_SECONDARY | _PAGE_GROUP_IX) /* PAGE_MASK gives the right answer below, but only by accident */ @@ -157,19 +158,19 @@ #endif /* __ASSEMBLY__ */ /* shift to put page number into pte */ -#define PTE_SHIFT (16) +#define PTE_SHIFT (17) /* We allow 2^41 bytes of real memory, so we need 29 bits in the PMD * to give the PTE page number. The bottom two bits are for flags. */ #define PMD_TO_PTEPAGE_SHIFT (2) #ifdef CONFIG_HUGETLB_PAGE -#define _PMD_HUGEPAGE 0x00000001U -#define HUGEPTE_BATCH_SIZE (1<<(HPAGE_SHIFT-PMD_SHIFT)) #ifndef __ASSEMBLY__ int hash_huge_page(struct mm_struct *mm, unsigned long access, unsigned long ea, unsigned long vsid, int local); + +void hugetlb_mm_free_pgd(struct mm_struct *mm); #endif /* __ASSEMBLY__ */ #define HAVE_ARCH_UNMAPPED_AREA @@ -177,7 +178,7 @@ #else #define hash_huge_page(mm,a,ea,vsid,local) -1 -#define _PMD_HUGEPAGE 0 +#define hugetlb_mm_free_pgd(mm) do {} while (0) #endif @@ -213,10 +214,8 @@ #define pmd_set(pmdp, ptep) \ (pmd_val(*(pmdp)) = (__ba_to_bpn(ptep) << PMD_TO_PTEPAGE_SHIFT)) #define pmd_none(pmd) (!pmd_val(pmd)) -#define pmd_hugepage(pmd) (!!(pmd_val(pmd) & _PMD_HUGEPAGE)) -#define pmd_bad(pmd) (((pmd_val(pmd)) == 0) || pmd_hugepage(pmd)) -#define pmd_present(pmd) ((!pmd_hugepage(pmd)) \ - && (pmd_val(pmd) & ~_PMD_HUGEPAGE) != 0) +#define pmd_bad(pmd) (pmd_val(pmd) == 0) +#define pmd_present(pmd) (pmd_val(pmd) != 0) #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0) #define pmd_page_kernel(pmd) \ (__bpn_to_ba(pmd_val(pmd) >> PMD_TO_PTEPAGE_SHIFT)) @@ -269,6 +268,7 @@ static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY;} static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;} static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} +static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_HUGE;} static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } @@ -294,6 +294,8 @@ pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } +static inline pte_t pte_mkhuge(pte_t pte) { + pte_val(pte) |= _PAGE_HUGE; return pte; } /* Atomic PTE updates */ static inline unsigned long pte_update(pte_t *p, unsigned long clr) @@ -464,6 +466,10 @@ extern void paging_init(void); +struct mmu_gather; +void hugetlb_free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *prev, + unsigned long start, unsigned long end); + /* * This gets called at the end of handling a page fault, when * the kernel has put a new PTE into the page table for the process. Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-10-27 10:43:46.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-10-27 12:06:02.637023240 +1000 @@ -27,116 +27,143 @@ #include -/* HugePTE layout: - * - * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 - * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH RW - HG=1 - */ +#define HUGEPGDIR_SHIFT (HPAGE_SHIFT + PAGE_SHIFT - 3) +#define HUGEPGDIR_SIZE (1UL << HUGEPGDIR_SHIFT) +#define HUGEPGDIR_MASK (~(HUGEPGDIR_SIZE-1)) + +#define HUGEPTE_INDEX_SIZE 9 +#define HUGEPGD_INDEX_SIZE 10 + +#define PTRS_PER_HUGEPTE (1 << HUGEPTE_INDEX_SIZE) +#define PTRS_PER_HUGEPGD (1 << HUGEPGD_INDEX_SIZE) -#define HUGEPTE_SHIFT 15 -#define _HUGEPAGE_PFN 0xffff8000 -#define _HUGEPAGE_BAD 0x00007f00 -#define _HUGEPAGE_HASHPTE 0x00000008 -#define _HUGEPAGE_SECONDARY 0x00000010 -#define _HUGEPAGE_GROUP_IX 0x000000e0 -#define _HUGEPAGE_HPTEFLAGS (_HUGEPAGE_HASHPTE | _HUGEPAGE_SECONDARY | \ - _HUGEPAGE_GROUP_IX) -#define _HUGEPAGE_RW 0x00000004 - -typedef struct {unsigned int val;} hugepte_t; -#define hugepte_val(hugepte) ((hugepte).val) -#define __hugepte(x) ((hugepte_t) { (x) } ) -#define hugepte_pfn(x) \ - ((unsigned long)(hugepte_val(x)>>HUGEPTE_SHIFT) << HUGETLB_PAGE_ORDER) -#define mk_hugepte(page,wr) __hugepte( \ - ((page_to_pfn(page)>>HUGETLB_PAGE_ORDER) << HUGEPTE_SHIFT ) \ - | (!!(wr) * _HUGEPAGE_RW) | _PMD_HUGEPAGE ) - -#define hugepte_bad(x) ( !(hugepte_val(x) & _PMD_HUGEPAGE) || \ - (hugepte_val(x) & _HUGEPAGE_BAD) ) -#define hugepte_page(x) pfn_to_page(hugepte_pfn(x)) -#define hugepte_none(x) (!(hugepte_val(x) & _HUGEPAGE_PFN)) - - -static void flush_hash_hugepage(mm_context_t context, unsigned long ea, - hugepte_t pte, int local); - -static inline unsigned int hugepte_update(hugepte_t *p, unsigned int clr, - unsigned int set) -{ - unsigned int old, tmp; - - __asm__ __volatile__( - "1: lwarx %0,0,%3 # pte_update\n\ - andc %1,%0,%4 \n\ - or %1,%1,%5 \n\ - stwcx. %1,0,%3 \n\ - bne- 1b" - : "=&r" (old), "=&r" (tmp), "=m" (*p) - : "r" (p), "r" (clr), "r" (set), "m" (*p) - : "cc" ); - return old; +static inline int hugepgd_index(unsigned long addr) +{ + return (addr & ~REGION_MASK) >> HUGEPGDIR_SHIFT; } -static inline void set_hugepte(hugepte_t *ptep, hugepte_t pte) +static pgd_t *hugepgd_offset(struct mm_struct *mm, unsigned long addr) { - hugepte_update(ptep, ~_HUGEPAGE_HPTEFLAGS, - hugepte_val(pte) & ~_HUGEPAGE_HPTEFLAGS); + int index; + + if (! mm->context.huge_pgdir) + return NULL; + + + index = hugepgd_index(addr); + BUG_ON(index >= PTRS_PER_HUGEPGD); + return mm->context.huge_pgdir + index; } -static hugepte_t *hugepte_alloc(struct mm_struct *mm, unsigned long addr) +static inline pte_t *hugepte_offset(pgd_t *dir, unsigned long addr) { - pgd_t *pgd; - pmd_t *pmd = NULL; + int index; - BUG_ON(!in_hugepage_area(mm->context, addr)); + if (pgd_none(*dir)) + return NULL; - pgd = pgd_offset(mm, addr); - pmd = pmd_alloc(mm, pgd, addr); + index = (addr >> HPAGE_SHIFT) % PTRS_PER_HUGEPTE; + return (pte_t *)pgd_page(*dir) + index; +} - /* We shouldn't find a (normal) PTE page pointer here */ - BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); - - return (hugepte_t *)pmd; +static pgd_t *hugepgd_alloc(struct mm_struct *mm, unsigned long addr) +{ + BUG_ON(! in_hugepage_area(mm->context, addr)); + + if (! mm->context.huge_pgdir) { + pgd_t *new; + spin_unlock(&mm->page_table_lock); + /* Don't use pgd_alloc(), because we want __GFP_REPEAT */ + new = kmem_cache_alloc(zero_cache, GFP_KERNEL | __GFP_REPEAT); + BUG_ON(memcmp(new, empty_zero_page, PAGE_SIZE)); + spin_lock(&mm->page_table_lock); + + /* + * Because we dropped the lock, we should re-check the + * entry, as somebody else could have populated it.. + */ + if (mm->context.huge_pgdir) + pgd_free(new); + else + mm->context.huge_pgdir = new; + } + return hugepgd_offset(mm, addr); } -static hugepte_t *hugepte_offset(struct mm_struct *mm, unsigned long addr) +static pte_t *hugepte_alloc(struct mm_struct *mm, pgd_t *dir, + unsigned long addr) { - pgd_t *pgd; - pmd_t *pmd = NULL; + if (! pgd_present(*dir)) { + pte_t *new; - BUG_ON(!in_hugepage_area(mm->context, addr)); + spin_unlock(&mm->page_table_lock); + new = kmem_cache_alloc(zero_cache, GFP_KERNEL | __GFP_REPEAT); + BUG_ON(memcmp(new, empty_zero_page, PAGE_SIZE)); + spin_lock(&mm->page_table_lock); + /* + * Because we dropped the lock, we should re-check the + * entry, as somebody else could have populated it.. + */ + if (pgd_present(*dir)) { + if (new) + kmem_cache_free(zero_cache, new); + } else { + struct page *ptepage; - pgd = pgd_offset(mm, addr); - if (pgd_none(*pgd)) - return NULL; + if (! new) + return NULL; + ptepage = virt_to_page(new); + ptepage->mapping = (void *) mm; + ptepage->index = addr & HUGEPGDIR_MASK; + pgd_populate(mm, dir, new); + } + } - pmd = pmd_offset(pgd, addr); + return hugepte_offset(dir, addr); +} + +static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; - /* We shouldn't find a (normal) PTE page pointer here */ - BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); + BUG_ON(! in_hugepage_area(mm->context, addr)); - return (hugepte_t *)pmd; + pgd = hugepgd_offset(mm, addr); + if (! pgd) + return NULL; + + return hugepte_offset(pgd, addr); } -static void setup_huge_pte(struct mm_struct *mm, struct page *page, - hugepte_t *ptep, int write_access) +static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { - hugepte_t entry; - int i; + pgd_t *pgd; - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - entry = mk_hugepte(page, write_access); - for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) - set_hugepte(ptep+i, entry); + BUG_ON(! in_hugepage_area(mm->context, addr)); + + pgd = hugepgd_alloc(mm, addr); + if (! pgd) + return NULL; + + return hugepte_alloc(mm, pgd, addr); } -static void teardown_huge_pte(hugepte_t *ptep) +static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, + struct page *page, pte_t *ptep, int write_access) { - int i; + pte_t entry; - for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) - pmd_clear((pmd_t *)(ptep+i)); + mm->rss += (HPAGE_SIZE / PAGE_SIZE); + if (write_access) { + entry = + pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); + } else { + entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); + } + entry = pte_mkyoung(entry); + entry = pte_mkhuge(entry); + + set_pte(ptep, entry); } /* @@ -268,34 +295,31 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma) { - hugepte_t *src_pte, *dst_pte, entry; + pte_t *src_pte, *dst_pte, entry; struct page *ptepage; unsigned long addr = vma->vm_start; unsigned long end = vma->vm_end; + int err = -ENOMEM; while (addr < end) { - BUG_ON(! in_hugepage_area(src->context, addr)); - BUG_ON(! in_hugepage_area(dst->context, addr)); - - dst_pte = hugepte_alloc(dst, addr); + dst_pte = huge_pte_alloc(dst, addr); if (!dst_pte) - return -ENOMEM; + goto out; - src_pte = hugepte_offset(src, addr); + src_pte = huge_pte_offset(src, addr); entry = *src_pte; - if ((addr % HPAGE_SIZE) == 0) { - /* This is the first hugepte in a batch */ - ptepage = hugepte_page(entry); - get_page(ptepage); - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - } - set_hugepte(dst_pte, entry); - + ptepage = pte_page(entry); + get_page(ptepage); + dst->rss += (HPAGE_SIZE / PAGE_SIZE); + set_pte(dst_pte, entry); - addr += PMD_SIZE; + addr += HPAGE_SIZE; } - return 0; + + err = 0; + out: + return err; } int @@ -310,18 +334,16 @@ vpfn = vaddr/PAGE_SIZE; while (vaddr < vma->vm_end && remainder) { - BUG_ON(!in_hugepage_area(mm->context, vaddr)); - if (pages) { - hugepte_t *pte; + pte_t *pte; struct page *page; - pte = hugepte_offset(mm, vaddr); + pte = huge_pte_offset(mm, vaddr); /* hugetlb should be locked, and hence, prefaulted */ - WARN_ON(!pte || hugepte_none(*pte)); + WARN_ON(!pte || pte_none(*pte)); - page = &hugepte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; + page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; WARN_ON(!PageCompound(page)); @@ -347,26 +369,31 @@ struct page * follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { - return ERR_PTR(-EINVAL); + pte_t *ptep; + struct page *page; + + if (! in_hugepage_area(mm->context, address)) + return ERR_PTR(-EINVAL); + + ptep = huge_pte_offset(mm, address); + page = pte_page(*ptep); + if (page) + page += (address % HPAGE_SIZE) / PAGE_SIZE; + + return page; } int pmd_huge(pmd_t pmd) { - return pmd_hugepage(pmd); + return 0; } struct page * follow_huge_pmd(struct mm_struct *mm, unsigned long address, pmd_t *pmd, int write) { - struct page *page; - - BUG_ON(! pmd_hugepage(*pmd)); - - page = hugepte_page(*(hugepte_t *)pmd); - if (page) - page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT); - return page; + BUG(); + return NULL; } void unmap_hugepage_range(struct vm_area_struct *vma, @@ -374,44 +401,38 @@ { struct mm_struct *mm = vma->vm_mm; unsigned long addr; - hugepte_t *ptep; + pte_t *ptep; struct page *page; - int cpu; - int local = 0; - cpumask_t tmp; WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON((start % HPAGE_SIZE) != 0); BUG_ON((end % HPAGE_SIZE) != 0); - /* XXX are there races with checking cpu_vm_mask? - Anton */ - cpu = get_cpu(); - tmp = cpumask_of_cpu(cpu); - if (cpus_equal(vma->vm_mm->cpu_vm_mask, tmp)) - local = 1; - for (addr = start; addr < end; addr += HPAGE_SIZE) { - hugepte_t pte; - - BUG_ON(!in_hugepage_area(mm->context, addr)); + pte_t pte; - ptep = hugepte_offset(mm, addr); - if (!ptep || hugepte_none(*ptep)) + ptep = huge_pte_offset(mm, addr); + if (!ptep || pte_none(*ptep)) continue; pte = *ptep; - page = hugepte_page(pte); - teardown_huge_pte(ptep); - - if (hugepte_val(pte) & _HUGEPAGE_HASHPTE) - flush_hash_hugepage(mm->context, addr, - pte, local); + page = pte_page(pte); + pte_clear(ptep); put_page(page); } - put_cpu(); - mm->rss -= (end - start) >> PAGE_SHIFT; + flush_tlb_pending(); +} + +void hugetlb_free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *prev, + unsigned long start, unsigned long end) +{ + /* Because the huge pgtables are only 2 level, they can take + * at most around 4M, much less than one hugepage which the + * process is presumably entitled to use. So we don't bother + * freeing up the pagetables on unmap, and wait until + * destroy_context() to clean up the lot. */ } int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) @@ -427,16 +448,14 @@ spin_lock(&mm->page_table_lock); for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { unsigned long idx; - hugepte_t *pte = hugepte_alloc(mm, addr); + pte_t *pte = huge_pte_alloc(mm, addr); struct page *page; - BUG_ON(!in_hugepage_area(mm->context, addr)); - if (!pte) { ret = -ENOMEM; goto out; } - if (!hugepte_none(*pte)) + if (! pte_none(*pte)) continue; idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) @@ -463,7 +482,7 @@ goto out; } } - setup_huge_pte(mm, page, pte, vma->vm_flags & VM_WRITE); + set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); } out: spin_unlock(&mm->page_table_lock); @@ -717,20 +736,55 @@ } } +void hugetlb_mm_free_pgd(struct mm_struct *mm) +{ + int i; + pgd_t *pgdir; + + spin_lock(&mm->page_table_lock); + + pgdir = mm->context.huge_pgdir; + if (! pgdir) + return; + + mm->context.huge_pgdir = NULL; + + /* cleanup any hugepte pages leftover */ + for (i = 0; i < PTRS_PER_HUGEPGD; i++) { + pgd_t *pgd = pgdir + i; + + if (! pgd_none(*pgd)) { + pte_t *pte = (pte_t *)pgd_page(*pgd); + struct page *ptepage = virt_to_page(pte); + + ptepage->mapping = NULL; + + BUG_ON(memcmp(pte, empty_zero_page, PAGE_SIZE)); + kmem_cache_free(zero_cache, pte); + } + pgd_clear(pgd); + } + + BUG_ON(memcmp(pgdir, empty_zero_page, PAGE_SIZE)); + kmem_cache_free(zero_cache, pgdir); + + spin_unlock(&mm->page_table_lock); +} + int hash_huge_page(struct mm_struct *mm, unsigned long access, unsigned long ea, unsigned long vsid, int local) { - hugepte_t *ptep; + pte_t *ptep; unsigned long va, vpn; int is_write; - hugepte_t old_pte, new_pte; - unsigned long hpteflags, prpn, flags; + pte_t old_pte, new_pte; + unsigned long hpteflags, prpn; long slot; + int err = 1; + + spin_lock(&mm->page_table_lock); - /* We have to find the first hugepte in the batch, since - * that's the one that will store the HPTE flags */ - ea &= HPAGE_MASK; - ptep = hugepte_offset(mm, ea); + ptep = huge_pte_offset(mm, ea); /* Search the Linux page table for a match with va */ va = (vsid << 28) | (ea & 0x0fffffff); @@ -740,19 +794,18 @@ * If no pte found or not present, send the problem up to * do_page_fault */ - if (unlikely(!ptep || hugepte_none(*ptep))) - return 1; + if (unlikely(!ptep || pte_none(*ptep))) + goto out; - BUG_ON(hugepte_bad(*ptep)); +/* BUG_ON(pte_bad(*ptep)); */ /* * Check the user's access rights to the page. If access should be * prevented then send the problem up to do_page_fault. */ is_write = access & _PAGE_RW; - if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW))) - return 1; - + if (unlikely(is_write && !(pte_val(*ptep) & _PAGE_RW))) + goto out; /* * At this point, we have a pte (old_pte) which can be used to build * or update an HPTE. There are 2 cases: @@ -765,41 +818,40 @@ * page is currently not DIRTY. */ - spin_lock_irqsave(&mm->page_table_lock, flags); old_pte = *ptep; new_pte = old_pte; - hpteflags = 0x2 | (! (hugepte_val(new_pte) & _HUGEPAGE_RW)); + hpteflags = 0x2 | (! (pte_val(new_pte) & _PAGE_RW)); /* Check if pte already has an hpte (case 2) */ - if (unlikely(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE)) { + if (unlikely(pte_val(old_pte) & _PAGE_HASHPTE)) { /* There MIGHT be an HPTE for this pte */ unsigned long hash, slot; hash = hpt_hash(vpn, 1); - if (hugepte_val(old_pte) & _HUGEPAGE_SECONDARY) + if (pte_val(old_pte) & _PAGE_SECONDARY) hash = ~hash; slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; - slot += (hugepte_val(old_pte) & _HUGEPAGE_GROUP_IX) >> 5; + slot += (pte_val(old_pte) & _PAGE_GROUP_IX) >> 12; if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1) - hugepte_val(old_pte) &= ~_HUGEPAGE_HPTEFLAGS; + pte_val(old_pte) &= ~_PAGE_HPTEFLAGS; } - if (likely(!(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE))) { + if (likely(!(pte_val(old_pte) & _PAGE_HASHPTE))) { unsigned long hash = hpt_hash(vpn, 1); unsigned long hpte_group; - prpn = hugepte_pfn(old_pte); + prpn = pte_pfn(old_pte); repeat: hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; /* Update the linux pte with the HPTE slot */ - hugepte_val(new_pte) &= ~_HUGEPAGE_HPTEFLAGS; - hugepte_val(new_pte) |= _HUGEPAGE_HASHPTE; + pte_val(new_pte) &= ~_PAGE_HPTEFLAGS; + pte_val(new_pte) |= _PAGE_HASHPTE; /* Add in WIMG bits */ /* XXX We should store these in the pte */ @@ -810,7 +862,7 @@ /* Primary is full, try the secondary */ if (unlikely(slot == -1)) { - hugepte_val(new_pte) |= _HUGEPAGE_SECONDARY; + pte_val(new_pte) |= _PAGE_SECONDARY; hpte_group = ((~hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; slot = ppc_md.hpte_insert(hpte_group, va, prpn, @@ -827,39 +879,20 @@ if (unlikely(slot == -2)) panic("hash_huge_page: pte_insert failed\n"); - hugepte_val(new_pte) |= (slot<<5) & _HUGEPAGE_GROUP_IX; + pte_val(new_pte) |= (slot<<12) & _PAGE_GROUP_IX; /* * No need to use ldarx/stdcx here because all who * might be updating the pte will hold the - * page_table_lock or the hash_table_lock - * (we hold both) + * page_table_lock */ *ptep = new_pte; } - spin_unlock_irqrestore(&mm->page_table_lock, flags); - - return 0; -} - -static void flush_hash_hugepage(mm_context_t context, unsigned long ea, - hugepte_t pte, int local) -{ - unsigned long vsid, vpn, va, hash, slot; - - BUG_ON(hugepte_bad(pte)); - BUG_ON(!in_hugepage_area(context, ea)); - - vsid = get_vsid(context.id, ea); + err = 0; - va = (vsid << 28) | (ea & 0x0fffffff); - vpn = va >> HPAGE_SHIFT; - hash = hpt_hash(vpn, 1); - if (hugepte_val(pte) & _HUGEPAGE_SECONDARY) - hash = ~hash; - slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; - slot += (hugepte_val(pte) & _HUGEPAGE_GROUP_IX) >> 5; + out: + spin_unlock(&mm->page_table_lock); - ppc_md.hpte_invalidate(slot, va, 1, local); + return err; } Index: working-2.6/include/asm-ppc64/mmu.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu.h 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu.h 2004-10-27 12:06:02.638023088 +1000 @@ -24,6 +24,7 @@ typedef struct { mm_context_id_t id; #ifdef CONFIG_HUGETLB_PAGE + pgd_t *huge_pgdir; u16 htlb_segs; /* bitmask */ #endif } mm_context_t; Index: working-2.6/include/asm-ppc64/page.h =================================================================== --- working-2.6.orig/include/asm-ppc64/page.h 2004-09-20 10:12:50.000000000 +1000 +++ working-2.6/include/asm-ppc64/page.h 2004-10-27 12:06:02.638023088 +1000 @@ -64,7 +64,6 @@ #define is_hugepage_only_range(addr, len) \ (touches_hugepage_high_range((addr), (len)) || \ touches_hugepage_low_range((addr), (len))) -#define hugetlb_free_pgtables free_pgtables #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #define in_hugepage_area(context, addr) \ Index: working-2.6/arch/ppc64/mm/init.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/init.c 2004-10-27 10:43:46.000000000 +1000 +++ working-2.6/arch/ppc64/mm/init.c 2004-10-27 12:06:02.639022936 +1000 @@ -478,6 +478,12 @@ int index; int err; +#ifdef CONFIG_HUGETLB_PAGE + /* We leave htlb_segs as it was, but for a fork, we need to + * clear the huge_pgdir. */ + mm->context.huge_pgdir = NULL; +#endif + again: if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL)) return -ENOMEM; @@ -508,6 +514,8 @@ spin_unlock(&mmu_context_lock); mm->context.id = NO_CONTEXT; + + hugetlb_mm_free_pgd(mm); } static int __init mmu_context_init(void) Index: working-2.6/arch/ppc64/mm/hash_utils.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hash_utils.c 2004-10-27 10:43:46.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hash_utils.c 2004-10-27 12:06:02.640022784 +1000 @@ -341,9 +341,7 @@ int local) { unsigned long vsid, vpn, va, hash, secondary, slot; - - /* XXX fix for large ptes */ - unsigned long large = 0; + unsigned long huge = pte_huge(pte); if ((ea >= USER_START) && (ea <= USER_END)) vsid = get_vsid(context, ea); @@ -351,18 +349,18 @@ vsid = get_kernel_vsid(ea); va = (vsid << 28) | (ea & 0x0fffffff); - if (large) + if (huge) vpn = va >> HPAGE_SHIFT; else vpn = va >> PAGE_SHIFT; - hash = hpt_hash(vpn, large); + hash = hpt_hash(vpn, huge); secondary = (pte_val(pte) & _PAGE_SECONDARY) >> 15; if (secondary) hash = ~hash; slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; slot += (pte_val(pte) & _PAGE_GROUP_IX) >> 12; - ppc_md.hpte_invalidate(slot, va, large, local); + ppc_md.hpte_invalidate(slot, va, huge, local); } void flush_hash_range(unsigned long context, unsigned long number, int local) -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From akpm at osdl.org Thu Oct 28 16:41:09 2004 From: akpm at osdl.org (Andrew Morton) Date: Wed, 27 Oct 2004 23:41:09 -0700 Subject: [PATCH] Make key management syscalls work on PPC/PPC64 In-Reply-To: <24857.1098904121@redhat.com> References: <24857.1098904121@redhat.com> Message-ID: <20041027234109.19b39e93.akpm@osdl.org> David Howells wrote: > > The attached patch permits my key management stuff to be used on PPC, PPC64 > and PPC on PPC64. Please remember to test your patches with CONFIG_KEYS=n --- 25-power4/kernel/sys.c~ppc-ppc64-make-key-management-syscalls-work-fix 2004-10-27 23:26:16.330512080 -0700 +++ 25-power4-akpm/kernel/sys.c 2004-10-27 23:27:04.516186744 -0700 @@ -286,6 +286,7 @@ cond_syscall(compat_set_mempolicy) cond_syscall(sys_add_key) cond_syscall(sys_request_key) cond_syscall(sys_keyctl) +cond_syscall(compat_keyctl) cond_syscall(compat_sys_socketcall) /* arch-specific weak syscall entries */ _ From kaos at sgi.com Thu Oct 28 16:30:15 2004 From: kaos at sgi.com (Keith Owens) Date: Thu, 28 Oct 2004 16:30:15 +1000 Subject: [PATCH] add syslog printing to xmon debugger. In-Reply-To: Your message of "Fri, 22 Oct 2004 11:59:17 +1000." <16760.26997.131687.456670@cargo.ozlabs.ibm.com> Message-ID: <5227.1098945015@kao2.melbourne.sgi.com> On Fri, 22 Oct 2004 11:59:17 +1000, Paul Mackerras wrote: >Linas, > >> Andrew, >> >> Please apply at least the kernel/printk.c part of the patch, >> if you are feeling at all charitable. > >Did you ever get any reaction to that? I see that the printk.c patch was lifted straight from kdb - without any mention of kdb. It even has the same bug as kdb, which was corrected in kdb-v4.4-2.6.9-common-2. The current kdb patch to printk.c is :- Index: linux/kernel/printk.c =================================================================== --- linux.orig/kernel/printk.c Tue Oct 19 07:55:35 2004 +++ linux/kernel/printk.c Thu Oct 21 18:06:28 2004 @@ -373,6 +373,20 @@ out: return error; } +#ifdef CONFIG_KDB +/* kdb dmesg command needs access to the syslog buffer. do_syslog() uses locks + * so it cannot be used during debugging. Just tell kdb where the start and + * end of the physical and logical logs are. This is equivalent to do_syslog(3). + */ +void kdb_syslog_data(char *syslog_data[4]) +{ + syslog_data[0] = log_buf; + syslog_data[1] = log_buf + log_buf_len; + syslog_data[2] = log_buf + log_end - (logged_chars < log_buf_len ? logged_chars : log_buf_len); + syslog_data[3] = log_buf + log_end; +} +#endif /* CONFIG_KDB */ + asmlinkage long sys_syslog(int type, char __user * buf, int len) { return do_syslog(type, buf, len); From sfr at canb.auug.org.au Thu Oct 28 18:23:58 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 28 Oct 2004 18:23:58 +1000 Subject: [PATCH] ppc64 iSeries: fix for generic irq changes Message-ID: <20041028182358.6b69eeac.sfr@canb.auug.org.au> Hi Andrew, The generic irq patches broke pci irqs on ppc64 iSeries. Signed-off-by: Stephen Rothwell Please merge and send to Linus. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.6.10-rc1-bk6/arch/ppc64/kernel/iSeries_irq.c 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_irq.c --- 2.6.10-rc1-bk6/arch/ppc64/kernel/iSeries_irq.c 2004-05-10 15:31:04.000000000 +1000 +++ 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_irq.c 2004-10-28 18:06:30.000000000 +1000 @@ -110,6 +110,7 @@ /* Unmask bridge interrupts in the FISR */ mask = 0x01010000 << function; HvCallPci_unmaskFisr(bus, subBus, deviceId, mask); + iSeries_enable_IRQ(irq); return 0; } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041028/e1d53c5a/attachment.pgp From sfr at canb.auug.org.au Fri Oct 29 02:42:51 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 29 Oct 2004 02:42:51 +1000 Subject: [PATCH] ppc64 iSeries pci cleanups Message-ID: <20041029024251.4cf06de2.sfr@canb.auug.org.au> Hi Andrew, This patch removes two files (iSeries_IoMmTable.[ch]) by merging them into iSeries_pci.c. This allowed quite a few more things to become declared static. It then does some fairly mechanical cleanups in iSeries_pci.c (replacing studly caps, removing the last of the PCIFR() macros and removing a couple of empty or unused routines). There are no semantic changes. Signed-off-by: Stephen Rothwell Please apply and send to Linus. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/Makefile 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/Makefile --- 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/Makefile 2004-10-28 14:18:05.000000000 +1000 +++ 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/Makefile 2004-10-28 14:50:05.000000000 +1000 @@ -15,8 +15,7 @@ obj-$(CONFIG_PPC_OF) += of_device.o -pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o \ - iSeries_IoMmTable.o +pci-obj-$(CONFIG_PPC_ISERIES) += iSeries_pci.o iSeries_pci_reset.o pci-obj-$(CONFIG_PPC_MULTIPLATFORM) += pci_dn.o pci_dma_direct.o obj-$(CONFIG_PCI) += pci.o pci_iommu.o iomap.o $(pci-obj-y) diff -ruN 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_IoMmTable.c 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_IoMmTable.c --- 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_IoMmTable.c 2004-02-04 17:24:34.000000000 +1100 +++ 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_IoMmTable.c 1970-01-01 10:00:00.000000000 +1000 @@ -1,169 +0,0 @@ -#define PCIFR(...) -/************************************************************************/ -/* This module supports the iSeries I/O Address translation mapping */ -/* Copyright (C) 20yy */ -/* */ -/* This program is free software; you can redistribute it and/or modify */ -/* it under the terms of the GNU General Public License as published by */ -/* the Free Software Foundation; either version 2 of the License, or */ -/* (at your option) any later version. */ -/* */ -/* This program is distributed in the hope that it will be useful, */ -/* but WITHOUT ANY WARRANTY; without even the implied warranty of */ -/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the */ -/* GNU General Public License for more details. */ -/* */ -/* You should have received a copy of the GNU General Public License */ -/* along with this program; if not, write to the: */ -/* Free Software Foundation, Inc., */ -/* 59 Temple Place, Suite 330, */ -/* Boston, MA 02111-1307 USA */ -/************************************************************************/ -/* Change Activity: */ -/* Created, December 14, 2000 */ -/* Added Bar table for IoMm performance. */ -/* Ported to ppc64 */ -/* Added dynamic table allocation */ -/* End Change Activity */ -/************************************************************************/ -#include -#include -#include -#include -#include -#include -#include - -#include "iSeries_IoMmTable.h" -#include "pci.h" - -/* - * Table defines - * Each Entry size is 4 MB * 1024 Entries = 4GB I/O address space. - */ -#define Max_Entries 1024 -unsigned long iSeries_IoMmTable_Entry_Size = 0x0000000000400000; -unsigned long iSeries_Base_Io_Memory = 0xE000000000000000; -unsigned long iSeries_Max_Io_Memory = 0xE000000000000000; -static long iSeries_CurrentIndex = 0; - -/* - * Lookup Tables. - */ -struct iSeries_Device_Node **iSeries_IoMmTable; -u8 *iSeries_IoBarTable; - -/* - * Static and Global variables - */ -static char *iSeriesPciIoText = "iSeries PCI I/O"; -static spinlock_t iSeriesIoMmTableLock = SPIN_LOCK_UNLOCKED; - -/* - * iSeries_IoMmTable_Initialize - * - * Allocates and initalizes the Address Translation Table and Bar - * Tables to get them ready for use. Must be called before any - * I/O space is handed out to the device BARs. - * A follow up method,iSeries_IoMmTable_Status can be called to - * adjust the table after the device BARs have been assiged to - * resize the table. - */ -void iSeries_IoMmTable_Initialize(void) -{ - spin_lock(&iSeriesIoMmTableLock); - iSeries_IoMmTable = kmalloc(sizeof(void *) * Max_Entries, GFP_KERNEL); - iSeries_IoBarTable = kmalloc(sizeof(u8) * Max_Entries, GFP_KERNEL); - spin_unlock(&iSeriesIoMmTableLock); - PCIFR("IoMmTable Initialized 0x%p", iSeries_IoMmTable); - if ((iSeries_IoMmTable == NULL) || (iSeries_IoBarTable == NULL)) - panic("PCI: I/O tables allocation failed.\n"); -} - -/* - * iSeries_IoMmTable_AllocateEntry - * - * Adds pci_dev entry in address translation table - * - * - Allocates the number of entries required in table base on BAR - * size. - * - Allocates starting at iSeries_Base_Io_Memory and increases. - * - The size is round up to be a multiple of entry size. - * - CurrentIndex is incremented to keep track of the last entry. - * - Builds the resource entry for allocated BARs. - */ -static void iSeries_IoMmTable_AllocateEntry(struct pci_dev *PciDev, - int BarNumber) -{ - struct resource *BarResource = &PciDev->resource[BarNumber]; - long BarSize = pci_resource_len(PciDev, BarNumber); - - /* - * No space to allocate, quick exit, skip Allocation. - */ - if (BarSize == 0) - return; - /* - * Set Resource values. - */ - spin_lock(&iSeriesIoMmTableLock); - BarResource->name = iSeriesPciIoText; - BarResource->start = - iSeries_IoMmTable_Entry_Size * iSeries_CurrentIndex; - BarResource->start += iSeries_Base_Io_Memory; - BarResource->end = BarResource->start+BarSize-1; - /* - * Allocate the number of table entries needed for BAR. - */ - while (BarSize > 0 ) { - *(iSeries_IoMmTable + iSeries_CurrentIndex) = - (struct iSeries_Device_Node *)PciDev->sysdata; - *(iSeries_IoBarTable + iSeries_CurrentIndex) = BarNumber; - BarSize -= iSeries_IoMmTable_Entry_Size; - ++iSeries_CurrentIndex; - } - iSeries_Max_Io_Memory = iSeries_Base_Io_Memory + - (iSeries_IoMmTable_Entry_Size * iSeries_CurrentIndex); - spin_unlock(&iSeriesIoMmTableLock); -} - -/* - * iSeries_allocateDeviceBars - * - * - Allocates ALL pci_dev BAR's and updates the resources with the - * BAR value. BARS with zero length will have the resources - * The HvCallPci_getBarParms is used to get the size of the BAR - * space. It calls iSeries_IoMmTable_AllocateEntry to allocate - * each entry. - * - Loops through The Bar resources(0 - 5) including the ROM - * is resource(6). - */ -void iSeries_allocateDeviceBars(struct pci_dev *PciDev) -{ - struct resource *BarResource; - int BarNumber; - - for (BarNumber = 0; BarNumber <= PCI_ROM_RESOURCE; ++BarNumber) { - BarResource = &PciDev->resource[BarNumber]; - iSeries_IoMmTable_AllocateEntry(PciDev, BarNumber); - } -} - -/* - * Translates the IoAddress to the device that is mapped to IoSpace. - * This code is inlined, see the iSeries_pci.c file for the replacement. - */ -struct iSeries_Device_Node *iSeries_xlateIoMmAddress(void *IoAddress) -{ - return NULL; -} - -/* - * Status hook for IoMmTable - */ -void iSeries_IoMmTable_Status(void) -{ - PCIFR("IoMmTable......: 0x%p", iSeries_IoMmTable); - PCIFR("IoMmTable Range: 0x%p to 0x%p", iSeries_Base_Io_Memory, - iSeries_Max_Io_Memory); -} diff -ruN 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_IoMmTable.h 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_IoMmTable.h --- 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_IoMmTable.h 2004-02-04 17:24:34.000000000 +1100 +++ 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_IoMmTable.h 1970-01-01 10:00:00.000000000 +1000 @@ -1,85 +0,0 @@ -#ifndef _ISERIES_IOMMTABLE_H -#define _ISERIES_IOMMTABLE_H -/************************************************************************/ -/* File iSeries_IoMmTable.h created by Allan Trautman on Dec 12 2001. */ -/************************************************************************/ -/* Interfaces for the write/read Io address translation table. */ -/* Copyright (C) 20yy Allan H Trautman, IBM Corporation */ -/* */ -/* This program is free software; you can redistribute it and/or modify */ -/* it under the terms of the GNU General Public License as published by */ -/* the Free Software Foundation; either version 2 of the License, or */ -/* (at your option) any later version. */ -/* */ -/* This program is distributed in the hope that it will be useful, */ -/* but WITHOUT ANY WARRANTY; without even the implied warranty of */ -/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the */ -/* GNU General Public License for more details. */ -/* */ -/* You should have received a copy of the GNU General Public License */ -/* along with this program; if not, write to the: */ -/* Free Software Foundation, Inc., */ -/* 59 Temple Place, Suite 330, */ -/* Boston, MA 02111-1307 USA */ -/************************************************************************/ -/* Change Activity: */ -/* Created December 12, 2000 */ -/* Ported to ppc64, August 30, 2001 */ -/* End Change Activity */ -/************************************************************************/ - -struct pci_dev; -struct iSeries_Device_Node; - -extern struct iSeries_Device_Node **iSeries_IoMmTable; -extern u8 *iSeries_IoBarTable; -extern unsigned long iSeries_Base_Io_Memory; -extern unsigned long iSeries_Max_Io_Memory; -extern unsigned long iSeries_Base_Io_Memory; -extern unsigned long iSeries_IoMmTable_Entry_Size; -/* - * iSeries_IoMmTable_Initialize - * - * - Initalizes the Address Translation Table and get it ready for use. - * Must be called before any client calls any of the other methods. - * - * Parameters: None. - * - * Return: None. - */ -extern void iSeries_IoMmTable_Initialize(void); -extern void iSeries_IoMmTable_Status(void); - -/* - * iSeries_allocateDeviceBars - * - * - Allocates ALL pci_dev BAR's and updates the resources with the BAR - * value. BARS with zero length will not have the resources. The - * HvCallPci_getBarParms is used to get the size of the BAR space. - * It calls iSeries_IoMmTable_AllocateEntry to allocate each entry. - * - * Parameters: - * pci_dev = Pointer to pci_dev structure that will be mapped to pseudo - * I/O Address. - * - * Return: - * The pci_dev I/O resources updated with pseudo I/O Addresses. - */ -extern void iSeries_allocateDeviceBars(struct pci_dev *); - -/* - * iSeries_xlateIoMmAddress - * - * - Translates an I/O Memory address to Device Node that has been the - * allocated the psuedo I/O Address. - * - * Parameters: - * IoAddress = I/O Memory Address. - * - * Return: - * An iSeries_Device_Node to the device mapped to the I/O address. The - * BarNumber and BarOffset are valid if the Device Node is returned. - */ -extern struct iSeries_Device_Node *iSeries_xlateIoMmAddress(void *IoAddress); - -#endif /* _ISERIES_IOMMTABLE_H */ diff -ruN 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_pci.c 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_pci.c --- 2.6.10-rc1-bk6-irq.1/arch/ppc64/kernel/iSeries_pci.c 2004-10-25 15:37:12.000000000 +1000 +++ 2.6.10-rc1-bk6-cleanup.1/arch/ppc64/kernel/iSeries_pci.c 2004-10-27 18:43:41.000000000 +1000 @@ -1,4 +1,3 @@ -#define PCIFR(...) /* * iSeries_pci.c * @@ -47,27 +46,19 @@ #include #include -#include "iSeries_IoMmTable.h" #include "pci.h" extern int panic_timeout; -extern unsigned long iSeries_Base_Io_Memory; - -extern struct iommu_table *tceTables[256]; extern unsigned long io_page_mask; -extern void iSeries_MmIoTest(void); - /* * Forward declares of prototypes. */ static struct iSeries_Device_Node *find_Device_Node(int bus, int devfn); -static void iSeries_Scan_PHBs_Slots(struct pci_controller *Phb); -static void iSeries_Scan_EADs_Bridge(HvBusNumber Bus, HvSubBusNumber SubBus, - int IdSel); -static int iSeries_Scan_Bridge_Slot(HvBusNumber Bus, - struct HvCallPci_BridgeInfo *Info); +static void scan_PHB_slots(struct pci_controller *Phb); +static void scan_EADS_bridge(HvBusNumber Bus, HvSubBusNumber SubBus, int IdSel); +static int scan_bridge_slot(HvBusNumber Bus, struct HvCallPci_BridgeInfo *Info); LIST_HEAD(iSeries_Global_Device_List); @@ -88,7 +79,116 @@ static struct pci_ops iSeries_pci_ops; /* - * Log Error infor in Flight Recorder to system Console. + * Table defines + * Each Entry size is 4 MB * 1024 Entries = 4GB I/O address space. + */ +#define IOMM_TABLE_MAX_ENTRIES 1024 +#define IOMM_TABLE_ENTRY_SIZE 0x0000000000400000UL +#define BASE_IO_MEMORY 0xE000000000000000UL + +static unsigned long max_io_memory = 0xE000000000000000UL; +static long current_iomm_table_entry; + +/* + * Lookup Tables. + */ +static struct iSeries_Device_Node **iomm_table; +static u8 *iobar_table; + +/* + * Static and Global variables + */ +static char *pci_io_text = "iSeries PCI I/O"; +static spinlock_t iomm_table_lock = SPIN_LOCK_UNLOCKED; + +/* + * iomm_table_initialize + * + * Allocates and initalizes the Address Translation Table and Bar + * Tables to get them ready for use. Must be called before any + * I/O space is handed out to the device BARs. + */ +static void iomm_table_initialize(void) +{ + spin_lock(&iomm_table_lock); + iomm_table = kmalloc(sizeof(*iomm_table) * IOMM_TABLE_MAX_ENTRIES, + GFP_KERNEL); + iobar_table = kmalloc(sizeof(*iobar_table) * IOMM_TABLE_MAX_ENTRIES, + GFP_KERNEL); + spin_unlock(&iomm_table_lock); + if ((iomm_table == NULL) || (iobar_table == NULL)) + panic("PCI: I/O tables allocation failed.\n"); +} + +/* + * iomm_table_allocate_entry + * + * Adds pci_dev entry in address translation table + * + * - Allocates the number of entries required in table base on BAR + * size. + * - Allocates starting at BASE_IO_MEMORY and increases. + * - The size is round up to be a multiple of entry size. + * - CurrentIndex is incremented to keep track of the last entry. + * - Builds the resource entry for allocated BARs. + */ +static void iomm_table_allocate_entry(struct pci_dev *dev, int bar_num) +{ + struct resource *bar_res = &dev->resource[bar_num]; + long bar_size = pci_resource_len(dev, bar_num); + + /* + * No space to allocate, quick exit, skip Allocation. + */ + if (bar_size == 0) + return; + /* + * Set Resource values. + */ + spin_lock(&iomm_table_lock); + bar_res->name = pci_io_text; + bar_res->start = + IOMM_TABLE_ENTRY_SIZE * current_iomm_table_entry; + bar_res->start += BASE_IO_MEMORY; + bar_res->end = bar_res->start + bar_size - 1; + /* + * Allocate the number of table entries needed for BAR. + */ + while (bar_size > 0 ) { + iomm_table[current_iomm_table_entry] = dev->sysdata; + iobar_table[current_iomm_table_entry] = bar_num; + bar_size -= IOMM_TABLE_ENTRY_SIZE; + ++current_iomm_table_entry; + } + max_io_memory = BASE_IO_MEMORY + + (IOMM_TABLE_ENTRY_SIZE * current_iomm_table_entry); + spin_unlock(&iomm_table_lock); +} + +/* + * allocate_device_bars + * + * - Allocates ALL pci_dev BAR's and updates the resources with the + * BAR value. BARS with zero length will have the resources + * The HvCallPci_getBarParms is used to get the size of the BAR + * space. It calls iomm_table_allocate_entry to allocate + * each entry. + * - Loops through The Bar resources(0 - 5) including the ROM + * is resource(6). + */ +static void allocate_device_bars(struct pci_dev *dev) +{ + struct resource *bar_res; + int bar_num; + + for (bar_num = 0; bar_num <= PCI_ROM_RESOURCE; ++bar_num) { + bar_res = &dev->resource[bar_num]; + iomm_table_allocate_entry(dev, bar_num); + } +} + +/* + * Log error information to system console. * Filter out the device not there errors. * PCI: EADs Connect Failed 0x18.58.10 Rc: 0x00xx * PCI: Read Vendor Failed 0x18.58.10 Rc: 0x00xx @@ -99,7 +199,6 @@ { if (HvRc == 0x0302) return; - printk(KERN_ERR "PCI: %s Failed: 0x%02X.%02X.%02X Rc: 0x%04X", Error_Text, Bus, SubBus, AgentId, HvRc); } @@ -133,8 +232,6 @@ node->DevFn = PCI_DEVFN(ISERIES_ENCODE_DEVICE(AgentId), Function); node->IoRetry = 0; iSeries_Get_Location_Code(node); - PCIFR("Device 0x%02X.%2X, Node:0x%p ", ISERIES_BUS(node), - ISERIES_DEVFUN(node), node); return node; } @@ -160,10 +257,8 @@ if (ret == 0) { printk("bus %d appears to exist\n", bus); phb = pci_alloc_pci_controller(phb_type_hypervisor); - if (phb == NULL) { - PCIFR("Allocate pci_controller failed."); + if (phb == NULL) return -1; - } phb->pci_mem_offset = phb->local_number = bus; phb->first_busno = bus; phb->last_busno = bus; @@ -171,10 +266,9 @@ PPCDBG(PPCDBG_BUSWALK, "PCI:Create iSeries pci_controller(%p), Bus: %04X\n", phb, bus); - PCIFR("Create iSeries PHB controller: %04X", bus); /* Find and connect the devices. */ - iSeries_Scan_PHBs_Slots(phb); + scan_PHB_slots(phb); } /* * Check for Unexpected Return code, a clue that something @@ -195,7 +289,7 @@ void iSeries_pcibios_init(void) { PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); - iSeries_IoMmTable_Initialize(); + iomm_table_initialize(); find_and_init_phbs(); io_page_mask = -1; /* pci_assign_all_busses = 0; SFRXXX*/ @@ -231,7 +325,7 @@ PPCDBG(PPCDBG_BUSWALK, "pdev 0x%p <==> DevNode 0x%p\n", pdev, node); - iSeries_allocateDeviceBars(pdev); + allocate_device_bars(pdev); iSeries_Device_Information(pdev, Buffer, sizeof(Buffer)); printk("%d. %s\n", DeviceCount, Buffer); @@ -241,7 +335,6 @@ (unsigned long)pdev); pdev->irq = node->Irq; } - iSeries_IoMmTable_Status(); iSeries_activate_IRQs(); mf_displaySrc(0xC9000200); } @@ -260,7 +353,7 @@ /* * Loop through each node function to find usable EADs bridges. */ -static void iSeries_Scan_PHBs_Slots(struct pci_controller *Phb) +static void scan_PHB_slots(struct pci_controller *Phb) { struct HvCallPci_DeviceInfo *DevInfo; HvBusNumber bus = Phb->local_number; /* System Bus */ @@ -283,7 +376,7 @@ sizeof(struct HvCallPci_DeviceInfo)); if (HvRc == 0) { if (DevInfo->deviceType == HvCallPci_NodeDevice) - iSeries_Scan_EADs_Bridge(bus, SubBus, IdSel); + scan_EADS_bridge(bus, SubBus, IdSel); else printk("PCI: Invalid System Configuration(0x%02X)" " for bus 0x%02x id 0x%02x.\n", @@ -295,7 +388,7 @@ kfree(DevInfo); } -static void iSeries_Scan_EADs_Bridge(HvBusNumber bus, HvSubBusNumber SubBus, +static void scan_EADS_bridge(HvBusNumber bus, HvSubBusNumber SubBus, int IdSel) { struct HvCallPci_BridgeInfo *BridgeInfo; @@ -340,7 +433,7 @@ if (BridgeInfo->busUnitInfo.deviceType == HvCallPci_BridgeDevice) { /* Scan_Bridge_Slot...: 0x18.00.12 */ - iSeries_Scan_Bridge_Slot(bus, BridgeInfo); + scan_bridge_slot(bus, BridgeInfo); } else printk("PCI: Invalid Bridge Configuration(0x%02X)", BridgeInfo->busUnitInfo.deviceType); @@ -355,7 +448,7 @@ /* * This assumes that the node slot is always on the primary bus! */ -static int iSeries_Scan_Bridge_Slot(HvBusNumber Bus, +static int scan_bridge_slot(HvBusNumber Bus, struct HvCallPci_BridgeInfo *BridgeInfo) { struct iSeries_Device_Node *node; @@ -593,12 +686,8 @@ return -1; /* Retry Try */ } /* If retry was in progress, log success and rest retry count */ - if (DevNode->IoRetry > 0) { - PCIFR("%s: Device 0x%04X:%02X Retry Successful(%2d).", - TextHdr, DevNode->DsaAddr.Dsa.busNumber, DevNode->DevFn, - DevNode->IoRetry); + if (DevNode->IoRetry > 0) DevNode->IoRetry = 0; - } return 0; } @@ -607,8 +696,9 @@ * Note: Make sure the passed variable end up on the stack to avoid * the exposure of being device global. */ -static inline struct iSeries_Device_Node *xlateIoMmAddress(const volatile void __iomem *IoAddress, - u64 *dsaptr, u64 *BarOffsetPtr) +static inline struct iSeries_Device_Node *xlate_iomm_address( + const volatile void __iomem *IoAddress, + u64 *dsaptr, u64 *BarOffsetPtr) { unsigned long OrigIoAddr; unsigned long BaseIoAddr; @@ -616,17 +706,16 @@ struct iSeries_Device_Node *DevNode; OrigIoAddr = (unsigned long __force)IoAddress; - if ((OrigIoAddr < iSeries_Base_Io_Memory) || - (OrigIoAddr >= iSeries_Max_Io_Memory)) + if ((OrigIoAddr < BASE_IO_MEMORY) || (OrigIoAddr >= max_io_memory)) return NULL; - BaseIoAddr = OrigIoAddr - iSeries_Base_Io_Memory; - TableIndex = BaseIoAddr / iSeries_IoMmTable_Entry_Size; - DevNode = iSeries_IoMmTable[TableIndex]; + BaseIoAddr = OrigIoAddr - BASE_IO_MEMORY; + TableIndex = BaseIoAddr / IOMM_TABLE_ENTRY_SIZE; + DevNode = iomm_table[TableIndex]; if (DevNode != NULL) { - int barnum = iSeries_IoBarTable[TableIndex]; + int barnum = iobar_table[TableIndex]; *dsaptr = DevNode->DsaAddr.DsaAddr | (barnum << 24); - *BarOffsetPtr = BaseIoAddr % iSeries_IoMmTable_Entry_Size; + *BarOffsetPtr = BaseIoAddr % IOMM_TABLE_ENTRY_SIZE; } else panic("PCI: Invalid PCI IoAddress detected!\n"); return DevNode; @@ -647,7 +736,7 @@ u64 dsa; struct HvCallPci_LoadReturn ret; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; @@ -676,7 +765,7 @@ u64 dsa; struct HvCallPci_LoadReturn ret; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; @@ -706,7 +795,7 @@ u64 dsa; struct HvCallPci_LoadReturn ret; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; @@ -743,7 +832,7 @@ u64 dsa; u64 rc; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; @@ -770,7 +859,7 @@ u64 dsa; u64 rc; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; @@ -797,7 +886,7 @@ u64 dsa; u64 rc; struct iSeries_Device_Node *DevNode = - xlateIoMmAddress(IoAddress, &dsa, &BarOffset); + xlate_iomm_address(IoAddress, &dsa, &BarOffset); if (DevNode == NULL) { static unsigned long last_jiffies; -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041029/152cc3a8/attachment.pgp From jschopp at austin.ibm.com Fri Oct 29 02:51:39 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Thu, 28 Oct 2004 11:51:39 -0500 Subject: [PPC64] Rework ppc64 hugepage code In-Reply-To: <20041028060151.GA1680@zax> References: <20041028060151.GA1680@zax> Message-ID: <4181239B.5020307@austin.ibm.com> > Andrew, please apply: > > Rework the ppc64 hugepage code. Instead of using specially marked pmd > entries in the normal pagetables to represent hugepages, use normal > pte_t entries, in a special set of pagetables used for hugepages only. > > Using pte_t instead of a special hugepte_t makes the code more similar > to that for other architecturess, allowing more possibilities for > consolidating the hugepage code. > > Using independent pagetables for the hugepages is also a prerequisite > for moving the hugepages into their own region well outside the normal > user address space. The restrictions imposed by the powerpc mmu's > segment design mean we probably want to do that in the fairly near > future. > Besides making the code more like other architectures and being a prerequisite for moving hugepages into their own region this patch has another use. It is on the list of prerequisites for memory hotplug remove on ppc64, because it unifies the method for flushing hardware page table entries of both large and normal sized pages. When David originally wrote this patch I tested it on some Power4 & Power5 hardware and it worked flawlessly for me. > Signed-off-by: David Gibson Acked-by: Joel Schopp From olof at austin.ibm.com Fri Oct 29 04:44:47 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 28 Oct 2004 13:44:47 -0500 Subject: [PATCH] [PPC64] Setup cpu_sibling_map on iSeries Message-ID: <20041028184447.GA30644@4> Hi, Nathan Lynch pointed this out: The CPU sibling map is never initialized on iSeries. This makes the scheduler very unhappy if CONFIG_SCHED_SMT is enabled, causing an oops in find_busiest_group during boot. Below patch adds the expected init. Please apply. Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/kernel/iSeries_smp.c | 1 + 1 files changed, 1 insertion(+) diff -puN arch/ppc64/kernel/iSeries_smp.c~iseries-sibling-map arch/ppc64/kernel/iSeries_smp.c --- linux-2.5/arch/ppc64/kernel/iSeries_smp.c~iseries-sibling-map 2004-10-28 13:24:03.063642740 -0500 +++ linux-2.5-olof/arch/ppc64/kernel/iSeries_smp.c 2004-10-28 13:28:06.592330464 -0500 @@ -94,6 +94,7 @@ static int smp_iSeries_numProcs(void) if (paca[i].lppaca.xDynProcStatus < 2) { cpu_set(i, cpu_possible_map); cpu_set(i, cpu_present_map); + cpu_set(i, cpu_sibling_map[i]); ++np; } } _ From johnrose at austin.ibm.com Fri Oct 29 05:27:45 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 28 Oct 2004 14:27:45 -0500 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <16768.28322.583827.9327@cargo.ozlabs.ibm.com> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> <16758.55568.809557.670513@cargo.ozlabs.ibm.com> <20041020170817.0ee49b64@localhost> <16768.28322.583827.9327@cargo.ozlabs.ibm.com> Message-ID: <1098991665.692.17.camel@sinatra.austin.ibm.com> On Wed, 2004-10-27 at 22:59, Paul Mackerras wrote: > Since this is happening at reboot time, I suggest we copy the block > list into rtas_rmo_buf. It's less complex to use rtas_rmo_buf exclusively for userspace. I'm against introducing kernel use of rtas_rmo_buf, even at reboot time. If we did, it would be proper to add a kernel lock to synchronize access to it, but then userspace apps have no way to take that lock when mmap()'ing /dev/mem. This seems like overkill for a situation we've never actually encountered, imho. John From johnrose at austin.ibm.com Fri Oct 29 07:28:36 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 28 Oct 2004 16:28:36 -0500 Subject: [PATCH] iommu fixes, round 3 In-Reply-To: <16768.10849.741580.850491@cargo.ozlabs.ibm.com> References: <1098775712.6897.17.camel@gaston> <1098808895.32293.23.camel@sinatra.austin.ibm.com> <1098813781.32293.40.camel@sinatra.austin.ibm.com> <16768.10849.741580.850491@cargo.ozlabs.ibm.com> Message-ID: <1098998916.692.20.camel@sinatra.austin.ibm.com> This patch changes the following iommu-related things: - Renames the [i,p]series versions of iommu_devnode_init(), to keep things logically separate where possible. - Moves iommu_free_table() to generic iommu.c - Creates of_cleanup_node(), which will directly precede the dynamic removal of any device node Comments welcome. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/iSeries_iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/iSeries_iommu.c --- 2_6_ketchup/arch/ppc64/kernel/iSeries_iommu.c~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iSeries_iommu.c 2004-10-28 16:16:13.000000000 -0500 @@ -171,7 +171,7 @@ static void iommu_table_getparms(struct } -void iommu_devnode_init(struct iSeries_Device_Node *dn) { +void iommu_devnode_init_iSeries(struct iSeries_Device_Node *dn) { struct iommu_table *tbl; tbl = (struct iommu_table *)kmalloc(sizeof(struct iommu_table), GFP_KERNEL); diff -puN arch/ppc64/kernel/iSeries_pci.c~iommu_free_table_fix4 arch/ppc64/kernel/iSeries_pci.c --- 2_6_ketchup/arch/ppc64/kernel/iSeries_pci.c~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iSeries_pci.c 2004-10-28 16:16:13.000000000 -0500 @@ -235,7 +235,7 @@ void __init iSeries_pci_final_fixup(void iSeries_Device_Information(pdev, Buffer, sizeof(Buffer)); printk("%d. %s\n", DeviceCount, Buffer); - iommu_devnode_init(node); + iommu_devnode_init_iSeries(node); } else printk("PCI: Device Tree not found for 0x%016lX\n", (unsigned long)pdev); diff -puN arch/ppc64/kernel/iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/iommu.c --- 2_6_ketchup/arch/ppc64/kernel/iommu.c~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/iommu.c 2004-10-28 16:16:13.000000000 -0500 @@ -425,6 +425,39 @@ struct iommu_table *iommu_init_table(str return tbl; } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} + /* Creates TCEs for a user provided buffer. The user buffer must be * contiguous real kernel storage (not vmalloc). The address of the buffer * passed here is the kernel (virtual) address of the buffer. The buffer diff -puN arch/ppc64/kernel/pSeries_iommu.c~iommu_free_table_fix4 arch/ppc64/kernel/pSeries_iommu.c --- 2_6_ketchup/arch/ppc64/kernel/pSeries_iommu.c~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/pSeries_iommu.c 2004-10-28 16:16:13.000000000 -0500 @@ -276,7 +276,7 @@ static void iommu_buses_init(void) first_phb = 0; for (dn = first_dn; dn != NULL; dn = dn->sibling) - iommu_devnode_init(dn); + iommu_devnode_init_pSeries(dn); } } @@ -298,7 +298,7 @@ static void iommu_buses_init_lpar(struct * Do it now because iommu_table_setparms_lpar needs it. */ busdn->bussubno = bus->number; - iommu_devnode_init(busdn); + iommu_devnode_init_pSeries(busdn); } /* look for a window on a bridge even if the PHB had one */ @@ -397,7 +397,7 @@ static void iommu_table_setparms_lpar(st } -void iommu_devnode_init(struct device_node *dn) +void iommu_devnode_init_pSeries(struct device_node *dn) { struct iommu_table *tbl; @@ -412,39 +412,6 @@ void iommu_devnode_init(struct device_no dn->iommu_table = iommu_init_table(tbl); } -void iommu_free_table(struct device_node *dn) -{ - struct iommu_table *tbl = dn->iommu_table; - unsigned long bitmap_sz, i; - unsigned int order; - - if (!tbl || !tbl->it_map) { - printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, - dn->full_name); - return; - } - - /* verify that table contains no entries */ - /* it_mapsize is in entries, and we're examining 64 at a time */ - for (i = 0; i < (tbl->it_mapsize/64); i++) { - if (tbl->it_map[i] != 0) { - printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", - __FUNCTION__, dn->full_name); - break; - } - } - - /* calculate bitmap size in bytes */ - bitmap_sz = (tbl->it_mapsize + 7) / 8; - - /* free bitmap */ - order = get_order(bitmap_sz); - free_pages((unsigned long) tbl->it_map, order); - - /* free table */ - kfree(tbl); -} - void iommu_setup_pSeries(void) { struct pci_dev *dev = NULL; @@ -469,7 +436,6 @@ void iommu_setup_pSeries(void) } } - /* These are called very early. */ void tce_init_pSeries(void) { diff -puN arch/ppc64/kernel/prom.c~iommu_free_table_fix4 arch/ppc64/kernel/prom.c --- 2_6_ketchup/arch/ppc64/kernel/prom.c~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/arch/ppc64/kernel/prom.c 2004-10-28 16:17:06.000000000 -0500 @@ -1740,7 +1740,7 @@ static int of_finish_dynamic_node(struct if (strcmp(node->name, "pci") == 0 && get_property(node, "ibm,dma-window", NULL)) { node->bussubno = node->busno; - iommu_devnode_init(node); + iommu_devnode_init_pSeries(node); } else node->iommu_table = parent->iommu_table; #endif /* CONFIG_PPC_PSERIES */ @@ -1802,6 +1802,15 @@ int of_add_node(const char *path, struct } /* + * Prepare an OF node for removal from system + */ +static void of_cleanup_node(struct device_node *np) +{ + if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); +} + +/* * Remove an OF device node from the system. * Caller should have already "gotten" np. */ @@ -1818,13 +1827,7 @@ int of_remove_node(struct device_node *n return -EBUSY; } - /* XXX This is a layering violation, should be moved to the caller - * --BenH. - */ -#ifdef CONFIG_PPC_PSERIES - if (np->iommu_table) - iommu_free_table(np); -#endif /* CONFIG_PPC_PSERIES */ + of_cleanup_node(np); write_lock(&devtree_lock); OF_MARK_STALE(np); diff -puN include/asm-ppc64/iommu.h~iommu_free_table_fix4 include/asm-ppc64/iommu.h --- 2_6_ketchup/include/asm-ppc64/iommu.h~iommu_free_table_fix4 2004-10-28 16:16:13.000000000 -0500 +++ 2_6_ketchup-johnrose/include/asm-ppc64/iommu.h 2004-10-28 16:16:13.000000000 -0500 @@ -110,22 +110,18 @@ struct scatterlist; extern void iommu_setup_pSeries(void); extern void iommu_setup_u3(void); -/* Creates table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ -extern void iommu_devnode_init(struct device_node *dn); - /* Frees table for an individual device node */ -/* XXX: This isn't generic, please name it accordingly or add - * some ppc_md. hooks for iommu implementations to do what they - * need to do. --BenH. - */ extern void iommu_free_table(struct device_node *dn); #endif /* CONFIG_PPC_MULTIPLATFORM */ +#ifdef CONFIG_PPC_PSERIES + +/* Creates table for an individual device node */ +extern void iommu_devnode_init_pSeries(struct device_node *dn); + +#endif /* CONFIG_PPC_PSERIES */ + #ifdef CONFIG_PPC_ISERIES /* Walks all buses and creates iommu tables */ @@ -136,7 +132,7 @@ extern void __init iommu_vio_init(void); struct iSeries_Device_Node; /* Creates table for an individual device node */ -extern void iommu_devnode_init(struct iSeries_Device_Node *dn); +extern void iommu_devnode_init_iSeries(struct iSeries_Device_Node *dn); #endif /* CONFIG_PPC_ISERIES */ _ From benh at kernel.crashing.org Fri Oct 29 09:28:41 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 29 Oct 2004 09:28:41 +1000 Subject: 2.6.10.-rc1 on ppc64 - returning from prom_init hang In-Reply-To: <200410272316.i9RNGTpj005995@falcon10.austin.ibm.com> References: <200410272316.i9RNGTpj005995@falcon10.austin.ibm.com> Message-ID: <1099006121.29690.81.camel@gaston> On Wed, 2004-10-27 at 18:16 -0500, Doug Maxey wrote: > Anyone have any thoughts on this? I have xmon=on, but it stops before it gets > there... for this iteration, added console=hvsi1, did not change anything. > > This libata-dev-2.6 on power5 system. If your system supports old-style "HVC console" HV calls to output things, then you can enable the early debug stuff for that in setup.c. If not, then you'll have to write an HVSI style early debug stuff... > Config file read, 1024 bytes > Welcome > Welcome to yaboot version 1.3.12 > Enter "help" to get some basic usage information > boot: > 2.6.10-rc1-ata-1 * linux > boot: 2.6.10-rc1-ata-1 > Please wait, loading kernel... > Elf64 kernel loaded... > Loading ramdisk... > ramdisk loaded at 02300000, size: 1306 Kbytes > OF stdout device is: /vdevice/vty at 30000001 > Hypertas detected, assuming LPAR ! > command line: root=/dev/VolGroup00/LogVol00 ro rhgb quiet console=hvsi1 xmon=on > memory layout at init: > alloc_bottom : 0000000002447000 > alloc_top : 0000000008000000 > alloc_top_hi : 0000000075000000 > rmo_top : 0000000008000000 > ram_top : 0000000075000000 > Looking for displays > found display : /pci at 800000020000002/pci at 2,2/pci at 1/display at 0, opening ... done > instantiating rtas at 0x00000000077d9000... done > 0000000000000000 : boot cpu 0000000000000000 > 0000000000000002 : starting cpu hw idx 0000000000000002... done > copying OF device tree ... > Building dt strings... > Building dt structure... > Device tree strings 0x0000000002748000 -> 0x00000000027492f8 > Device tree struct 0x000000000274a000 -> 0x000000000275a000 > Calling quiesce ... > returning from prom_init > > ++doug > > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -- Benjamin Herrenschmidt From wli at holomorphy.com Fri Oct 29 13:48:17 2004 From: wli at holomorphy.com (William Lee Irwin III) Date: Thu, 28 Oct 2004 20:48:17 -0700 Subject: [RFC] Consolidate lots of hugepage code In-Reply-To: <20041029033708.GF12247@zax> References: <20041029033708.GF12247@zax> Message-ID: <20041029034817.GY12934@holomorphy.com> On Fri, Oct 29, 2004 at 01:37:08PM +1000, David Gibson wrote: > wA lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This > patch attempts to consolidate a lot of the code across the arch's, > putting the combined version in mm/hugetlb.c. There are a couple of > uglyish hacks in order to cover all the hugepage archs, but the result > is a very large reduction in the total amount of code. It also means > things like hugepage lazy allocation could be implemented in one > place, instead of six. > As yet this is entirely untested, except on ppc64. Comments? > Objections? Testing acks? > Notes: > - this patch changes the meaning of set_huge_pte() to be more > analagous to set_pte() > - does SH4 need special huge_ptep_get_and_clear()?? Further consolidation is premature given that outstanding hugetlb bugs have the implication that architectures' needs are not being served by the current arch/core split. I have at least two relatively major hugetlb bugs outstanding, the lack of a flush_dcache_page() analogue first, and another (soon to be a reported to affected distros) less well-understood. Unless they're directly toward the end of restoring hugetlb to a sound state, they're counterproductive to merge before patches doing so. -- wli From david at gibson.dropbear.id.au Fri Oct 29 13:37:08 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 29 Oct 2004 13:37:08 +1000 Subject: [RFC] Consolidate lots of hugepage code Message-ID: <20041029033708.GF12247@zax> wA lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch attempts to consolidate a lot of the code across the arch's, putting the combined version in mm/hugetlb.c. There are a couple of uglyish hacks in order to cover all the hugepage archs, but the result is a very large reduction in the total amount of code. It also means things like hugepage lazy allocation could be implemented in one place, instead of six. As yet this is entirely untested, except on ppc64. Comments? Objections? Testing acks? Notes: - this patch changes the meaning of set_huge_pte() to be more analagous to set_pte() - does SH4 need special huge_ptep_get_and_clear()?? Index: working-2.6/mm/hugetlb.c =================================================================== --- working-2.6.orig/mm/hugetlb.c 2004-09-07 10:38:00.000000000 +1000 +++ working-2.6/mm/hugetlb.c 2004-10-29 11:38:27.132145776 +1000 @@ -7,9 +7,13 @@ #include #include #include -#include #include #include +#include +#include +#include + +#include const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL; static unsigned long nr_huge_pages, free_huge_pages; @@ -248,6 +252,75 @@ .nopage = hugetlb_nopage, }; +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr); +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr); + +pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page) +{ + pte_t entry; + + if (vma->vm_flags & VM_WRITE) { + entry = + pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); + } else { + entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); + } + entry = pte_mkyoung(entry); + entry = pte_mkhuge(entry); + + return entry; +} + +int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma) +{ + pte_t *src_pte, *dst_pte, entry; + struct page *ptepage; + unsigned long addr = vma->vm_start; + unsigned long end = vma->vm_end; + + while (addr < end) { + dst_pte = huge_pte_alloc(dst, addr); + if (!dst_pte) + goto nomem; + src_pte = huge_pte_offset(src, addr); + BUG_ON(!src_pte || pte_none(*src_pte)); /* prefaulted */ + entry = *src_pte; + ptepage = pte_page(entry); + get_page(ptepage); + set_huge_pte(dst_pte, entry); + dst->rss += (HPAGE_SIZE / PAGE_SIZE); + addr += HPAGE_SIZE; + } + return 0; + +nomem: + return -ENOMEM; +} + +void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned long address; + pte_t pte; + struct page *page; + + WARN_ON(!is_vm_hugetlb_page(vma)); + BUG_ON(start & ~HPAGE_MASK); + BUG_ON(end & ~HPAGE_MASK); + + for (address = start; address < end; address += HPAGE_SIZE) { + pte = huge_ptep_get_and_clear(huge_pte_offset(mm, address)); + if (pte_none(pte)) + continue; + page = pte_page(pte); + put_page(page); + } + mm->rss -= (end - start) >> PAGE_SHIFT; + flush_tlb_range(vma, start, end); +} + void zap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long length) { @@ -257,3 +330,106 @@ unmap_hugepage_range(vma, start, start + length); spin_unlock(&mm->page_table_lock); } + +int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) +{ + struct mm_struct *mm = current->mm; + unsigned long addr; + int ret = 0; + + WARN_ON(!is_vm_hugetlb_page(vma)); + BUG_ON(vma->vm_start & ~HPAGE_MASK); + BUG_ON(vma->vm_end & ~HPAGE_MASK); + + spin_lock(&mm->page_table_lock); + for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { + unsigned long idx; + pte_t *pte = huge_pte_alloc(mm, addr); + struct page *page; + + if (!pte) { + ret = -ENOMEM; + goto out; + } + if (! pte_none(*pte)) + hugetlb_clean_stale_pgtable(pte); + + idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) + + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); + page = find_get_page(mapping, idx); + if (!page) { + /* charge the fs quota first */ + if (hugetlb_get_quota(mapping)) { + ret = -ENOMEM; + goto out; + } + page = alloc_huge_page(); + if (!page) { + hugetlb_put_quota(mapping); + ret = -ENOMEM; + goto out; + } + ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); + if (! ret) { + unlock_page(page); + } else { + hugetlb_put_quota(mapping); + free_huge_page(page); + goto out; + } + } + mm->rss += (HPAGE_SIZE / PAGE_SIZE); + set_huge_pte(pte, make_huge_pte(vma, page)); + } +out: + spin_unlock(&mm->page_table_lock); + return ret; +} + +int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i) +{ + unsigned long vpfn, vaddr = *position; + int remainder = *length; + + BUG_ON(!is_vm_hugetlb_page(vma)); + + vpfn = vaddr/PAGE_SIZE; + while (vaddr < vma->vm_end && remainder) { + + if (pages) { + pte_t *pte; + struct page *page; + + /* Some archs (sparc64, sh*) have multiple + * pte_ts to each hugepage. We have to make + * sure we get the first, for the page + * indexing below to work. */ + pte = huge_pte_offset(mm, vaddr & HPAGE_MASK); + + /* hugetlb should be locked, and hence, prefaulted */ + WARN_ON(!pte || pte_none(*pte)); + + page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; + + WARN_ON(!PageCompound(page)); + + get_page(page); + pages[i] = page; + } + + if (vmas) + vmas[i] = vma; + + vaddr += PAGE_SIZE; + ++vpfn; + --remainder; + ++i; + } + + *length = remainder; + *position = vaddr; + + return i; +} Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-10-29 11:37:48.139082848 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-10-29 11:38:27.133145624 +1000 @@ -122,7 +122,7 @@ return hugepte_offset(dir, addr); } -static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; @@ -135,7 +135,7 @@ return hugepte_offset(pgd, addr); } -static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; @@ -148,24 +148,6 @@ return hugepte_alloc(mm, pgd, addr); } -static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, - struct page *page, pte_t *ptep, int write_access) -{ - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - if (write_access) { - entry = - pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); - } else { - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - } - entry = pte_mkyoung(entry); - entry = pte_mkhuge(entry); - - set_pte(ptep, entry); -} - /* * This function checks for proper alignment of input addr and len parameters. */ @@ -292,80 +274,6 @@ return -EINVAL; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - int err = -ENOMEM; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto out; - - src_pte = huge_pte_offset(src, addr); - entry = *src_pte; - - ptepage = pte_page(entry); - get_page(ptepage); - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - set_pte(dst_pte, entry); - - addr += HPAGE_SIZE; - } - - err = 0; - out: - return err; -} - -int -follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) -{ - unsigned long vpfn, vaddr = *position; - int remainder = *length; - - WARN_ON(!is_vm_hugetlb_page(vma)); - - vpfn = vaddr/PAGE_SIZE; - while (vaddr < vma->vm_end && remainder) { - if (pages) { - pte_t *pte; - struct page *page; - - pte = huge_pte_offset(mm, vaddr); - - /* hugetlb should be locked, and hence, prefaulted */ - WARN_ON(!pte || pte_none(*pte)); - - page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; - - WARN_ON(!PageCompound(page)); - - get_page(page); - pages[i] = page; - } - - if (vmas) - vmas[i] = vma; - - vaddr += PAGE_SIZE; - ++vpfn; - --remainder; - ++i; - } - - *length = remainder; - *position = vaddr; - - return i; -} - struct page * follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { @@ -396,35 +304,6 @@ return NULL; } -void unmap_hugepage_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) -{ - struct mm_struct *mm = vma->vm_mm; - unsigned long addr; - pte_t *ptep; - struct page *page; - - WARN_ON(!is_vm_hugetlb_page(vma)); - BUG_ON((start % HPAGE_SIZE) != 0); - BUG_ON((end % HPAGE_SIZE) != 0); - - for (addr = start; addr < end; addr += HPAGE_SIZE) { - pte_t pte; - - ptep = huge_pte_offset(mm, addr); - if (!ptep || pte_none(*ptep)) - continue; - - pte = *ptep; - page = pte_page(pte); - pte_clear(ptep); - - put_page(page); - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_pending(); -} - void hugetlb_free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *prev, unsigned long start, unsigned long end) { @@ -435,60 +314,6 @@ * destroy_context() to clean up the lot. */ } -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - WARN_ON(!is_vm_hugetlb_page(vma)); - BUG_ON((vma->vm_start % HPAGE_SIZE) != 0); - BUG_ON((vma->vm_end % HPAGE_SIZE) != 0); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - if (! pte_none(*pte)) - continue; - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - free_huge_page(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; -} - /* Because we have an exclusive hugepage region which lies within the * normal user address space, we have to take special measures to make * non-huge mmap()s evade the hugepage reserved regions. */ Index: working-2.6/arch/ia64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ia64/mm/hugetlbpage.c 2004-08-09 09:51:26.000000000 +1000 +++ working-2.6/arch/ia64/mm/hugetlbpage.c 2004-10-29 11:38:27.134145472 +1000 @@ -24,7 +24,7 @@ unsigned int hpage_shift=HPAGE_SHIFT_DEFAULT; -static pte_t * +pte_t * huge_pte_alloc (struct mm_struct *mm, unsigned long addr) { unsigned long taddr = htlbpage_to_page(addr); @@ -39,7 +39,7 @@ return pte; } -static pte_t * +pte_t * huge_pte_offset (struct mm_struct *mm, unsigned long addr) { unsigned long taddr = htlbpage_to_page(addr); @@ -57,25 +57,6 @@ return pte; } -#define mk_pte_huge(entry) { pte_val(entry) |= _PAGE_P; } - -static void -set_huge_pte (struct mm_struct *mm, struct vm_area_struct *vma, - struct page *page, pte_t * page_table, int write_access) -{ - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - if (write_access) { - entry = - pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); - } else - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - entry = pte_mkyoung(entry); - mk_pte_huge(entry); - set_pte(page_table, entry); - return; -} /* * This function checks for proper alignment of input addr and len parameters. */ @@ -91,68 +72,6 @@ return 0; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto nomem; - src_pte = huge_pte_offset(src, addr); - entry = *src_pte; - ptepage = pte_page(entry); - get_page(ptepage); - set_pte(dst_pte, entry); - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - addr += HPAGE_SIZE; - } - return 0; -nomem: - return -ENOMEM; -} - -int -follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *st, int *length, int i) -{ - pte_t *ptep, pte; - unsigned long start = *st; - unsigned long pstart; - int len = *length; - struct page *page; - - do { - pstart = start & HPAGE_MASK; - ptep = huge_pte_offset(mm, start); - pte = *ptep; - -back1: - page = pte_page(pte); - if (pages) { - page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT); - get_page(page); - pages[i] = page; - } - if (vmas) - vmas[i] = vma; - i++; - len--; - start += PAGE_SIZE; - if (((start & HPAGE_MASK) == pstart) && len && - (start < vma->vm_end)) - goto back1; - } while (len && start < vma->vm_end); - *length = len; - *st = start; - return i; -} - struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int write) { struct page *page; @@ -231,81 +150,6 @@ } } -void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) -{ - struct mm_struct *mm = vma->vm_mm; - unsigned long address; - pte_t *pte; - struct page *page; - - BUG_ON(start & (HPAGE_SIZE - 1)); - BUG_ON(end & (HPAGE_SIZE - 1)); - - for (address = start; address < end; address += HPAGE_SIZE) { - pte = huge_pte_offset(mm, address); - if (pte_none(*pte)) - continue; - page = pte_page(*pte); - put_page(page); - pte_clear(pte); - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_range(vma, start, end); -} - -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - BUG_ON(vma->vm_start & ~HPAGE_MASK); - BUG_ON(vma->vm_end & ~HPAGE_MASK); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - if (!pte_none(*pte)) - continue; - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - page_cache_release(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; -} - unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { Index: working-2.6/arch/i386/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/i386/mm/hugetlbpage.c 2004-10-27 10:43:46.000000000 +1000 +++ working-2.6/arch/i386/mm/hugetlbpage.c 2004-10-29 11:44:43.541035816 +1000 @@ -18,7 +18,7 @@ #include #include -static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd = NULL; @@ -28,7 +28,7 @@ return (pte_t *) pmd; } -static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd = NULL; @@ -38,21 +38,6 @@ return (pte_t *) pmd; } -static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, struct page *page, pte_t * page_table, int write_access) -{ - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - if (write_access) { - entry = - pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); - } else - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - entry = pte_mkyoung(entry); - mk_pte_huge(entry); - set_pte(page_table, entry); -} - /* * This function checks for proper alignment of input addr and len parameters. */ @@ -65,77 +50,6 @@ return 0; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto nomem; - src_pte = huge_pte_offset(src, addr); - entry = *src_pte; - ptepage = pte_page(entry); - get_page(ptepage); - set_pte(dst_pte, entry); - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - addr += HPAGE_SIZE; - } - return 0; - -nomem: - return -ENOMEM; -} - -int -follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) -{ - unsigned long vpfn, vaddr = *position; - int remainder = *length; - - WARN_ON(!is_vm_hugetlb_page(vma)); - - vpfn = vaddr/PAGE_SIZE; - while (vaddr < vma->vm_end && remainder) { - - if (pages) { - pte_t *pte; - struct page *page; - - pte = huge_pte_offset(mm, vaddr); - - /* hugetlb should be locked, and hence, prefaulted */ - WARN_ON(!pte || pte_none(*pte)); - - page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; - - WARN_ON(!PageCompound(page)); - - get_page(page); - pages[i] = page; - } - - if (vmas) - vmas[i] = vma; - - vaddr += PAGE_SIZE; - ++vpfn; - --remainder; - ++i; - } - - *length = remainder; - *position = vaddr; - - return i; -} - #if 0 /* This is just for testing */ struct page * follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) @@ -200,87 +114,15 @@ } #endif -void unmap_hugepage_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) +void hugetlb_clean_stale_pgtable(pte_t *pte) { - struct mm_struct *mm = vma->vm_mm; - unsigned long address; - pte_t pte; + pmd_t *pmd = (pmd_t *) pte; struct page *page; - BUG_ON(start & (HPAGE_SIZE - 1)); - BUG_ON(end & (HPAGE_SIZE - 1)); - - for (address = start; address < end; address += HPAGE_SIZE) { - pte = ptep_get_and_clear(huge_pte_offset(mm, address)); - if (pte_none(pte)) - continue; - page = pte_page(pte); - put_page(page); - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_range(vma, start, end); -} - -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - BUG_ON(vma->vm_start & ~HPAGE_MASK); - BUG_ON(vma->vm_end & ~HPAGE_MASK); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - - if (!pte_none(*pte)) { - pmd_t *pmd = (pmd_t *) pte; - - page = pmd_page(*pmd); - pmd_clear(pmd); - mm->nr_ptes--; - dec_page_state(nr_page_table_pages); - page_cache_release(page); - } - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - free_huge_page(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; + page = pmd_page(*pmd); + pmd_clear(pmd); + dec_page_state(nr_page_table_pages); + page_cache_release(page); } /* x86_64 also uses this file */ Index: working-2.6/arch/sh64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sh64/mm/hugetlbpage.c 2004-08-09 09:51:41.000000000 +1000 +++ working-2.6/arch/sh64/mm/hugetlbpage.c 2004-10-29 11:38:27.137145016 +1000 @@ -24,7 +24,7 @@ #include #include -static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -39,7 +39,7 @@ return pte; } -static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -54,23 +54,9 @@ return pte; } -#define mk_pte_huge(entry) do { pte_val(entry) |= _PAGE_SZHUGE; } while (0) - -static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, - struct page *page, pte_t * page_table, int write_access) +void set_huge_pte(pte_t *page_table, pte_t entry) { unsigned long i; - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - - if (write_access) - entry = pte_mkwrite(pte_mkdirty(mk_pte(page, - vma->vm_page_prot))); - else - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - entry = pte_mkyoung(entry); - mk_pte_huge(entry); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { set_pte(page_table, entry); @@ -80,6 +66,20 @@ } } +pte_t huge_ptep_get_and_clear(pte_t *ptep) +{ + pte_t entry; + + entry = *ptep; + + for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { + pte_clear(pte); + pte++; + } + + return entry; +} + /* * This function checks for proper alignment of input addr and len parameters. */ @@ -92,79 +92,6 @@ return 0; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - int i; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto nomem; - src_pte = huge_pte_offset(src, addr); - BUG_ON(!src_pte || pte_none(*src_pte)); - entry = *src_pte; - ptepage = pte_page(entry); - get_page(ptepage); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - set_pte(dst_pte, entry); - pte_val(entry) += PAGE_SIZE; - dst_pte++; - } - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - addr += HPAGE_SIZE; - } - return 0; - -nomem: - return -ENOMEM; -} - -int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) -{ - unsigned long vaddr = *position; - int remainder = *length; - - WARN_ON(!is_vm_hugetlb_page(vma)); - - while (vaddr < vma->vm_end && remainder) { - if (pages) { - pte_t *pte; - struct page *page; - - pte = huge_pte_offset(mm, vaddr); - - /* hugetlb should be locked, and hence, prefaulted */ - BUG_ON(!pte || pte_none(*pte)); - - page = pte_page(*pte); - - WARN_ON(!PageCompound(page)); - - get_page(page); - pages[i] = page; - } - - if (vmas) - vmas[i] = vma; - - vaddr += PAGE_SIZE; - --remainder; - ++i; - } - - *length = remainder; - *position = vaddr; - - return i; -} - struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { @@ -181,84 +108,3 @@ { return NULL; } - -void unmap_hugepage_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) -{ - struct mm_struct *mm = vma->vm_mm; - unsigned long address; - pte_t *pte; - struct page *page; - int i; - - BUG_ON(start & (HPAGE_SIZE - 1)); - BUG_ON(end & (HPAGE_SIZE - 1)); - - for (address = start; address < end; address += HPAGE_SIZE) { - pte = huge_pte_offset(mm, address); - BUG_ON(!pte); - if (pte_none(*pte)) - continue; - page = pte_page(*pte); - put_page(page); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - pte_clear(pte); - pte++; - } - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_range(vma, start, end); -} - -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - BUG_ON(vma->vm_start & ~HPAGE_MASK); - BUG_ON(vma->vm_end & ~HPAGE_MASK); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - if (!pte_none(*pte)) - continue; - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - free_huge_page(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; -} Index: working-2.6/arch/sh/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sh/mm/hugetlbpage.c 2004-08-09 09:51:40.000000000 +1000 +++ working-2.6/arch/sh/mm/hugetlbpage.c 2004-10-29 11:38:27.138144864 +1000 @@ -24,7 +24,7 @@ #include #include -static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -39,7 +39,7 @@ return pte; } -static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -54,23 +54,9 @@ return pte; } -#define mk_pte_huge(entry) do { pte_val(entry) |= _PAGE_SZHUGE; } while (0) - -static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, - struct page *page, pte_t * page_table, int write_access) +void set_huge_pte(pte_t *page_table, pte_t entry) { unsigned long i; - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - - if (write_access) - entry = pte_mkwrite(pte_mkdirty(mk_pte(page, - vma->vm_page_prot))); - else - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - entry = pte_mkyoung(entry); - mk_pte_huge(entry); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { set_pte(page_table, entry); @@ -80,6 +66,20 @@ } } +pte_t huge_ptep_get_and_clear(pte_t *ptep) +{ + pte_t entry; + + entry = *ptep; + + for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { + pte_clear(pte); + pte++; + } + + return entry; +} + /* * This function checks for proper alignment of input addr and len parameters. */ @@ -92,79 +92,6 @@ return 0; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - int i; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto nomem; - src_pte = huge_pte_offset(src, addr); - BUG_ON(!src_pte || pte_none(*src_pte)); - entry = *src_pte; - ptepage = pte_page(entry); - get_page(ptepage); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - set_pte(dst_pte, entry); - pte_val(entry) += PAGE_SIZE; - dst_pte++; - } - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - addr += HPAGE_SIZE; - } - return 0; - -nomem: - return -ENOMEM; -} - -int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) -{ - unsigned long vaddr = *position; - int remainder = *length; - - WARN_ON(!is_vm_hugetlb_page(vma)); - - while (vaddr < vma->vm_end && remainder) { - if (pages) { - pte_t *pte; - struct page *page; - - pte = huge_pte_offset(mm, vaddr); - - /* hugetlb should be locked, and hence, prefaulted */ - BUG_ON(!pte || pte_none(*pte)); - - page = pte_page(*pte); - - WARN_ON(!PageCompound(page)); - - get_page(page); - pages[i] = page; - } - - if (vmas) - vmas[i] = vma; - - vaddr += PAGE_SIZE; - --remainder; - ++i; - } - - *length = remainder; - *position = vaddr; - - return i; -} - struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { @@ -181,84 +108,3 @@ { return NULL; } - -void unmap_hugepage_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) -{ - struct mm_struct *mm = vma->vm_mm; - unsigned long address; - pte_t *pte; - struct page *page; - int i; - - BUG_ON(start & (HPAGE_SIZE - 1)); - BUG_ON(end & (HPAGE_SIZE - 1)); - - for (address = start; address < end; address += HPAGE_SIZE) { - pte = huge_pte_offset(mm, address); - BUG_ON(!pte); - if (pte_none(*pte)) - continue; - page = pte_page(*pte); - put_page(page); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - pte_clear(pte); - pte++; - } - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_range(vma, start, end); -} - -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - BUG_ON(vma->vm_start & ~HPAGE_MASK); - BUG_ON(vma->vm_end & ~HPAGE_MASK); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - if (!pte_none(*pte)) - continue; - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - free_huge_page(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; -} Index: working-2.6/arch/sparc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sparc64/mm/hugetlbpage.c 2004-08-09 09:51:42.000000000 +1000 +++ working-2.6/arch/sparc64/mm/hugetlbpage.c 2004-10-29 11:38:27.138144864 +1000 @@ -21,7 +21,7 @@ #include #include -static pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -36,7 +36,7 @@ return pte; } -static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; pmd_t *pmd; @@ -51,23 +51,9 @@ return pte; } -#define mk_pte_huge(entry) do { pte_val(entry) |= _PAGE_SZHUGE; } while (0) - -static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, - struct page *page, pte_t * page_table, int write_access) +void set_huge_pte(pte_t *page_table, pte_t entry) { unsigned long i; - pte_t entry; - - mm->rss += (HPAGE_SIZE / PAGE_SIZE); - - if (write_access) - entry = pte_mkwrite(pte_mkdirty(mk_pte(page, - vma->vm_page_prot))); - else - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); - entry = pte_mkyoung(entry); - mk_pte_huge(entry); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { set_pte(page_table, entry); @@ -77,6 +63,20 @@ } } +pte_t huge_ptep_get_and_clear(pte_t *ptep) +{ + pte_t entry; + + entry = *ptep; + + for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { + pte_clear(pte); + pte++; + } + + return entry; +} + /* * This function checks for proper alignment of input addr and len parameters. */ @@ -89,79 +89,6 @@ return 0; } -int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) -{ - pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; - unsigned long addr = vma->vm_start; - unsigned long end = vma->vm_end; - int i; - - while (addr < end) { - dst_pte = huge_pte_alloc(dst, addr); - if (!dst_pte) - goto nomem; - src_pte = huge_pte_offset(src, addr); - BUG_ON(!src_pte || pte_none(*src_pte)); - entry = *src_pte; - ptepage = pte_page(entry); - get_page(ptepage); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - set_pte(dst_pte, entry); - pte_val(entry) += PAGE_SIZE; - dst_pte++; - } - dst->rss += (HPAGE_SIZE / PAGE_SIZE); - addr += HPAGE_SIZE; - } - return 0; - -nomem: - return -ENOMEM; -} - -int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, - struct page **pages, struct vm_area_struct **vmas, - unsigned long *position, int *length, int i) -{ - unsigned long vaddr = *position; - int remainder = *length; - - WARN_ON(!is_vm_hugetlb_page(vma)); - - while (vaddr < vma->vm_end && remainder) { - if (pages) { - pte_t *pte; - struct page *page; - - pte = huge_pte_offset(mm, vaddr); - - /* hugetlb should be locked, and hence, prefaulted */ - BUG_ON(!pte || pte_none(*pte)); - - page = pte_page(*pte); - - WARN_ON(!PageCompound(page)); - - get_page(page); - pages[i] = page; - } - - if (vmas) - vmas[i] = vma; - - vaddr += PAGE_SIZE; - --remainder; - ++i; - } - - *length = remainder; - *position = vaddr; - - return i; -} - struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { @@ -178,84 +105,3 @@ { return NULL; } - -void unmap_hugepage_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end) -{ - struct mm_struct *mm = vma->vm_mm; - unsigned long address; - pte_t *pte; - struct page *page; - int i; - - BUG_ON(start & (HPAGE_SIZE - 1)); - BUG_ON(end & (HPAGE_SIZE - 1)); - - for (address = start; address < end; address += HPAGE_SIZE) { - pte = huge_pte_offset(mm, address); - BUG_ON(!pte); - if (pte_none(*pte)) - continue; - page = pte_page(*pte); - put_page(page); - for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { - pte_clear(pte); - pte++; - } - } - mm->rss -= (end - start) >> PAGE_SHIFT; - flush_tlb_range(vma, start, end); -} - -int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) -{ - struct mm_struct *mm = current->mm; - unsigned long addr; - int ret = 0; - - BUG_ON(vma->vm_start & ~HPAGE_MASK); - BUG_ON(vma->vm_end & ~HPAGE_MASK); - - spin_lock(&mm->page_table_lock); - for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { - unsigned long idx; - pte_t *pte = huge_pte_alloc(mm, addr); - struct page *page; - - if (!pte) { - ret = -ENOMEM; - goto out; - } - if (!pte_none(*pte)) - continue; - - idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) - + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); - page = find_get_page(mapping, idx); - if (!page) { - /* charge the fs quota first */ - if (hugetlb_get_quota(mapping)) { - ret = -ENOMEM; - goto out; - } - page = alloc_huge_page(); - if (!page) { - hugetlb_put_quota(mapping); - ret = -ENOMEM; - goto out; - } - ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); - if (! ret) { - unlock_page(page); - } else { - hugetlb_put_quota(mapping); - free_huge_page(page); - goto out; - } - } - set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); - } -out: - spin_unlock(&mm->page_table_lock); - return ret; -} Index: working-2.6/include/asm-sh/pgtable.h =================================================================== --- working-2.6.orig/include/asm-sh/pgtable.h 2004-10-29 10:15:21.000000000 +1000 +++ working-2.6/include/asm-sh/pgtable.h 2004-10-29 11:38:27.139144712 +1000 @@ -194,6 +194,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; } static inline pte_t pte_mkyoung(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; } static inline pte_t pte_mkwrite(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_RW)); return pte; } +static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_SZHUGE)); return pte; } /* * Macro and implementation to make a page protection as uncachable. Index: working-2.6/include/asm-ia64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-ia64/pgtable.h 2004-10-29 10:15:20.000000000 +1000 +++ working-2.6/include/asm-ia64/pgtable.h 2004-10-29 11:38:27.140144560 +1000 @@ -281,6 +281,7 @@ #define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A)) #define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D)) #define pte_mkdirty(pte) (__pte(pte_val(pte) | _PAGE_D)) +#define pte_mkhuge(entry) (__pte(pte_val(pte) | _PAGE_P)) /* * Macro to a page protection value as "uncacheable". Note that "protection" is really a Index: working-2.6/include/asm-i386/pgtable.h =================================================================== --- working-2.6.orig/include/asm-i386/pgtable.h 2004-10-21 11:55:01.000000000 +1000 +++ working-2.6/include/asm-i386/pgtable.h 2004-10-29 11:38:27.141144408 +1000 @@ -236,6 +236,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { (pte).pte_low |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte_low |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte_low |= _PAGE_RW; return pte; } +static inline pte_t pte_mkhuge(pte_t pte) { (pte).pte_low |= _PAGE_PRESENT | _PAGE_PSE; return pte; } #ifdef CONFIG_X86_PAE # include @@ -273,7 +274,6 @@ */ #define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) -#define mk_pte_huge(entry) ((entry).pte_low |= _PAGE_PRESENT | _PAGE_PSE) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { Index: working-2.6/include/asm-sparc64/page.h =================================================================== --- working-2.6.orig/include/asm-sparc64/page.h 2004-08-09 09:52:58.000000000 +1000 +++ working-2.6/include/asm-sparc64/page.h 2004-10-29 11:38:27.141144408 +1000 @@ -93,6 +93,7 @@ #define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE - 1UL)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) +#define ARCH_HAS_SETCLEAR_HUGE_PTE #endif #define TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \ Index: working-2.6/include/asm-sparc64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-sparc64/pgtable.h 2004-08-11 10:28:33.000000000 +1000 +++ working-2.6/include/asm-sparc64/pgtable.h 2004-10-29 11:38:27.142144256 +1000 @@ -302,6 +302,7 @@ #define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_ACCESSED | _PAGE_R)) #define pte_mkwrite(pte) (__pte(pte_val(pte) | _PAGE_WRITE)) #define pte_mkdirty(pte) (__pte(pte_val(pte) | _PAGE_MODIFIED | _PAGE_W)) +#define pte_mkhuge(pte) (__pte(pte_val(pte) | _PAGE_SZHUGE)) /* to find an entry in a page-table-directory. */ #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD)) Index: working-2.6/include/asm-sh/page.h =================================================================== --- working-2.6.orig/include/asm-sh/page.h 2004-10-19 17:17:04.000000000 +1000 +++ working-2.6/include/asm-sh/page.h 2004-10-29 11:38:27.142144256 +1000 @@ -31,6 +31,7 @@ #define HPAGE_SIZE (1UL << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE-1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT-PAGE_SHIFT) +#define ARCH_HAS_SETCLEAR_HUGE_PTE #endif #ifdef __KERNEL__ Index: working-2.6/include/asm-sh64/pgtable.h =================================================================== --- working-2.6.orig/include/asm-sh64/pgtable.h 2004-10-21 11:55:01.000000000 +1000 +++ working-2.6/include/asm-sh64/pgtable.h 2004-10-29 11:38:27.143144104 +1000 @@ -429,6 +429,8 @@ extern inline pte_t pte_mkexec(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_EXECUTE)); return pte; } extern inline pte_t pte_mkdirty(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; } extern inline pte_t pte_mkyoung(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; } +extern inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_SZHUGE)); return pte; } + /* * Conversion functions: convert a page and protection to a page entry. Index: working-2.6/include/asm-sh64/page.h =================================================================== --- working-2.6.orig/include/asm-sh64/page.h 2004-08-09 09:52:55.000000000 +1000 +++ working-2.6/include/asm-sh64/page.h 2004-10-29 11:38:27.144143952 +1000 @@ -41,6 +41,7 @@ #define HPAGE_SIZE (1UL << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE-1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT-PAGE_SHIFT) +#define ARCH_HAS_SETCLEAR_HUGE_PTE #endif #ifdef __KERNEL__ Index: working-2.6/include/linux/hugetlb.h =================================================================== --- working-2.6.orig/include/linux/hugetlb.h 2004-08-09 09:53:01.000000000 +1000 +++ working-2.6/include/linux/hugetlb.h 2004-10-29 11:38:27.144143952 +1000 @@ -47,6 +47,20 @@ int prepare_hugepage_range(unsigned long addr, unsigned long len); #endif +#ifndef ARCH_HAS_SETCLEAR_HUGE_PTE +#define set_huge_pte(ptep, pte) set_pte(ptep, pte) +#define huge_ptep_get_and_clear(ptep) ptep_get_and_clear(ptep) +#else +void set_huge_pte(pte_t *ptep, pte_t pte); +pte_t huge_ptep_get_and_clear(pte_t *ptep); +#endif + +#ifndef ARCH_HAS_HUGETLB_CLEAN_STALE_PGTABLE +#define hugetlb_clean_stale_pgtable(pte) BUG() +#else +void hugetlb_clean_stale_pgtable(pte_t *pte); +#endif + #else /* !CONFIG_HUGETLB_PAGE */ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) Index: working-2.6/include/asm-i386/page.h =================================================================== --- working-2.6.orig/include/asm-i386/page.h 2004-10-27 10:43:47.000000000 +1000 +++ working-2.6/include/asm-i386/page.h 2004-10-29 11:39:01.817064456 +1000 @@ -64,6 +64,7 @@ #define HPAGE_MASK (~(HPAGE_SIZE - 1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA +#define ARCH_HAS_HUGETLB_CLEAN_STALE_PGTABLE #endif -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From Darren.Sheppard at ncode.com Fri Oct 29 18:33:54 2004 From: Darren.Sheppard at ncode.com (Darren Sheppard) Date: Fri, 29 Oct 2004 09:33:54 +0100 Subject: Shared Libraries and Exceptions on PSeries Message-ID: I am new to this site so apologies if I have inadvertently broken any rules. We are having trouble catching Exceptions within a shared library built on IBM PSeries running SUSE Linux 8.0 using 32bit gcc compiler. We have created a very simple test application which demonstrates this. We are pretty experienced with porting code to unix platforms but have never come across this before. The code sample works on all of our Unix and Linux platforms and Windows. There is no possibility of upgrading to SUSE 9 as the project we are working on if for a large multinational company who wont upgrade for another 2 years. Here is the code MAIN.CPP #include int main(int argc, char *argv[]) { printf ("In main\n"); void shared_func(); try { throw 1; } catch(int) { printf ("Catch in main ok\n"); } try { shared_func(); } catch(...) { printf ("Caught shared exception in main - ERROR\n"); } return 0; } SHARED.CPP #include void shared_func() { try { printf ("Throwing in shared\n"); throw 1; } catch(...) { printf ("Caught in shared\n"); } } -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041029/2c358065/attachment.htm From dwm at austin.ibm.com Sat Oct 30 05:55:41 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Fri, 29 Oct 2004 14:55:41 -0500 Subject: 2.6.10-rc1-mm2 In-Reply-To: <20041029014930.21ed5b9a.akpm@osdl.org> Message-ID: <200410291955.i9TJtfaj014056@falcon10.austin.ibm.com> Andrew, having some troubles on ppc64. It looks like the changes in the scripts/Makefile.{clean,build} are expecting include/asm to exist in the source tree. I don't see any related file except the include/asm-$ARCH/Kbuild Below is output from a hacked up attempt to add $(srctree) check to fix scripts/Makefile.build. It invokes an added $(warning) at the top of the file: ============================= cmd=={make -j4 O=/build/dwm/build/lk-2.6.10-rc1-mm2.edit/ppc64 zImage} Using /build/dwm/linux/lk-2.6.10-rc1-mm2.edit as source for kernel CHK include/linux/version.h GEN /build/dwm/build/lk-2.6.10-rc1-mm2.edit/ppc64/Makefile /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/basic/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/basic/Kbuild, make=scripts/basic/Makefile! GEN /build/dwm/build/lk-2.6.10-rc1-mm2.edit/ppc64/Makefile /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/kconfig/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/kconfig/Kbuild, make=scripts/kconfig/Makefile! scripts/kconfig/conf -s arch/ppc64/Kconfig # # using defaults found in .config # SPLIT include/linux/autoconf.h -> include/config/* /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/basic/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/basic/Kbuild, make=scripts/basic/Makefile! /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/include/asm/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit//build/dwm/linux/lk-2.6.10-rc1-mm2.edit/include/asm/Kbuild, make=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/include/asm/Makefile! /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:14: /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/include/asm/Makefile: No such file or directory /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Kbuild, make=scripts/Makefile! make[2]: *** No rule to make target `/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/include/asm/Makefile'. Stop. make[1]: *** [prepare0] Error 2 make[1]: *** Waiting for unfinished jobs.... /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/genksyms/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/genksyms/Kbuild, make=scripts/genksyms/Makefile! /build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build:13: kbuild: obj=scripts/mod/Kbuild srctree=/build/dwm/linux/lk-2.6.10-rc1-mm2.edit/scripts/mod/Kbuild, make=scripts/mod/Makefile! make: *** [zImage] Error 2 ============================= diff from vanilla scripts/Makefile.{build,clean} ============================= diff -Nwupa libata-dev-2.6/scripts/Makefile.build lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build --- libata-dev-2.6/scripts/Makefile.build 2004-10-27 15:38:46.972904640 -0500 +++ lk-2.6.10-rc1-mm2.edit/scripts/Makefile.build 2004-10-29 12:50:35.766986000 -0500 @@ -10,7 +10,7 @@ __build: # Read .config if it exist, otherwise ignore -include .config -include $(obj)/Makefile +include $(if $(wildcard $(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) include scripts/Makefile.lib diff -Nwupa libata-dev-2.6/scripts/Makefile.clean lk-2.6.10-rc1-mm2.edit/scripts/Makefile.clean --- libata-dev-2.6/scripts/Makefile.clean 2004-10-27 15:38:46.972904640 -0500 +++ lk-2.6.10-rc1-mm2.edit/scripts/Makefile.clean 2004-10-29 12:50:35.766986000 -0500 @@ -7,7 +7,7 @@ src := $(obj) .PHONY: __clean __clean: -include $(obj)/Makefile +include $(if $(wildcard $(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) # Figure out what we need to build from the various variables From sam at ravnborg.org Sat Oct 30 08:13:07 2004 From: sam at ravnborg.org (Sam Ravnborg) Date: Sat, 30 Oct 2004 00:13:07 +0200 Subject: 2.6.10-rc1-mm2 In-Reply-To: <200410291955.i9TJtfaj014056@falcon10.austin.ibm.com> References: <20041029014930.21ed5b9a.akpm@osdl.org> <200410291955.i9TJtfaj014056@falcon10.austin.ibm.com> Message-ID: <20041029221307.GB11016@mars.ravnborg.org> On Fri, Oct 29, 2004 at 02:55:41PM -0500, Doug Maxey wrote: > > Andrew, > > having some troubles on ppc64. It looks like the changes in > the scripts/Makefile.{clean,build} are expecting include/asm to > exist in the source tree. I don't see any related file except the > include/asm-$ARCH/Kbuild Fix attached. Sam ===== Makefile 1.546 vs edited ===== --- 1.546/Makefile 2004-10-27 23:00:25 +02:00 +++ edited/Makefile 2004-10-29 23:05:42 +02:00 @@ -761,7 +761,7 @@ prepare1: prepare2 outputmakefile prepare0: prepare1 include/linux/version.h include/asm include/config/MARKER - $(Q)$(MAKE) $(build)=$(srctree)/include/asm + $(Q)$(MAKE) $(build)=include/asm-$(ARCH) ifneq ($(KBUILD_MODULES),) $(Q)rm -rf $(MODVERDIR) $(Q)mkdir -p $(MODVERDIR) ===== include/asm-i386/Kbuild 1.1 vs edited ===== --- 1.1/include/asm-i386/Kbuild 2004-10-27 23:06:50 +02:00 +++ edited/include/asm-i386/Kbuild 2004-10-29 01:44:08 +02:00 @@ -11,7 +11,7 @@ always := offsets.h targets := offsets.s -CFLAGS_offsets.o := -I arch/i386/kernel +CFLAGS_offsets.o := -Iarch/i386/kernel $(obj)/offsets.h: $(obj)/offsets.s FORCE $(call filechk,gen-asm-offsets, < $<) ===== scripts/Makefile.build 1.51 vs edited ===== --- 1.51/scripts/Makefile.build 2004-10-27 22:49:53 +02:00 +++ edited/scripts/Makefile.build 2004-10-29 23:04:40 +02:00 @@ -10,7 +10,7 @@ # Read .config if it exist, otherwise ignore -include .config -include $(if $(wildcard $(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) +include $(if $(wildcard $(srctree)/$(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) include scripts/Makefile.lib ===== scripts/Makefile.clean 1.17 vs edited ===== --- 1.17/scripts/Makefile.clean 2004-10-27 22:49:53 +02:00 +++ edited/scripts/Makefile.clean 2004-10-29 23:22:26 +02:00 @@ -7,7 +7,7 @@ .PHONY: __clean __clean: -include $(if $(wildcard $(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) +include $(if $(wildcard $(srctree)/$(obj)/Kbuild), $(obj)/Kbuild, $(obj)/Makefile) # Figure out what we need to build from the various variables # ========================================================================== From dwm at austin.ibm.com Sat Oct 30 07:24:11 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Fri, 29 Oct 2004 16:24:11 -0500 Subject: 2.6.10-rc1-mm2 In-Reply-To: <20041029221307.GB11016@mars.ravnborg.org> Message-ID: <200410292124.i9TLOBIe014728@falcon10.austin.ibm.com> On Sat, 30 Oct 2004 00:13:07 +0200, Sam Ravnborg wrote: >On Fri, Oct 29, 2004 at 02:55:41PM -0500, Doug Maxey wrote: >> >> Andrew, >> >> having some troubles on ppc64. It looks like the changes in >> the scripts/Makefile.{clean,build} are expecting include/asm to >> exist in the source tree. I don't see any related file except the >> include/asm-$ARCH/Kbuild > >Fix attached. Worked, thanks! From dwm at austin.ibm.com Sat Oct 30 08:09:03 2004 From: dwm at austin.ibm.com (Doug Maxey) Date: Fri, 29 Oct 2004 17:09:03 -0500 Subject: [PATCH 1/1] ppc64 install outside of source tree Message-ID: <200410292209.i9TM937o014943@falcon10.austin.ibm.com> Sam, please apply. Having been using this for a while. Name: arch/ppc64/boot install outside of source tree Rationale: When building outside source tree, install.sh is looked for in the obj side. Status: tested on ppc64 builds Signed-off-by: Doug Maxey ChangeLog: * have ppc64 ability to run install.sh from build outside srctree. ++doug IBM Linux Technology Center ===== arch/ppc64/boot/Makefile 1.25 vs edited ===== --- 1.25/arch/ppc64/boot/Makefile 2004-10-03 12:23:50 -05:00 +++ edited/arch/ppc64/boot/Makefile 2004-10-11 14:15:58 -05:00 @@ -118,6 +118,6 @@ >> $(obj)/imagesize.c install: $(CONFIGURE) $(obj)/$(BOOTIMAGE) - sh -x $(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" + sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" "$(obj)/$(BOOTIMAGE)" "$(INSTALL_PATH)" clean-files := $(addprefix $(objtree)/, $(obj-boot) vmlinux.strip) From sam at ravnborg.org Sun Oct 31 09:12:59 2004 From: sam at ravnborg.org (Sam Ravnborg) Date: Sun, 31 Oct 2004 00:12:59 +0200 Subject: [PATCH 1/1] ppc64 install outside of source tree In-Reply-To: <200410292209.i9TM937o014943@falcon10.austin.ibm.com> References: <200410292209.i9TM937o014943@falcon10.austin.ibm.com> Message-ID: <20041030221259.GA9592@mars.ravnborg.org> On Fri, Oct 29, 2004 at 05:09:03PM -0500, Doug Maxey wrote: > > Sam, > please apply. Having been using this for a while. Applied. Sam From hpa at zytor.com Sun Oct 31 10:37:40 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Sat, 30 Oct 2004 16:37:40 -0700 Subject: PATCH: Altivec support for RAID-6 Message-ID: <418425C4.1020900@zytor.com> This patch allows the RAID-6 code to use Altivec on ppc/ppc64 processors. Note that it uses gcc support, so it might require a fairly recent gcc -- but I haven't been able to get a clear answer on *how* new. It also changes -mcpu=power4 to -mcpu=970 when CONFIG_ALTIVEC is enabled, since -mcpu=power4 doesn't allow -maltivec to be specified with it :( The results are *impressive*, however; on a PowerMac G5 I get 6.1 GB/s (on one CPU!); this is close to the 7.8 GB/s for RAID-5, and almost 2x what my 3 GHz Pentium4 gets. -hpa -------------- next part -------------- A non-text attachment was scrubbed... Name: raid6altivec.diff Type: text/x-patch Size: 7285 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041030/1f500436/attachment.bin From hpa at zytor.com Sun Oct 31 10:37:59 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Sat, 30 Oct 2004 16:37:59 -0700 Subject: PATCH: Altivec support for RAID-6 Message-ID: <418425D7.1050602@zytor.com> This patch allows the RAID-6 code to use Altivec on ppc/ppc64 processors. Note that it uses gcc support, so it might require a fairly recent gcc -- but I haven't been able to get a clear answer on *how* new. It also changes -mcpu=power4 to -mcpu=970 when CONFIG_ALTIVEC is enabled, since -mcpu=power4 doesn't allow -maltivec to be specified with it :( The results are *impressive*, however; on a PowerMac G5 I get 6.1 GB/s (on one CPU!); this is close to the 7.8 GB/s for RAID-5, and almost 2x what my 3 GHz Pentium4 gets. -hpa Signed-Off-By: H. Peter Anvin -------------- next part -------------- A non-text attachment was scrubbed... Name: raid6altivec.diff Type: text/x-patch Size: 7285 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041030/65ffa544/attachment.bin From olh at suse.de Thu Oct 28 17:25:59 2004 From: olh at suse.de (Olaf Hering) Date: Thu, 28 Oct 2004 09:25:59 +0200 Subject: module.viomap support for ppc64 In-Reply-To: <16768.31051.268932.927382@cargo.ozlabs.ibm.com> References: <20040812173751.GA30564@suse.de> <1092339278.19137.8.camel@localhost> <1092354195.25196.11.camel@bach> <20040813094040.GA1769@suse.de> <16768.31051.268932.927382@cargo.ozlabs.ibm.com> Message-ID: <20041028072559.GA4977@suse.de> On Thu, Oct 28, Paul Mackerras wrote: > Olaf Hering writes: > > > A hack for 2.6.8-rc4 is below. Can I read the alias file via > > while read a b c ; do : done < modules.alias ? > > Is b supposed to contain not spaces? What special delimiter chars are > > allowed? The 'name' and 'compat' property can contain almost any char. > > I used '^' for the time being. > > Olaf, do you still want these changes made? I rebased your patch on > current BK (see below). Yes, but how to implemented in detail was the question. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG