From johnrose at austin.ibm.com Fri Oct 1 01:45:15 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 30 Sep 2004 10:45:15 -0500 Subject: Why do we map PCI IO space so late ? In-Reply-To: <1096532573.32754.13.camel@gaston> References: <1096532573.32754.13.camel@gaston> Message-ID: <1096559115.27021.33.camel@sinatra.austin.ibm.com> Hi Ben- Good questions :) First let me clear something up, and forgive me if I'm telling you stuff you already know. The ioremap()'s that we do at boot are _exclusively_ done for PHBs. This creates mappings that span the ranges for their children buses. Why do we do this when drivers can themselves use ioremap()? Because some drivers still use inb()/outb(), etc, without remapping their own space. The short answer to your questions is that I/O DLPAR required these PHB ioremap()'s to be moved to a later chronological point during boot, so that imalloc records would be kept. Here's the long answer. To dynamically remove a bus (EADS or PHB), we need to iounmap() the range associated with it. The iounmap() function is prototyped in generic code to take one argument, the virtual address in question. In order to know the size of the region to unmap, we need to keep some records of what was ioremap()'ed originally. The imalloc subsystem exists to keep these records. The ppc64 ioremap() implementation has the limitation that if one calls it before mem_init_done, no imalloc records are left behind. If we remap the PHBs early in boot, we have no way to unmap them (or their children) at DLPAR remove time. Does this make sense? As a side note, we didn't similarly defer the remap for ISA, b/c we assumed that we'd never want to unmap this range. I wrote the function that remaps for ISA, and it's a hack, you're right :) Suggestions are welcome. I would ask why your ISA node doesn't have a ranges property, b/c I thought it was mandatory from some spec. You asked about ioremap_explicit(). This is used in two ways. First during boot, to remap the necessary regions for PHBs after mem_init_done. We've saved off the "physical" range info from the ofdt early in boot, and now we explicitly remap starting at virtual addr PHBS_IO_BASE. Second, we use it to remap the range of a newly DLPAR-added bus. You can imagine that in the case of adding an EADS slot, we need the mappings to exist at exact virtual addresses relative to its parent PHB, etc. Hence the creation of ioremap_explicit(). Suggestions on improvements are welcome. Hope this helps, it's before lunch and I'm being wordy. :) Thanks- John On Thu, 2004-09-30 at 03:22, Benjamin Herrenschmidt wrote: > Hi John ! > > I was going through some of the PCI setup code while working on > some bringup stuff, and had an issue which was related to the way > we do the ioremap'ing of the PCI IO space. > > So the current scenario is: > > - early (setup_arch() time basically), we ioremap_explicit the ISA > space and that only > > - later (pcibios_fixup time), we scan all busses and ioremap_explicit > their various IO spaces. > > I have two problems with that at the moment. > > First is, I'm annoyed that during the actual PCI probing, the IO space > is not mapped. That means that any quirk that needs IO accesses to the > device will not work. I wonder also in which conditions we might end up > instanciating a PCI driver as early as the PCI probing and thus crash. > Also, this is all after console_initcalls(), so that leaves a gap of > code that runs with PCI IO space not mapped. So far, it ended up beeing > mostly ok because our console uses legacy serial drivers that use the > ISA space which happen to be mapped early, but that sounds fragile & > bogus to me. (For the short story, I found that while working on a board > for which the "isa" node didn't have a "ranges" property, so we failed > to early map it, thus the serial driver would crash doing IO cycles). > Why can't we do the ioremap_explicit right after setting up the PHBs ? > > The second thing that annoys me is that it seems we are also doing an > ioremap_explicit for each p2p bridge IO space, aren't we ? I don't fully > understand the logic here. Aren't those supposed to be fully enclosed by > their parent PHB IO space, and thus mapped by those ? > > Thanks for enlightening me, > Ben. > > > > From segher at kernel.crashing.org Fri Oct 1 02:38:20 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Thu, 30 Sep 2004 11:38:20 -0500 Subject: reading files in /proc/device-tree In-Reply-To: <1096546849.3081.2.camel@gaston> References: <20040929101700.GA2623@in.ibm.com> <1096546849.3081.2.camel@gaston> Message-ID: <23A68A84-12FF-11D9-8370-000A95A4DC02@kernel.crashing.org> >> Also, the format of the entries is dependent on the >> #address-cells and #sized-cells properties. > > ... of the parent node :) read the OF spec for more details Of the first (not necessarily immediate) parent that has those properties, yes. As memory is a child of the root node, it will be its direct parent, yes. Segher From david at gibson.dropbear.id.au Fri Oct 1 14:03:25 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 1 Oct 2004 14:03:25 +1000 Subject: mapping memory in 0xb space In-Reply-To: References: <20040929014017.GC5470@zax> Message-ID: <20041001040325.GB12890@zax> On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > On Wed, 29 Sep 2004, David Gibson wrote: > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > > > > > Recent kernels don't even > > > > have VSIDs allocated for the 0xb... region. > > > > > > Looking at both 2.6.8 and 2.4.21, I don't see a difference in > > > get_kernel_vsid() code. > > > > Ok, *very* recent kernels. The new VSID algorithm has gone into the > > BK tree since 2.6.8. > > >From the description I read, I might be better off using 0xfff.. addresses > with that algorithm. Not a big deal. Perhaps. However, there are issues there as well: older kernels have the same 41-bit address restriction (maybe somewhat extendable) in the 0xf region, just like 0xb. The new VSID algo gives VSIDs for every address above 0xc000000000000000 *except* the very last segment, 0xfffffffff0000000-0xffffffffffffffff. > > > This leaves segments. Both > > > DataAccess_common and DataAccessSLB_common call > > > do_stab_bolted/do_slb_bolted when confronted with an address in 0xb > > > region. > > > > Oh, so it does. That, I think is a 2.4 thing, long gone in 2.6 (even > > before the SLB rewrite, I'm pretty sure do_slb_bolted was only called > > for 0xc addresses). > > In my 2.4.21 source, do_slb_bolted does get called for 0xb addresses. > And thanks for letting me know about power4 being SLB. I was clueless on > the issue. > > > Presumably, this will fault in the segments I am interested in. > > > > Yes, actually, it should. Ok, I guess the problem is deeper than I > > thought. > > Or is it? > > > > Also, I narrowed it down to > > > working (or appearing to work) as long as the highest 5 bits of the page > > > index (those that end up as partial index in the HPTE) are zero. This may > > > just be a weird coincidence. > > > > Could be. > > > > > > Why on earth do you want to do this? > > > > > > Good question ;-). A long long time ago, I posted on this list and > > > explained. Since then, I found what appeared to be a solution, except > > > that it appears power4 breaks it. I am building a tool that allows > > > dynamic splicing of code into a running kernel (see > > > http://www.paradyn.org/html/kerninst.html). In order for this to work, I > > > need to be able to overwrite a single instruction with a jump to > > > spliced-in code. The target of the jump needs to be within the range (26 > > > bits). Therefore, I have a choice of 0xbfff.. addresses with backward > > > jumps from 0xc region, or the 0xff.. addresses for absolute jumps. I > > > chose 0xbff.., because I found already-working code, originally written > > > for the performance counter interface. Am I making more sense now? > > > > Aha! But this does actually explain the problem - there are only > > VSIDs assigned for the first 2^41 bits of each region - so although > > there are vsids for 0xb000000000000000-0xb00001ffffffffff, there > > aren't any for 0xbff... addresses. Likewise the Linux pagetables only > > cover a 41-bit address range, but that won't matter if you're creating > > HPTEs directly. > > And this is why I avoided explaining fully in my first email :-). I'd > like to solve one problem at a time. What I said in my initial email > is accurate. Even within the valid VSID range, if the highest 5 bits of > the page index are not zero, I get a crash on access (e.g. > 0xb00001FFFFF00000, but works on 0xb00001FFF0000000). Hrm. Ok. I'm not sure why that would be. > As for why I thought 0xbff would work, I reasoned that > since the highest bits are masked out in get_kernel_vsid(), and since > nobody else is using the 0xb region, it doesn't matter if I get a VSID > that is the same as some other VSID in 0xb region. However, I did not > consider the bug in do_slb_bolted that you describe below. Yes, with that bug the collision can be with a segment anywhere, not just in the 0xb region. > > You may have seen the comment in do_slb_bolted which claims to permit > > a full 32-bits of ESID - it's wrong. The code doesn't mask the ESID > > down to 13 bits as get_kernel_vsid() does, but it probably should - an > > overlarge ESID will cause collisions with VSIDs from entirely > > different address places, which would be a Bad Thing. > > This must be happening, although I would still like to know why it > misbehaves even within the valid VSID range. > > > > > Actually, you should be able to allow ESIDs of up to 21 bits there (36 > > bit VSID - 15 bits of "context"). But you will need to make sure > > get_kernel_vsid(), or whatever you're using to calculate the VAs for > > the hash HPTEs is updated to match - at the moment I think it will > > mask down to 13 bits. I'm not sure if that will get you sufficiently > > close to 0xc0... for your purposes. > > No, it's not close enough--I really must have that very last segment. > It sounds like I was simply getting lucky on the power3 machine. > Without the mask, I must have been getting random pages, and > happily overwriting them. > > Any ideas on how I might map that very last segment of 0xb, or for > that matter the very last segment of 0xf ? It need not be pretty, > but it cannot involve modifying the kernel source, though it can rely on > whatever dirty tricks a kernel module might get away with. I don't > want to modify the source, because I would like the tool to work on > unmodified kernels. Um... right. You know, I'm really not sure its possible without changing the kernel source, short of binary patching the do_slb_bolted code from a module. Sorry. The segment code's just really not set up to handle this. Though, come to that, you do only need one segment, so it might not be that hard to binary patch in branch to some code of your own which provides a VSID for that one segment. > It's starting to sound like an impossible task (at least on non-recent > kernels). I think I might go with a backup suboptimal solution, which > involves extra jumps, but at least it might work. That may be a better idea. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Fri Oct 1 17:21:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 01 Oct 2004 17:21:04 +1000 Subject: Why do we map PCI IO space so late ? In-Reply-To: <1096559115.27021.33.camel@sinatra.austin.ibm.com> References: <1096532573.32754.13.camel@gaston> <1096559115.27021.33.camel@sinatra.austin.ibm.com> Message-ID: <1096615264.11463.93.camel@gaston> On Fri, 2004-10-01 at 01:45, John Rose wrote: > Hi Ben- > > Good questions :) First let me clear something up, and forgive me if > I'm telling you stuff you already know. The ioremap()'s that we do at > boot are _exclusively_ done for PHBs. This creates mappings that span > the ranges for their children buses. Why do we do this when drivers can > themselves use ioremap()? Because some drivers still use inb()/outb(), > etc, without remapping their own space. Yah, that at least is obvious :) > The short answer to your questions is that I/O DLPAR required these PHB > ioremap()'s to be moved to a later chronological point during boot, so > that imalloc records would be kept. Okay, that makes more sense to me now. > Here's the long answer. To dynamically remove a bus (EADS or PHB), we > need to iounmap() the range associated with it. The iounmap() function > is prototyped in generic code to take one argument, the virtual address > in question. In order to know the size of the region to unmap, we need > to keep some records of what was ioremap()'ed originally. The imalloc > subsystem exists to keep these records. Right. > The ppc64 ioremap() implementation has the limitation that if one calls > it before mem_init_done, no imalloc records are left behind. If we > remap the PHBs early in boot, we have no way to unmap them (or their > children) at DLPAR remove time. Does this make sense? Yup. > As a side note, we didn't similarly defer the remap for ISA, b/c we > assumed that we'd never want to unmap this range. I wrote the function > that remaps for ISA, and it's a hack, you're right :) Suggestions are > welcome. I would ask why your ISA node doesn't have a ranges property, > b/c I thought it was mandatory from some spec. The OF tree of this board is still a work in progress. It has to be mapped early anyway for other reasons, like the console serial driver which will be initialized before we do the real mapping. > You asked about ioremap_explicit(). This is used in two ways. First > during boot, to remap the necessary regions for PHBs after > mem_init_done. We've saved off the "physical" range info from the ofdt > early in boot, and now we explicitly remap starting at virtual addr > PHBS_IO_BASE. Second, we use it to remap the range of a newly > DLPAR-added bus. You can imagine that in the case of adding an EADS > slot, we need the mappings to exist at exact virtual addresses relative > to its parent PHB, etc. Hence the creation of ioremap_explicit(). > > Suggestions on improvements are welcome. Hope this helps, it's before > lunch and I'm being wordy. :) Thanks, it's enough for now, I need to think of alternative (read: simpler) ways to deal with that in the future, but for now, it's fine. Ben. From david at gibson.dropbear.id.au Fri Oct 1 18:45:14 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 1 Oct 2004 18:45:14 +1000 Subject: [PPC64] Change bad choice of VSID_MULTIPLIER Message-ID: <20041001084514.GB19046@zax> Andrew/Linus, please apply: We recently changed the VSID allocation on PPC64 to use a new scheme based on a multiplicative hash. It turns out our choice of multiplier (the largest 28-bit prime) wasn't so great: with large contiguous mappings, we can get very poor hash scattering. In particular earlier machines (without 16M pages) which had a reasonable about of RAM (>2G or so) wouldn't boot, because the linear mapping overflowed some hash buckets. This patch changes the multiplier to something which seems to work better (it is, rather arbitrarily, the median of the primes between 2^27 and 2^28). Some more theory should almost certainly go into the choice of this constant, to avoid more pathological cases. But for now, this choice fixes a serious bug, and seems to do at least as well at scattering as the old choice on a handful of simple testcases. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/mmu_context.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu_context.h 2004-09-20 10:12:50.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu_context.h 2004-10-01 18:28:01.565963320 +1000 @@ -108,11 +108,10 @@ * * This scramble is only well defined for proto-VSIDs below * 0xFFFFFFFFF, so both proto-VSID and actual VSID 0xFFFFFFFFF are - * reserved. VSID_MULTIPLIER is prime (the largest 28-bit prime, in - * fact), so in particular it is co-prime to VSID_MODULUS, making this - * a 1:1 scrambling function. Because the modulus is 2^n-1 we can - * compute it efficiently without a divide or extra multiply (see - * below). + * reserved. VSID_MULTIPLIER is prime, so in particular it is + * co-prime to VSID_MODULUS, making this a 1:1 scrambling function. + * Because the modulus is 2^n-1 we can compute it efficiently without + * a divide or extra multiply (see below). * * This scheme has several advantages over older methods: * Index: working-2.6/include/asm-ppc64/mmu.h =================================================================== --- working-2.6.orig/include/asm-ppc64/mmu.h 2004-09-20 10:12:50.000000000 +1000 +++ working-2.6/include/asm-ppc64/mmu.h 2004-10-01 18:28:01.566963168 +1000 @@ -202,7 +202,7 @@ #define SLB_VSID_KERNEL (SLB_VSID_KP|SLB_VSID_C) #define SLB_VSID_USER (SLB_VSID_KP|SLB_VSID_KS) -#define VSID_MULTIPLIER ASM_CONST(268435399) /* largest 28-bit prime */ +#define VSID_MULTIPLIER ASM_CONST(200730139) /* 28-bit prime */ #define VSID_BITS 36 #define VSID_MODULUS ((1UL<>SID_SHIFT) - .llong 0x40bffffd5 /* KERNELBASE VSID */ + .llong 0x408f92c94 /* KERNELBASE VSID */ /* We have to list the bolted VMALLOC segment here, too, so that it * will be restored on shared processor switch */ .llong (VMALLOCBASE>>SID_SHIFT) - .llong 0xb0cffffd1 /* VMALLOCBASE VSID */ + .llong 0xf09b89af5 /* VMALLOCBASE VSID */ .llong 8192 /* # pages to map (32 MB) */ .llong 0 /* Offset from start of loadarea to start of map */ - .llong 0x40bffffd50000 /* VPN of first page to map */ + .llong 0x408f92c940000 /* VPN of first page to map */ . = 0x6100 -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From grave at ipno.in2p3.fr Sat Oct 2 02:04:14 2004 From: grave at ipno.in2p3.fr (grave) Date: Fri, 01 Oct 2004 16:04:14 +0000 Subject: XServe Node running a debian with only one processor In-Reply-To: <1096548321l.32616l.0l@ipnnarval> (from grave@ipno.in2p3.fr on Thu Sep 30 14:45:21 2004) References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> Message-ID: <1096646654l.2901l.2l@ipnnarval> Got the xserve booting (thanks to http://ozlabs.org/ppc64-patches/patch.pl?id=59) But I can only run a single CPU kernel, does somebody know how to get the second CPU on ? The kernel is a ppc64 one with smp compiled in but only able to boot with nosmp option kernel from kernel.org + patch to setup.c and pmac_features.c) Thanks in advance for any hint... xavier From igor at cs.wisc.edu Sat Oct 2 04:05:12 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Fri, 1 Oct 2004 13:05:12 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <20041001040325.GB12890@zax> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: A question for the rest of you, who haven't been following this thread. Is there publicly available documentation on the power4 extensions, specifically the large page support, how it effects the HPT hashing, and the SLB, including the new instructions for maintaining it in software? I haven't been able to find anything yet. On Fri, 1 Oct 2004, David Gibson wrote: > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > > > > > > > Recent kernels don't even > > > > > have VSIDs allocated for the 0xb... region. > > > > > > > > Looking at both 2.6.8 and 2.4.21, I don't see a difference in > > > > get_kernel_vsid() code. > > > > > > Ok, *very* recent kernels. The new VSID algorithm has gone into the > > > BK tree since 2.6.8. > > > > >From the description I read, I might be better off using 0xfff.. addresses > > with that algorithm. Not a big deal. > > Perhaps. However, there are issues there as well: older kernels have > the same 41-bit address restriction (maybe somewhat extendable) in the > 0xf region, just like 0xb. The new VSID algo gives VSIDs for every > address above 0xc000000000000000 *except* the very last segment, > 0xfffffffff0000000-0xffffffffffffffff. Lucky me! I'll take a look at what the VSID for the last segment conflicts with, maybe it will be something unused. Or I'll have to think of something else clever. Right now, I still want my 2.4.21 implementation to work. > > > > Also, I narrowed it down to > > > > working (or appearing to work) as long as the highest 5 bits of the page > > > > index (those that end up as partial index in the HPTE) are zero. This may > > > > just be a weird coincidence. > > > > > > Could be. > > > > > > > > Why on earth do you want to do this? > > > > > > > > Good question ;-). A long long time ago, I posted on this list and > > > > explained. Since then, I found what appeared to be a solution, except > > > > that it appears power4 breaks it. I am building a tool that allows > > > > dynamic splicing of code into a running kernel (see > > > > http://www.paradyn.org/html/kerninst.html). In order for this to work, I > > > > need to be able to overwrite a single instruction with a jump to > > > > spliced-in code. The target of the jump needs to be within the range (26 > > > > bits). Therefore, I have a choice of 0xbfff.. addresses with backward > > > > jumps from 0xc region, or the 0xff.. addresses for absolute jumps. I > > > > chose 0xbff.., because I found already-working code, originally written > > > > for the performance counter interface. Am I making more sense now? > > > > > > Aha! But this does actually explain the problem - there are only > > > VSIDs assigned for the first 2^41 bits of each region - so although > > > there are vsids for 0xb000000000000000-0xb00001ffffffffff, there > > > aren't any for 0xbff... addresses. Likewise the Linux pagetables only > > > cover a 41-bit address range, but that won't matter if you're creating > > > HPTEs directly. > > > > And this is why I avoided explaining fully in my first email :-). I'd > > like to solve one problem at a time. What I said in my initial email > > is accurate. Even within the valid VSID range, if the highest 5 bits of > > the page index are not zero, I get a crash on access (e.g. > > 0xb00001FFFFF00000, but works on 0xb00001FFF0000000). > > Hrm. Ok. I'm not sure why that would be. Here is some more background. Maybe it will help you think of what's going wrong here. I noticed that if I write to the remapped 0xb00001FFF0000000, the changes do not show up at the physical address I mapped it to. At this point, I noticed that get_free_page() returns a 4K page frame above 256MB, which means that in reality, it's an address within a large page. SLB entry created by do_slb_bolted likewise has the large page bit set. I changed my code to create an HPTE mapping for the large page, and finally I get a sensible result: changes to the remapped page show up on the physical page. Note that even though I create a mapping for the whole large page, I only write to the 4K chunk that corresponds to the address returned by get_free_page() -- I do not want to clobber random memory. In summary, mapping the first large page of the 0xb00001FFF segment works, but mapping any other within that segment causes a kernel crash. There must be something I don't understand about how large pages fit into the HPT. Could you point me to documentation on the large page extensions of power4, and, while we are at it, documentation on the SLB? So far, I simply guessed on how it works, based on the code I see in the kernel. For what it's worth, here is (roughly) the relevant code I am using: frame = get_free_page(GFP_KERNEL); pa = (unsigned long)__v2a(frame) & 0xFFFFFFFFFF000000; //want physical address to point to the corresponding large page. ea = 0xb00001FFFF000000; vsid = get_kernel_vsid(ea); va = ( vsid << 28 ) | ( ea & 0xfffffff ); vpn = va >> PAGE_SHIFT; rpn = pa >> PAGE_SHIFT; hpteflags = _PAGE_ACCESSED|_PAGE_COHERENT|PP_RWXX; slot = ppc_md->hpte_insert(vpn, rpn, hpteflags, 1, 1); smallpage_offset = ( (unsigned long) __v2a(frame) - pa) return ea + smallpage_offset; //only access the relevant 4K chunk within the large page > > > As for why I thought 0xbff would work, I reasoned that > > since the highest bits are masked out in get_kernel_vsid(), and since > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > that is the same as some other VSID in 0xb region. However, I did not > > consider the bug in do_slb_bolted that you describe below. > > Yes, with that bug the collision can be with a segment anywhere, not > just in the 0xb region. OK, I will deal with this, somehow. Binary patch idea might just work. > Though, come to that, you do only need one segment, so it might not be > that hard to binary patch in branch to some code of your own which > provides a VSID for that one segment. > > > It's starting to sound like an impossible task (at least on non-recent > > kernels). I think I might go with a backup suboptimal solution, which > > involves extra jumps, but at least it might work. > > That may be a better idea. I'd like to avoid this, but if I only have to incur this for the binary patch to do_slb_bolted, I might be fine. Thanks, Igor From jschopp at austin.ibm.com Sat Oct 2 06:41:55 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:41:55 -0500 Subject: [PATCH][0/2] ppc64 pre/post boot memory macros Message-ID: <415DC113.1080007@austin.ibm.com> I'm sending two patches for review and passing upstream. The basic idea is that these patches put in place some macros such that memory management can be easily split into pre and post boot. This is based on the work of Mike Kravetz and Dave Hansen. It should be harmless, as the new macros are currently defined to the same thing the old macros were. It is also isolated to ppc64 files, so the other arch guys don't need to worry. Ultimatly my motivation is to move toward hotplug memory. Acceptance of these patches will allow us to carry smaller patches out of mainline and ease our development greatly. Comments/feedback/flames welcome. Patches against 2.6.9-rc3 and have been boot tested on Power5 LPAR. From jschopp at austin.ibm.com Sat Oct 2 06:43:27 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:43:27 -0500 Subject: [PATCH][1/2] ppc64 pre/post boot memory macros In-Reply-To: <415DC113.1080007@austin.ibm.com> References: <415DC113.1080007@austin.ibm.com> Message-ID: <415DC16F.7030402@austin.ibm.com> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64-daveh.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041001/14bca17c/attachment.txt From jschopp at austin.ibm.com Sat Oct 2 06:43:56 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 01 Oct 2004 15:43:56 -0500 Subject: [PATCH][2/2] ppc64 pre/post boot memory macros In-Reply-To: <415DC113.1080007@austin.ibm.com> References: <415DC113.1080007@austin.ibm.com> Message-ID: <415DC18C.6040501@austin.ibm.com> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ppc64-dave-hmore.patch Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041001/52e74dd2/attachment.txt From schwab at suse.de Sat Oct 2 07:40:04 2004 From: schwab at suse.de (Andreas Schwab) Date: Fri, 01 Oct 2004 23:40:04 +0200 Subject: Machine check during PCI scan on PMac G5 Message-ID: Has anyone been able to get 2.6.9-rc3 running on the new PMacs (PowerMac7,3)? I'm getting a machine check during PCI scan in u3_ht_read_config while doing in_8 on 0xe00000008094800e. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sat Oct 2 21:17:01 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 02 Oct 2004 21:17:01 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: References: Message-ID: <1096715821.26913.35.camel@gaston> On Sat, 2004-10-02 at 07:40, Andreas Schwab wrote: > Has anyone been able to get 2.6.9-rc3 running on the new PMacs > (PowerMac7,3)? I'm getting a machine check during PCI scan in > u3_ht_read_config while doing in_8 on 0xe00000008094800e. Argh... again ! Looks like the box doesn't like us to probe the PCI device that is there. Can you print out the precise devfn bus number & offset where the machine check happens ? I wonder if it's something that is turned off by the firmware like one of the K2 internal USB1 controllers that are unused on this machine. K2 is notoriously allergic to us probing things that are turned off. This patch should help by preventing the config space accesses to occur on those devices that aren't in the device-tree, I'll push it to Linus as a temporary fix if you confirm it works. Ben. ===== arch/ppc64/kernel/pmac_pci.c 1.5 vs edited ===== --- 1.5/arch/ppc64/kernel/pmac_pci.c 2004-07-25 14:51:52 +10:00 +++ edited/arch/ppc64/kernel/pmac_pci.c 2004-08-04 10:26:07 +10:00 @@ -271,7 +271,7 @@ int offset, int len, u32 *val) { struct pci_controller *hose; - struct device_node *busdn; + struct device_node *busdn, *dn; unsigned long addr; if (bus->self) @@ -282,6 +282,16 @@ return PCIBIOS_DEVICE_NOT_FOUND; hose = busdn->phb; if (hose == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + /* We only allow config cycles to devices that are in OF device-tree + * as we are apparently having some weird things going on with some + * revs of K2 on recent G5s + */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + break; + if (dn == NULL) return PCIBIOS_DEVICE_NOT_FOUND; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); From benh at kernel.crashing.org Sat Oct 2 21:21:57 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 02 Oct 2004 21:21:57 +1000 Subject: XServe Node running a debian with only one processor In-Reply-To: <1096646654l.2901l.2l@ipnnarval> References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> <1096646654l.2901l.2l@ipnnarval> Message-ID: <1096716117.3634.40.camel@gaston> On Sat, 2004-10-02 at 02:04, grave wrote: > Got the xserve booting > (thanks to http://ozlabs.org/ppc64-patches/patch.pl?id=59) > > But I can only run a single CPU kernel, does somebody know how to get > the second CPU on ? > > The kernel is a ppc64 one with smp compiled in but only able to boot > with nosmp option > kernel from kernel.org + patch to setup.c and pmac_features.c) > > Thanks in advance for any hint... What exact version ? what patches ? What happens (last printed on serial console) if you try to boot SMP ? Ben. From schwab at suse.de Sun Oct 3 05:50:54 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 02 Oct 2004 21:50:54 +0200 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096715821.26913.35.camel@gaston> (Benjamin Herrenschmidt's message of "Sat, 02 Oct 2004 21:17:01 +1000") References: <1096715821.26913.35.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Argh... again ! Looks like the box doesn't like us to probe the > PCI device that is there. Can you print out the precise devfn > bus number & offset where the machine check happens ? The first occurence is devfn 48, bus number 0, offset 14. > This patch should help by preventing the config space accesses to > occur on those devices that aren't in the device-tree, I'll push it > to Linus as a temporary fix if you confirm it works. Thanks, I can confirm that it works. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Oct 3 10:38:38 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 10:38:38 +1000 Subject: [PATCH] Fix booting on some recent G5s Message-ID: <1096763918.26914.63.camel@gaston> Hi ! Some recent G5s have a problem with PCI/HT probing. They crash (machine check) during the probe of some slot numbers, it seems to be related to some functions beeing disabled by the firmware inside the K2 ASIC. This patch limits the config space accesses to devices that are present in the OF device-tree. This fixes the problem and shouldn't "add" any limitation. If you plug a "random" PCI card with no OF driver, the firmware will still build a node for it with the default set of properties created from the config space. Ben. Signed-off-by: Benjamin Herrenschmidt --- 1.5/arch/ppc64/kernel/pmac_pci.c 2004-07-25 14:51:52 +10:00 +++ edited/arch/ppc64/kernel/pmac_pci.c 2004-08-04 10:26:07 +10:00 @@ -271,7 +271,7 @@ int offset, int len, u32 *val) { struct pci_controller *hose; - struct device_node *busdn; + struct device_node *busdn, *dn; unsigned long addr; if (bus->self) @@ -282,6 +282,16 @@ return PCIBIOS_DEVICE_NOT_FOUND; hose = busdn->phb; if (hose == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + + /* We only allow config cycles to devices that are in OF device-tree + * as we are apparently having some weird things going on with some + * revs of K2 on recent G5s + */ + for (dn = busdn->child; dn; dn = dn->sibling) + if (dn->devfn == devfn) + break; + if (dn == NULL) return PCIBIOS_DEVICE_NOT_FOUND; addr = u3_ht_cfg_access(hose, bus->number, devfn, offset); --- 1.21/arch/ppc/platforms/pmac_pci.c 2004-07-29 14:58:35 +10:00 +++ edited/arch/ppc/platforms/pmac_pci.c 2004-08-17 14:18:09 +10:00 @@ -315,6 +315,10 @@ unsigned int addr; int i; + struct device_node *np = pci_busdev_to_OF_node(bus, devfn); + if (np == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; + /* * When a device in K2 is powered down, we die on config * cycle accesses. Fix that here. @@ -362,6 +366,9 @@ unsigned int addr; int i; + struct device_node *np = pci_busdev_to_OF_node(bus, devfn); + if (np == NULL) + return PCIBIOS_DEVICE_NOT_FOUND; /* * When a device in K2 is powered down, we die on config * cycle accesses. Fix that here. From benh at kernel.crashing.org Sun Oct 3 10:51:46 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 10:51:46 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> Message-ID: <1096764706.11996.77.camel@gaston> On Sun, 2004-10-03 at 10:36, Segher Boessenkool wrote: > >> Argh... again ! Looks like the box doesn't like us to probe the > >> PCI device that is there. Can you print out the precise devfn > >> bus number & offset where the machine check happens ? > > > > The first occurence is devfn 48, bus number 0, offset 14. > > That's the "header type" field on the GEM shim. > > I'd rather not have this fixed by the device-tree check, for > various reasons; note that this issue probably is related to the > "config space not readable while GEM is in sleep mode" problem > on older Macs. Is the GEM powered on during boot, on these boxes? I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem properly functionning on this box after this fix ?). I think the numbering of the Shims can change from firmware to firmware, it's more probably one of the USBs. There is code in pmac_feature.c to power up the GEM (but only if it has a device-node). I think the proper solution is the filter from the device-tree on Apple G5s, at least for now, though OF itself probably has a property somewhere that tells it which slots to probe and not to probe, I need to find it. Ben. From segher at kernel.crashing.org Sun Oct 3 10:36:26 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Sat, 2 Oct 2004 19:36:26 -0500 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: References: <1096715821.26913.35.camel@gaston> Message-ID: <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> >> Argh... again ! Looks like the box doesn't like us to probe the >> PCI device that is there. Can you print out the precise devfn >> bus number & offset where the machine check happens ? > > The first occurence is devfn 48, bus number 0, offset 14. That's the "header type" field on the GEM shim. I'd rather not have this fixed by the device-tree check, for various reasons; note that this issue probably is related to the "config space not readable while GEM is in sleep mode" problem on older Macs. Is the GEM powered on during boot, on these boxes? Segher From benh at kernel.crashing.org Sun Oct 3 13:44:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 13:44:44 +1000 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096764706.11996.77.camel@gaston> References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> <1096764706.11996.77.camel@gaston> Message-ID: <1096775084.9539.4.camel@gaston> > I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem > properly functionning on this box after this fix ?). I think the > numbering of the Shims can change from firmware to firmware, it's > more probably one of the USBs. There is code in pmac_feature.c to > power up the GEM (but only if it has a device-node). Ok, after digging in the OF code, it seems that on machines without a PCI-X bridge, shim 6 is just not used and the stuff is really upset when we probe it. K2 is a weird beast that needs care... Ben. From schwab at suse.de Sun Oct 3 21:52:54 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 13:52:54 +0200 Subject: Machine check during PCI scan on PMac G5 In-Reply-To: <1096764706.11996.77.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 03 Oct 2004 10:51:46 +1000") References: <1096715821.26913.35.camel@gaston> <4231083A-14D4-11D9-AE7A-000A95A4DC02@kernel.crashing.org> <1096764706.11996.77.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > I'm suprised, I'm not sure it's actually GEM (Andreas, is the Sungem > properly functionning on this box after this fix ?). It appears to be. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From schwab at suse.de Sun Oct 3 21:59:47 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 13:59:47 +0200 Subject: PM72 works also on PowerMac7,3 Message-ID: The therm_pm72 driver appears to work fine on the PowerMac7,3. Andreas. Signed-off-by: Andreas Schwab --- linux-2.6/drivers/macintosh/therm_pm72.c.~1~ 2004-08-19 11:31:30.000000000 +0200 +++ linux-2.6/drivers/macintosh/therm_pm72.c 2004-10-03 13:55:22.361631501 +0200 @@ -1301,7 +1301,8 @@ static int __init therm_pm72_init(void) { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Sun Oct 3 22:06:45 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 03 Oct 2004 22:06:45 +1000 Subject: PM72 works also on PowerMac7,3 In-Reply-To: References: Message-ID: <1096805205.9514.17.camel@gaston> On Sun, 2004-10-03 at 21:59, Andreas Schwab wrote: > The therm_pm72 driver appears to work fine on the PowerMac7,3. Before commiting this, I'd rather make sure the code & fan IDs is actually the same in Darwin, also, just allowing the 7,3 may enable the code on the new water cooling machines. Before doing so, I'd rather make sure we get that right too. I'm waiting for one of these to be delivered by Apple, they seem to take ages, but hopefully, it should be there soon. Ben. From schwab at suse.de Sun Oct 3 22:20:43 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 03 Oct 2004 14:20:43 +0200 Subject: Properly recognize PowerMac7,3 Message-ID: Make the PowerMac7,3 no longer unknown. Andreas. Signed-off-by: Andreas Schwab --- linux-2.6/arch/ppc64/kernel/pmac_feature.c.~1~ 2004-09-28 00:28:34.000000000 +0200 +++ linux-2.6/arch/ppc64/kernel/pmac_feature.c 2004-10-03 14:17:03.458461540 +0200 @@ -343,6 +343,10 @@ static struct pmac_mb_def pmac_mb_defs[] PMAC_TYPE_POWERMAC_G5, g5_features, 0, }, + { "PowerMac7,3", "PowerMac G5", + PMAC_TYPE_POWERMAC_G5, g5_features, + 0, + }, { "RackMac3,1", "XServe G5", PMAC_TYPE_POWERMAC_G5, g5_features, 0, -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From grave at ipno.in2p3.fr Mon Oct 4 17:59:44 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 07:59:44 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= XServe Node running a debian with only one processor In-Reply-To: <415D85D1.4040701@austin.ibm.com> (from olof@austin.ibm.com on Fri Oct 1 18:29:05 2004) References: <1096546729l.32147l.0l@ipnnarval> <1096548321l.32616l.0l@ipnnarval> <1096646654l.2901l.2l@ipnnarval> <415D85D1.4040701@austin.ibm.com> Message-ID: <1096876784l.19627l.4l@ipnnarval> Here are a few informations : console output in the joined file I use a cross compiler ppc32 -> ppc64 gcc-3.4.1 GNU ld version 2.15 kernel from ftp.kernel.org 2.6.6 patch (had to apply it reversed because of the initial diff I think) : diff -ur linux-2.6.6-working/arch/ppc64/kernel/pmac_feature.c linux-2.6.6/arch/ppc64/kernel/pmac_feature.c --- linux-2.6.6-working/arch/ppc64/kernel/pmac_feature.c 2004-05-13 17:00:12.000000000 -0600 +++ linux-2.6.6/arch/ppc64/kernel/pmac_feature.c 2004-05-09 20:32:54.000000000 -0600 @@ -343,10 +343,6 @@ PMAC_TYPE_POWERMAC_G5, g5_features, 0, }, - { "RackMac3,1", "XServe G5", - PMAC_TYPE_POWERMAC_G5, g5_features, - 0, - }, }; /* diff -ur linux-2.6.6-working/arch/ppc64/kernel/setup.c linux-2.6.6/ arch/ppc64/kernel/setup.c --- linux-2.6.6-working/arch/ppc64/kernel/setup.c 2004-05-13 16:06:33.000000000 -0600 +++ linux-2.6.6/arch/ppc64/kernel/setup.c 2004-05-09 20:32:29.000000000 -0600 @@ -547,7 +547,7 @@ int __init ppc_init(void) { /* clear the progress line */ - if(ppc_md.progress) ppc_md.progress(" ", 0xffff); + ppc_md.progress(" ", 0xffff); if (ppc_md.init != NULL) { ppc_md.init(); -------------- next part -------------- Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount-cache hash table entries: 256 (order: 0, 4096 bytes) POSIX conformance testing by UNIFIX PowerMac SMP probe found 2 cpus Processor 1 found. Synchronizing timebase Got ack score 299, offset 1000 score 299, offset 500 score -299, offset 250 score 299, offset 375 score -299, offset 312 score -299, offset 343 score -299, offset 359 score -299, offset 367 score -283, offset 371 score -247, offset 373 score 133, offset 374 score -239, offset 373 Min 373 (score -237), Max 374 (score 129) Final offset: 374 (127/300) Brought up 2 CPUs a few seconds and : [c0000000000172f4] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to k00025ef1c0[1] 'swapper' THREAD: c0000000025e8000 CPU: 0 GPR00: C000000000077E90 C0000000025EBAE0 C00000000045EA58 FFFFFFFFFFFFFFFF GPR04: 0000000000000DE7 FFFFFFFFFFFFFFFF C000000000397200 C000000000397218 GPR08: 0000000000000000 C000000000359180 0000000000000001 C0000000004AE730 GPR12: 0000000088004044 C000000000308000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000001000 GPR20: C0000000004B16E0 000000000000007F 0000000000000008 000000000000001B GPR24: C00000007DFCDD80 C0000000025EBBE0 0000000000000036 0000000000000080 GPR28: C0000000025EBBE0 C0000000004364F0 C0000000003A5628 0000000000000000 NIP [c000000000077e9c] .smp_call_function_all_cpus+0x7c/0x98 LR [c000000000077e90] .smp_call_function_all_cpus+0x70/0x98 Call Trace: [c000000000079c28] .do_tune_cpucache+0xb4/0x3fc [c00000000007a050] .enable_cpucache+0xe0/0x118 [c00000000007a7b0] .kmem_cache_create+0x728/0x79c [c0000000002f5448] .sk_init+0x30/0xdc [c0000000002f539c] .sock_init+0x3c/0xb8 [c00000000000c6ec] .init+0x238/0x43c [c0000000000172f4] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to kill init! smp_call_function on cpu 0: other cpus not responding (0) Rebooting in 180 seconds.. From grave at ipno.in2p3.fr Mon Oct 4 18:48:26 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 08:48:26 +0000 Subject: discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... Message-ID: <1096879706l.20867l.0l@ipnnarval> http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things going right ! One more time sorry... From benh at kernel.crashing.org Mon Oct 4 18:55:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 18:55:29 +1000 Subject: discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096879706l.20867l.0l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> Message-ID: <1096880129.9514.70.camel@gaston> On Mon, 2004-10-04 at 18:48, grave wrote: > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > going right ! > > One more time sorry... Hrm, that should be in Linus tree already... Ben. From grave at ipno.in2p3.fr Mon Oct 4 22:31:21 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 12:31:21 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096880129.9514.70.camel@gaston> (from benh@kernel.crashing.org on Mon Oct 4 10:55:29 2004) References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> Message-ID: <1096893081l.23876l.0l@ipnnarval> On 04.10.2004 10:55:29, Benjamin Herrenschmidt wrote: > On Mon, 2004-10-04 at 18:48, grave wrote: > > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > > going right ! > > > > One more time sorry... > > Hrm, that should be in Linus tree already... Not in the 2.6.6 tree from www.kernel.org It's present in 2.6.8.1 but this one crash at boot (see attached file). This kernel also crash if I use the nosmp option xavier -------------- next part -------------- Min 8 (score -13), Max 9 (score 51) Final offset: 8 (9/300) Brought up 2 CPUs NET: Registered protocol family 16 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 POWERMAC NIP: C0000000002DA15C XER: 0000000000000000 LR: C00000000000C600 REGS: c0000000027e7be0 TRAP: 0300 Not tainted (2.6.8.1) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 0000000000000000, DSISR: 0000000008000000 TASK: c0000000027e1200[1] 'swapper' THREAD: c0000000027e4000 CPU: 0 GPR00: C00000000000C600 C0000000027E7E60 C000000000437E78 C0000000002AEC28 GPR04: 000000000000FFFF 0000000000000000 C000000000493C48 C00000007DE5BD78 GPR08: 0000000000000002 0000000000000000 0000000000000002 0000000000000000 GPR12: 0000000028000042 C000000000304000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000220000 0000000000230000 0000000001400000 GPR24: C000000000304000 C000000000435008 C0000000002F4348 C0000000002F8268 GPR28: 0000000000000000 C000000000436420 C000000000364260 C0000000002F7F30 NIP [c0000000002da15c] .ppc_init+0x30/0xa4 LR [c00000000000c600] .init+0x234/0x428 Call Trace: [c0000000027e7e60] [c0000000027e7ef0] 0xc0000000027e7ef0 (unreliable) [c0000000027e7ef0] [c00000000000c600] .init+0x234/0x428 [c0000000027e7f90] [c000000000017734] .kernel_thread+0x4c/0x68 <0>Kernel panic: Attempted to kill init! <0>Rebooting in 180 seconds.. From benh at kernel.crashing.org Mon Oct 4 23:32:23 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:32:23 +1000 Subject: =?iso-8859-1?q?Re=3A_Re=C2=A0=3A_discovered_the_patch_pages_and_?= =?iso-8859-1?q?how_it_work_on=0D=0A=09penguinppc64=2Eorg_sorry_for_th?= =?iso-8859-1?q?e_previous_mail=2E=2E=2E?= In-Reply-To: <1096893081l.23876l.0l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> Message-ID: <1096896743.9516.84.camel@gaston> On Mon, 2004-10-04 at 22:31, grave wrote: > On 04.10.2004 10:55:29, Benjamin Herrenschmidt wrote: > > On Mon, 2004-10-04 at 18:48, grave wrote: > > > http://ozlabs.org/ppc64-patches/patch.pl?id=62 make the all things > > > going right ! > > > > > > One more time sorry... > > > > Hrm, that should be in Linus tree already... > Not in the 2.6.6 tree from www.kernel.org > > It's present in 2.6.8.1 but this one crash at boot (see attached file). > This kernel also crash if I use the nosmp option Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk snapshot of 2.6.9 Ben. From grave at ipno.in2p3.fr Mon Oct 4 23:48:28 2004 From: grave at ipno.in2p3.fr (grave) Date: Mon, 04 Oct 2004 13:48:28 +0000 Subject: =?iso-8859-1?q?Re=A0=3A_Re=A0=3A?= discovered the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096896743.9516.84.camel@gaston> (from benh@kernel.crashing.org on Mon Oct 4 15:32:23 2004) References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> Message-ID: <1096897708l.24855l.1l@ipnnarval> > Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk > snapshot of 2.6.9 I also tryed with the bk tree (2.6.9-rc1-ames) it also crashed... I'll retry in order to send you a log of the crash... Where can I get the 2.6.9-rc3 tree ? I didn't find where it is ? I'm trying to have something "better" than 2.6.6 in order to have termal management. xavier From benh at kernel.crashing.org Mon Oct 4 23:45:50 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:45:50 +1000 Subject: =?iso-8859-1?q?Re=3A_Re=C2=A0=3A_Re=C2=A0=3A_discovered_the_patc?= =?iso-8859-1?q?h_pages_and_how_it_work_on=0D=0A=09penguinppc64=2Eorg_?= =?iso-8859-1?q?sorry_for_the_previous_mail=2E=2E=2E?= In-Reply-To: <1096897708l.24855l.1l@ipnnarval> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> <1096897708l.24855l.1l@ipnnarval> Message-ID: <1096897549.23141.93.camel@gaston> On Mon, 2004-10-04 at 23:48, grave wrote: > > Can you try 2.6.9-rc3 and let me know ? Or beter, the current bk > > snapshot of 2.6.9 > > I also tryed with the bk tree (2.6.9-rc1-ames) it also crashed... > I'll retry in order to send you a log of the crash... ames ? just use mainstream > Where can I get the 2.6.9-rc3 tree ? I didn't find where it is ? kernel.org ? > I'm trying to have something "better" than 2.6.6 in order to have > termal management. > > xavier -- Benjamin Herrenschmidt From benh at kernel.crashing.org Mon Oct 4 23:46:54 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 04 Oct 2004 23:46:54 +1000 Subject: =?iso-8859-1?q?Re=C2?= =?iso-8859-1?q?=C2=A0=3A?= =?iso-8859-1?q?Re=C2?= =?iso-8859-1?q?=C2=A0=3A_discovered?= the patch pages and how it work on penguinppc64.org sorry for the previous mail... In-Reply-To: <1096897549.23141.93.camel@gaston> References: <1096879706l.20867l.0l@ipnnarval> <1096880129.9514.70.camel@gaston> <1096893081l.23876l.0l@ipnnarval> <1096896743.9516.84.camel@gaston> <1096897708l.24855l.1l@ipnnarval> <1096897549.23141.93.camel@gaston> Message-ID: <1096897613.9539.95.camel@gaston> On Mon, 2004-10-04 at 23:45, Benjamin Herrenschmidt wrote: > > I'm trying to have something "better" than 2.6.6 in order to have > > termal management. BTW. Thermal control isn't there yet for xserve's ... soon hopefully Ben. From moilanen at austin.ibm.com Tue Oct 5 05:43:05 2004 From: moilanen at austin.ibm.com (moilanen at austin.ibm.com) Date: Mon, 4 Oct 2004 14:43:05 -0500 Subject: [PATCH 1/1] rtas_flash_4gig Message-ID: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> We should probably check to make sure that all of the flash list headers are above 4gig. Not just the first one. We could see this situation happen if we are low on memory and get a paged alloc'd that's over the 4 gig boundary. Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/rtas.c~rtas_flash_4gig arch/ppc64/kernel/rtas.c --- linux-2.6-bk/arch/ppc64/kernel/rtas.c~rtas_flash_4gig Mon Oct 4 10:46:46 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/rtas.c Mon Oct 4 14:22:31 2004 @@ -338,6 +338,12 @@ rtas_flash_firmware(void) f->next = (struct flash_block_list *)virt_to_abs(f->next); else f->next = NULL; + + if (f->next >= 4UL*1024*1024*1024) { + printk(KERN_ALERT "FLASH: aborted...flash list header addr above 4GB\n"); + return; + } + /* make num_blocks into the version/length field */ f->num_blocks = (FLASH_BLOCK_LIST_VERSION << 56) | ((f->num_blocks+1)*16); } _ From schwab at suse.de Tue Oct 5 06:51:55 2004 From: schwab at suse.de (Andreas Schwab) Date: Mon, 04 Oct 2004 22:51:55 +0200 Subject: Sound on G5 Message-ID: Is anyone already working on sound support for the PowerMac G5, by chance? That's actually the only thing still missing. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Tue Oct 5 11:25:03 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Oct 2004 11:25:03 +1000 Subject: Sound on G5 In-Reply-To: References: Message-ID: <1096939502.24584.6.camel@gaston> On Tue, 2004-10-05 at 06:51, Andreas Schwab wrote: > Is anyone already working on sound support for the PowerMac G5, by chance? > That's actually the only thing still missing. Nobody really seriously ATM. One of the main issue is that the darwin driver abuses apple "do-platform-*" shit. It's a mecanism they invented to put sort-of "scripts" (in binary form) in the device-tree that can contains elementary ops such as write GPIOs, I2C, etc... This is extremely messy and difficult to parse. I have written the basis for parsing them, but interpreting them is even more shitty as the actual implementation of each ops sort-of depends on the target object. It's really a piece-of-shit imho. So we could go that way and complete my "interpreter" or just hard code all of the GPIOs we need in the driver hoping apple don't shuffle them too much in upcoming models... Ben. From david at gibson.dropbear.id.au Tue Oct 5 13:13:41 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 13:13:41 +1000 Subject: [PPC64] Squash EEH warnings Message-ID: <20041005031341.GA3695@zax> Andrew, please apply: A slightly non-ideal version of the recent patch which fixed EEH being a no-op went in. The srcsave variable in eeh_memcpy_to_io() is now never referenced on non-pSeries machines, and so spews hundreds of warnings. The variable doesn't actually accomplish anything, so this patch gets rid of it. Signed-off-by: David Gibson Index: working-2.6/include/asm-ppc64/eeh.h =================================================================== --- working-2.6.orig/include/asm-ppc64/eeh.h 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/include/asm-ppc64/eeh.h 2004-10-05 13:09:24.730992368 +1000 @@ -196,7 +196,6 @@ static inline void eeh_memcpy_fromio(void *dest, const volatile void __iomem *src, unsigned long n) { void *vsrc = (void __force *) src; void *destsave = dest; - const volatile void __iomem *srcsave = src; unsigned long nsave = n; while(n && (!EEH_CHECK_ALIGN(vsrc, 4) || !EEH_CHECK_ALIGN(dest, 4))) { @@ -227,7 +226,7 @@ */ if ((nsave >= 4) && (EEH_POSSIBLE_ERROR((*((u32 *) destsave+nsave-4)), u32))) { - eeh_check_failure(srcsave, (*((u32 *) destsave+nsave-4))); + eeh_check_failure(src, (*((u32 *) destsave+nsave-4))); } } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Oct 5 15:26:27 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 15:26:27 +1000 Subject: [TRIVIAL, PPC64] Remove redundant #ifdef CONFIG_ALTIVEC Message-ID: <20041005052627.GD3695@zax> Andrew, please apply: arch/ppc64/kernel/process.c has an #ifdef CONFIG_ALTIVEC within an #ifdef CONFIG_ALTIVEC. This patch removes the inner one. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/process.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/process.c 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/process.c 2004-10-05 15:18:56.581996496 +1000 @@ -147,7 +147,6 @@ */ void flush_altivec_to_thread(struct task_struct *tsk) { -#ifdef CONFIG_ALTIVEC if (tsk->thread.regs) { preempt_disable(); if (tsk->thread.regs->msr & MSR_VEC) { @@ -158,7 +157,6 @@ } preempt_enable(); } -#endif } int dump_task_altivec(struct pt_regs *regs, elf_vrregset_t *vrregs) -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Tue Oct 5 16:42:56 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 5 Oct 2004 16:42:56 +1000 Subject: [PPC64] xmon sparse cleanups Message-ID: <20041005064255.GF3695@zax> Andrew, please apply: This patch removes many sparse warnings from the xmon code. Mostly K&R function declarations and 0-instead-of-NULLs. I believe this removes all save one sparse error in xmon, excepting those inherited from header files. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/xmon/xmon.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/xmon.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/xmon.c 2004-10-05 16:31:01.822963256 +1000 @@ -645,7 +645,7 @@ for (i = 0; i < NBPTS; ++i, ++bp) if (bp->enabled && pc == bp->address) return bp; - return 0; + return NULL; } static struct bpt *in_breakpoint_table(unsigned long nip, unsigned long *offp) @@ -1582,7 +1582,7 @@ extern char dec_exc; void -super_regs() +super_regs(void) { int cmd; unsigned long val; @@ -1816,7 +1816,7 @@ ""; void -memex() +memex(void) { int cmd, inc, i, nslash; unsigned long n; @@ -1967,7 +1967,7 @@ } int -bsesc() +bsesc(void) { int c; @@ -1985,7 +1985,7 @@ || ('a' <= (c) && (c) <= 'f') \ || ('A' <= (c) && (c) <= 'F')) void -dump() +dump(void) { int c; @@ -2150,7 +2150,7 @@ static unsigned mask; void -memlocate() +memlocate(void) { unsigned a, n; unsigned char val[4]; @@ -2183,7 +2183,7 @@ static unsigned long mlim = 0xffffffff; void -memzcan() +memzcan(void) { unsigned char v; unsigned a; @@ -2212,7 +2212,7 @@ /* Input scanning routines */ int -skipbl() +skipbl(void) { int c; @@ -2237,8 +2237,7 @@ }; int -scanhex(vp) -unsigned long *vp; +scanhex(unsigned long *vp) { int c, d; unsigned long v; @@ -2322,7 +2321,7 @@ } void -scannl() +scannl(void) { int c; @@ -2365,13 +2364,13 @@ static char *lineptr; void -flush_input() +flush_input(void) { lineptr = NULL; } int -inchar() +inchar(void) { if (lineptr == NULL || *lineptr == 0) { if (fgets(line, sizeof(line), stdin) == NULL) { @@ -2384,8 +2383,7 @@ } void -take_input(str) -char *str; +take_input(char *str) { lineptr = str; } Index: working-2.6/arch/ppc64/xmon/ppc-opc.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/ppc-opc.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/ppc-opc.c 2004-10-05 16:41:20.355047248 +1000 @@ -20,6 +20,7 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +#include #include "nonstdio.h" #include "ppc.h" @@ -110,12 +111,12 @@ /* The zero index is used to indicate the end of the list of operands. */ #define UNUSED 0 - { 0, 0, 0, 0, 0 }, + { 0, 0, NULL, NULL, 0 }, /* The BA field in an XL form instruction. */ #define BA UNUSED + 1 #define BA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_CR }, + { 5, 16, NULL, NULL, PPC_OPERAND_CR }, /* The BA field in an XL form instruction when it must be the same as the BT field in the same instruction. */ @@ -125,7 +126,7 @@ /* The BB field in an XL form instruction. */ #define BB BAT + 1 #define BB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_CR }, + { 5, 11, NULL, NULL, PPC_OPERAND_CR }, /* The BB field in an XL form instruction when it must be the same as the BA field in the same instruction. */ @@ -168,21 +169,21 @@ /* The BF field in an X or XL form instruction. */ #define BF BDPA + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR }, /* An optional BF field. This is used for comparison instructions, in which an omitted BF field is taken as zero. */ #define OBF BF + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, /* The BFA field in an X or XL form instruction. */ #define BFA OBF + 1 - { 3, 18, 0, 0, PPC_OPERAND_CR }, + { 3, 18, NULL, NULL, PPC_OPERAND_CR }, /* The BI field in a B form or XL form instruction. */ #define BI BFA + 1 #define BI_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_CR }, + { 5, 16, NULL, NULL, PPC_OPERAND_CR }, /* The BO field in a B form instruction. Certain values are illegal. */ @@ -197,36 +198,36 @@ /* The BT field in an X or XL form instruction. */ #define BT BOE + 1 - { 5, 21, 0, 0, PPC_OPERAND_CR }, + { 5, 21, NULL, NULL, PPC_OPERAND_CR }, /* The condition register number portion of the BI field in a B form or XL form instruction. This is used for the extended conditional branch mnemonics, which set the lower two bits of the BI field. This field is optional. */ #define CR BT + 1 - { 3, 18, 0, 0, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, + { 3, 18, NULL, NULL, PPC_OPERAND_CR | PPC_OPERAND_OPTIONAL }, /* The CRB field in an X form instruction. */ #define CRB CR + 1 - { 5, 6, 0, 0, 0 }, + { 5, 6, NULL, NULL, 0 }, /* The CRFD field in an X form instruction. */ #define CRFD CRB + 1 - { 3, 23, 0, 0, PPC_OPERAND_CR }, + { 3, 23, NULL, NULL, PPC_OPERAND_CR }, /* The CRFS field in an X form instruction. */ #define CRFS CRFD + 1 - { 3, 0, 0, 0, PPC_OPERAND_CR }, + { 3, 0, NULL, NULL, PPC_OPERAND_CR }, /* The CT field in an X form instruction. */ #define CT CRFS + 1 - { 5, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 5, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The D field in a D form instruction. This is a displacement off a register, and implies that the next operand is a register in parentheses. */ #define D CT + 1 - { 16, 0, 0, 0, PPC_OPERAND_PARENS | PPC_OPERAND_SIGNED }, + { 16, 0, NULL, NULL, PPC_OPERAND_PARENS | PPC_OPERAND_SIGNED }, /* The DE field in a DE form instruction. This is like D, but is 12 bits only. */ @@ -252,40 +253,40 @@ /* The E field in a wrteei instruction. */ #define E DS + 1 - { 1, 15, 0, 0, 0 }, + { 1, 15, NULL, NULL, 0 }, /* The FL1 field in a POWER SC form instruction. */ #define FL1 E + 1 - { 4, 12, 0, 0, 0 }, + { 4, 12, NULL, NULL, 0 }, /* The FL2 field in a POWER SC form instruction. */ #define FL2 FL1 + 1 - { 3, 2, 0, 0, 0 }, + { 3, 2, NULL, NULL, 0 }, /* The FLM field in an XFL form instruction. */ #define FLM FL2 + 1 - { 8, 17, 0, 0, 0 }, + { 8, 17, NULL, NULL, 0 }, /* The FRA field in an X or A form instruction. */ #define FRA FLM + 1 #define FRA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_FPR }, + { 5, 16, NULL, NULL, PPC_OPERAND_FPR }, /* The FRB field in an X or A form instruction. */ #define FRB FRA + 1 #define FRB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_FPR }, + { 5, 11, NULL, NULL, PPC_OPERAND_FPR }, /* The FRC field in an A form instruction. */ #define FRC FRB + 1 #define FRC_MASK (0x1f << 6) - { 5, 6, 0, 0, PPC_OPERAND_FPR }, + { 5, 6, NULL, NULL, PPC_OPERAND_FPR }, /* The FRS field in an X form instruction or the FRT field in a D, X or A form instruction. */ #define FRS FRC + 1 #define FRT FRS - { 5, 21, 0, 0, PPC_OPERAND_FPR }, + { 5, 21, NULL, NULL, PPC_OPERAND_FPR }, /* The FXM field in an XFX instruction. */ #define FXM FRS + 1 @@ -298,11 +299,11 @@ /* The L field in a D or X form instruction. */ #define L FXM4 + 1 - { 1, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 1, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The LEV field in a POWER SC form instruction. */ #define LEV L + 1 - { 7, 5, 0, 0, 0 }, + { 7, 5, NULL, NULL, 0 }, /* The LI field in an I form instruction. The lower two bits are forced to zero. */ @@ -316,24 +317,24 @@ /* The LS field in an X (sync) form instruction. */ #define LS LIA + 1 - { 2, 21, 0, 0, PPC_OPERAND_OPTIONAL }, + { 2, 21, NULL, NULL, PPC_OPERAND_OPTIONAL }, /* The MB field in an M form instruction. */ #define MB LS + 1 #define MB_MASK (0x1f << 6) - { 5, 6, 0, 0, 0 }, + { 5, 6, NULL, NULL, 0 }, /* The ME field in an M form instruction. */ #define ME MB + 1 #define ME_MASK (0x1f << 1) - { 5, 1, 0, 0, 0 }, + { 5, 1, NULL, NULL, 0 }, /* The MB and ME fields in an M form instruction expressed a single operand which is a bitmask indicating which bits to select. This is a two operand form using PPC_OPERAND_NEXT. See the description in opcode/ppc.h for what this means. */ #define MBE ME + 1 - { 5, 6, 0, 0, PPC_OPERAND_OPTIONAL | PPC_OPERAND_NEXT }, + { 5, 6, NULL, NULL, PPC_OPERAND_OPTIONAL | PPC_OPERAND_NEXT }, { 32, 0, insert_mbe, extract_mbe, 0 }, /* The MB or ME field in an MD or MDS form instruction. The high @@ -345,7 +346,7 @@ /* The MO field in an mbar instruction. */ #define MO MB6 + 1 - { 5, 21, 0, 0, 0 }, + { 5, 21, NULL, NULL, 0 }, /* The NB field in an X form instruction. The value 32 is stored as 0. */ @@ -361,34 +362,34 @@ /* The RA field in an D, DS, DQ, X, XO, M, or MDS form instruction. */ #define RA NSI + 1 #define RA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_GPR }, + { 5, 16, NULL, NULL, PPC_OPERAND_GPR }, /* The RA field in the DQ form lq instruction, which has special value restrictions. */ #define RAQ RA + 1 - { 5, 16, insert_raq, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_raq, NULL, PPC_OPERAND_GPR }, /* The RA field in a D or X form instruction which is an updating load, which means that the RA field may not be zero and may not equal the RT field. */ #define RAL RAQ + 1 - { 5, 16, insert_ral, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ral, NULL, PPC_OPERAND_GPR }, /* The RA field in an lmw instruction, which has special value restrictions. */ #define RAM RAL + 1 - { 5, 16, insert_ram, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ram, NULL, PPC_OPERAND_GPR }, /* The RA field in a D or X form instruction which is an updating store or an updating floating point load, which means that the RA field may not be zero. */ #define RAS RAM + 1 - { 5, 16, insert_ras, 0, PPC_OPERAND_GPR }, + { 5, 16, insert_ras, NULL, PPC_OPERAND_GPR }, /* The RB field in an X, XO, M, or MDS form instruction. */ #define RB RAS + 1 #define RB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_GPR }, + { 5, 11, NULL, NULL, PPC_OPERAND_GPR }, /* The RB field in an X form instruction when it must be the same as the RS field in the instruction. This is used for extended @@ -402,22 +403,22 @@ #define RS RBS + 1 #define RT RS #define RT_MASK (0x1f << 21) - { 5, 21, 0, 0, PPC_OPERAND_GPR }, + { 5, 21, NULL, NULL, PPC_OPERAND_GPR }, /* The RS field of the DS form stq instruction, which has special value restrictions. */ #define RSQ RS + 1 - { 5, 21, insert_rsq, 0, PPC_OPERAND_GPR }, + { 5, 21, insert_rsq, NULL, PPC_OPERAND_GPR }, /* The RT field of the DQ form lq instruction, which has special value restrictions. */ #define RTQ RSQ + 1 - { 5, 21, insert_rtq, 0, PPC_OPERAND_GPR }, + { 5, 21, insert_rtq, NULL, PPC_OPERAND_GPR }, /* The SH field in an X or M form instruction. */ #define SH RTQ + 1 #define SH_MASK (0x1f << 11) - { 5, 11, 0, 0, 0 }, + { 5, 11, NULL, NULL, 0 }, /* The SH field in an MD form instruction. This is split. */ #define SH6 SH + 1 @@ -426,12 +427,12 @@ /* The SI field in a D form instruction. */ #define SI SH6 + 1 - { 16, 0, 0, 0, PPC_OPERAND_SIGNED }, + { 16, 0, NULL, NULL, PPC_OPERAND_SIGNED }, /* The SI field in a D form instruction when we accept a wide range of positive values. */ #define SISIGNOPT SI + 1 - { 16, 0, 0, 0, PPC_OPERAND_SIGNED | PPC_OPERAND_SIGNOPT }, + { 16, 0, NULL, NULL, PPC_OPERAND_SIGNED | PPC_OPERAND_SIGNOPT }, /* The SPR field in an XFX form instruction. This is flipped--the lower 5 bits are stored in the upper 5 and vice- versa. */ @@ -443,25 +444,25 @@ /* The BAT index number in an XFX form m[ft]ibat[lu] instruction. */ #define SPRBAT SPR + 1 #define SPRBAT_MASK (0x3 << 17) - { 2, 17, 0, 0, 0 }, + { 2, 17, NULL, NULL, 0 }, /* The SPRG register number in an XFX form m[ft]sprg instruction. */ #define SPRG SPRBAT + 1 #define SPRG_MASK (0x3 << 16) - { 2, 16, 0, 0, 0 }, + { 2, 16, NULL, NULL, 0 }, /* The SR field in an X form instruction. */ #define SR SPRG + 1 - { 4, 16, 0, 0, 0 }, + { 4, 16, NULL, NULL, 0 }, /* The STRM field in an X AltiVec form instruction. */ #define STRM SR + 1 #define STRM_MASK (0x3 << 21) - { 2, 21, 0, 0, 0 }, + { 2, 21, NULL, NULL, 0 }, /* The SV field in a POWER SC form instruction. */ #define SV STRM + 1 - { 14, 2, 0, 0, 0 }, + { 14, 2, NULL, NULL, 0 }, /* The TBR field in an XFX form instruction. This is like the SPR field, but it is optional. */ @@ -471,52 +472,52 @@ /* The TO field in a D or X form instruction. */ #define TO TBR + 1 #define TO_MASK (0x1f << 21) - { 5, 21, 0, 0, 0 }, + { 5, 21, NULL, NULL, 0 }, /* The U field in an X form instruction. */ #define U TO + 1 - { 4, 12, 0, 0, 0 }, + { 4, 12, NULL, NULL, 0 }, /* The UI field in a D form instruction. */ #define UI U + 1 - { 16, 0, 0, 0, 0 }, + { 16, 0, NULL, NULL, 0 }, /* The VA field in a VA, VX or VXR form instruction. */ #define VA UI + 1 #define VA_MASK (0x1f << 16) - { 5, 16, 0, 0, PPC_OPERAND_VR }, + { 5, 16, NULL, NULL, PPC_OPERAND_VR }, /* The VB field in a VA, VX or VXR form instruction. */ #define VB VA + 1 #define VB_MASK (0x1f << 11) - { 5, 11, 0, 0, PPC_OPERAND_VR }, + { 5, 11, NULL, NULL, PPC_OPERAND_VR }, /* The VC field in a VA form instruction. */ #define VC VB + 1 #define VC_MASK (0x1f << 6) - { 5, 6, 0, 0, PPC_OPERAND_VR }, + { 5, 6, NULL, NULL, PPC_OPERAND_VR }, /* The VD or VS field in a VA, VX, VXR or X form instruction. */ #define VD VC + 1 #define VS VD #define VD_MASK (0x1f << 21) - { 5, 21, 0, 0, PPC_OPERAND_VR }, + { 5, 21, NULL, NULL, PPC_OPERAND_VR }, /* The SIMM field in a VX form instruction. */ #define SIMM VD + 1 - { 5, 16, 0, 0, PPC_OPERAND_SIGNED}, + { 5, 16, NULL, NULL, PPC_OPERAND_SIGNED}, /* The UIMM field in a VX form instruction. */ #define UIMM SIMM + 1 - { 5, 16, 0, 0, 0 }, + { 5, 16, NULL, NULL, 0 }, /* The SHB field in a VA form instruction. */ #define SHB UIMM + 1 - { 4, 6, 0, 0, 0 }, + { 4, 6, NULL, NULL, 0 }, /* The other UIMM field in a EVX form instruction. */ #define EVUIMM SHB + 1 - { 5, 11, 0, 0, 0 }, + { 5, 11, NULL, NULL, 0 }, /* The other UIMM field in a half word EVX form instruction. */ #define EVUIMM_2 EVUIMM + 1 @@ -533,11 +534,11 @@ /* The WS field. */ #define WS EVUIMM_8 + 1 #define WS_MASK (0x7 << 11) - { 3, 11, 0, 0, 0 }, + { 3, 11, NULL, NULL, 0 }, /* The L field in an mtmsrd instruction */ #define MTMSRD_L WS + 1 - { 1, 16, 0, 0, PPC_OPERAND_OPTIONAL }, + { 1, 16, NULL, NULL, PPC_OPERAND_OPTIONAL }, }; Index: working-2.6/arch/ppc64/xmon/start.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/start.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/start.c 2004-10-05 16:33:50.355028808 +1000 @@ -173,7 +173,7 @@ c = xmon_getchar(); if (c == -1) { if (p == str) - return 0; + return NULL; break; } *p++ = c; -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From grave at ipno.in2p3.fr Tue Oct 5 18:41:04 2004 From: grave at ipno.in2p3.fr (grave) Date: Tue, 05 Oct 2004 08:41:04 +0000 Subject: xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 Message-ID: <1096965664l.7230l.0l@ipnnarval> Hi, I've tryed both kernel and got crashes (see attached files). Do I missed a patch ? xavier PS:2.6.6 + smp patch run fine -------------- next part -------------- PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub nvram_init: Could not find nvram partition for nvram buffered error logging. rtasd: no RTAS on system devfs: 2004-01-31 Richard Gooch (rgooch at atnf.csiro.au) devfs: boot_options: 0x0 Oops: Machine check, sig: 0 [#1] SMP NR_CPUS=2 POWERMAC NIP: C00000000014A640 XER: 0000000000000000 LR: C00000000014A614 REGS: c000000001a17a50 TRAP: 0200 Not tainted (2.6.9-rc3-bk4) MSR: 9000000000101032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c000000001a110c0[1] 'swapper' THREAD: c000000001a14000 CPU: 0 GPR00: FFFFFFFFFFFFFFFF C000000001A17CD0 C00000000043C390 00000000000000FF GPR04: C00000000FEFB400 0000000000000010 C0000000002AAAA0 C000000000468298 GPR08: C000000000468268 E0000000828CD000 C00000000045AD5C 9000000000009032 GPR12: 0000000028000042 C000000000355780 0000000000000000 0000000000000000 GPR16: 0000000001400000 00000000016FB720 00000000016FB720 BFFFFFFFFEC00000 GPR20: 000000000023FD58 0000000000000000 0000000001A6A020 00000000016FB998 GPR24: 9000000000009032 0000000000000032 C00000000043F730 C000000000352D58 GPR28: C00000000043F728 0000000000000000 C0000000003D5718 C00000000043F730 NIP [c00000000014a640] .i8042_flush+0x6c/0x15c LR [c00000000014a614] .i8042_flush+0x40/0x15c Call Trace: [c000000001a17cd0] [c000000000355780] 0xc000000000355780 (unreliable) [c000000001a17d80] [c00000000014b240] .i8042_controller_init+0x1c/0x1e4 [c000000001a17e10] [c0000000002f4164] .i8042_init+0xe8/0x64c [c000000001a17ef0] [c00000000000c688] .init+0x234/0x440 [c000000001a17f90] [c0000000000172b8] .kernel_thread+0x4c/0x6c <0>Kernel panic - not syncing: Attempted to kill init! <0>Rebooting in 180 seconds.. -------------- next part -------------- PCI: Probing PCI hardware done SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub nvram_init: Could not find nvram partition for nvram buffered error logging. rtasd: no RTAS on system devfs: 2004-01-31 Richard Gooch (rgooch at atnf.csiro.au) devfs: boot_options: 0x0 Oops: Machine check, sig: 0 [#1] SMP NR_CPUS=2 POWERMAC NIP: C00000000014A244 XER: 0000000000000000 LR: C00000000014A218 REGS: c000000001a17a50 TRAP: 0200 Not tainted (2.6.9-rc3) MSR: 9000000000101032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK: c000000001a110c0[1] 'swapper' THREAD: c000000001a14000 CPU: 0 GPR00: FFFFFFFFFFFFFFFF C000000001A17CD0 C0000000004383A8 00000000000000FF GPR04: C00000000FEED3C0 0000000000000010 C0000000002A7A48 C000000000464298 GPR08: C000000000464268 E0000000828CD000 C000000000456D64 9000000000009032 GPR12: 0000000028000042 C000000000351780 0000000000000000 0000000000000000 GPR16: 0000000001400000 00000000016F8720 00000000016F8720 BFFFFFFFFEC00000 GPR20: 000000000023FD58 0000000000000000 0000000001A66020 00000000016F8998 GPR24: 9000000000009032 0000000000000032 C00000000043B730 C00000000034ED58 GPR28: C00000000043B728 0000000000000000 C0000000003D1728 C00000000043B730 NIP [c00000000014a244] .i8042_flush+0x6c/0x15c LR [c00000000014a218] .i8042_flush+0x40/0x15c Call Trace: [c000000001a17cd0] [c000000000351780] 0xc000000000351780 (unreliable) [c000000001a17d80] [c00000000014ae44] .i8042_controller_init+0x1c/0x1e4 [c000000001a17e10] [c0000000002f1164] .i8042_init+0xe8/0x64c [c000000001a17ef0] [c00000000000c688] .init+0x234/0x440 [c000000001a17f90] [c0000000000172b8] .kernel_thread+0x4c/0x6c <0>Kernel panic - not syncing: Attempted to kill init! <0>Rebooting in 180 seconds.. From benh at kernel.crashing.org Tue Oct 5 18:46:43 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 05 Oct 2004 18:46:43 +1000 Subject: xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 In-Reply-To: <1096965664l.7230l.0l@ipnnarval> References: <1096965664l.7230l.0l@ipnnarval> Message-ID: <1096966003.24535.48.camel@gaston> On Tue, 2004-10-05 at 18:41, grave wrote: > Hi, > > I've tryed both kernel and got crashes (see attached files). > > Do I missed a patch ? > > xavier > PS:2.6.6 + smp patch run fine That's your .config You have enabled the legacy x86 keyboard support ! :) Use a g5_defconfig I'm working on a fix so that this driver stops crashing though. Ben. From grave at ipno.in2p3.fr Tue Oct 5 19:25:20 2004 From: grave at ipno.in2p3.fr (grave) Date: Tue, 05 Oct 2004 09:25:20 +0000 Subject: =?iso-8859-1?q?Re=A0=3A?= xserve and 2.6.9-rc3 and 2.6.9-rc3-bk4 In-Reply-To: <1096966003.24535.48.camel@gaston> (from benh@kernel.crashing.org on Tue Oct 5 10:46:43 2004) References: <1096965664l.7230l.0l@ipnnarval> <1096966003.24535.48.camel@gaston> Message-ID: <1096968320l.7230l.2l@ipnnarval> Le 05.10.2004 10:46:43, Benjamin Herrenschmidt a ?crit?: > On Tue, 2004-10-05 at 18:41, grave wrote: > > Hi, > > > > I've tryed both kernel and got crashes (see attached files). > > > > Do I missed a patch ? > > > > xavier > > PS:2.6.6 + smp patch run fine > > That's your .config > > You have enabled the legacy x86 keyboard support ! :) > > Use a g5_defconfig It works now... Thanks one more time ! From schwab at suse.de Tue Oct 5 19:50:44 2004 From: schwab at suse.de (Andreas Schwab) Date: Tue, 05 Oct 2004 11:50:44 +0200 Subject: Sound on G5 In-Reply-To: <1096939502.24584.6.camel@gaston> (Benjamin Herrenschmidt's message of "Tue, 05 Oct 2004 11:25:03 +1000") References: <1096939502.24584.6.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > So we could go that way and complete my "interpreter" or just hard code > all of the GPIOs we need in the driver hoping apple don't shuffle them > too much in upcoming models... I would be happy to test anything that is available. Thanks, Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From paulus at samba.org Tue Oct 5 20:38:52 2004 From: paulus at samba.org (Paul Mackerras) Date: Tue, 5 Oct 2004 20:38:52 +1000 Subject: [PPC64] xmon sparse cleanups In-Reply-To: <20041005064255.GF3695@zax> References: <20041005064255.GF3695@zax> Message-ID: <16738.31164.464250.638432@cargo.ozlabs.ibm.com> David Gibson writes: > Andrew, please apply: > > This patch removes many sparse warnings from the xmon code. Mostly > K&R function declarations and 0-instead-of-NULLs. The trouble with this patch is that it makes ppc-opc.c diverge from the version in binutils, which is where it came from. I'd rather keep it as close as possible to that version. I have no problem with the changes to the other files. Paul. From igor at cs.wisc.edu Wed Oct 6 03:46:53 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Tue, 5 Oct 2004 12:46:53 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <3337F539-14B0-11D9-AE7A-000A95A4DC02@kernel.crashing.org> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> <3337F539-14B0-11D9-AE7A-000A95A4DC02@kernel.crashing.org> Message-ID: On Sat, 2 Oct 2004, Segher Boessenkool wrote: > > A question for the rest of you, who haven't been following this thread. > > Is there publicly available documentation on the power4 extensions, > > specifically the large page support, how it effects the HPT hashing, > > and > > the SLB, including the new instructions for maintaining it in software? > > I haven't been able to find anything yet. > > http://www-106.ibm.com/developerworks/eserver/pdfs/archpub3.pdf > > has some info, don't know if that is enough for you -- nothing > much POWER4 specific in there, but large pages are part of the > architecture, so it does talk about the instructions to handle > them etc. Thanks, this is what I was looking for. -Igor From igor at cs.wisc.edu Wed Oct 6 03:45:47 2004 From: igor at cs.wisc.edu (Igor Grobman) Date: Tue, 5 Oct 2004 12:45:47 -0500 (CDT) Subject: mapping memory in 0xb space In-Reply-To: <20041001040325.GB12890@zax> References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: One more followup on this issue, since I do have the base code working now. The problem was in the fact that do_slb_bolted code sets the large page bit in the SLB entry, but my code (and particularly hpte_insert code) did not insert a proper large page mapping. On Fri, 1 Oct 2004, David Gibson wrote: > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > As for why I thought 0xbff would work, I reasoned that > > since the highest bits are masked out in get_kernel_vsid(), and since > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > that is the same as some other VSID in 0xb region. However, I did not > > consider the bug in do_slb_bolted that you describe below. > > Yes, with that bug the collision can be with a segment anywhere, not > just in the 0xb region. > I am not convinced anymore. The lower 36 bits of the ordinal are still the same in do_slb_bolted and get_kernel_vsid. Multiplying the ordinal by the 36-bit randomizer should produce the same lower 36 bits whether or not the upper bits are different. do_slb_bolted eventually clears the upper 28 bits, before using the VSID. I no longer think there can be a conflict outside the 0xb region. Is my reasoning correct? > > > You may have seen the comment in do_slb_bolted which claims to permit > > > a full 32-bits of ESID - it's wrong. The code doesn't mask the ESID > > > down to 13 bits as get_kernel_vsid() does, but it probably should - an > > > overlarge ESID will cause collisions with VSIDs from entirely > > > different address places, which would be a Bad Thing. > > > > This must be happening, although I would still like to know why it > > misbehaves even within the valid VSID range. > > > > > > > > Actually, you should be able to allow ESIDs of up to 21 bits there (36 > > > bit VSID - 15 bits of "context"). But you will need to make sure > > > get_kernel_vsid(), or whatever you're using to calculate the VAs for > > > the hash HPTEs is updated to match - at the moment I think it will > > > mask down to 13 bits. I'm not sure if that will get you sufficiently > > > close to 0xc0... for your purposes. > > Thanks, Igor From caveman at boxacle.net Wed Oct 6 04:24:25 2004 From: caveman at boxacle.net (CAVEMAN) Date: Tue, 5 Oct 2004 13:24:25 -0500 Subject: Sound on G5 In-Reply-To: <1096939502.24584.6.camel@gaston> References: <1096939502.24584.6.camel@gaston> Message-ID: <200410051324.25817@laptop> On Monday 04 October 2004 20:25, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-05 at 06:51, Andreas Schwab wrote: > > Is anyone already working on sound support for the PowerMac G5, by > > chance? That's actually the only thing still missing. > > Nobody really seriously ATM. One of the main issue is that the darwin > driver abuses apple "do-platform-*" shit. It's a mecanism they invented > to put sort-of "scripts" (in binary form) in the device-tree that can > contains elementary ops such as write GPIOs, I2C, etc... > > This is extremely messy and difficult to parse. I have written the > basis for parsing them, but interpreting them is even more shitty as > the actual implementation of each ops sort-of depends on the target > object. > > It's really a piece-of-shit imho. > > So we could go that way and complete my "interpreter" or just hard code > all of the GPIOs we need in the driver hoping apple don't shuffle them > too much in upcoming models... I'd be willing to do some work and/or testing on this, where can I get the code? Regards, caveman From rmk+lkml at arm.linux.org.uk Wed Oct 6 17:26:59 2004 From: rmk+lkml at arm.linux.org.uk (Russell King) Date: Wed, 6 Oct 2004 08:26:59 +0100 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <1096534248.32721.36.camel@gaston>; from benh@kernel.crashing.org on Thu, Sep 30, 2004 at 06:50:48PM +1000 References: <1096534248.32721.36.camel@gaston> Message-ID: <20041006082658.A18379@flint.arm.linux.org.uk> On Thu, Sep 30, 2004 at 06:50:48PM +1000, Benjamin Herrenschmidt wrote: > +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS > static struct old_serial_port old_serial_port[] = { > SERIAL_PORT_DFNS /* defined in asm/serial.h */ > }; > - > +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) > +{ > + *count = ARRAY_SIZE(old_serial_port); > + return old_serial_port; > +} > #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) > +#endif /* ARCH_HAS_GET_LEGACY_SERIAL_PORTS */ > + What happens if 8250.c is built as a module and ARCH_HAS_GET_LEGACY_SERIAL_PORTS is defined? > diff -urN linux-2.5/include/linux/serial.h linux-maple/include/linux/serial.h > --- linux-2.5/include/linux/serial.h 2004-09-30 18:31:55.867785437 +1000 > +++ linux-maple/include/linux/serial.h 2004-09-30 15:36:57.981697919 +1000 > @@ -14,6 +14,21 @@ > #include > > /* > + * Definition of a legacy serial port > + */ > +struct old_serial_port { > + unsigned int uart; > + unsigned int baud_base; > + unsigned int port; > + unsigned int irq; > + unsigned int flags; > + unsigned char hub6; > + unsigned char io_type; > + unsigned char *iomem_base; > + unsigned short iomem_reg_shift; > +}; > + > +/* > * Counters of the input lines (CTS, DSR, RI, CD) interrupts > */ serial.h is used by userspace programs. We should not expose this structure to those programs. Instead, maybe creating an 8250.h header, or even moving the existing 8250.h header ? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core From benh at kernel.crashing.org Wed Oct 6 18:15:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Oct 2004 18:15:11 +1000 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <20041006082658.A18379@flint.arm.linux.org.uk> References: <1096534248.32721.36.camel@gaston> <20041006082658.A18379@flint.arm.linux.org.uk> Message-ID: <1097050508.21132.15.camel@gaston> On Wed, 2004-10-06 at 17:26, Russell King wrote: > On Thu, Sep 30, 2004 at 06:50:48PM +1000, Benjamin Herrenschmidt wrote: > > +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS > > static struct old_serial_port old_serial_port[] = { > > SERIAL_PORT_DFNS /* defined in asm/serial.h */ > > }; > > - > > +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) > > +{ > > + *count = ARRAY_SIZE(old_serial_port); > > + return old_serial_port; > > +} > > #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) > > +#endif /* ARCH_HAS_GET_LEGACY_SERIAL_PORTS */ > > + > > What happens if 8250.c is built as a module and > ARCH_HAS_GET_LEGACY_SERIAL_PORTS is defined? It well call get_legacy_serial_ports() which is hopefully exported by the arch code. > serial.h is used by userspace programs. We should not expose this > structure to those programs. Instead, maybe creating an 8250.h > header, or even moving the existing 8250.h header ? Hrm... ok. Or adding a #ifdef __KERNEL__ (sic !) :) I'll send you a new patch later today as I had to do another fix, we tend to "force" register_console() apparently even when we have nothing to register because we set the "ops" to all ports even those who were never configured and we test "ops" to decide wether to succeed or fail in the console setup() callback. Ben. From benh at kernel.crashing.org Wed Oct 6 19:07:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 06 Oct 2004 19:07:44 +1000 Subject: [RFC][PATCH] Way for platforms to alter built-in serial ports In-Reply-To: <20041006082658.A18379@flint.arm.linux.org.uk> References: <1096534248.32721.36.camel@gaston> <20041006082658.A18379@flint.arm.linux.org.uk> Message-ID: <1097053663.21132.56.camel@gaston> On Wed, 2004-10-06 at 17:26, Russell King wrote: > serial.h is used by userspace programs. We should not expose this > structure to those programs. Instead, maybe creating an 8250.h > header, or even moving the existing 8250.h header ? Here's a new version of that patch that moves 8250.h to include/linux, moves the definition of old_serial_ports there, and also corrects the problem I told you about with serial console. Let me know if I can send it to Andrew... Ben. diff -urN linux-2.5/drivers/serial/8250.c linux-maple/drivers/serial/8250.c --- linux-2.5/drivers/serial/8250.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250.c 2004-10-06 19:05:13.042342513 +1000 @@ -41,7 +41,7 @@ #endif #include -#include "8250.h" +#include /* * Configuration: @@ -112,11 +112,18 @@ #define SERIAL_PORT_DFNS #endif +#ifndef ARCH_HAS_GET_LEGACY_SERIAL_PORTS static struct old_serial_port old_serial_port[] = { SERIAL_PORT_DFNS /* defined in asm/serial.h */ }; - +static inline struct old_serial_port *get_legacy_serial_ports(unsigned int *count) +{ + *count = ARRAY_SIZE(old_serial_port); + return old_serial_port; +} #define UART_NR (ARRAY_SIZE(old_serial_port) + CONFIG_SERIAL_8250_NR_UARTS) +#endif /* ARCH_HAS_DYNAMIC_LEGACY_SERIAL_PORTS */ + #ifdef CONFIG_SERIAL_8250_RSA @@ -1839,22 +1846,28 @@ { struct uart_8250_port *up; static int first = 1; + struct old_serial_port *old_ports; + int count; int i; if (!first) return; first = 0; - for (i = 0, up = serial8250_ports; i < ARRAY_SIZE(old_serial_port); + old_ports = get_legacy_serial_ports(&count); + if (old_ports == NULL) + return; + + for (i = 0, up = serial8250_ports; i < count; i++, up++) { - up->port.iobase = old_serial_port[i].port; - up->port.irq = irq_canonicalize(old_serial_port[i].irq); - up->port.uartclk = old_serial_port[i].baud_base * 16; - up->port.flags = old_serial_port[i].flags; - up->port.hub6 = old_serial_port[i].hub6; - up->port.membase = old_serial_port[i].iomem_base; - up->port.iotype = old_serial_port[i].io_type; - up->port.regshift = old_serial_port[i].iomem_reg_shift; + up->port.iobase = old_ports[i].port; + up->port.irq = irq_canonicalize(old_ports[i].irq); + up->port.uartclk = old_ports[i].baud_base * 16; + up->port.flags = old_ports[i].flags; + up->port.hub6 = old_ports[i].hub6; + up->port.membase = old_ports[i].iomem_base; + up->port.iotype = old_ports[i].io_type; + up->port.regshift = old_ports[i].iomem_reg_shift; up->port.ops = &serial8250_pops; if (share_irqs) up->port.flags |= UPF_SHARE_IRQ; @@ -1870,6 +1883,9 @@ for (i = 0; i < UART_NR; i++) { struct uart_8250_port *up = &serial8250_ports[i]; + if (!up->port.iobase) + continue; + up->port.line = i; up->port.ops = &serial8250_pops; init_timer(&up->timer); diff -urN linux-2.5/drivers/serial/8250.h linux-maple/drivers/serial/8250.h --- linux-2.5/drivers/serial/8250.h 2004-09-30 18:31:42.000000000 +1000 +++ /dev/null 2004-10-05 22:10:47.391719208 +1000 @@ -1,71 +0,0 @@ -/* - * linux/drivers/char/8250.h - * - * Driver for 8250/16550-type serial ports - * - * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. - * - * Copyright (C) 2001 Russell King. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * $Id: 8250.h,v 1.8 2002/07/21 21:32:30 rmk Exp $ - */ - -#include - -void serial8250_get_irq_map(unsigned int *map); -void serial8250_suspend_port(int line); -void serial8250_resume_port(int line); - -struct old_serial_port { - unsigned int uart; - unsigned int baud_base; - unsigned int port; - unsigned int irq; - unsigned int flags; - unsigned char hub6; - unsigned char io_type; - unsigned char *iomem_base; - unsigned short iomem_reg_shift; -}; - -/* - * This replaces serial_uart_config in include/linux/serial.h - */ -struct serial8250_config { - const char *name; - unsigned int fifo_size; - unsigned int tx_loadsz; - unsigned int flags; -}; - -#define UART_CAP_FIFO (1 << 8) /* UART has FIFO */ -#define UART_CAP_EFR (1 << 9) /* UART has EFR */ -#define UART_CAP_SLEEP (1 << 10) /* UART has IER sleep */ - -#undef SERIAL_DEBUG_PCI - -#if defined(__i386__) && (defined(CONFIG_M386) || defined(CONFIG_M486)) -#define SERIAL_INLINE -#endif - -#ifdef SERIAL_INLINE -#define _INLINE_ inline -#else -#define _INLINE_ -#endif - -#define PROBE_RSA (1 << 0) -#define PROBE_ANY (~0) - -#define HIGH_BITS_OFFSET ((sizeof(long)-sizeof(int))*8) - -#ifdef CONFIG_SERIAL_8250_SHARE_IRQ -#define SERIAL8250_SHARE_IRQS 1 -#else -#define SERIAL8250_SHARE_IRQS 0 -#endif diff -urN linux-2.5/drivers/serial/8250_pci.c linux-maple/drivers/serial/8250_pci.c --- linux-2.5/drivers/serial/8250_pci.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250_pci.c 2004-10-06 19:05:41.301674308 +1000 @@ -25,13 +25,12 @@ #include #include #include +#include #include #include #include -#include "8250.h" - /* * Definitions for PCI support. */ diff -urN linux-2.5/drivers/serial/8250_pnp.c linux-maple/drivers/serial/8250_pnp.c --- linux-2.5/drivers/serial/8250_pnp.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/8250_pnp.c 2004-10-06 19:05:55.788749883 +1000 @@ -25,13 +25,12 @@ #include #include #include +#include #include #include #include -#include "8250.h" - #define UNKNOWN_DEV 0x3000 diff -urN linux-2.5/drivers/serial/au1x00_uart.c linux-maple/drivers/serial/au1x00_uart.c --- linux-2.5/drivers/serial/au1x00_uart.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/au1x00_uart.c 2004-10-06 19:07:39.461032916 +1000 @@ -40,7 +40,7 @@ #endif #include -#include "8250.h" +#include /* * Debugging. diff -urN linux-2.5/drivers/serial/serial_cs.c linux-maple/drivers/serial/serial_cs.c --- linux-2.5/drivers/serial/serial_cs.c 2004-09-30 18:31:42.000000000 +1000 +++ linux-maple/drivers/serial/serial_cs.c 2004-10-06 19:07:35.059700476 +1000 @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -55,8 +56,6 @@ #include #include -#include "8250.h" - #ifdef PCMCIA_DEBUG static int pc_debug = PCMCIA_DEBUG; MODULE_PARM(pc_debug, "i"); diff -urN linux-2.5/include/linux/8250.h linux-maple/include/linux/8250.h --- /dev/null 2004-10-05 22:10:47.391719208 +1000 +++ linux-maple/include/linux/8250.h 2004-10-06 19:06:45.680713598 +1000 @@ -0,0 +1,74 @@ +/* + * linux/drivers/char/8250.h + * + * Driver for 8250/16550-type serial ports + * + * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. + * + * Copyright (C) 2001 Russell King. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * $Id: 8250.h,v 1.8 2002/07/21 21:32:30 rmk Exp $ + */ + +#include + +void serial8250_get_irq_map(unsigned int *map); +void serial8250_suspend_port(int line); +void serial8250_resume_port(int line); + +/* + * This replaces serial_uart_config in include/linux/serial.h + */ +struct serial8250_config { + const char *name; + unsigned int fifo_size; + unsigned int tx_loadsz; + unsigned int flags; +}; + +#define UART_CAP_FIFO (1 << 8) /* UART has FIFO */ +#define UART_CAP_EFR (1 << 9) /* UART has EFR */ +#define UART_CAP_SLEEP (1 << 10) /* UART has IER sleep */ + +/* + * Definition of a legacy serial port + */ +struct old_serial_port { + unsigned int uart; + unsigned int baud_base; + unsigned int port; + unsigned int irq; + unsigned int flags; + unsigned char hub6; + unsigned char io_type; + unsigned char *iomem_base; + unsigned short iomem_reg_shift; +}; + +#undef SERIAL_DEBUG_PCI + +#if defined(__i386__) && (defined(CONFIG_M386) || defined(CONFIG_M486)) +#define SERIAL_INLINE +#endif + +#ifdef SERIAL_INLINE +#define _INLINE_ inline +#else +#define _INLINE_ +#endif + +#define PROBE_RSA (1 << 0) +#define PROBE_ANY (~0) + +#define HIGH_BITS_OFFSET ((sizeof(long)-sizeof(int))*8) + +#ifdef CONFIG_SERIAL_8250_SHARE_IRQ +#define SERIAL8250_SHARE_IRQS 1 +#else +#define SERIAL8250_SHARE_IRQS 0 +#endif From clmason at gmail.com Thu Oct 7 00:58:56 2004 From: clmason at gmail.com (Chris L. Mason) Date: Wed, 6 Oct 2004 11:58:56 -0300 Subject: iMac G5 available for testing Message-ID: <610e346604100607581144298e@mail.gmail.com> Hi all, I have a new iMac G5/1.8 GHz/17-inch system that I would like to make available for testing/debugging. If you have anything you would like me to try booting, checking in open firmware, etc., let me know. Thanks, Chris From benh at kernel.crashing.org Thu Oct 7 08:36:09 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Oct 2004 08:36:09 +1000 Subject: iMac G5 available for testing In-Reply-To: <610e346604100607581144298e@mail.gmail.com> References: <610e346604100607581144298e@mail.gmail.com> Message-ID: <1097102169.8448.14.camel@gaston> On Thu, 2004-10-07 at 00:58, Chris L. Mason wrote: > Hi all, > > I have a new iMac G5/1.8 GHz/17-inch system that I would like to make > available for testing/debugging. If you have anything you would like > me to try booting, checking in open firmware, etc., let me know. We have ordered one here. It will require some reverse engineering work since it's a new rev of the chipset and the good old PMU chip was finally, years later, replaced by a new "SMU" that is totally undocumented of course... Ben. From clmason at gmail.com Thu Oct 7 09:12:04 2004 From: clmason at gmail.com (Chris L. Mason) Date: Wed, 6 Oct 2004 20:12:04 -0300 Subject: iMac G5 available for testing In-Reply-To: <1097102169.8448.14.camel@gaston> References: <610e346604100607581144298e@mail.gmail.com> <1097102169.8448.14.camel@gaston> Message-ID: <610e34660410061612379af1c8@mail.gmail.com> On Thu, 07 Oct 2004 08:36:09 +1000, Benjamin Herrenschmidt wrote: > > > On Thu, 2004-10-07 at 00:58, Chris L. Mason wrote: > > Hi all, > > > > I have a new iMac G5/1.8 GHz/17-inch system that I would like to make > > available for testing/debugging. If you have anything you would like > > me to try booting, checking in open firmware, etc., let me know. > > We have ordered one here. It will require some reverse engineering work > since it's a new rev of the chipset and the good old PMU chip was finally, > years later, replaced by a new "SMU" that is totally undocumented of course... > Ah, wonderful. :) The good news is that with tgall's latest debug kernel, I do at least get to boot as far the ata drive detection before it freezes, although it gets kernel error too right after the tux logo. Here's an image of my boot attempt: http://homepage.mac.com/clmason/imacboot.jpg (Sorry for the bad quality of the image) Segher also told me how to use the romgrabber. I have a copy up at: http://homepage.mac.com/clmason/OF-5.2.2f1-2004-08-18 Chris From david at gibson.dropbear.id.au Thu Oct 7 11:01:54 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 7 Oct 2004 11:01:54 +1000 Subject: mapping memory in 0xb space In-Reply-To: References: <20040929014017.GC5470@zax> <20041001040325.GB12890@zax> Message-ID: <20041007010154.GC25012@zax> On Tue, Oct 05, 2004 at 12:45:47PM -0500, Igor Grobman wrote: > One more followup on this issue, since I do have the base code working > now. The problem was in the fact that do_slb_bolted code sets the large > page bit in the SLB entry, but my code (and particularly hpte_insert code) > did not insert a proper large page mapping. > > > On Fri, 1 Oct 2004, David Gibson wrote: > > On Wed, Sep 29, 2004 at 12:14:08AM -0500, Igor Grobman wrote: > > > On Wed, 29 Sep 2004, David Gibson wrote: > > > > > > > On Tue, Sep 28, 2004 at 01:52:16PM -0500, Igor Grobman wrote: > > > > > On Tue, 28 Sep 2004, David Gibson wrote: > > > As for why I thought 0xbff would work, I reasoned that > > > since the highest bits are masked out in get_kernel_vsid(), and since > > > nobody else is using the 0xb region, it doesn't matter if I get a VSID > > > that is the same as some other VSID in 0xb region. However, I did not > > > consider the bug in do_slb_bolted that you describe below. > > > > Yes, with that bug the collision can be with a segment anywhere, not > > just in the 0xb region. > > > > I am not convinced anymore. The lower 36 bits of the ordinal are still > the same in do_slb_bolted and get_kernel_vsid. Multiplying the ordinal > by the 36-bit randomizer should produce the same lower 36 bits whether or > not the upper bits are different. do_slb_bolted eventually clears the > upper 28 bits, before using the VSID. I no longer think there can be > a conflict outside the 0xb region. Is my reasoning correct? Ah, yes, I think it is. Sorry, I guess I wasn't thinking very clearly when I decided the collisions could be anywhere. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Thu Oct 7 18:30:09 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 07 Oct 2004 18:30:09 +1000 Subject: Gothic horrors in pci_dn.c Message-ID: <1097137808.4894.73.camel@gaston> Hi ! To all those who had to deal with the guts of the PCI layer on ppc64, I'd like your comments about these and what do you think I may break. Currently, the code in pci_dn.c does 2 things articulated around a single function. That function is traverse_pci_devices() and is supposed to traverse the PCI tree exposed by Open Firmware and call back a function passed as an argument for each node in the tree. The 2 things it does are - Setting up the "devfn" and "busno" fields of the device nodes in the tree in an initial traversal pass at boot - "Finding" a device node for a given pci_dev at any time However, the current code does a number of assumptions and is bogus in many cases. Among the issues are: - The tree traversal goes all the way down the tree only skipping things that don't have a "class code". That means potentially walking on subtrees of a PCI device that aren't PCI themselves (USB ? FireWire ?) and we have no guarantee that those busses have no "class-code" property, though we are sure to misinterpret anything we find here. - We try to manipulate host bridge nodes as if they were PCI devices, which leads us to various funny and totally bogus special cases. First, in update_dn_pci_info(), where we have an "intersting" (at least) heuristic to find out if a node if a host bridge or not, with an horrible special case for avoiding setting the devfn 0 on U3 on blades, and then we "use" those devfn and busno of the host bridge property in is_devfn_node() later on when trying to match which is why we have to do the above bogus workaround. - Our firmware (and Apple's too in some cases) is broken in the sense that it doesn't show the host bridge in the tree as a PCI device. Host bridges that are themselves visible as devices on their own PCI bus should have an additional node in the PCI domain named "host" that represent them. The solution to this however is very simple, but I need to make sure I won't break anything else by doing so. It's based on a few facts: - The "node" of the host bridge is _NOT_ a PCI node, and thus should not be traversed by traverse_pci_devices(). This is very easy to do without any assumption due to the way this function works, just remove 2 lines near the beginning before the for loop. - The result for the update_dn_pci_info() pass is that we can rip off the workaround completely. busno and devfn in the host bridge node are undefined and that how they should be as they won't be traversed. There is no "driver" for the host bridge that should make use of them. - Same thing with is_devfn_node(). - We initialize "sysdata" of all pci_dev to point the the host bridge by default. So if the host bridge happens to have an associated pci_dev, and no "specific" node (as explained above), then we'll point to the root node of that pci tree which is exactly what we want, cool ! - Now the only remaining problem is the test if (dn->devfn == dev->devfn && dn->busno == (dev->bus->number&0xff)) Which will result in incorrect result if the host bridge has undefined (and typically 0) values in devfn and busno fields and the device we are looking for happens to really be 0:00.0. This is fixed by forcing those fields on all PHB nodes to -1. (No special U3 case, all of them). Here's a patch (untested, it's getting late here) implementing those, I need to know if it will work at all. Comments welcome :) Note to Anton & Milton: Pretty much nothing relies anymore on the device nodes for PCI devices to exist. The only mandatory ones are PHBs, but you can easily statically lay them out in a static device-tree blob for BM. By default, all pci_dev point to the PHB. I have a couple of fixes coming in for u3_iommu to properly setup iommu_table for PHB nodes (it forgot to do it) and I confirm it works with no OF nodes for the devices themselves. Config space accesses never need the OF node neither except when you have RTAS, but then you don't care since you have real nodes for everything. I added a simple helper to my tree (will be pushed after 2.6.9) that gives you the pci_controller* from the pci_dev* without doing a full device-tree walk, and I use that for pmac & maple. You should do the same for PM. Ben. ===== arch/ppc64/kernel/pci_dn.c 1.18 vs edited ===== --- 1.18/arch/ppc64/kernel/pci_dn.c 2004-10-05 17:24:47 +10:00 +++ edited/arch/ppc64/kernel/pci_dn.c 2004-10-07 18:35:41 +10:00 @@ -46,28 +46,13 @@ { struct pci_controller *phb = data; u32 *regs; - char *device_type = get_property(dn, "device_type", NULL); - char *model; dn->phb = phb; - if (device_type && (strcmp(device_type, "pci") == 0) && - (get_property(dn, "class-code", NULL) == 0)) { - /* special case for PHB's. Sigh. */ - regs = (u32 *)get_property(dn, "bus-range", NULL); - dn->busno = regs[0]; - - model = (char *)get_property(dn, "model", NULL); - if (model && strstr(model, "U3")) - dn->devfn = -1; - else - dn->devfn = 0; /* assumption */ - } else { - regs = (u32 *)get_property(dn, "reg", NULL); - if (regs) { - /* First register entry is addr (00BBSS00) */ - dn->busno = (regs[0] >> 16) & 0xff; - dn->devfn = (regs[0] >> 8) & 0xff; - } + regs = (u32 *)get_property(dn, "reg", NULL); + if (regs) { + /* First register entry is addr (00BBSS00) */ + dn->busno = (regs[0] >> 16) & 0xff; + dn->devfn = (regs[0] >> 8) & 0xff; } return NULL; } @@ -96,20 +81,25 @@ struct device_node *dn, *nextdn; void *ret; - if (pre && ((ret = pre(start, data)) != NULL)) - return ret; + /* We started with a phb, iterate all childs */ for (dn = start->child; dn; dn = nextdn) { + u32 *classp, class; + nextdn = NULL; - if (get_property(dn, "class-code", NULL)) { - if (pre && ((ret = pre(dn, data)) != NULL)) - return ret; - if (dn->child) - /* Depth first...do children */ - nextdn = dn->child; - else if (dn->sibling) - /* ok, try next sibling instead. */ - nextdn = dn->sibling; - } + classp = (u32 *)get_property(dn, "class-code", NULL); + class = classp ? *classp : 0; + + if (pre && ((ret = pre(dn, data)) != NULL)) + return ret; + + /* If we are a PCI bridge, go down */ + if (dn->child && (class >> 8) == PCI_CLASS_BRIDGE_PCI && + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS) + /* Depth first...do children */ + nextdn = dn->child; + else if (dn->sibling) + /* ok, try next sibling instead. */ + nextdn = dn->sibling; if (!nextdn) { /* Walk up to next valid sibling. */ do { @@ -123,21 +113,6 @@ return NULL; } -/* - * Same as traverse_pci_devices except this does it for all phbs. - */ -static void *traverse_all_pci_devices(traverse_func pre) -{ - struct pci_controller *phb, *tmp; - void *ret; - - list_for_each_entry_safe(phb, tmp, &hose_list, list_node) - if ((ret = traverse_pci_devices(phb->arch_data, pre, phb)) - != NULL) - return ret; - return NULL; -} - /* * Traversal func that looks for a value. @@ -147,6 +122,7 @@ { int busno = ((unsigned long)data >> 8) & 0xff; int devfn = ((unsigned long)data) & 0xff; + return ((devfn == dn->devfn) && (busno == dn->busno)) ? dn : NULL; } @@ -173,10 +149,8 @@ phb_dn = phb->arch_data; dn = traverse_pci_devices(phb_dn, is_devfn_node, (void *)searchval); - if (dn) { + if (dn) dev->sysdata = dn; - /* ToDo: call some device init hook here */ - } return dn; } EXPORT_SYMBOL(fetch_dev_dn); @@ -188,8 +162,16 @@ */ void __init pci_devs_phb_init(void) { + struct pci_controller *phb, *tmp; + /* This must be done first so the device nodes have valid pci info! */ - traverse_all_pci_devices(update_dn_pci_info); + list_for_each_entry_safe(phb, tmp, &hose_list, list_node) { + struct device_node * dn = (struct device_node *) phb->arch_data; + /* PHB nodes themselves must not match */ + dn->devfn = dn->busno = -1; + dn->phb = phb; + traverse_pci_devices(phb->arch_data, update_dn_pci_info, phb); + } } From hollisb at us.ibm.com Thu Oct 7 20:40:27 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Thu, 7 Oct 2004 10:40:27 +0000 Subject: [patch] HVSI udbg Message-ID: <200410071040.27907.hollisb@us.ibm.com> This fixes a long-standing omission in HVSI support: dropping to xmon would basically hang your system, as there was no udbg code to read/write chars from xmon. It's based on the existing "LP" routines. Could we get this pushed upstream soon? -- Hollis Blanchard IBM Linux Technology Center ===== arch/ppc64/kernel/pSeries_lpar.c 1.41 vs edited ===== --- 1.41/arch/ppc64/kernel/pSeries_lpar.c Tue Sep 21 23:40:30 2004 +++ edited/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 10:52:23 2004 @@ -59,6 +59,74 @@ int vtermno; /* virtual terminal# for udbg */ +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long)))) +static void udbg_hvsi_putc(unsigned char c) +{ + /* packet's seqno isn't used anyways */ + uint8_t packet[] __ALIGNED__ = { 0xff, 5, 0, 0, c }; + int rc; + + if (c == '\n') + udbg_hvsi_putc('\r'); + + do { + rc = plpar_put_term_char(vtermno, sizeof(packet), packet); + } while (rc == H_Busy); +} + +static long hvsi_udbg_buf_len; +static uint8_t hvsi_udbg_buf[256]; + +static int udbg_hvsi_getc_poll(void) +{ + unsigned char ch; + int rc, i; + + if (hvsi_udbg_buf_len == 0) { + rc = plpar_get_term_char(vtermno, &hvsi_udbg_buf_len, hvsi_udbg_buf); + if (rc != H_Success || hvsi_udbg_buf[0] != 0xff) { + /* bad read or non-data packet */ + hvsi_udbg_buf_len = 0; + } else { + /* remove the packet header */ + for (i = 4; i < hvsi_udbg_buf_len; i++) + hvsi_udbg_buf[i-4] = hvsi_udbg_buf[i]; + hvsi_udbg_buf_len -= 4; + } + } + + if (hvsi_udbg_buf_len <= 0 || hvsi_udbg_buf_len > 256) { + /* no data ready */ + hvsi_udbg_buf_len = 0; + return -1; + } + + ch = hvsi_udbg_buf[0]; + /* shift remaining data down */ + for (i = 1; i < hvsi_udbg_buf_len; i++) { + hvsi_udbg_buf[i-1] = hvsi_udbg_buf[i]; + } + hvsi_udbg_buf_len--; + + return ch; +} + +static unsigned char udbg_hvsi_getc(void) +{ + int ch; + for (;;) { + ch = udbg_hvsi_getc_poll(); + if (ch == -1) { + /* This shouldn't be needed...but... */ + volatile unsigned long delay; + for (delay=0; delay < 2000000; delay++) + ; + } else { + return ch; + } + } +} + static void udbg_putcLP(unsigned char c) { char buf[16]; @@ -167,11 +235,15 @@ ppc_md.udbg_getc_poll = udbg_getc_pollLP; found = 1; } - } else { - /* XXX implement udbg_putcLP_vtty for hvterm-protocol1 case */ - printk(KERN_WARNING "%s doesn't speak hvterm1; " - "can't print udbg messages\n", - stdout_node->full_name); + } else if (device_is_compatible(stdout_node, "hvterm-protocol")) { + termno = (u32 *)get_property(stdout_node, "reg", NULL); + if (termno) { + vtermno = termno[0]; + ppc_md.udbg_putc = udbg_hvsi_putc; + ppc_md.udbg_getc = udbg_hvsi_getc; + ppc_md.udbg_getc_poll = udbg_hvsi_getc_poll; + found = 1; + } } } else if (strncmp(name, "serial", 6)) { /* XXX fix ISA serial console */ From johnrose at austin.ibm.com Fri Oct 8 03:54:21 2004 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 07 Oct 2004 12:54:21 -0500 Subject: [PATCH] create iommu_free_table() Message-ID: <1097171661.7087.1.camel@sinatra.austin.ibm.com> The patch below creates iommu_free_table(). Iommu tables are not currently freed in PPC64. This could cause a memory leak for DLPAR of an EADS slot. The function verifies that there are no outstanding TCE entries for the range of the table before freeing it. I added a call to iommu_free_table() to the code that dynamically removes a device node. This should be fairly symmetrical with the table allocation, which happens during dynamic addition of a device node. Comments welcome. Thanks- John Signed-off-by: John Rose diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c --- a/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 +++ b/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 @@ -412,6 +412,38 @@ dn->iommu_table = iommu_init_table(tbl); } +void iommu_free_table(struct device_node *dn) +{ + struct iommu_table *tbl = dn->iommu_table; + unsigned long bitmap_sz, i; + unsigned int order; + + if (!tbl || !tbl->it_map) { + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, + dn->full_name); + return; + } + + /* verify that table contains no entries */ + /* it_mapsize is in entries, and we're examining 64 at a time */ + for (i = 0; i < (tbl->it_mapsize/64); i++) { + if (tbl->it_map[i] != 0) { + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", + __FUNCTION__, dn->full_name); + break; + } + } + + /* calculate bitmap size in bytes */ + bitmap_sz = (tbl->it_mapsize + 7) / 8; + + /* free bitmap */ + order = get_order(bitmap_sz); + free_pages((unsigned long) tbl->it_map, order); + + /* free table */ + kfree(tbl); +} void iommu_setup_pSeries(void) { diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c --- a/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 +++ b/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 @@ -1818,6 +1818,9 @@ return -EBUSY; } + if (np->iommu_table) + iommu_free_table(np); + write_lock(&devtree_lock); OF_MARK_STALE(np); remove_node_proc_entries(np); diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h --- a/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 +++ b/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 @@ -113,6 +113,9 @@ /* Creates table for an individual device node */ extern void iommu_devnode_init(struct device_node *dn); +/* Frees table for an individual device node */ +extern void iommu_free_table(struct device_node *dn); + #endif /* CONFIG_PPC_MULTIPLATFORM */ #ifdef CONFIG_PPC_ISERIES From linas at austin.ibm.com Fri Oct 8 04:13:35 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 7 Oct 2004 13:13:35 -0500 Subject: [linas: [PATCH] PPC64: crash during firmware flash update] Message-ID: <20041007181335.GA21633@austin.ibm.com> Sent to the wrong mailing list :) ----- Forwarded message from linas ----- To: paulus at samba.org, anton at samba.org Cc: linuxppc64-dev at lists.linuxppc.org Subject: [PATCH] PPC64: crash during firmware flash update Race conditions during system shutdown after a firmware flash can sometimes lead to an invalid pointer deref (deref to freed memory). This patch fixes this. In addition, it makes sure that the proc entries created by the firmware flash module are removed when the module is unloaded. Signed-off-by: Linas Vepstas --- a/arch/ppc64/kernel/rtas_flash.c.orig 2004-09-20 11:59:18.000000000 -0500 +++ b/arch/ppc64/kernel/rtas_flash.c 2004-10-06 11:19:45.000000000 -0500 @@ -562,6 +562,7 @@ static int validate_flash_release(struct validate_flash(args_buf); } + /* The matching atomic_inc was in rtas_excl_open() */ atomic_dec(&dp->count); return 0; @@ -572,7 +573,8 @@ static void remove_flash_pde(struct proc if (dp) { if (dp->data != NULL) kfree(dp->data); - remove_proc_entry(dp->name, NULL); + dp->owner = NULL; + remove_proc_entry(dp->name, dp->parent); } } ----- End forwarded message ----- From benh at kernel.crashing.org Fri Oct 8 12:27:07 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 08 Oct 2004 12:27:07 +1000 Subject: Gothic horrors in pci_dn.c In-Reply-To: <1097137808.4894.73.camel@gaston> References: <1097137808.4894.73.camel@gaston> Message-ID: <1097202427.846.102.camel@gaston> On Thu, 2004-10-07 at 18:30, Benjamin Herrenschmidt wrote: > + /* If we are a PCI bridge, go down */ > + if (dn->child && (class >> 8) == PCI_CLASS_BRIDGE_PCI && > + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS) > + /* Depth first...do children */ > + nextdn = dn->child; Of course, that should have been + /* If we are a PCI bridge, go down */ + if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI || + (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS)) + /* Depth first...do children */ + nextdn = dn->child; Ben. From paulus at samba.org Fri Oct 8 10:44:32 2004 From: paulus at samba.org (Paul Mackerras) Date: Fri, 8 Oct 2004 10:44:32 +1000 Subject: [patch] HVSI udbg In-Reply-To: <200410071040.27907.hollisb@us.ibm.com> References: <200410071040.27907.hollisb@us.ibm.com> Message-ID: <16741.58096.932315.526999@cargo.ozlabs.ibm.com> Hollis, > --- 1.41/arch/ppc64/kernel/pSeries_lpar.c Tue Sep 21 23:40:30 2004 > +++ edited/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 10:52:23 2004 > @@ -59,6 +59,74 @@ > > int vtermno; /* virtual terminal# for udbg */ > > +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long)))) > +static void udbg_hvsi_putc(unsigned char c) > +{ > + /* packet's seqno isn't used anyways */ > + uint8_t packet[] __ALIGNED__ = { 0xff, 5, 0, 0, c }; > + int rc; All the tabs in the patch seem to have got changed to spaces. Is it your mailer or is the list software doing something bad? Paul. From arnd at arndb.de Fri Oct 8 16:22:57 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 8 Oct 2004 08:22:57 +0200 Subject: [patch] HVSI udbg In-Reply-To: <16741.58096.932315.526999@cargo.ozlabs.ibm.com> References: <200410071040.27907.hollisb@us.ibm.com> <16741.58096.932315.526999@cargo.ozlabs.ibm.com> Message-ID: <200410080823.03298.arnd@arndb.de> On Freedag 08 Oktober 2004 02:44, Paul Mackerras wrote: > All the tabs in the patch seem to have got changed to spaces. Is it > your mailer or is the list software doing something bad? It's the latest kmail (or Qt) update from Debian Sarge that broke this. I have the same problem here. Attachments appear to be still working. http://bugs.kde.org/show_bug.cgi?id=90688 Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041008/33488d20/attachment.pgp From hollisb at us.ibm.com Fri Oct 8 22:42:13 2004 From: hollisb at us.ibm.com (Hollis Blanchard) Date: Fri, 8 Oct 2004 12:42:13 +0000 Subject: [patch 2] HVSI udbg Message-ID: <200410081242.13486.hollisb@us.ibm.com> This patch (resent as attachment due to mailer troubles) adds support for the udbg early console interfaces when using an HVSI console. -- Hollis Blanchard IBM Linux Technology Center -------------- next part -------------- A non-text attachment was scrubbed... Name: hvsi-udbg.diff Type: text/x-diff Size: 2552 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041008/663aa099/attachment.diff From david at gibson.dropbear.id.au Mon Oct 11 12:11:46 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Mon, 11 Oct 2004 12:11:46 +1000 Subject: [PPC64] xmon sparse cleanups In-Reply-To: <16738.31164.464250.638432@cargo.ozlabs.ibm.com> References: <20041005064255.GF3695@zax> <16738.31164.464250.638432@cargo.ozlabs.ibm.com> Message-ID: <20041011021146.GA1556@zax> On Tue, Oct 05, 2004 at 08:38:52PM +1000, Paul Mackerras wrote: > David Gibson writes: > > > Andrew, please apply: > > > > This patch removes many sparse warnings from the xmon code. Mostly > > K&R function declarations and 0-instead-of-NULLs. > > The trouble with this patch is that it makes ppc-opc.c diverge from > the version in binutils, which is where it came from. I'd rather keep > it as close as possible to that version. I have no problem with the > changes to the other files. A corresponding patch has now gone into binutils CVS. As it happens there has already been a certain amount of divergence between the versions, presumably because the kernel copy hasn't been updated from binutils in quite a while. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From schwab at suse.de Tue Oct 12 06:11:42 2004 From: schwab at suse.de (Andreas Schwab) Date: Mon, 11 Oct 2004 22:11:42 +0200 Subject: 2.6.9-rc4: oops during ide probing Message-ID: I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: ide-pmac: cannot find MacIO node for Kauai ATA interface ide0: Found Apple OHare ATA controller, bus ID 0, irq 0 Oops: Kernel access of bad area, sig: 11 [#1] NIP [...] .ide_mm_inb+0x0/0x14 LR [...] .ide_wait_not_busy+0x98/0xf0 (Sorry, I couldn't capture the whole oops.) I've tried also with the patch from , but that didn't help. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From pbadari at us.ibm.com Tue Oct 12 08:07:32 2004 From: pbadari at us.ibm.com (Badari Pulavarty) Date: 11 Oct 2004 15:07:32 -0700 Subject: 2.6.9-rc4-mm1 doesn't boot on my Power3 box Message-ID: <1097532452.12861.398.camel@dyn318077bld.beaverton.ibm.com> Hi, My Power3 box doesn't boot with 2.6.9-rc4-mm1. I get following OOPs. (2.6.9-rc3-mm3 also same issue). Any fixes ? Thanks, Badari kernel BUG in __flush_tlb_pending at arch/ppc64/mm/tlb.c:125! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=128 NUMA PSERIES NIP: C00000000003E344 XER: 0000000020000000 LR: C000000000014DA0 REGS: c000000001963550 TRAP: 0700 Not tainted (2.6.9-rc4-mm1) MSR: a000000000023032 EE: 0 PR: 0 FP: 1 ME: 1 IR/DR: 11 TASK: c00000003f7577e0[1396] 'hotplug' THREAD: c000000001960000 CPU: 0 GPR00: 0000000004000000 C0000000019637D0 C0000000005D29F0 C0000000006B70A0 GPR04: C00000003FB597E0 000000028904198B C0000000005D1008 C0000000004583B0 GPR08: 0000000000260F00 C000000001960000 C0000000005D1008 0000000000000002 GPR12: 0000000022222482 C0000000004B9900 C00000003F757A80 00000030CAC526D0 GPR16: C0000000005D1008 000000000065E4C0 0000000000000000 C00000000F052500 GPR20: C0000000006BAD88 C000000001963990 C00000003FB597E0 C0000000006B9B38 GPR24: C000000001945200 C00000003F7577E0 0000000018221613 C00000003FB597E0 GPR28: 0000000000001260 C00000003F7577E0 0000000000000000 C0000000006B70A0 NIP [c00000000003e344] .__flush_tlb_pending+0x38/0x150 LR [c000000000014da0] .__switch_to+0xb4/0xd8 Call Trace: [c0000000019637d0] [00000000f7fad210] 0xf7fad210 (unreliable) --- Exception: 901 at .copy_page_range+0x218/0x61c LR = .copy_page_range+0x160/0x61c [c000000001963890] [c000000000014da0] .__switch_to+0xb4/0xd8 (unreliable) [c000000001963920] [c00000000039a5dc] .schedule+0x38c/0xc3c [c000000001963a40] [c00000000039b028] .cond_resched+0x4c/0x80 [c000000001963ac0] [c000000000096eb0] .copy_page_range+0x29c/0x61c [c000000001963bd0] [c00000000004fecc] .copy_process+0x8c0/0x148c [c000000001963ce0] [c000000000050b38] .do_fork+0xa0/0x25c [c000000001963dc0] [c000000000014680] .sys_clone+0x5c/0x74 [c000000001963e30] [c000000000010208] .ppc_clone+0x8/0xc From dwmw2 at infradead.org Tue Oct 12 23:51:49 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 12 Oct 2004 14:51:49 +0100 Subject: cond_syscall() and new ABI. Message-ID: <1097589108.318.425.camel@hades.cambridge.redhat.com> This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: /* * "Conditional" syscalls * * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); Two options -- either we ditch older toolchains (before 2002-03-01 probably), by switching to what we say in the comment, or we introduce an ifdef to choose whether to include the '.' in the symbol names... Both attached. Someone who cares can choose one :) -- dwmw2 -------------- next part -------------- ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:49:48 2004 @@ -468,7 +468,11 @@ * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 3) +#define cond_syscall(x) asm(".weak\t" #x "\n\t.set\t" #x ",sys_ni_syscall"); +#else #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); +#endif #endif /* __KERNEL__ */ -------------- next part -------------- ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:48:08 2004 @@ -468,7 +468,7 @@ * What we want is __attribute__((weak,alias("sys_ni_syscall"))), * but it doesn't work on all toolchains, so we just do it by hand */ -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); #endif /* __KERNEL__ */ From hch at lst.de Wed Oct 13 00:26:27 2004 From: hch at lst.de (Christoph Hellwig) Date: Tue, 12 Oct 2004 16:26:27 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <1097589108.318.425.camel@hades.cambridge.redhat.com> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> Message-ID: <20041012142627.GA19091@lst.de> On Tue, Oct 12, 2004 at 02:51:49PM +0100, David Woodhouse wrote: > This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: > > /* > * "Conditional" syscalls > * > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > Two options -- either we ditch older toolchains (before 2002-03-01 > probably), by switching to what we say in the comment, or we introduce > an ifdef to choose whether to include the '.' in the symbol names... > > Both attached. Someone who cares can choose one :) > > -- > dwmw2 > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 > +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:49:48 2004 > @@ -468,7 +468,11 @@ > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ > 3) > +#define cond_syscall(x) asm(".weak\t" #x "\n\t.set\t" #x ",sys_ni_syscall"); > +#else > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); this is broken. Gcc 3.4 doesn't even have support for the non-dotted ABI, nevermind uses it by default. > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > --- 1.34/include/asm-ppc64/unistd.h Tue Sep 14 01:23:12 2004 > +++ edited/include/asm-ppc64/unistd.h Tue Oct 12 14:48:08 2004 > @@ -468,7 +468,7 @@ > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); this one otoh makes lots of sense - it's what most architectures use. From moilanen at austin.ibm.com Wed Oct 13 00:56:19 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:56:19 -0500 Subject: [PATCH 1/2][RFC] PPC64 no-exec support for user space In-Reply-To: <20041012095248.2b6418c4@localhost> References: <20041012095248.2b6418c4@localhost> Message-ID: <20041012095619.63a38530@localhost> Here is no-exec support for user space. This patch also includes base no-exec support. Once again it requires Ben's signal trampoline in vdso piece. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/head.S~nx-user-ppc64 arch/ppc64/kernel/head.S --- linux-2.6-bk/arch/ppc64/kernel/head.S~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S Thu Oct 7 15:23:52 2004 @@ -35,6 +35,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_ISERIES @@ -879,6 +880,7 @@ InstructionAccess_common: ld r3,_NIP(r1) andis. r4,r12,0x5820 li r5,0x400 + ori r4,r4,_PAGE_EXEC b .do_hash_page /* Try to handle as hpte fault */ .align 7 @@ -964,11 +966,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB) * accessing a userspace segment (even from the kernel). We assume * kernel addresses always have the high bit set. */ - rlwinm r4,r4,32-23,29,29 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */ rotldi r0,r3,15 /* Move high bit into MSR_PR posn */ orc r0,r12,r0 /* MSR_PR | ~high_bit */ rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ - ori r4,r4,1 /* add _PAGE_PRESENT */ /* * On iSeries, we soft-disable interrupts here, then diff -puN arch/ppc64/mm/fault.c~nx-user-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Thu Oct 7 15:23:52 2004 @@ -92,6 +92,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long code = SEGV_MAPERR; unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); + unsigned long is_exec = trap == 0x400; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -191,16 +192,19 @@ int do_page_fault(struct pt_regs *regs, good_area: code = SEGV_ACCERR; + if (is_exec) { + /* protection fault */ + if (error_code & 0x08000000) + goto bad_area; + if (!(vma->vm_flags & VM_EXEC)) + goto bad_area; /* a write */ - if (is_write) { + } else if (is_write) { if (!(vma->vm_flags & VM_WRITE)) goto bad_area; /* a read */ } else { - /* protection fault */ - if (error_code & 0x08000000) - goto bad_area; - if (!(vma->vm_flags & (VM_READ | VM_EXEC))) + if (!(vma->vm_flags & VM_READ)) goto bad_area; } diff -puN arch/ppc64/mm/hash_low.S~nx-user-ppc64 arch/ppc64/mm/hash_low.S --- linux-2.6-bk/arch/ppc64/mm/hash_low.S~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S Thu Oct 7 15:23:52 2004 @@ -89,7 +89,7 @@ _GLOBAL(__hash_page) /* Prepare new PTE value (turn access RW into DIRTY, then * add BUSY,HASHPTE and ACCESSED) */ - rlwinm r30,r4,5,24,24 /* _PAGE_RW -> _PAGE_DIRTY */ + rlwinm r30,r4,32-9+7,31-7,31-7 /* _PAGE_RW -> _PAGE_DIRTY */ or r30,r30,r31 ori r30,r30,_PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE /* Write the linux PTE atomically (setting busy) */ @@ -112,11 +112,11 @@ _GLOBAL(__hash_page) rldicl r5,r5,0,25 /* vsid & 0x0000007fffffffff */ rldicl r0,r3,64-12,48 /* (ea >> 12) & 0xffff */ xor r28,r5,r0 - - /* Convert linux PTE bits into HW equivalents - */ - andi. r3,r30,0x1fa /* Get basic set of flags */ - rlwinm r0,r30,32-2+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ + + /* Convert linux PTE bits into HW equivalents */ + andi. r3,r30,0x1fe /* Get basic set of flags */ + xori r3,r3,HW_NO_EXEC /* _PAGE_EXEC -> NOEXEC */ + rlwinm r0,r30,32-9+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ rlwinm r4,r30,32-7+1,30,30 /* _PAGE_DIRTY -> _PAGE_USER (r4) */ and r0,r0,r4 /* _PAGE_RW & _PAGE_DIRTY -> r0 bit 30 */ andc r0,r30,r0 /* r0 = pte & ~r0 */ diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c Thu Oct 7 15:23:52 2004 @@ -89,8 +89,11 @@ static int set_brk(unsigned long start, end = ELF_PAGEALIGN(end); if (end > start) { unsigned long addr = do_brk(start, end - start); + if (BAD_ADDR(addr)) return addr; + + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); } current->mm->start_brk = current->mm->brk = end; return 0; diff -puN include/asm-ppc64/elf.h~nx-user-ppc64 include/asm-ppc64/elf.h --- linux-2.6-bk/include/asm-ppc64/elf.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/elf.h Thu Oct 7 15:23:52 2004 @@ -226,6 +226,13 @@ do { \ else if (current->personality != PER_LINUX32) \ set_personality(PER_LINUX); \ } while (0) + +/* + * An executable for which elf_read_implies_exec() returns TRUE will + * have the READ_IMPLIES_EXEC personality flag set automatically. + */ +#define elf_read_implies_exec_binary(ex, have_pt_gnu_stack) (!(have_pt_gnu_stack)) + #endif /* diff -puN include/asm-ppc64/page.h~nx-user-ppc64 include/asm-ppc64/page.h --- linux-2.6-bk/include/asm-ppc64/page.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h Thu Oct 7 15:23:52 2004 @@ -233,8 +233,25 @@ extern int page_is_ram(unsigned long pfn #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) #endif /* __KERNEL__ */ #endif /* _PPC64_PAGE_H */ diff -puN include/asm-ppc64/pgtable.h~nx-user-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h Thu Oct 7 15:23:52 2004 @@ -86,24 +86,25 @@ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ #define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ -#define _PAGE_RW 0x0004 /* software: user write access allowed */ +#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ #define _PAGE_GUARDED 0x0008 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ #define _PAGE_WRITETHRU 0x0040 /* W: cache write-through */ #define _PAGE_DIRTY 0x0080 /* C: page changed */ #define _PAGE_ACCESSED 0x0100 /* R: page referenced */ -#define _PAGE_EXEC 0x0200 /* software: i-cache coherence required */ +#define _PAGE_RW 0x0200 /* software: user write access allowed */ #define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */ #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ #define _PAGE_GROUP_IX 0x7000 /* software: HPTE index within group */ /* Bits 0x7000 identify the index within an HPT Group */ #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_SECONDARY | _PAGE_GROUP_IX) + /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_HPTEFLAGS) +#define _PAGE_CHG_MASK (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | _PAGE_WRITETHRU | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HPTEFLAGS | PAGE_MASK) #define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT) @@ -119,31 +120,32 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) -#define PAGE_KERNEL_CI __pgprot(_PAGE_PRESENT | _PAGE_ACCESSED | \ - _PAGE_WRENABLE | _PAGE_NO_CACHE | _PAGE_GUARDED) /* - * The PowerPC can only do execute protection on a segment (256MB) basis, - * not on a page basis. So we consider execute permission the same as read. + * POWER4 and newer have per page execute protection, older chips can only + * do this on a segment (256MB) basis. + * * Also, write permissions imply read permissions. * This is the closest we can get.. + * + * Note due to the way vm flags are laid out, the bits are XWR */ #define __P000 PAGE_NONE -#define __P001 PAGE_READONLY_X +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY -#define __P011 PAGE_COPY_X -#define __P100 PAGE_READONLY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_X #define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY +#define __P110 PAGE_COPY_X #define __P111 PAGE_COPY_X #define __S000 PAGE_NONE -#define __S001 PAGE_READONLY_X +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED_X -#define __S100 PAGE_READONLY +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_X #define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED +#define __S110 PAGE_SHARED_X #define __S111 PAGE_SHARED_X #ifndef __ASSEMBLY__ @@ -200,7 +202,8 @@ int hash_huge_page(struct mm_struct *mm, }) #define pte_modify(_pte, newprot) \ - (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | pgprot_val(newprot))) + (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | \ + (pgprot_val(newprot) & ~_PAGE_CHG_MASK))) #define pte_none(pte) ((pte_val(pte) & ~_PAGE_HPTEFLAGS) == 0) #define pte_present(pte) (pte_val(pte) & _PAGE_PRESENT) @@ -270,9 +273,6 @@ static inline int pte_dirty(pte_t pte) { static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;} static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} -static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } -static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } - static inline pte_t pte_rdprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_USER; return pte; } static inline pte_t pte_exprotect(pte_t pte) { @@ -420,7 +420,7 @@ static inline void set_pte(pte_t *ptep, static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry, int dirty) { unsigned long bits = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); unsigned long old, tmp; __asm__ __volatile__( diff -puN arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 arch/ppc64/mm/hugetlbpage.c --- linux-2.6-bk/arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c Thu Oct 7 15:23:52 2004 @@ -29,8 +29,8 @@ /* HugePTE layout: * - * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 - * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH RW - HG=1 + * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 + * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH !EXEC RW HG=1 */ #define HUGEPTE_SHIFT 15 @@ -41,7 +41,8 @@ #define _HUGEPAGE_GROUP_IX 0x000000e0 #define _HUGEPAGE_HPTEFLAGS (_HUGEPAGE_HASHPTE | _HUGEPAGE_SECONDARY | \ _HUGEPAGE_GROUP_IX) -#define _HUGEPAGE_RW 0x00000004 +#define _HUGEPAGE_RW 0x00000002 +#define _HUGEPAGE_EXEC 0x00000004 /* this is inverted */ typedef struct {unsigned int val;} hugepte_t; #define hugepte_val(hugepte) ((hugepte).val) @@ -722,6 +723,7 @@ int hash_huge_page(struct mm_struct *mm, hugepte_t *ptep; unsigned long va, vpn; int is_write; + int is_exec; hugepte_t old_pte, new_pte; unsigned long hpteflags, prpn, flags; long slot; @@ -752,6 +754,10 @@ int hash_huge_page(struct mm_struct *mm, if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW))) return 1; + is_exec = access & _PAGE_EXEC; + if (unlikely(is_exec && !(hugepte_val(*ptep) & _HUGEPAGE_EXEC))) + return 1; + /* * At this point, we have a pte (old_pte) which can be used to build * or update an HPTE. There are 2 cases: @@ -769,7 +775,10 @@ int hash_huge_page(struct mm_struct *mm, old_pte = *ptep; new_pte = old_pte; - hpteflags = 0x2 | (! (hugepte_val(new_pte) & _HUGEPAGE_RW)); + /* _HUGEPAGE_EXEC -> HW_NO_EXEC since it's inverted */ + hpteflags = (hugepte_val(new_pte) & _HUGEPAGE_RW) | + (hugepte_val(new_pte) ^ HW_NO_EXEC) | + (!(hugepte_val(new_pte) & _HUGEPAGE_RW)); /* Check if pte already has an hpte (case 2) */ if (unlikely(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE)) { diff -L arch/ppc64/kernel/pSeries_htab.c -puN /dev/null /dev/null diff -puN arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 arch/ppc64/kernel/pSeries_lpar.c --- linux-2.6-bk/arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c Thu Oct 7 15:23:52 2004 @@ -384,7 +384,7 @@ static void pSeries_lpar_hpte_updatebolt slot = pSeries_lpar_hpte_find(vpn); BUG_ON(slot == -1); - flags = newpp & 3; + flags = newpp & 7; lpar_rc = plpar_pte_protect(flags, slot, 0); BUG_ON(lpar_rc != H_Success); diff -puN arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 arch/ppc64/kernel/iSeries_htab.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 Thu Oct 7 15:23:52 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c Thu Oct 7 15:23:52 2004 @@ -144,6 +144,10 @@ static long iSeries_hpte_updatepp(unsign HvCallHpt_get(&hpte, slot); if ((hpte.dw0.dw0.avpn == avpn) && (hpte.dw0.dw0.v)) { + /* + * Hypervisor expects bit's as NPPP, which is + * different from how they are mapped in our PP. + */ HvCallHpt_setPp(slot, (newpp & 0x3) | ((newpp & 0x4) << 1)); iSeries_hunlock(slot); return 0; _ From moilanen at austin.ibm.com Wed Oct 13 00:52:48 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:52:48 -0500 Subject: [PATCH 0/2][RFC] PPC64 no-exec support Message-ID: <20041012095248.2b6418c4@localhost> These patches add no exec support to PPC64. It should prohibit executing code out of the stack, or most any non-text segment. For distros that compile w/ pt_gnu_stacks, they depend on Ben's signal trampoline changes, or else it will hang on the first signal due to the return code being put on the signal context stack to return to the kernel on the completion of the signal handler. The patches include a base fixup from Anton of the wrong bit being used for no-exec and for read/write on the hardware PTEs. The patch is broken into two parts: 1/2: PPC64 no-exec support for user space: This will prohibit user space apps from executing in segments not marked as executable. The base support is in here as well. 2/2: PPC64 no-exec support for kernel space: This prohibits the kernel from executing non-text code. Thanks, Jake From moilanen at austin.ibm.com Wed Oct 13 00:58:52 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 12 Oct 2004 09:58:52 -0500 Subject: [PATCH 2/2][RFC] PPC64 no-exec support for kernel space In-Reply-To: <20041012095248.2b6418c4@localhost> References: <20041012095248.2b6418c4@localhost> Message-ID: <20041012095852.29e583a3@localhost> Here is the kernel piece of no-exec. It marks all non-text pages as no-execute. It depends on the no-exec for user-space patch. Thanks, Jake Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/module.c~nx-kernel-ppc64 arch/ppc64/kernel/module.c --- linux-2.6-bk/arch/ppc64/kernel/module.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c Thu Oct 7 15:23:55 2004 @@ -102,7 +102,8 @@ void *module_alloc(unsigned long size) { if (size == 0) return NULL; - return vmalloc(size); + + return vmalloc_exec(size); } /* Free memory returned from module_alloc */ diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c Thu Oct 7 15:23:55 2004 @@ -75,6 +75,21 @@ static int store_updates_sp(struct pt_re return 0; } +pte_t *lookup_address(unsigned long address) +{ + pgd_t *pgd = pgd_offset_k(address); + pmd_t *pmd; + + if (pgd_none(*pgd)) + return NULL; + + pmd = pmd_offset(pgd, address); + if (pmd_none(*pmd)) + return NULL; + + return pte_offset_kernel(pmd, address); +} + /* * The error_code parameter is * - DSISR for a non-SLB data access fault, @@ -93,6 +108,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); unsigned long is_exec = trap == 0x400; + pte_t *ptep; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -245,6 +261,15 @@ bad_area_nosemaphore: info.si_addr = (void __user *) address; force_sig_info(SIGSEGV, &info, current); return 0; + } + + ptep = lookup_address(address); + + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { + if (printk_ratelimit()) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); + show_stack(current, (unsigned long *)__get_SP()); + do_exit(SIGKILL); } return SIGSEGV; diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c Thu Oct 7 15:23:55 2004 @@ -52,6 +52,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -89,12 +90,23 @@ static inline void loop_forever(void) ; } +int is_kernel_text(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) + return 1; + + return 0; +} + + + #ifdef CONFIG_PPC_MULTIPLATFORM static inline void create_pte_mapping(unsigned long start, unsigned long end, unsigned long mode, int large) { unsigned long addr; unsigned int step; + unsigned long tmp_mode; if (large) step = 16*MB; @@ -112,6 +124,13 @@ static inline void create_pte_mapping(un else vpn = va >> PAGE_SHIFT; + + tmp_mode = mode; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(addr)) + tmp_mode = mode | HW_NO_EXEC; + hash = hpt_hash(vpn, large); hpteg = ((hash & htab_data.htab_hash_mask)*HPTES_PER_GROUP); @@ -120,12 +139,12 @@ static inline void create_pte_mapping(un if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); else #endif /* CONFIG_PPC_PSERIES */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); if (ret == -1) { ppc64_terminate_msg(0x20, "create_pte_mapping"); @@ -239,8 +258,6 @@ unsigned int hash_page_do_lazy_icache(un { struct page *page; -#define PPC64_HWNOEXEC (1 << 2) - if (!pfn_valid(pte_pfn(pte))) return pp; @@ -251,8 +268,8 @@ unsigned int hash_page_do_lazy_icache(un if (trap == 0x400) { __flush_dcache_icache(page_address(page)); set_bit(PG_arch_1, &page->flags); - } else - pp |= PPC64_HWNOEXEC; + } else + pp |= HW_NO_EXEC; } return pp; } diff -puN include/asm-ppc64/mmu.h~nx-kernel-ppc64 include/asm-ppc64/mmu.h diff -puN include/asm-ppc64/pgtable.h~nx-kernel-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h Thu Oct 7 15:23:55 2004 @@ -101,6 +101,12 @@ /* Bits 0x7000 identify the index within an HPT Group */ #define _PAGE_HPTEFLAGS (_PAGE_BUSY | _PAGE_HASHPTE | _PAGE_SECONDARY | _PAGE_GROUP_IX) +#define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is + * inverted, even though it's the + * same value, hopefully it will be + * clearer in the code what is + * going on. */ + /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ @@ -120,6 +126,7 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) +#define PAGE_KERNEL_EXEC __pgprot(_PAGE_BASE | _PAGE_WRENABLE | _PAGE_EXEC) /* * POWER4 and newer have per page execute protection, older chips can only @@ -266,6 +273,7 @@ int hash_huge_page(struct mm_struct *mm, * The following only work if pte_present() is true. * Undefined behaviour if not.. */ +static inline int pte_user(pte_t pte) { return pte_val(pte) & _PAGE_USER;} static inline int pte_read(pte_t pte) { return pte_val(pte) & _PAGE_USER;} static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW;} static inline int pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC;} diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 Thu Oct 7 15:23:55 2004 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c Thu Oct 7 15:23:55 2004 @@ -622,6 +622,7 @@ static void __init iSeries_bolt_kernel(u { unsigned long pa; unsigned long mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + unsigned long tmp_mode; HPTE hpte; for (pa = saddr; pa < eaddr ;pa += PAGE_SIZE) { @@ -630,6 +631,12 @@ static void __init iSeries_bolt_kernel(u unsigned long va = (vsid << 28) | (pa & 0xfffffff); unsigned long vpn = va >> PAGE_SHIFT; unsigned long slot = HvCallHpt_findValid(&hpte, vpn); + + tmp_mode = mode_rw; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(ea)) + tmp_mode = mode_rw | HW_NO_EXEC; if (hpte.dw0.dw0.v) { /* HPTE exists, so just bolt it */ _ From dwmw2 at infradead.org Wed Oct 13 05:08:02 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Tue, 12 Oct 2004 20:08:02 +0100 Subject: cond_syscall() and new ABI. In-Reply-To: <200410122043.52351.arnd@arndb.de> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <20041012142627.GA19091@lst.de> <200410122043.52351.arnd@arndb.de> Message-ID: <1097608083.5178.5.camel@localhost.localdomain> On Tue, 2004-10-12 at 20:43 +0200, Arnd Bergmann wrote: > A better solution IMHO would be to include the right headers from sys.c > and have > > #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); That's true in theory, yes -- not that I can see any way that having the 'correct' prototype will actually make a difference in practice. > Also, someone should try to find out which toolchains don't support this > and if anybody is still using those. One issue seems to be the one from > http://seclists.org/lists/linux-kernel/2004/Jan/2474.html, but I'm not > sure if that is the problem that the comment refers to. That happens with both the current inline asm method, and with the 'alias' method which translates to basically the same asm output from gcc, but without the ifdefs. -- dwmw2 From arnd at arndb.de Wed Oct 13 04:43:52 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Tue, 12 Oct 2004 20:43:52 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <20041012142627.GA19091@lst.de> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <20041012142627.GA19091@lst.de> Message-ID: <200410122043.52351.arnd@arndb.de> On Dinsdag 12 Oktober 2004 16:26, Christoph Hellwig wrote: > > ===== include/asm-ppc64/unistd.h 1.34 vs edited ===== > > --- 1.34/include/asm-ppc64/unistd.h???Tue Sep 14 01:23:12 2004 > > +++ edited/include/asm-ppc64/unistd.h?Tue Oct 12 14:48:08 2004 > > @@ -468,7 +468,7 @@ > > ? * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > > ? * but it doesn't work on all toolchains, so we just do it by hand > > ? */ > > -#define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > +#define cond_syscall(x) void x(void) __attribute__((weak,alias("sys_ni_syscall"))); > > > this one otoh makes lots of sense - it's what most architectures use. It's also something that looks suboptimal to me. The syscalls should already have a proper protoype in , which typically is not "void sys_foo(void)". A better solution IMHO would be to include the right headers from sys.c and have #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); Also, someone should try to find out which toolchains don't support this and if anybody is still using those. One issue seems to be the one from http://seclists.org/lists/linux-kernel/2004/Jan/2474.html, but I'm not sure if that is the problem that the comment refers to. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041012/782ffcb7/attachment.pgp From anton at samba.org Wed Oct 13 05:39:02 2004 From: anton at samba.org (Anton Blanchard) Date: Wed, 13 Oct 2004 05:39:02 +1000 Subject: cond_syscall() and new ABI. In-Reply-To: <1097589108.318.425.camel@hades.cambridge.redhat.com> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> Message-ID: <20041012193902.GB3315@krispykreme.ozlabs.ibm.com> > This (in linux/asm-ppc64/unistd.h) doesn't work with the new ABI: > > /* > * "Conditional" syscalls > * > * What we want is __attribute__((weak,alias("sys_ni_syscall"))), > * but it doesn't work on all toolchains, so we just do it by hand > */ > #define cond_syscall(x) asm(".weak\t." #x "\n\t.set\t." #x ",.sys_ni_syscall"); > > Two options -- either we ditch older toolchains (before 2002-03-01 > probably), by switching to what we say in the comment, or we introduce > an ifdef to choose whether to include the '.' in the symbol names... http://ozlabs.org/ppc64-patches/ Has 5 remove -mminimal-toc patches which should fix this mess. The syscall table is currently abusing the ABI, it would be nice to fix it. If there are no complaints Id like to push this patchset once 2.6.10 opens. Anton From grave at ipno.in2p3.fr Wed Oct 13 17:19:56 2004 From: grave at ipno.in2p3.fr (grave) Date: Wed, 13 Oct 2004 07:19:56 +0000 Subject: libmotovec Message-ID: <1097651996l.1092l.0l@ipnnarval> Hi, Just to know : does the current powerpc kernel benefit from something like libmotovec ? Since the VMX is also on the ppc970 familly perhaps we will see more ibm processors in the future with such velocity engine so... xavier From benh at kernel.crashing.org Wed Oct 13 17:44:23 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 13 Oct 2004 17:44:23 +1000 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: References: Message-ID: <1097653462.5553.43.camel@gaston> On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: > I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: Can you send me a dump of the whole device-tree ? Ben. From arnd at arndb.de Wed Oct 13 19:19:19 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 13 Oct 2004 11:19:19 +0200 Subject: cond_syscall() and new ABI. In-Reply-To: <1097608083.5178.5.camel@localhost.localdomain> References: <1097589108.318.425.camel@hades.cambridge.redhat.com> <200410122043.52351.arnd@arndb.de> <1097608083.5178.5.camel@localhost.localdomain> Message-ID: <200410131119.23409.arnd@arndb.de> On Dinsdag 12 Oktober 2004 21:08, David Woodhouse wrote: > On Tue, 2004-10-12 at 20:43 +0200, Arnd Bergmann wrote: > > A better solution IMHO would be to include the right headers from sys.c > > and have > > > > #define cond_syscall(x) typeof(x) (x) __attribute__((weak,alias("sys_ni_syscall"))); > > That's true in theory, yes -- not that I can see any way that having the > 'correct' prototype will actually make a difference in practice. Right, my point was mostly about having an implementation that is less surprising to the reader, not about correctness. It might actually become a bug as soon as someone tries to build the kernel with a compiler that does inter-module analysis, but that's not likely to happen soon. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041013/d033fef2/attachment.pgp From schwab at suse.de Wed Oct 13 19:48:52 2004 From: schwab at suse.de (Andreas Schwab) Date: Wed, 13 Oct 2004 11:48:52 +0200 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: <1097653462.5553.43.camel@gaston> (Benjamin Herrenschmidt's message of "Wed, 13 Oct 2004 17:44:23 +1000") References: <1097653462.5553.43.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: >> I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > Can you send me a dump of the whole device-tree ? By "dump" do you mean ls -R or something more fancy? Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From benh at kernel.crashing.org Thu Oct 14 00:28:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 14 Oct 2004 00:28:53 +1000 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: References: <1097653462.5553.43.camel@gaston> Message-ID: <1097677732.10215.1.camel@gaston> On Wed, 2004-10-13 at 19:48, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > On Tue, 2004-10-12 at 06:11, Andreas Schwab wrote: > >> I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > > > Can you send me a dump of the whole device-tree ? > > By "dump" do you mean ls -R or something more fancy? tarball of /proc/device-tree From linas at austin.ibm.com Thu Oct 14 05:23:56 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 14:23:56 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D6D89.6030300@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> Message-ID: <20041013192356.GE12237@austin.ibm.com> Hi, I'm copying over to the linuxppc64-dev at ozlabs.org mailing list, which is the right place to discuss this. On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > Linas Vepstas wrote: > >Hi, > > > >On Wed, Oct 13, 2004 at 09:12:23AM +0800, Zhen Huang was heard to remark: > > > >>Hi, > >> > >>The watchdog I mentioned means such a device: > >>Once we open it we must write to it regularly. > >>Otherwise the whole system will be reset. > >> > >>Many OS have software implement of this. > >>But the software watchdog will depend on the health of the OS. > >> > >>I want to know whether there have any hardware implement in pServer. > > > > > >Yes, there is a hardware watchdog; its implemented on all pSeries > >machines that have service processors (thus, it goes back to at > >least power3). However, it is not a unix 'device' that a user-land > >process can 'open'; it is only accessible through RTAS calls. The > >kernel daemon rtasd provides the regular heartbeat. > > > >The kernel enables the watchdog function with the 'enable_surveillance()' > >subroutine call (see arch/ppc64/kernel/rtasd.c). > >Once its enabled, the heartbeat is the 'event-scan' RTAS call, > >which the kernel must call regularly from each CPU. (I guess this > >helps detect hung CPU's on SMP systems). If the event-scan call > >isn't made within the 'surveillance timeout', the SP will reboot > >the OS (or call in a service request, etc.) > > > >I don't know if there is any interest in moving this heartbeat > >watchdog out from kernel space into user space; right now, > >rtasd is a kernel daemon, and it more or less just works. > > > >iIf it ever is converted to userland, its not likely it will > >every be a traditional unix device; instead, functions like > >this are moving to the sysfs file system. > > This would be a logical equivalent to the well-known and long-standing > 'softdog' device driver which already has a well-known API, which is also > implemented on other hardware devices and architectures. > > So, my suggestion would be that if it were moved to a userspace driver, > that the softdog API be retained. I might have volunteered to hack this up real quick, were it not for Mike Strosaker's correction, that the surveillance featues were taken out of Power5. Anyone on this list know why? --linas From strosake at austin.ibm.com Thu Oct 14 05:57:35 2004 From: strosake at austin.ibm.com (Mike Strosaker) Date: Wed, 13 Oct 2004 14:57:35 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013192356.GE12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> Message-ID: <416D88AF.1010706@austin.ibm.com> Linas Vepstas wrote: > I might have volunteered to hack this up real quick, were it not for > Mike Strosaker's correction, that the surveillance featues were taken > out of Power5. > > Anyone on this list know why? > I sent the reason I got from the hardware RAS folks to this list a while back. Luckily, it's still in my sent mail folder: "Because of the virtualization layer and partitioning, the surveillance requirement was moved to PHYP<->SP. Apparently, this was a hotly contested issue among the platform design folks (especially considering that partitioned power4 systems still have OS<->SP surveillance). I think the logic is: If an OS goes down, its not likely a server problem, hence no requirement to monitor from the server side. At least the platform gets notified of panics via os-term. I gather that some user space tools are expected to monitor for deadlocks/hangs (maybe clustering tools). " Thanks, Mike From alanr at unix.sh Thu Oct 14 07:30:02 2004 From: alanr at unix.sh (Alan Robertson) Date: Wed, 13 Oct 2004 15:30:02 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D88AF.1010706@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> Message-ID: <416D9E5A.9080102@unix.sh> Mike Strosaker wrote: > Linas Vepstas wrote: > >> I might have volunteered to hack this up real quick, were it not for >> Mike Strosaker's correction, that the surveillance featues were taken >> out of Power5. >> Anyone on this list know why? >> > > I sent the reason I got from the hardware RAS folks to this list a while > back. > Luckily, it's still in my sent mail folder: > > "Because of the virtualization layer and partitioning, the surveillance > requirement was moved to PHYP<->SP. Apparently, this was a hotly > contested issue among the platform design folks (especially considering > that > partitioned power4 systems still have OS<->SP surveillance). I think > the logic > is: If an OS goes down, its not likely a server problem, hence no > requirement > to monitor from the server side. > > At least the platform gets notified of panics via os-term. I gather > that some user space tools are expected to monitor for deadlocks/hangs > (maybe clustering tools). " This is about half-right. There is one particular circumstance which can ONLY be monitored from a hardware-level monitor. OS hangs. If the OS hangs, then, nothing but a hardware timer can bring the machine out of it's hung state. Hangs do NOT panic (by definition), and can't be reliably detected any other way. In highly available systems (like telecom systems), hardware level monitors are required. Leaving it out sends the message that "availability isn't important". The normal way that a highly available systems is to have layers (or a hierarchy) of watchers. At the bottom is the hardware monitor. Above that is an application monitor above that is resource monitors etc. But, there are certain kinds of faults which cannot be caught without this bottom layer monitor. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From linas at austin.ibm.com Thu Oct 14 08:12:54 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 17:12:54 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416D9E5A.9080102@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> Message-ID: <20041013221254.GF12237@austin.ibm.com> Hi, On Wed, Oct 13, 2004 at 03:30:02PM -0600, Alan Robertson was heard to remark: > Mike Strosaker wrote: > >Linas Vepstas wrote: > > > >>I might have volunteered to hack this up real quick, were it not for > >>Mike Strosaker's correction, that the surveillance featues were taken > >>out of Power5. > >>Anyone on this list know why? > >> > > > >I sent the reason I got from the hardware RAS folks to this list a while > >back. > >Luckily, it's still in my sent mail folder: > > > >"Because of the virtualization layer and partitioning, the surveillance > >requirement was moved to PHYP<->SP. Apparently, this was a hotly > >contested issue among the platform design folks (especially considering > >that > >partitioned power4 systems still have OS<->SP surveillance). I think > >the logic > >is: If an OS goes down, its not likely a server problem, hence no > >requirement > >to monitor from the server side. > > > >At least the platform gets notified of panics via os-term. I gather > >that some user space tools are expected to monitor for deadlocks/hangs > >(maybe clustering tools). " > > This is about half-right. > > There is one particular circumstance which can ONLY be monitored from a > hardware-level monitor. > > OS hangs. Heh. I think I can clarify, after talking to the firmware folks. The core thinking behind the the "platform architecture" was to make sure that the underlying hardware, i.e. the "platform" wasn't hung. They were not concerned about the OS itself; they assumed that OS'es have thier own independent mechanisms for detecting hung-ness. >From the platform point of view, they are concerned that they'll have a machine with a dozen different partitons on it (a dozen different OS'es), and a hardware hang will take down all twelve. So they've got the hypervisor and service processor montioring each other, keeping things humming. If just one partition goes down due to a kernel hang/crash, well, that's too bad, but its not the end of the world from the platform point of view. I think Alan's point of view is from the other side of the table: why should someone buy 12 pci-card watchdogs, one for each partition, chewing up 12 pci slots, when the pSeries is already capable of doing watchdog functions? To add insult to injury, the sysadmin now needs to duct-tape each of the watchdog cards to some sort of kill-switch, to reboot a dead partition. The kill-switch needs to then ssh to the fsp or the hmc to start the reboot. So it gets pretty byzantine for something that could have been 'simple' and built-in. Never mind that the reliability goes down: the kill switch could fail, the pci watchdog card could fail (or get EEH'ed out), causing a reboot when no reboot was necessary, etc. --linas From jschopp at austin.ibm.com Thu Oct 14 08:32:10 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 13 Oct 2004 17:32:10 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013221254.GF12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> Message-ID: <416DACEA.1070900@austin.ibm.com> > I think Alan's point of view is from the other side of the table: > why should someone buy 12 pci-card watchdogs, one for each partition, > chewing up 12 pci slots, when the pSeries is already capable of doing > watchdog functions? To add insult to injury, the sysadmin now needs > to duct-tape each of the watchdog cards to some sort of kill-switch, > to reboot a dead partition. The kill-switch needs to then ssh to > the fsp or the hmc to start the reboot. So it gets pretty byzantine > for something that could have been 'simple' and built-in. Never mind > that the reliability goes down: the kill switch could fail, the > pci watchdog card could fail (or get EEH'ed out), causing a reboot > when no reboot was necessary, etc. I will miss the old school hardware watchdog. If I'd had a vote I would have voted to keep it. But since it is not a democracy I can only add a couple points to this argument. First, if people really care about reliability that much they will be running with hot spares in a HA environment. In that case there are already external monitors that activate the spare on any sign of problems. Second, this can all be done from the HMC. The HMC is perfectly capable of determining the partition is hung (LED error codes, heartbeat timeouts). It is also perfectly capable of rebooting a partition. I am not aware that there is a way to put the two together right now, so that the HMC automatically reboots the partition if it hangs, but it would certainly be an easy feature to add the HMC. From linas at austin.ibm.com Thu Oct 14 08:53:16 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 13 Oct 2004 17:53:16 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416DACEA.1070900@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416DACEA.1070900@austin.ibm.com> Message-ID: <20041013225316.GH12237@austin.ibm.com> On Wed, Oct 13, 2004 at 05:32:10PM -0500, Joel Schopp was heard to remark: > > >I think Alan's point of view is from the other side of the table: > >why should someone buy 12 pci-card watchdogs, one for each partition, > >chewing up 12 pci slots, when the pSeries is already capable of doing > >watchdog functions? To add insult to injury, the sysadmin now needs > >to duct-tape each of the watchdog cards to some sort of kill-switch, > >to reboot a dead partition. The kill-switch needs to then ssh to > >the fsp or the hmc to start the reboot. So it gets pretty byzantine > >for something that could have been 'simple' and built-in. Never mind > >that the reliability goes down: the kill switch could fail, the > >pci watchdog card could fail (or get EEH'ed out), causing a reboot > >when no reboot was necessary, etc. > > I will miss the old school hardware watchdog. If I'd had a vote I would > have voted to keep it. But since it is not a democracy I can only add a > couple points to this argument. > > First, if people really care about reliability that much they will be > running with hot spares in a HA environment. In that case there are > already external monitors that activate the spare on any sign of problems. Yes, well, Alan is the guy who designs and builds these systems :) He's trying to figure out how to hook them up to the pSeries. You can't just cut the power, like you can for PC's :) http://www.linux-ha.org > Second, this can all be done from the HMC. The HMC is perfectly capable > of determining the partition is hung (LED error codes, heartbeat > timeouts). It is also perfectly capable of rebooting a partition. I am > not aware that there is a way to put the two together right now, so that > the HMC automatically reboots the partition if it hangs, but it would > certainly be an easy feature to add the HMC. The HMC is a natural place for this. One of Alan's complaints is that (non-pSeries) HMC's tend to be semi-proprietary and mostly unarchitected, with a wide variation from one model to another. The dependance on Java for core functions also makes them untrustworthy. --linas From alanr at unix.sh Thu Oct 14 14:41:26 2004 From: alanr at unix.sh (Alan Robertson) Date: Wed, 13 Oct 2004 22:41:26 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013221254.GF12237@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> Message-ID: <416E0376.1010500@unix.sh> Linas Vepstas wrote: > Hi, > > On Wed, Oct 13, 2004 at 03:30:02PM -0600, Alan Robertson was heard to remark: > >>Mike Strosaker wrote: >> >>>Linas Vepstas wrote: >>> >>> >>>>I might have volunteered to hack this up real quick, were it not for >>>>Mike Strosaker's correction, that the surveillance featues were taken >>>>out of Power5. >>>>Anyone on this list know why? >>>> >>> >>>I sent the reason I got from the hardware RAS folks to this list a while >>>back. >>>Luckily, it's still in my sent mail folder: >>> >>>"Because of the virtualization layer and partitioning, the surveillance >>>requirement was moved to PHYP<->SP. Apparently, this was a hotly >>>contested issue among the platform design folks (especially considering >>>that >>>partitioned power4 systems still have OS<->SP surveillance). I think >>>the logic >>>is: If an OS goes down, its not likely a server problem, hence no >>>requirement >>>to monitor from the server side. >>> >>>At least the platform gets notified of panics via os-term. I gather >>>that some user space tools are expected to monitor for deadlocks/hangs >>>(maybe clustering tools). " >> >>This is about half-right. >> >>There is one particular circumstance which can ONLY be monitored from a >>hardware-level monitor. >> >>OS hangs. > > > Heh. I think I can clarify, after talking to the firmware folks. > > The core thinking behind the the "platform architecture" was to make > sure that the underlying hardware, i.e. the "platform" wasn't hung. > They were not concerned about the OS itself; they assumed that OS'es > have thier own independent mechanisms for detecting hung-ness. > >>From the platform point of view, they are concerned that they'll > have a machine with a dozen different partitons on it (a dozen > different OS'es), and a hardware hang will take down all twelve. > So they've got the hypervisor and service processor montioring > each other, keeping things humming. If just one partition goes > down due to a kernel hang/crash, well, that's too bad, but its > not the end of the world from the platform point of view. And this is a great set of goals as far as they go. But, not sufficient when looking at the platform as something which actually delivers services, not just runs the hypervisor. [[I guess I forgot to say that in addition to being the architect for IBM's OSS Linux strategy and product, I worked for 21 years for Bell Labs on highly reliable telecommunications systems before this. So, I have some reasonable knowledge of how these kinds of things work in well-tested, well-proven systems. Typically, telephone systems are considered extremely reliable - because they follow a well-proven discipline of design. The international telephone system is in effect the worlds largest ultra-reliable computer. And, it has been since back when telephone switches were made with discrete transistors - largely because of good HA system design]] > I think Alan's point of view is from the other side of the table: > why should someone buy 12 pci-card watchdogs, one for each partition, > chewing up 12 pci slots, when the pSeries is already capable of doing > watchdog functions? To add insult to injury, the sysadmin now needs > to duct-tape each of the watchdog cards to some sort of kill-switch, > to reboot a dead partition. The kill-switch needs to then ssh to > the fsp or the hmc to start the reboot. So it gets pretty byzantine > for something that could have been 'simple' and built-in. Never mind > that the reliability goes down: the kill switch could fail, the > pci watchdog card could fail (or get EEH'ed out), causing a reboot > when no reboot was necessary, etc. Linas is right about the cost and complexity of the monitoring cards and the whole system. In addition, if we're trying to see pSeries as a premium highly-reliable system better than the competition, it just doesn't send the right message if you tell a customer that this is what they have to do. It looks really Rube Goldberg-ish (to say the least). In addition, from a technical perspective, there is a basic principle in HA systems which is being ignored here... A sick system cannot reliably monitor itself. If you're relying on a system which you believe to be sick to monitor itself, it will be unable to do this reliably under all circumstances - it's sick, and therefore not reliable -- by definition. Crazy people may not think they're insane ;-). The hardware watchdog timer is a 3rd party monitoring system, and therefore is likely to be reliable when the thing it is watching is sick - because its sanity is uncorrelated to the failure of the thing it is watching. For example, if by a programming error in the kernel, you halt or loop with interrupts disabled -- you're screwed with no way out. In mainframes I think this is called a disabled wait state. Of course, there are more complex ways to do this, but hopefully one example makes the point. This is the point of the hierarchy of monitoring I described before. This is very much standard operating procedure for reliable systems in the telecom industry (and many others). In fact, such a watchdog timer is a requirement for Carrier Grade Linux (CGL). Here is the standard way which highly available systems are architected to work -- and it's consistent with 35-year industry practice in telephony systems, the formal CGL requirements, and the architecture of the Linux-HA system. The hardware watchdog timer times out when it doesn't get a heartbeat in the allotted time. (duhhh!) Just before loading the BIOS, the watchdog timer should be set for some "reasonable" amount of time (like a few seconds) for the BIOS to load and begin executing. The BIOS should set the timer for a reasonable time for the bootstrap program to load. It must tickle it periodically while waiting for input from humans.* The bootstrap loader should work much the same way. Before it jumps to the OS, it should set the timer for a reasonable amount of time for the OS to take over the tickling.* When it first comes up, the OS takes over and tickles the watchdog timer. When the HA monitoring subsystem comes up, it takes over and tickles the watchdog timer. As HA-aware processes start up, they tickle individual watchdog timers maintained by the HA monitoring subsystem (apphbd). If they die, or hang, they are restarted by the Recovery Manager. As a special case, apphbd will restart the recovery manager as described below. The recovery manager registers with the HA monitoring subsystem and receives notification of insane or dead processes. If they're insane it kills them. When they die, it restarts them. If the recovery manager dies (or goes insane), then apphbd will (kill and) restart the recovery manager.** When the system panics, then the watchdog timer needs to be tickled while waiting for human input, and while making progress taking a dump. [but only when actually making progress]. When the OS jumps back into the BIOS for any reason then the timer is reset to some value suitable for the BIOS to take over and start tickling it. (~ same as the original value). Now if the BIOS or OS or bootstrap loader, or dump process craps out and hangs, or the hard disk can't boot, or a peripherial hangs the bus, then this watchdog timer will trigger, and the system will be reset - and you'll get a chance to try it again. [[If you fail too often in too short a period of time, then "phone home" or cry "uncle" or sit and cry if you like. Or, you can just keep persisting...]] Later on when HA monitoring system is running, if it (or the scheduler or other piece of the OS) craps out and the HA monitoring system doesn't (or isn't able to) tickle this watchdog timer - for whatever reason - then everything will reboot just like it should. Notice how many different kinds of errors this one single timer can detect and recover from - and how many of them cannot easily be recovered from at all without it. Note how handy it is in designing the system to know that your underlying hardware has this capability built-in. It eliminates a lot of complexity from several pieces of software, and does a better job too! Without this timer, you can't easily design a truly reliable system. (and maybe not at all). The lowest level monitor should be the simplest and most reliable. It monitors the OS. The driver for this in the kernel should also be solid and no-frills. The base-level HA monitoring system (which monitors processes for their health) should also be as simple as possible. Complexity is the enemy of reliability. If any of these components fail, then the system will be rebooted unnecessarily. This is a BadThing(TM). Now, to use this "right", the thing that any subsystem tickling the timer at the next higher level should do is periodically schedule something to evaluate its internal sanity (data structure consistency or queue lengths or whatever), and tickle the watchdog timer only when it passes whatever its internal sanity measure is. Then, if you go into an infinite loop, or doubt your own sanity long enough, someone else will eventually do something about it - you'll be killed and restarted (if a process) -- or rebooted (if you're the HA process monitor, or the BIOS, or bootstrap loader or OS). Of course, this doesn't *replace* external monitoring (see the note above about declaring oneself sick), but it is a good orthogonal measure, and simpler to implement for subsystems with limited external interfaces - like the bootstrap loader. * = Note that these layers may have to deal with bootstrap loaders and/or OSes which won't tickle the watchdog timer - so they have to shut it off (or set it really long) when booting a layer under them which isn't watchdog-aware. ** = The reason why the recovery manager is not part of the apphbd process in our design is because the apphbd process should be as simple as it can be - because it's death or insanity would trigger a system restart. Putting it in a separate process lessens the liklihood of an unnecessary system restart. This is not a necessity, but I believe it to be a good design choice - after all it was my design choice ;-) It is certainly true that we don't have to implement all these things today, or at all, but with the hardware watchdog timer, they're possible. And, without it, they're not. Even without implementing all these extra HA features, it still monitors the OS more reliably than it can monitor itself. So, I think this is a very worthwhile feature for the platform to have. Hope this helps! -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From arnd at arndb.de Thu Oct 14 21:35:13 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 14 Oct 2004 13:35:13 +0200 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041013192356.GE12237@austin.ibm.com> References: <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> Message-ID: <200410141335.17333.arnd@arndb.de> On Middeweken 13 Oktober 2004 21:23, Linas Vepstas wrote: > On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > > This would be a logical equivalent to the well-known and long-standing > > 'softdog' device driver which already has a well-known API, which is also > > implemented on other hardware devices and architectures. > > > > So, my suggestion would be that if it were moved to a userspace driver, > > that the softdog API be retained. > > I might have volunteered to hack this up real quick, were it not for > Mike Strosaker's correction, that the surveillance featues were taken > out of Power5. ? FWIW, s390 linux has just added support for a hypervisor watchdog [1] that looks like a hardware watchdog to linux, but is implemented with hypercalls ("diag 0x288"). Since Power5 is typically running in hypervisor more, the watchdog interface could be provided completely by the firmware. Arnd <>< [1] http://ftp2.de.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/\ 2.6.9-rc4/2.6.9-rc4-mm1/broken-out/s390-9-12-z-vm-watchdog-timer.patch -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041014/29c36997/attachment.pgp From alanr at unix.sh Fri Oct 15 01:56:14 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 09:56:14 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <200410141335.17333.arnd@arndb.de> References: <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <200410141335.17333.arnd@arndb.de> Message-ID: <416EA19E.40300@unix.sh> Arnd Bergmann wrote: > On Middeweken 13 Oktober 2004 21:23, Linas Vepstas wrote: > >>On Wed, Oct 13, 2004 at 12:01:45PM -0600, Alan Robertson was heard to remark: > > >>>This would be a logical equivalent to the well-known and long-standing >>>'softdog' device driver which already has a well-known API, which is also >>>implemented on other hardware devices and architectures. >>> >>>So, my suggestion would be that if it were moved to a userspace driver, >>>that the softdog API be retained. >> >>I might have volunteered to hack this up real quick, were it not for >>Mike Strosaker's correction, that the surveillance featues were taken >>out of Power5. > > > FWIW, s390 linux has just added support for a hypervisor watchdog [1] > that looks like a hardware watchdog to linux, but is implemented with > hypercalls ("diag 0x288"). Since Power5 is typically running in hypervisor > more, the watchdog interface could be provided completely by the > firmware. The method of implementation isn't that important. Are these hypervisor calls (or equivalent) provided by power5? Is there any disadvantage to running under the hypervisor? -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From linas at austin.ibm.com Fri Oct 15 02:21:41 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Thu, 14 Oct 2004 11:21:41 -0500 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <416E0376.1010500@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> Message-ID: <20041014162141.GA958@austin.ibm.com> Hi Alan, Long emails confuse me ... On Wed, Oct 13, 2004 at 10:41:26PM -0600, Alan Robertson was heard to remark: > Linas Vepstas wrote: > >why should someone buy 12 pci-card watchdogs, one for each partition, > >chewing up 12 pci slots, when the pSeries is already capable of doing > > It looks really Rube Goldberg-ish (to say the least). [...] > > The hardware watchdog timer is a 3rd party > monitoring system, and therefore is likely to be reliable when the thing it > is watching is sick - Not sure where you're going with this; are you saying that 3rd-party watchdog PCI cards, one for each partition, is a good idea, or a bad idea? Would you rather have the OS monitoring done with (a) watchdog PCI cards, (b) with 'surveillance' done by firmware/hypervisor, (c) or with some other method? > The bootstrap loader should work much the I guess I didn't get this exposition either. Although its nice to know that boot was successful, I see boot as a whole lot less important than monitoring the system once its gone 'online'. The boot sequence can be monitored much more loosely, with a whole-lot less complexity. The hypervisor knows when the OS boot sequence starts. If the OS hasn't completely booted after, say, 10 minutes, then it can call a human to look at the problem. I don't see why one needs to heartbeat once a second during boot; that's hard to do and seems un-neccessary. By contrast, I'd expect to turn on the once-per-second heartbeat just before the system goes 'online' or 'critical'. --linas From alanr at unix.sh Fri Oct 15 03:34:48 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 11:34:48 -0600 Subject: Hardware Watchdog Device in pSeries? In-Reply-To: <20041014162141.GA958@austin.ibm.com> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> <20041014162141.GA958@austin.ibm.com> Message-ID: <416EB8B8.8040601@unix.sh> Linas Vepstas wrote: > Hi Alan, > > Long emails confuse me ... > > On Wed, Oct 13, 2004 at 10:41:26PM -0600, Alan Robertson was heard to remark: > >>Linas Vepstas wrote: >> >>>why should someone buy 12 pci-card watchdogs, one for each partition, >>>chewing up 12 pci slots, when the pSeries is already capable of doing >> >> It looks really Rube Goldberg-ish (to say the least). > > > [...] > >>The hardware watchdog timer is a 3rd party >>monitoring system, and therefore is likely to be reliable when the thing it >>is watching is sick - > > > > Not sure where you're going with this; are you saying that > 3rd-party watchdog PCI cards, one for each partition, is a > good idea, or a bad idea? > > Would you rather have the OS monitoring done with > (a) watchdog PCI cards, > (b) with 'surveillance' done by firmware/hypervisor, > (c) or with some other method? I would prefer (b). Because the software and address spaces of the firmware/hypervisor are separate, it is effectively a third party reset mechanism. The test I would use is: Does failure of the thing being monitored cause or correlate to failure in the thing doing the monitoring - and the answer is "no" -- therefore it's a third-party reset. I don't have a (c) method in mind that would work in this environment. Evaluating (a) and (b): Method (a): + is third party - is complex and hard to configure all around (think about configuring those cards with passwords and ssh, and ip addresses and partition names and so on - also think about how many things could break and keep this from working). - difficult to support - doesn't scale well in any obvious way - is relatively expensive for the customer (adds several hundred dollars for each partition - maybe as much as $1K) - difficult to bring into existence (compared to (b)) - is ugly, kludgy, and Rube Goldberg-ish. Method (b): + is third party + is relatively simple when compared to (a) (i.e., more reliable) + requires little/no special configuration to make it work + Shows off the advantages of pSeries architecture + adds no cost to the customer's solution + is comparatively easy to bring into existence (compared to a) + is a natural and clean solution. >> The bootstrap loader should work much the > > > I guess I didn't get this exposition either. ---- OK -- as I said this is an improvement over the above - but not absolutely critical -- But I'll try explaining it again and see if giving a shorter answer helps ------- > Although its nice to > know that boot was successful, I see boot as a whole lot less > important than monitoring the system once its gone 'online'. The boot > sequence can be monitored much more loosely, with a whole-lot less > complexity. The hypervisor knows when the OS boot sequence starts. > If the OS hasn't completely booted after, say, 10 minutes, then it > can call a human to look at the problem. I don't see why one needs > to heartbeat once a second during boot; that's hard to do and seems > un-neccessary. I didn't say anything about once a second. It could be once every 30 seconds - or even 5 minutes. That gives you lots of time, and you then only have to heartbeat in a couple of select places, and while in input loops waiting for human input. These aren't so much periodic heartbeats as they are progress reports. If you stop making progress, you get reset. > By contrast, I'd expect to turn on the once-per-second > heartbeat just before the system goes 'online' or 'critical'. This change decreases MTTR. MTTR has an effect on system availability - even in a redundant HA cluster - since MTTR determines the probability of "simultaneous" failures from which the HA system cannot recover. Calling a human is slow and often expensive (particularly on an emergency basis). It takes minutes to hours and may result in an extra service charge from someone (depending on who gets the call, what time it is, and what arrangements are made, etc.). A system which doesn't boot isn't providing service. If service isn't being provided, it doesn't matter why it's not being provided (OS, dump, bootstrap, BIOS, etc.)... The OS is not the only possible cause of failure. The OS is by far more likely than these others, but all software has bugs. And, hardware has transient failures as well as permanent ones. A system with these capabilities will continue to try and provide service in the presence of (transient) errors until it succeeds, or exceeds some retry threshold, meaning a human needs to intervene and fix whatever's wrong. This is essentially autonomic computing for the boot process. In short: With this architecture, the system will come up and provide service, or it is broken so badly that retrying won't help and a human really is needed. Otherwise, no recovery will be performed for errors which keep the system from coming up (after a crash or otherwise) and some outages may be unnecessarily prolonged. If your availability is poor, this will make zero difference. If your availability is very good, this helps a little. And, when your availability is very good, it's hard to find things that help even a little... Of course, being able to say "autonomic computing wired into the lowest levels of the system" probably has marketing value beyond the small amount of improved availability it provides ;-) [[If this system is running the air traffic control system while I'm in the air, I vote for adding this feature ;-)]]. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From alanr at unix.sh Fri Oct 15 07:19:17 2004 From: alanr at unix.sh (Alan Robertson) Date: Thu, 14 Oct 2004 15:19:17 -0600 Subject: My use of the term "3rd party" In-Reply-To: <416E0376.1010500@unix.sh> References: <20041013165050.GC12237@austin.ibm.com> <416D6D89.6030300@unix.sh> <20041013192356.GE12237@austin.ibm.com> <416D88AF.1010706@austin.ibm.com> <416D9E5A.9080102@unix.sh> <20041013221254.GF12237@austin.ibm.com> <416E0376.1010500@unix.sh> Message-ID: <416EED55.7050200@unix.sh> I just realized that this term has a different meaning to many people than it does to me in this context. I meant that it was an independent of the thing it was monitoring. That is, that its probability of failure is an independent random variable with respect to the thing it is measuring. In other words, the failure of the watchdog timer is uncorrelated to failures of the operating system or other user of the watchdog timer. I did *not* mean that you had to buy it from a 3rd party hardware manufacturer. My apologies for what was probably a poor choice of terminology. -- Alan Robertson "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce From benh at kernel.crashing.org Fri Oct 15 19:16:32 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:16:32 +1000 Subject: Fan control for PowerMac7_3 Message-ID: <1097831790.1131.111.camel@gaston> Hi ! This is an experimental (read: totally untested) patch to the G5 fan control code. All I know is that it builds :) It should add proper support for all desktop G5s including liquid cooling. I suggest you run it with debug enabled (#undef DEBUG -> #define DEBUG in the beginning of the .c file) and send me the output though :) It does _NOT_ add support for the Xserve yet ! People who have already working cooling don't _need_ to test, they are welcome to do it though in case I broke something, but only send me the output if you feel something is wrong ... Should apply on top of current bk. Ben. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-15 19:09:05.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,13 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 15, 2004 : 1.1b1 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling */ #include @@ -101,7 +110,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b1" #undef DEBUG @@ -121,16 +130,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +424,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +441,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +497,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +630,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), + FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -659,6 +780,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state->index, (int)state->rpm, intake, pump, state->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +919,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +938,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1021,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1033,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1047,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1059,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1068,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1082,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1096,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1170,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1237,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1304,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1319,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1389,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1546,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) / 2) - 0x10; + else + fan_id = ((*reg) / 2) - 0x30; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,7 +1669,8 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From benh at kernel.crashing.org Fri Oct 15 19:19:42 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:19:42 +1000 Subject: Wrong patch! (Re: Fan control for PowerMac7_3) In-Reply-To: <1097831790.1131.111.camel@gaston> References: <1097831790.1131.111.camel@gaston> Message-ID: <1097831981.1131.113.camel@gaston> On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: > Hi ! > > This is an experimental (read: totally untested) patch to the G5 fan > control code. All I know is that it builds :) And I sent a wrong version ... sorry, the good one in a few minutes. Ben. From benh at kernel.crashing.org Fri Oct 15 19:20:50 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 15 Oct 2004 19:20:50 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097831981.1131.113.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> Message-ID: <1097832049.1149.115.camel@gaston> On Fri, 2004-10-15 at 19:19, Benjamin Herrenschmidt wrote: > On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: > > Hi ! > > > > This is an experimental (read: totally untested) patch to the G5 fan > > control code. All I know is that it builds :) > > And I sent a wrong version ... sorry, the good one in a few minutes. Here it is: diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-15 19:20:06.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,13 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 15, 2004 : 1.1b1 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling */ #include @@ -101,7 +110,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b1" #undef DEBUG @@ -121,16 +130,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +424,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +441,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +497,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +630,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), + FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -659,6 +780,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state->index, (int)state->rpm, intake, pump, state->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +919,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +938,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1021,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1033,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1047,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1059,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1068,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1082,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1096,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1170,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1237,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1304,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1319,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1389,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1546,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,7 +1669,8 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From jimix at watson.ibm.com Sat Oct 16 01:53:17 2004 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Fri, 15 Oct 2004 11:53:17 -0400 Subject: [vHype-discussion] u64 in linux In-Reply-To: <1097849471.25095.97.camel@brick.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> Message-ID: <16751.62061.393716.650492@kitch0.watson.ibm.com> >>>>> "MO" == Michal Ostrowski writes: MO> In trying to integrate ppc64 changes into the vhype linux tree, I'm MO> coming across a problem with usage of "u64". MO> On x86, u64 is "unsigned long long". On ppc64 it is "unsigned long". *sigh* I thought the hell over size_t unsigned int vs. unsigned long would have tought everyone. BTW: a thread starts here: http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html After a whole lot of clicking it looks like a dropped patch. I guess its the cast, it seems thats the linux way at the moment. -JX From jimix at watson.ibm.com Sat Oct 16 02:46:56 2004 From: jimix at watson.ibm.com (Jimi Xenidis) Date: Fri, 15 Oct 2004 12:46:56 -0400 Subject: u64 in linux In-Reply-To: <16751.62061.393716.650492@kitch0.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> Message-ID: <16751.65280.234326.437361@kitch0.watson.ibm.com> >>>>> "JX" == Jimi Xenidis writes: Forgive the CC to my internal list. The real question is, what was the result of this thread? JX> http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html And is casting the acceptable thing to do? -JX From hpa at zytor.com Sat Oct 16 03:33:45 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 10:33:45 -0700 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097832049.1149.115.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: <417009F9.6080007@zytor.com> Hi there, I tried to apply this patch to top-of-tree (bkcvs), but it looks like the current TOT doesn't compile on ppc64 for unrelated reasons: .config attached. arch/ppc64/kernel/built-in.o(.text+0x79f8): In function `.sys_call_table32': : undefined reference to `.sys_acct' arch/ppc64/kernel/built-in.o(.text+0x7c78): In function `.sys_call_table32': : undefined reference to `.sys_quotactl' arch/ppc64/kernel/built-in.o(.text+0x8078): In function `.sys_call_table32': : undefined reference to `.compat_mbind' arch/ppc64/kernel/built-in.o(.text+0x8080): In function `.sys_call_table32': : undefined reference to `.compat_get_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x8088): In function `.sys_call_table32': : undefined reference to `.compat_set_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x8090): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_open' arch/ppc64/kernel/built-in.o(.text+0x8098): In function `.sys_call_table32': : undefined reference to `.sys_mq_unlink' arch/ppc64/kernel/built-in.o(.text+0x80a0): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_timedsend' arch/ppc64/kernel/built-in.o(.text+0x80a8): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_timedreceive' arch/ppc64/kernel/built-in.o(.text+0x80b0): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_notify' arch/ppc64/kernel/built-in.o(.text+0x80b8): In function `.sys_call_table32': : undefined reference to `.compat_sys_mq_getsetattr' arch/ppc64/kernel/built-in.o(.text+0x8260): In function `.sys_call_table': : undefined reference to `.sys_acct' arch/ppc64/kernel/built-in.o(.text+0x84e0): In function `.sys_call_table': : undefined reference to `.sys_quotactl' arch/ppc64/kernel/built-in.o(.text+0x88e0): In function `.sys_call_table': : undefined reference to `.sys_mbind' arch/ppc64/kernel/built-in.o(.text+0x88e8): In function `.sys_call_table': : undefined reference to `.sys_get_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x88f0): In function `.sys_call_table': : undefined reference to `.sys_set_mempolicy' arch/ppc64/kernel/built-in.o(.text+0x88f8): In function `.sys_call_table': : undefined reference to `.sys_mq_open' arch/ppc64/kernel/built-in.o(.text+0x8900): In function `.sys_call_table': : undefined reference to `.sys_mq_unlink' arch/ppc64/kernel/built-in.o(.text+0x8908): In function `.sys_call_table': : undefined reference to `.sys_mq_timedsend' arch/ppc64/kernel/built-in.o(.text+0x8910): In function `.sys_call_table': : undefined reference to `.sys_mq_timedreceive' arch/ppc64/kernel/built-in.o(.text+0x8918): In function `.sys_call_table': : undefined reference to `.sys_mq_notify' arch/ppc64/kernel/built-in.o(.text+0x8920): In function `.sys_call_table': : undefined reference to `.sys_mq_getsetattr' make: *** [.tmp_vmlinux1] Error 1 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: .config Url: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/6b5d420b/attachment.txt From arnd at arndb.de Sat Oct 16 04:58:58 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 15 Oct 2004 20:58:58 +0200 Subject: [vHype-discussion] u64 in linux In-Reply-To: <16751.62061.393716.650492@kitch0.watson.ibm.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> Message-ID: <200410152059.03647.arnd@arndb.de> On Freedag 15 Oktober 2004 17:53, Jimi Xenidis wrote: > BTW: a thread starts here: > ? ?http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html > > After a whole lot of clicking it looks like a dropped patch. > > I guess its the cast, it seems thats the linux way at the moment. Yes, I think there have been some patches to drivers going in that direction. An alternative if the warning is in your own code is to use 'unsigned long long' or a user defined 'uval64' directly in the declaration instead of 'u64'. C99 also mandates that the macro PRIu64 contains the correct format string for uint64_t (which afaik is always the same as u64). It's currently not defined in linux, but could perhaps be added. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/2b1c6e18/attachment.pgp From hpa at zytor.com Sat Oct 16 05:21:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 12:21:07 -0700 Subject: [vHype-discussion] u64 in linux In-Reply-To: <200410152059.03647.arnd@arndb.de> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> Message-ID: <41702323.9010903@zytor.com> Arnd Bergmann wrote: > On Freedag 15 Oktober 2004 17:53, Jimi Xenidis wrote: > >>BTW: a thread starts here: >> http://www.ussg.iu.edu/hypermail/linux/kernel/0402.3/1428.html >> >>After a whole lot of clicking it looks like a dropped patch. >> >>I guess its the cast, it seems thats the linux way at the moment. > > > Yes, I think there have been some patches to drivers going in that > direction. > An alternative if the warning is in your own code is to use > 'unsigned long long' or a user defined 'uval64' directly in > the declaration instead of 'u64'. > > C99 also mandates that the macro PRIu64 contains the correct > format string for uint64_t (which afaik is always the same as u64). > It's currently not defined in linux, but could perhaps be added. > Also, in C99, you can print any integer type by casting it to [u]intmax_t and use %j. -hpa From hpa at zytor.com Sat Oct 16 05:27:58 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 12:27:58 -0700 Subject: [vHype-discussion] u64 in linux In-Reply-To: <41702323.9010903@zytor.com> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> <41702323.9010903@zytor.com> Message-ID: <417024BE.3060008@zytor.com> H. Peter Anvin wrote: > > Also, in C99, you can print any integer type by casting it to > [u]intmax_t and use %j. > By the way, my very firm opinion on this is that we should match and use as much as possible. Quite frankly actually resolves a lot of issues that previous attempts at creating these datatypes -- including the one in Linux -- have ignored. This is a good thing. Yes, there is ugliness, and I actually would have liked to see the C99 committee to have adopted the M$ extension %Inn (e.g. %I64d for a 64-bit signed decimal integer); to make matters worse GNU used %I for a different purpose to it's not even possible to make it a compatible extension. -hpa From olh at suse.de Sat Oct 16 05:34:00 2004 From: olh at suse.de (Olaf Hering) Date: Fri, 15 Oct 2004 21:34:00 +0200 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <20041015193400.GA14307@suse.de> On Fri, Oct 15, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: > > .config attached. > # Linux kernel version: 2.6.9-rc4 rc4-bk3 builds ok for me with that config. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From dwmw2 at infradead.org Sat Oct 16 06:52:42 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Fri, 15 Oct 2004 21:52:42 +0100 Subject: Reserve initrd pages. Message-ID: <1097873562.13633.732.camel@hades.cambridge.redhat.com> We don't mark initrd pages as reserved. If we manage to allocate enough other stuff before using the initrd, we end up eating into the initrd and we don't boot. Signed-Off-By: David Woodhouse ===== arch/ppc64/kernel/setup.c 1.83 vs edited ===== --- 1.83/arch/ppc64/kernel/setup.c 2004-10-04 20:17:37 +01:00 +++ edited/arch/ppc64/kernel/setup.c 2004-10-15 21:02:33 +01:00 @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -990,6 +991,9 @@ /* set up the bootmem stuff with available memory */ do_init_bootmem(); + + if (initrd_start) + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); /* Select the correct idle loop for the platform. */ idle_setup(); -- dwmw2 From schwab at suse.de Sat Oct 16 07:00:48 2004 From: schwab at suse.de (Andreas Schwab) Date: Fri, 15 Oct 2004 23:00:48 +0200 Subject: 2.6.9-rc4: oops during ide probing In-Reply-To: (Andreas Schwab's message of "Mon, 11 Oct 2004 22:11:42 +0200") References: Message-ID: > I'm getting an oops during ide probing on the PMac G5 with 2.6.9-rc4: > > ide-pmac: cannot find MacIO node for Kauai ATA interface > ide0: Found Apple OHare ATA controller, bus ID 0, irq 0 > Oops: Kernel access of bad area, sig: 11 [#1] > NIP [...] .ide_mm_inb+0x0/0x14 > LR [...] .ide_wait_not_busy+0x98/0xf0 That turned out to be an apparent compiler bug. The kernel is working fine for me now. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From schwab at suse.de Sat Oct 16 08:16:28 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 16 Oct 2004 00:16:28 +0200 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097832049.1149.115.camel@gaston> (Benjamin Herrenschmidt's message of "Fri, 15 Oct 2004 19:20:50 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > On Fri, 2004-10-15 at 19:19, Benjamin Herrenschmidt wrote: >> On Fri, 2004-10-15 at 19:16, Benjamin Herrenschmidt wrote: >> > Hi ! >> > >> > This is an experimental (read: totally untested) patch to the G5 fan >> > control code. All I know is that it builds :) >> >> And I sent a wrong version ... sorry, the good one in a few minutes. > > Here it is: Here's a patch to make it compile with DEBUG enabled: --- linux-2.6.9-rc4/drivers/macintosh/therm_pm72.c.~1~ 2004-10-16 00:02:36.705511068 +0200 +++ linux-2.6.9-rc4/drivers/macintosh/therm_pm72.c 2004-10-16 00:07:04.815455733 +0200 @@ -652,7 +652,7 @@ static int do_read_one_cpu_values(struct DBG(" cpu %d, fan reading error !\n", state->index); } else { state->rpm = rc; - DBG(" cpu %d, exhaust RPM: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); } /* Get some sensor readings and scale it */ @@ -691,8 +691,8 @@ static int do_read_one_cpu_values(struct state->last_power = *power; DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", - state->index, FIX32TOPRINT(current_a), FIX32TOPRINT(voltage), - FIX32TOPRINT(*power)); + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); return 0; } @@ -850,7 +850,7 @@ static void do_monitor_cpu_combined(void state1->intake_rpm = state0->intake_rpm; DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", - state->index, (int)state->rpm, intake, pump, state->overtemp); + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); /* We should check for errors, shouldn't we ? But then, what * do we do once the error occurs ? For FCU notified fan Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From raosanth at us.ibm.com Sat Oct 16 07:00:25 2004 From: raosanth at us.ibm.com (Santhosh Rao) Date: Fri, 15 Oct 2004 16:00:25 -0500 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" Message-ID: Ok, it appears we aren't dropping into the open firmware debugger randomly, the kernel seems to give up early in the boot process Below is the output of an attempted boot of 2.6.9-rc4. Jose, ever seen anything like this? The machine is a p615 power-4 2-CPU box with 2GB of RAM. -- Sonny Output: Elapsed time since release of system processors: 1 mins 23 secs Config file read, 4096 bytes Welcome to yaboot version 1.3.11.SuSE Enter "help" to get some basic usage information boot: autobench Please wait, loading kernel... Elf64 kernel loaded... OF stdout device is: /pci at 400000000110/isa at 3/serial at i3f8 command line: root=/dev/sda3 elevator=noop elevator=noop memory layout at init: alloc_bottom : 000000000403c000 alloc_top : 0000000040000000 alloc_top_hi : 0000000080000000 rmo_top : 0000000080000000 ram_top : 0000000080000000 Looking for displays ERROR, cannot find space for TCE table. EXIT called ok 0 > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/46c8a0fc/attachment.htm From dwmw2 at infradead.org Sat Oct 16 09:15:50 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 00:15:50 +0100 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097882150.13633.754.camel@hades.cambridge.redhat.com> On Fri, 2004-10-15 at 21:52 +0100, David Woodhouse wrote: > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); That doesn't work if CONFIG_NUMA is set. This one does... --- linux-2.6.8/arch/ppc64/kernel/setup.c~ 2004-10-15 20:59:01.000000000 +0100 +++ linux-2.6.8/arch/ppc64/kernel/setup.c 2004-10-15 23:59:18.082932384 +0100 @@ -533,6 +533,8 @@ if (initrd_start) printk("Found initrd at 0x%lx:0x%lx\n", initrd_start, initrd_end); + lmb_reserve(__pa(initrd_start), initrd_end-initrd_start); + DBG(" <- check_for_initrd()\n"); #endif /* CONFIG_BLK_DEV_INITRD */ } -- dwmw2 From dwmw2 at infradead.org Sat Oct 16 09:35:23 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 00:35:23 +0100 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <1097883323.13633.757.camel@hades.cambridge.redhat.com> On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: Building with -mcall-aixdesc will work around that. -- dwmw2 From benh at kernel.crashing.org Sat Oct 16 10:39:21 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:39:21 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <417009F9.6080007@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> Message-ID: <1097887160.6527.15.camel@gaston> On Sat, 2004-10-16 at 03:33, H. Peter Anvin wrote: > Hi there, > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > the current TOT doesn't compile on ppc64 for unrelated reasons: Weird... could it be cond_syscall not working ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:42:02 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:42:02 +1000 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097883323.13633.757.camel@hades.cambridge.redhat.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> <1097883323.13633.757.camel@hades.cambridge.redhat.com> Message-ID: <1097887322.6487.21.camel@gaston> On Sat, 2004-10-16 at 09:35, David Woodhouse wrote: > On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > > Hi there, > > > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > > the current TOT doesn't compile on ppc64 for unrelated reasons: > > Building with -mcall-aixdesc will work around that. What is the exact problem ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:45:10 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:45:10 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: References: Message-ID: <1097887510.6487.23.camel@gaston> On Sat, 2004-10-16 at 07:00, Santhosh Rao wrote: > Ok, it appears we aren't dropping into the open firmware debugger > randomly, the kernel seems to give up early in the boot process > Below is the output of an attempted boot of 2.6.9-rc4. > > Jose, ever seen anything like this? > > The machine is a p615 power-4 2-CPU box with 2GB of RAM. Can you enable PROM_DEBUG in arch/ppc64/kernel/prom_init.c and send me the output log ? Ben. From benh at kernel.crashing.org Sat Oct 16 10:46:18 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:46:18 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097887578.6546.25.camel@gaston> On Sat, 2004-10-16 at 06:52, David Woodhouse wrote: > We don't mark initrd pages as reserved. If we manage to allocate enough > other stuff before using the initrd, we end up eating into the initrd > and we don't boot. Hrm... that should be done in > Signed-Off-By: David Woodhouse > > ===== arch/ppc64/kernel/setup.c 1.83 vs edited ===== > --- 1.83/arch/ppc64/kernel/setup.c 2004-10-04 20:17:37 +01:00 > +++ edited/arch/ppc64/kernel/setup.c 2004-10-15 21:02:33 +01:00 > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -990,6 +991,9 @@ > > /* set up the bootmem stuff with available memory */ > do_init_bootmem(); > + > + if (initrd_start) > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); > > /* Select the correct idle loop for the platform. */ > idle_setup(); -- Benjamin Herrenschmidt From benh at kernel.crashing.org Sat Oct 16 10:47:41 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:47:41 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097873562.13633.732.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> Message-ID: <1097887661.6487.28.camel@gaston> On Sat, 2004-10-16 at 06:52, David Woodhouse wrote: > We don't mark initrd pages as reserved. If we manage to allocate enough > other stuff before using the initrd, we end up eating into the initrd > and we don't boot. That should be done in mm/init.c, do_init_bootmem() itself: /* reserve the sections we're already using */ for (i=0; i < lmb.reserved.cnt; i++) { unsigned long physbase = lmb.reserved.region[i].physbase; unsigned long size = lmb.reserved.region[i].size; reserve_bootmem(physbase, size); } The initrd is part of the "reserved map" passed in by prom_init and thus is put in the list of reserved lmb regions. Ben. From benh at kernel.crashing.org Sat Oct 16 10:48:11 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 10:48:11 +1000 Subject: Reserve initrd pages. In-Reply-To: <1097882150.13633.754.camel@hades.cambridge.redhat.com> References: <1097873562.13633.732.camel@hades.cambridge.redhat.com> <1097882150.13633.754.camel@hades.cambridge.redhat.com> Message-ID: <1097887691.6527.30.camel@gaston> On Sat, 2004-10-16 at 09:15, David Woodhouse wrote: > On Fri, 2004-10-15 at 21:52 +0100, David Woodhouse wrote: > > + reserve_bootmem(__pa(initrd_start), initrd_end-initrd_start); > > That doesn't work if CONFIG_NUMA is set. This one does... Again, it should be already in the LMB reserve map, if not, then there is a bug, but that isn't the right fix. Ben. From dwmw2 at infradead.org Sat Oct 16 10:47:46 2004 From: dwmw2 at infradead.org (David Woodhouse) Date: Sat, 16 Oct 2004 01:47:46 +0100 Subject: Fan control for PowerMac7_3 In-Reply-To: <1097887322.6487.21.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <417009F9.6080007@zytor.com> <1097883323.13633.757.camel@hades.cambridge.redhat.com> <1097887322.6487.21.camel@gaston> Message-ID: <1097887666.5788.2059.camel@baythorne.infradead.org> On Sat, 2004-10-16 at 10:42 +1000, Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 09:35, David Woodhouse wrote: > > On Fri, 2004-10-15 at 10:33 -0700, H. Peter Anvin wrote: > > > Hi there, > > > > > > I tried to apply this patch to top-of-tree (bkcvs), but it looks like > > > the current TOT doesn't compile on ppc64 for unrelated reasons: > > > > Building with -mcall-aixdesc will work around that. > > What is the exact problem ? cond_syscall not working due to new ABI. -- dwmw2 From benh at kernel.crashing.org Sat Oct 16 12:23:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 12:23:53 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097832049.1149.115.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> Message-ID: <1097893432.6546.37.camel@gaston> Ok, here's a new patch that fixes a few issues, it's been tested on a non-liquid cooled system and appear to work ok. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-16 12:21:42.000000000 +1000 @@ -46,6 +46,8 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? * * History: * @@ -73,6 +75,14 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 16, 2004 : 1.1b2 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling + * - Workaround for some PowerMac7,3 with empty "fan" node in the devtree */ #include @@ -101,7 +111,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b2" #undef DEBUG @@ -121,16 +131,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +425,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +442,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +498,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +631,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -626,7 +748,7 @@ * input target is mpu.ttarget, input max is mpu.tmax */ integ_p = ((s64)state->mpu.pid_gr) * (s64)integral; - DBG(" integ_p: %d\n", (int)(deriv_p >> 36)); + DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sval = (state->mpu.tmax << 16) - ((integ_p >> 20) & 0xffffffff); adj_in_target = (state->mpu.ttarget << 16); if (adj_in_target > sval) @@ -659,6 +781,127 @@ state->rpm = state->mpu.rminn_exhaust_fan; if (state->rpm > state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; if (intake < state->mpu.rminn_intake_fan) @@ -677,11 +920,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +939,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1022,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1034,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1048,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1060,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1069,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1083,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1097,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3 at 0,f8000000"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1171,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1238,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1305,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1320,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1390,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1547,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,15 +1670,20 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); np = of_find_node_by_type(NULL, "fcu"); if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u3 at 0,f8000000/i2c at f8001000/fan at 15e"); + if (np == NULL) { + printk(KERN_ERR "Can't find FCU in device-tree !\n"); + return -ENODEV; + } } of_dev = of_platform_device_create(np, "temperature"); if (of_dev == NULL) { diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-15 18:58:22.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + u32 G_d; + u32 G_p; + u32 G_r; + u32 input_target; + u32 output_min; + u32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From hpa at zytor.com Sat Oct 16 14:24:05 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 21:24:05 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097893432.6546.37.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <4170A265.6030402@zytor.com> Benjamin Herrenschmidt wrote: > Ok, here's a new patch that fixes a few issues, it's been > tested on a non-liquid cooled system and appear to work ok. I'm testing it out right now. It is definitely suffering from some degree of oscillation when idling, and it seems to be one particular (set of) fan(s) that is having that problem. Note that one cause of oscillation at low speed is that there is a minimum speed below which the fans will simply stop. This may be what is happening here. Some time later I'll try to figure out which numbers to collect and try to generate a graph over time. It's definitely passing the stress test, though; make -j4 on the whole kernel (with the .config posted earlier) took 4:27.19. The other stress test -- which used to kill the old thermal driver dead in a matter of seconds -- is to start a bunch of "cat /dev/zero > /dev/null" is happily running, and nice and quiet. The oscillation is obnoxious, though. -hpa From nathanl at austin.ibm.com Sat Oct 16 14:37:04 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Fri, 15 Oct 2004 23:37:04 -0500 Subject: cpu hotplug broken in 2.6.9-rc4 Message-ID: <1097901423.3226.42.camel@biclops> (Urgh, sent this to the wrong list initially, sorry. Second try...) Hi- Seems that cpu hotplug got broken when benh's monster cleanup patch went into bk (in 2.6.9-rc2-bk10). System boots fine, but if I take down a cpu and then try to bring it back up, I get: # echo 1 > /sys/devices/system/cpu/cpu1/online Bad kernel stack pointer 7d23080 at 6373c0 cpu 0x1: Vector: c000000002ff4d80 at [c0000000077c7d40] pc: 00000000006373c0 lr: 00000000006373c0 sp: 7d23080 msr: 1002 current = 0xc00000000779a8a0 paca = 0xc000000000493d00 pid = 0, comm = swapper enter ? for help 1:mon> t SP (7d23080) is in userspace 1:mon> r R00 = 0000000000000000 R16 = 0000000000000000 R01 = 0000000007d23080 R17 = 0000000000000000 R02 = 0000000007ad4b68 R18 = 0000000000000000 R03 = 0000000000000001 R19 = 0000000000000000 R04 = 00000000006373c0 R20 = c000000000493880 R05 = 0000000000000001 R21 = 00016bb01a585f1a R06 = 0000000000000020 R22 = c000000000493d00 R07 = fffffffd00000000 R23 = c000000002565008 R08 = 00000000000d6000 R24 = 0000000000000001 R09 = c00000000737ef80 R25 = 0000000000000000 R10 = c00000000737ed40 R26 = 0000000000000008 R11 = c00000000737eb40 R27 = 0000000000000010 R12 = 0000000000000001 R28 = c0000000077a4000 R13 = c000000000493d00 R29 = 0000000007889698 R14 = 0000000000000000 R30 = c0000000077a4010 R15 = 0000000007ab0420 R31 = 0000000007d23080 pc = 00000000006373c0 lr = 00000000006373c0 msr = 0000000000001002 cr = 22000024 ctr = 800000000010dd60 xer = 0000000000000001 trap = c000000002ff4d80 For what it's worth, the least significant half of pc (00000000006373c0) matches the address of pseries_secondary_smp_init in the System.map: c0000000006373c0 D pseries_secondary_smp_init If I revert the monster patch from the 2.6.9-rc2-bk10 snapshot things work fine. I haven't been able to figure out yet how the stack pointer gets a bad value. Nathan From benh at kernel.crashing.org Sat Oct 16 14:50:07 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:50:07 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170A265.6030402@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> Message-ID: <1097902206.8965.2.camel@gaston> On Sat, 2004-10-16 at 14:24, H. Peter Anvin wrote: > Benjamin Herrenschmidt wrote: > > Ok, here's a new patch that fixes a few issues, it's been > > tested on a non-liquid cooled system and appear to work ok. > > I'm testing it out right now. It is definitely suffering from some > degree of oscillation when idling, and it seems to be one particular > (set of) fan(s) that is having that problem. Which ones ? The CPU fans ? > Note that one cause of oscillation at low speed is that there is a > minimum speed below which the fans will simply stop. This may be what > is happening here. Do the fan actually stop ? Yes we "floor" the fan speeds and indeed, Apple algorithm is known to slowly oscillate, on my box it's between 300 and 1000 RPM for the CPU fans over a period of a minute or 2. Such an oscillation is expected. Something worse would mean we get something wrong. Did you compare against OS X ? > Some time later I'll try to figure out which numbers to collect and try > to generate a graph over time. > > It's definitely passing the stress test, though; make -j4 on the whole > kernel (with the .config posted earlier) took 4:27.19. The other stress > test -- which used to kill the old thermal driver dead in a matter of > seconds -- is to start a bunch of "cat /dev/zero > /dev/null" is happily > running, and nice and quiet. > > The oscillation is obnoxious, though. Hehe... Ben. From hpa at zytor.com Sat Oct 16 14:58:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 21:58:07 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902206.8965.2.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> Message-ID: <4170AA5F.6060107@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:24, H. Peter Anvin wrote: > >>Benjamin Herrenschmidt wrote: >> >>>Ok, here's a new patch that fixes a few issues, it's been >>>tested on a non-liquid cooled system and appear to work ok. >> >>I'm testing it out right now. It is definitely suffering from some >>degree of oscillation when idling, and it seems to be one particular >>(set of) fan(s) that is having that problem. > > Which ones ? The CPU fans ? I don't know how to tell; it's a significant sound. Let me see if I can figure it out. >>Note that one cause of oscillation at low speed is that there is a >>minimum speed below which the fans will simply stop. This may be what >>is happening here. > > Do the fan actually stop ? Yes we "floor" the fan speeds and indeed, > Apple algorithm is known to slowly oscillate, on my box it's between > 300 and 1000 RPM for the CPU fans over a period of a minute or 2. > > Such an oscillation is expected. Something worse would mean we get > something wrong. Did you compare against OS X ? OS X doesn't sound like this. The oscillation period for what it's worth is 10 seconds. -hpa From benh at kernel.crashing.org Sat Oct 16 14:55:45 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:55:45 +1000 Subject: [PATCH] ppc64: Fix a typo in the code that reserves memory at boot Message-ID: <1097902544.8963.5.camel@gaston> Hi ! The code that marks memory regions as "reserved" early during boot has a typo (doing incorrect rounding of the top address) which can cause some areas to not be properly reserved. That may explain some cases of initrd corruption reported recently. Signed-off-by: Benjamin Herrenschmidt ===== arch/ppc64/kernel/prom_init.c 1.2 vs edited ===== --- 1.2/arch/ppc64/kernel/prom_init.c 2004-09-27 19:12:49 +10:00 +++ edited/arch/ppc64/kernel/prom_init.c 2004-10-16 14:53:28 +10:00 @@ -595,7 +595,7 @@ * dumb and just copy this entire array to the boot params */ base = _ALIGN_DOWN(base, PAGE_SIZE); - top = _ALIGN_DOWN(top, PAGE_SIZE); + top = _ALIGN_UP(top, PAGE_SIZE); size = top - base; if (cnt >= (MEM_RESERVE_MAP_SIZE - 1)) From benh at kernel.crashing.org Sat Oct 16 14:58:14 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 14:58:14 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170AA5F.6060107@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> Message-ID: <1097902694.8965.8.camel@gaston> On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > I don't know how to tell; it's a significant sound. Let me see if I can > figure it out. If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well with the case open, as long as you keep the plexiglass in place, which drives the air flow, and you'll be able to see the CPU and slots fans. You can also read the speed values from /sys/devices/temperature > OS X doesn't sound like this. The oscillation period for what it's > worth is 10 seconds. Ok, there must be something wrong then... Ben. From hpa at zytor.com Sat Oct 16 15:03:26 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:03:26 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902694.8965.8.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> Message-ID: <4170AB9E.5010006@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > > >>I don't know how to tell; it's a significant sound. Let me see if I can >>figure it out. > > > If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well > with the case open, as long as you keep the plexiglass in place, which > drives the air flow, and you'll be able to see the CPU and slots fans. > > You can also read the speed values from /sys/devices/temperature > That's what I'm about to do. Hang on. -hpa From benh at kernel.crashing.org Sat Oct 16 15:02:53 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:02:53 +1000 Subject: [PATCH] ppc64: Fix a typo in the code that reserves memory at boot In-Reply-To: <1097902544.8963.5.camel@gaston> References: <1097902544.8963.5.camel@gaston> Message-ID: <1097902973.9026.10.camel@gaston> On Sat, 2004-10-16 at 14:55, Benjamin Herrenschmidt wrote: > Hi ! > > The code that marks memory regions as "reserved" early during boot > has a typo (doing incorrect rounding of the top address) which can > cause some areas to not be properly reserved. That may explain some > cases of initrd corruption reported recently. > > Signed-off-by: Benjamin Herrenschmidt Ok, ignore it and take Anton's one instead. Ben. From hpa at zytor.com Sat Oct 16 15:32:07 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:32:07 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097902694.8965.8.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> Message-ID: <4170B257.1010602@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 14:58, H. Peter Anvin wrote: > > >>I don't know how to tell; it's a significant sound. Let me see if I can >>figure it out. > > > If the dual 2.5Ghz is like the old dual 2Ghz, you can run prefectly well > with the case open, as long as you keep the plexiglass in place, which > drives the air flow, and you'll be able to see the CPU and slots fans. > > You can also read the speed values from /sys/devices/temperature > > >>OS X doesn't sound like this. The oscillation period for what it's >>worth is 10 seconds. > > > Ok, there must be something wrong then... > It's the backside fan that oscillates; backside_fan_pwm varies between 30 and 100 in what is pretty much a squarewave. See attached graph (and note how the other fans vary with workload.) I probably need to write a "power virus" program for the G5 to really test out the high end (a power virus is a program which keeps the chip running as hard as it can; generally keep all pipelines stuffed.) -hpa -------------- next part -------------- A non-text attachment was scrubbed... Name: temps.pdf Type: application/pdf Size: 11823 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/4f39c9d7/attachment.pdf From benh at kernel.crashing.org Sat Oct 16 15:33:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:33:04 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <1097904783.8961.23.camel@gaston> On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) > > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) think about also banging FPU and Altivec units then :) Since it's low oscillation point is 30, I suppose it properly detects U3H (can you verify that in the code, adding a printk for example in init_backside_state()). I'll double check the values used for the PID in darwin Ben From benh at kernel.crashing.org Sat Oct 16 15:43:19 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 16 Oct 2004 15:43:19 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <1097905399.8963.26.camel@gaston> On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) > > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) Strange... The values used seem to be identical to OS X (a 75? target which is high actually, and a different G_d value than old U3). I would need to see the debug output and compare with the OS X driver built with debug output as well (don't ask me to fully understand the math of the PID algorithm) Ben. From hpa at zytor.com Sat Oct 16 16:01:08 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 23:01:08 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097904783.8961.23.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <1097904783.8961.23.camel@gaston> Message-ID: <4170B924.3040104@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) >> >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > think about also banging FPU and Altivec units then :) > Those would be included in "all pipelines." I need to learn more about the specifics of the G5 -- and general PowerPC stuff -- before I can write such a program, though. > Since it's low oscillation point is 30, I suppose it properly detects > U3H (can you verify that in the code, adding a printk for example in > init_backside_state()). > > I'll double check the values used for the PID in darwin I'll do that and compile with debugging enabled, and send you a log from hell. -hpa From nathanl at austin.ibm.com Sat Oct 16 16:14:17 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Sat, 16 Oct 2004 01:14:17 -0500 Subject: [PATCH] ppc64: fix smp_startup_cpu for cpu hotplug In-Reply-To: <1097901423.3226.42.camel@biclops> References: <1097901423.3226.42.camel@biclops> Message-ID: <1097907257.3226.47.camel@biclops> This change is needed in order to allow cpus to be onlined after boot. This used to work but the declaration of pseries_secondary_smp_init in this file was changed in Ben's big cleanup patch a while back, so the cpu would start at a bad address. Signed-off-by: Nathan Lynch smp.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: 2.6.9-rc4/arch/ppc64/kernel/smp.c =================================================================== --- 2.6.9-rc4.orig/arch/ppc64/kernel/smp.c 2004-10-16 00:38:57.404529136 -0500 +++ 2.6.9-rc4/arch/ppc64/kernel/smp.c 2004-10-16 00:56:13.266054248 -0500 @@ -390,7 +390,8 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { int status; - unsigned long start_here = __pa(pseries_secondary_smp_init); + unsigned long start_here = __pa((u32)*((unsigned long *) + pseries_secondary_smp_init)); unsigned int pcpu; /* At boot time the cpus are already spinning in hold From schwab at suse.de Sun Oct 17 06:05:22 2004 From: schwab at suse.de (Andreas Schwab) Date: Sat, 16 Oct 2004 22:05:22 +0200 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097893432.6546.37.camel@gaston> (Benjamin Herrenschmidt's message of "Sat, 16 Oct 2004 12:23:53 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Ok, here's a new patch that fixes a few issues, it's been > tested on a non-liquid cooled system and appear to work ok. That doesn't work very well for me. The fans are constantly spinning at a rather high rate independent of how loaded the system is. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From paulus at samba.org Sun Oct 17 10:53:21 2004 From: paulus at samba.org (Paul Mackerras) Date: Sun, 17 Oct 2004 10:53:21 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <4170B257.1010602@zytor.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> Message-ID: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> H. Peter Anvin writes: > It's the backside fan that oscillates; backside_fan_pwm varies between > 30 and 100 in what is pretty much a squarewave. See attached graph (and > note how the other fans vary with workload.) The sharp rises look like the code thinks it gets into an over-temperature situation and turns the fans on full blast. It could be worth putting some printks in the overtemp code. > I probably need to write a "power virus" program for the G5 to really > test out the high end (a power virus is a program which keeps the chip > running as hard as it can; generally keep all pipelines stuffed.) Hmmm, I should see if I can dig such a thing out of somewhere in IBM. Paul. From hpa at zytor.com Sun Oct 17 10:58:22 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Sat, 16 Oct 2004 17:58:22 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <16753.49793.159513.618588@cargo.ozlabs.ibm.com> Message-ID: <4171C3AE.3010302@zytor.com> Paul Mackerras wrote: > H. Peter Anvin writes: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) > > The sharp rises look like the code thinks it gets into an > over-temperature situation and turns the fans on full blast. It could > be worth putting some printks in the overtemp code. > Changing the unsigned variables to signed per Ben's suggestion seems to have solved the problem. > >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > Hmmm, I should see if I can dig such a thing out of somewhere in IBM. > That would be good; otherwise they're not too hard to write given a microarchitectural description. -hpa From benh at kernel.crashing.org Sun Oct 17 10:58:04 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 10:58:04 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <1097974684.8965.59.camel@gaston> On Sun, 2004-10-17 at 06:05, Andreas Schwab wrote: > Benjamin Herrenschmidt writes: > > > Ok, here's a new patch that fixes a few issues, it's been > > tested on a non-liquid cooled system and appear to work ok. > > That doesn't work very well for me. The fans are constantly spinning at a > rather high rate independent of how loaded the system is. Is it all fans or just the backside fan getting crazy ? This later bug is fixed by version #4 I'll post in a minute... Ben. From benh at kernel.crashing.org Sun Oct 17 11:01:03 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 11:01:03 +1000 Subject: Fan control for PowerMac7_3 (#4) In-Reply-To: <1097893432.6546.37.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> Message-ID: <1097974861.8965.62.camel@gaston> This version fixes a bug with the backside fan doing crazy things, it appears to work properly on the dual 2.5Ghz now. Unless I get a negative report, I intend to submit it to Linus in a couple of days. diff -urN linux-2.5/drivers/macintosh/therm_pm72.c linux-pogo/drivers/macintosh/therm_pm72.c --- linux-2.5/drivers/macintosh/therm_pm72.c 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.c 2004-10-16 18:49:57.000000000 +1000 @@ -46,6 +46,9 @@ * overtemp conditions so userland can take some policy * decisions, like slewing down CPUs * - Deal with fan and i2c failures in a better way + * - Maybe do a generic PID based on params used for + * U3 and Drives ? + * - Add RackMac3,1 support (XServe g5) * * History: * @@ -73,6 +76,15 @@ * values in the configuration register * - Switch back to use of target fan speed for PID, thus lowering * pressure on i2c + * + * Oct. 16, 2004 : 1.1b3 (beta) + * - Add device-tree lookup for fan IDs, should detect liquid cooling + * pumps when present + * - Enable driver for PowerMac7,3 machines + * - Split the U3/Backside cooling on U3 & U3H versions as Darwin does + * - Add new CPU cooling algorithm for machines with liquid cooling + * - Workaround for some PowerMac7,3 with empty "fan" node in the devtree + * - Fix a signed/unsigned compare issue in some PID loops */ #include @@ -101,7 +113,7 @@ #include "therm_pm72.h" -#define VERSION "0.9" +#define VERSION "1.1b3" #undef DEBUG @@ -121,16 +133,100 @@ static struct i2c_adapter * u3_1; static struct i2c_client * fcu; static struct cpu_pid_state cpu_state[2]; +static struct basckside_pid_params backside_params; static struct backside_pid_state backside_state; static struct drives_pid_state drives_state; static int state; static int cpu_count; +static int cpu_pid_type; static pid_t ctrl_task; static struct completion ctrl_complete; static int critical_state; static DECLARE_MUTEX(driver_lock); /* + * We have 2 types of CPU PID control. One is "split" old style control + * for intake & exhaust fans, the other is "combined" control for both + * CPUs that also deals with the pumps when present. To be "compatible" + * with OS X at this point, we only use "COMBINED" on the machines that + * are identified as having the pumps (though that identification is at + * least dodgy). Ultimately, we could probably switch completely to this + * algorithm provided we hack it to deal with the UP case + */ +#define CPU_PID_TYPE_SPLIT 0 +#define CPU_PID_TYPE_COMBINED 1 + +/* + * This table describes all fans in the FCU. The "id" and "type" values + * are defaults valid for all earlier machines. Newer machines will + * eventually override the table content based on the device-tree + */ +struct fcu_fan_table +{ + char* loc; /* location code */ + int type; /* 0 = rpm, 1 = pwm, 2 = pump */ + int id; /* id or -1 */ +}; + +#define FCU_FAN_RPM 0 +#define FCU_FAN_PWM 1 + +#define FCU_FAN_ABSENT_ID -1 + +#define FCU_FAN_COUNT ARRAY_SIZE(fcu_fans) + +struct fcu_fan_table fcu_fans[] = { + [BACKSIDE_FAN_PWM_INDEX] = { + .loc = "BACKSIDE", + .type = FCU_FAN_PWM, + .id = BACKSIDE_FAN_PWM_DEFAULT_ID, + }, + [DRIVES_FAN_RPM_INDEX] = { + .loc = "DRIVE BAY", + .type = FCU_FAN_RPM, + .id = DRIVES_FAN_RPM_DEFAULT_ID, + }, + [SLOTS_FAN_PWM_INDEX] = { + .loc = "SLOT", + .type = FCU_FAN_PWM, + .id = SLOTS_FAN_PWM_DEFAULT_ID, + }, + [CPUA_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU A INTAKE", + .type = FCU_FAN_RPM, + .id = CPUA_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUA_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU A EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUA_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + [CPUB_INTAKE_FAN_RPM_INDEX] = { + .loc = "CPU B INTAKE", + .type = FCU_FAN_RPM, + .id = CPUB_INTAKE_FAN_RPM_DEFAULT_ID, + }, + [CPUB_EXHAUST_FAN_RPM_INDEX] = { + .loc = "CPU B EXHAUST", + .type = FCU_FAN_RPM, + .id = CPUB_EXHAUST_FAN_RPM_DEFAULT_ID, + }, + /* pumps aren't present by default, have to be looked up in the + * device-tree + */ + [CPUA_PUMP_RPM_INDEX] = { + .loc = "CPU A PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, + [CPUB_PUMP_RPM_INDEX] = { + .loc = "CPU B PUMP", + .type = FCU_FAN_RPM, + .id = FCU_FAN_ABSENT_ID, + }, +}; + +/* * i2c_driver structure to attach to the host i2c controller */ @@ -331,10 +427,16 @@ return 0; } -static int set_rpm_fan(int fan, int rpm) +static int set_rpm_fan(int fan_index, int rpm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (rpm < 300) rpm = 300; @@ -342,43 +444,55 @@ rpm = 8191; buf[0] = rpm >> 5; buf[1] = rpm << 3; - rc = fan_write_reg(0x10 + (fan * 2), buf, 2); + rc = fan_write_reg(0x10 + (id * 2), buf, 2); if (rc < 0) return -EIO; return 0; } -static int get_rpm_fan(int fan, int programmed) +static int get_rpm_fan(int fan_index, int programmed) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc, reg_base; + int rc, id, reg_base; + + if (fcu_fans[fan_index].type != FCU_FAN_RPM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0xb, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0xd, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ reg_base = programmed ? 0x10 : 0x11; - rc = fan_read_reg(reg_base + (fan * 2), buf, 2); + rc = fan_read_reg(reg_base + (id * 2), buf, 2); if (rc != 2) return -EIO; return (buf[0] << 5) | buf[1] >> 3; } -static int set_pwm_fan(int fan, int pwm) +static int set_pwm_fan(int fan_index, int pwm) { unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; if (pwm < 10) pwm = 10; @@ -386,32 +500,38 @@ pwm = 100; pwm = (pwm * 2559) / 1000; buf[0] = pwm; - rc = fan_write_reg(0x30 + (fan * 2), buf, 1); + rc = fan_write_reg(0x30 + (id * 2), buf, 1); if (rc < 0) return rc; return 0; } -static int get_pwm_fan(int fan) +static int get_pwm_fan(int fan_index) { unsigned char failure; unsigned char active; unsigned char buf[2]; - int rc; + int rc, id; + + if (fcu_fans[fan_index].type != FCU_FAN_PWM) + return -EINVAL; + id = fcu_fans[fan_index].id; + if (id == FCU_FAN_ABSENT_ID) + return -EINVAL; rc = fan_read_reg(0x2b, &failure, 1); if (rc != 1) return -EIO; - if ((failure & (1 << fan)) != 0) + if ((failure & (1 << id)) != 0) return -EFAULT; rc = fan_read_reg(0x2d, &active, 1); if (rc != 1) return -EIO; - if ((active & (1 << fan)) == 0) + if ((active & (1 << id)) == 0) return -ENXIO; /* Programmed value or real current speed */ - rc = fan_read_reg(0x30 + (fan * 2), buf, 1); + rc = fan_read_reg(0x30 + (id * 2), buf, 1); if (rc != 1) return -EIO; @@ -513,80 +633,84 @@ /* * CPUs fans control loop */ -static void do_monitor_cpu(struct cpu_pid_state *state) + +static int do_read_one_cpu_values(struct cpu_pid_state *state, s32 *temp, s32 *power) { - s32 temp, voltage, current_a, power, power_target; - s32 integral, derivative, proportional, adj_in_target, sval; - s64 integ_p, deriv_p, prop_p, sum; - int i, intake, rc; + s32 ltemp, volts, amps; + int rc = 0; - DBG("cpu %d:\n", state->index); + /* Default (in case of error) */ + *temp = state->cur_temp; + *power = state->cur_power; /* Read current fan status */ if (state->index == 0) - rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); else - rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { - printk(KERN_WARNING "Error %d reading CPU %d exhaust fan !\n", - rc, state->index); - /* XXX What do we do now ? */ - } else + /* XXX What do we do now ? Nothing for now, keep old value, but + * return error upstream + */ + DBG(" cpu %d, fan reading error !\n", state->index); + } else { state->rpm = rc; - DBG(" current rpm: %d\n", state->rpm); + DBG(" cpu %d, exhaust RPM: %d\n", state->index, state->rpm); + } /* Get some sensor readings and scale it */ - temp = read_smon_adc(state, 1); - if (temp == -1) { + ltemp = read_smon_adc(state, 1); + if (ltemp == -1) { + /* XXX What do we do now ? */ state->overtemp++; - return; + if (rc == 0) + rc = -EIO; + DBG(" cpu %d, temp reading error !\n", state->index); + } else { + /* Fixup temperature according to diode calibration + */ + DBG(" cpu %d, temp raw: %04x, m_diode: %04x, b_diode: %04x\n", + state->index, + ltemp, state->mpu.mdiode, state->mpu.bdiode); + *temp = ((s32)ltemp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; + state->last_temp = *temp; + DBG(" temp: %d.%03d\n", FIX32TOPRINT((*temp))); } - voltage = read_smon_adc(state, 3); - current_a = read_smon_adc(state, 4); - /* Fixup temperature according to diode calibration + /* + * Read voltage & current and calculate power */ - DBG(" temp raw: %04x, m_diode: %04x, b_diode: %04x\n", - temp, state->mpu.mdiode, state->mpu.bdiode); - temp = ((s32)temp * (s32)state->mpu.mdiode + ((s32)state->mpu.bdiode << 12)) >> 2; - state->last_temp = temp; - DBG(" temp: %d.%03d\n", FIX32TOPRINT(temp)); + volts = read_smon_adc(state, 3); + amps = read_smon_adc(state, 4); - /* Check tmax, increment overtemp if we are there. At tmax+8, we go - * full blown immediately and try to trigger a shutdown - */ - if (temp >= ((state->mpu.tmax + 8) << 16)) { - printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" - " (%d) !\n", - state->index, temp >> 16); - state->overtemp = CPU_MAX_OVERTEMP; - } else if (temp > (state->mpu.tmax << 16)) - state->overtemp++; - else - state->overtemp = 0; - if (state->overtemp >= CPU_MAX_OVERTEMP) - critical_state = 1; - if (state->overtemp > 0) { - state->rpm = state->mpu.rmaxn_exhaust_fan; - state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; - goto do_set_fans; - } - - /* Scale other sensor values according to fixed scales + /* Scale voltage and current raw sensor values according to fixed scales * obtained in Darwin and calculate power from I and V */ - state->voltage = voltage *= ADC_CPU_VOLTAGE_SCALE; - state->current_a = current_a *= ADC_CPU_CURRENT_SCALE; - power = (((u64)current_a) * ((u64)voltage)) >> 16; + volts *= ADC_CPU_VOLTAGE_SCALE; + amps *= ADC_CPU_CURRENT_SCALE; + *power = (((u64)volts) * ((u64)amps)) >> 16; + state->voltage = volts; + state->current_a = amps; + state->last_power = *power; + + DBG(" cpu %d, current: %d.%03d, voltage: %d.%03d, power: %d.%03d W\n", + state->index, FIX32TOPRINT(state->current_a), + FIX32TOPRINT(state->voltage), FIX32TOPRINT(*power)); + + return 0; +} + +static void do_cpu_pid(struct cpu_pid_state *state, s32 temp, s32 power) +{ + s32 power_target, integral, derivative, proportional, adj_in_target, sval; + s64 integ_p, deriv_p, prop_p, sum; + int i; /* Calculate power target value (could be done once for all) * and convert to a 16.16 fp number */ power_target = ((u32)(state->mpu.pmaxh - state->mpu.padjmax)) << 16; - - DBG(" current: %d.%03d, voltage: %d.%03d\n", - FIX32TOPRINT(current_a), FIX32TOPRINT(voltage)); - DBG(" power: %d.%03d W, target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power), + DBG(" power target: %d.%03d, error: %d.%03d\n", FIX32TOPRINT(power_target), FIX32TOPRINT(power_target - power)); /* Store temperature and power in history array */ @@ -626,7 +750,7 @@ * input target is mpu.ttarget, input max is mpu.tmax */ integ_p = ((s64)state->mpu.pid_gr) * (s64)integral; - DBG(" integ_p: %d\n", (int)(deriv_p >> 36)); + DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sval = (state->mpu.tmax << 16) - ((integ_p >> 20) & 0xffffffff); adj_in_target = (state->mpu.ttarget << 16); if (adj_in_target > sval) @@ -655,15 +779,136 @@ DBG(" sum: %d\n", (int)sum); state->rpm += (s32)sum; - if (state->rpm < state->mpu.rminn_exhaust_fan) + if (state->rpm < (int)state->mpu.rminn_exhaust_fan) state->rpm = state->mpu.rminn_exhaust_fan; - if (state->rpm > state->mpu.rmaxn_exhaust_fan) + if (state->rpm > (int)state->mpu.rmaxn_exhaust_fan) state->rpm = state->mpu.rmaxn_exhaust_fan; +} + +static void do_monitor_cpu_combined(void) +{ + struct cpu_pid_state *state0 = &cpu_state[0]; + struct cpu_pid_state *state1 = &cpu_state[1]; + s32 temp0, power0, temp1, power1; + s32 temp_combi, power_combi; + int rc, intake, pump; + + rc = do_read_one_cpu_values(state0, &temp0, &power0); + if (rc < 0) { + /* XXX What do we do now ? */ + } + state1->overtemp = 0; + rc = do_read_one_cpu_values(state1, &temp1, &power1); + if (rc < 0) { + /* XXX What do we do now ? */ + } + if (state1->overtemp) + state0->overtemp++; + + temp_combi = max(temp0, temp1); + power_combi = max(power0, power1); + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp_combi >= ((state0->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! Temperature way above maximum (%d) !\n", + temp_combi >> 16); + state0->overtemp = CPU_MAX_OVERTEMP; + } else if (temp_combi > (state0->mpu.tmax << 16)) + state0->overtemp++; + else + state0->overtemp = 0; + if (state0->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state0->overtemp > 0) { + state0->rpm = state0->mpu.rmaxn_exhaust_fan; + state0->intake_rpm = intake = state0->mpu.rmaxn_intake_fan; + pump = CPU_PUMP_OUTPUT_MAX; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state0, temp_combi, power_combi); + + /* Calculate intake fan speed */ + intake = (state0->rpm * CPU_INTAKE_SCALE) >> 16; + if (intake < (int)state0->mpu.rminn_intake_fan) + intake = state0->mpu.rminn_intake_fan; + if (intake > (int)state0->mpu.rmaxn_intake_fan) + intake = state0->mpu.rmaxn_intake_fan; + state0->intake_rpm = intake; + + /* Calculate pump speed */ + pump = (state0->rpm * CPU_PUMP_OUTPUT_MAX) / + state0->mpu.rmaxn_exhaust_fan; + if (pump > CPU_PUMP_OUTPUT_MAX) + pump = CPU_PUMP_OUTPUT_MAX; + if (pump < CPU_PUMP_OUTPUT_MIN) + pump = CPU_PUMP_OUTPUT_MIN; + + do_set_fans: + /* We copy values from state 0 to state 1 for /sysfs */ + state1->rpm = state0->rpm; + state1->intake_rpm = state0->intake_rpm; + + DBG("** CPU %d RPM: %d Ex, %d, Pump: %d, In, overtemp: %d\n", + state1->index, (int)state1->rpm, intake, pump, state1->overtemp); + + /* We should check for errors, shouldn't we ? But then, what + * do we do once the error occurs ? For FCU notified fan + * failures (-EFAULT) we probably want to notify userland + * some way... + */ + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state0->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state0->rpm); + + if (fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUA_PUMP_RPM_INDEX, pump); + if (fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) + set_rpm_fan(CPUB_PUMP_RPM_INDEX, pump); +} + +static void do_monitor_cpu_split(struct cpu_pid_state *state) +{ + s32 temp, power; + int rc, intake; + + /* Read current fan status */ + rc = do_read_one_cpu_values(state, &temp, &power); + if (rc < 0) { + /* XXX What do we do now ? */ + } + + /* Check tmax, increment overtemp if we are there. At tmax+8, we go + * full blown immediately and try to trigger a shutdown + */ + if (temp >= ((state->mpu.tmax + 8) << 16)) { + printk(KERN_WARNING "Warning ! CPU %d temperature way above maximum" + " (%d) !\n", + state->index, temp >> 16); + state->overtemp = CPU_MAX_OVERTEMP; + } else if (temp > (state->mpu.tmax << 16)) + state->overtemp++; + else + state->overtemp = 0; + if (state->overtemp >= CPU_MAX_OVERTEMP) + critical_state = 1; + if (state->overtemp > 0) { + state->rpm = state->mpu.rmaxn_exhaust_fan; + state->intake_rpm = intake = state->mpu.rmaxn_intake_fan; + goto do_set_fans; + } + + /* Do the PID */ + do_cpu_pid(state, temp, power); intake = (state->rpm * CPU_INTAKE_SCALE) >> 16; - if (intake < state->mpu.rminn_intake_fan) + if (intake < (int)state->mpu.rminn_intake_fan) intake = state->mpu.rminn_intake_fan; - if (intake > state->mpu.rmaxn_intake_fan) + if (intake > (int)state->mpu.rmaxn_intake_fan) intake = state->mpu.rmaxn_intake_fan; state->intake_rpm = intake; @@ -677,11 +922,11 @@ * some way... */ if (state->index == 0) { - set_rpm_fan(CPUA_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUA_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUA_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUA_EXHAUST_FAN_RPM_INDEX, state->rpm); } else { - set_rpm_fan(CPUB_INTAKE_FAN_RPM_ID, intake); - set_rpm_fan(CPUB_EXHAUST_FAN_RPM_ID, state->rpm); + set_rpm_fan(CPUB_INTAKE_FAN_RPM_INDEX, intake); + set_rpm_fan(CPUB_EXHAUST_FAN_RPM_INDEX, state->rpm); } } @@ -696,6 +941,7 @@ state->overtemp = 0; state->adc_config = 0x00; + if (index == 0) state->monitor = attach_i2c_chip(SUPPLY_MONITOR_ID, "CPU0_monitor"); else if (index == 1) @@ -778,7 +1024,7 @@ DBG("backside:\n"); /* Check fan status */ - rc = get_pwm_fan(BACKSIDE_FAN_PWM_ID); + rc = get_pwm_fan(BACKSIDE_FAN_PWM_INDEX); if (rc < 0) { printk(KERN_WARNING "Error %d reading backside fan !\n", rc); /* XXX What do we do now ? */ @@ -790,12 +1036,12 @@ temp = i2c_smbus_read_byte_data(state->monitor, MAX6690_EXT_TEMP) << 16; state->last_temp = temp; DBG(" temp: %d.%03d, target: %d.%03d\n", FIX32TOPRINT(temp), - FIX32TOPRINT(BACKSIDE_PID_INPUT_TARGET)); + FIX32TOPRINT(backside_params.input_target)); /* Store temperature and error in history array */ state->cur_sample = (state->cur_sample + 1) % BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; - state->error_history[state->cur_sample] = temp - BACKSIDE_PID_INPUT_TARGET; + state->error_history[state->cur_sample] = temp - backside_params.input_target; /* If first loop, fill the history table */ if (state->first) { @@ -804,7 +1050,7 @@ BACKSIDE_PID_HISTORY_SIZE; state->sample_history[state->cur_sample] = temp; state->error_history[state->cur_sample] = - temp - BACKSIDE_PID_INPUT_TARGET; + temp - backside_params.input_target; } state->first = 0; } @@ -816,7 +1062,7 @@ integral += state->error_history[i]; integral *= BACKSIDE_PID_INTERVAL; DBG(" integral: %08x\n", integral); - integ_p = ((s64)BACKSIDE_PID_G_r) * (s64)integral; + integ_p = ((s64)backside_params.G_r) * (s64)integral; DBG(" integ_p: %d\n", (int)(integ_p >> 36)); sum += integ_p; @@ -825,12 +1071,12 @@ state->error_history[(state->cur_sample + BACKSIDE_PID_HISTORY_SIZE - 1) % BACKSIDE_PID_HISTORY_SIZE]; derivative /= BACKSIDE_PID_INTERVAL; - deriv_p = ((s64)BACKSIDE_PID_G_d) * (s64)derivative; + deriv_p = ((s64)backside_params.G_d) * (s64)derivative; DBG(" deriv_p: %d\n", (int)(deriv_p >> 36)); sum += deriv_p; /* Calculate the proportional term */ - prop_p = ((s64)BACKSIDE_PID_G_p) * (s64)(state->error_history[state->cur_sample]); + prop_p = ((s64)backside_params.G_p) * (s64)(state->error_history[state->cur_sample]); DBG(" prop_p: %d\n", (int)(prop_p >> 36)); sum += prop_p; @@ -839,13 +1085,13 @@ DBG(" sum: %d\n", (int)sum); state->pwm += (s32)sum; - if (state->pwm < BACKSIDE_PID_OUTPUT_MIN) - state->pwm = BACKSIDE_PID_OUTPUT_MIN; - if (state->pwm > BACKSIDE_PID_OUTPUT_MAX) - state->pwm = BACKSIDE_PID_OUTPUT_MAX; + if (state->pwm < backside_params.output_min) + state->pwm = backside_params.output_min; + if (state->pwm > backside_params.output_max) + state->pwm = backside_params.output_max; DBG("** BACKSIDE PWM: %d\n", (int)state->pwm); - set_pwm_fan(BACKSIDE_FAN_PWM_ID, state->pwm); + set_pwm_fan(BACKSIDE_FAN_PWM_INDEX, state->pwm); } /* @@ -853,6 +1099,35 @@ */ static int init_backside_state(struct backside_pid_state *state) { + struct device_node *u3; + int u3h = 1; /* conservative by default */ + + /* + * There are different PID params for machines with U3 and machines + * with U3H, pick the right ones now + */ + u3 = of_find_node_by_path("/u3 at 0,f8000000"); + if (u3 != NULL) { + u32 *vers = (u32 *)get_property(u3, "device-rev", NULL); + if (vers) + if (((*vers) & 0x3f) < 0x34) + u3h = 0; + of_node_put(u3); + } + + backside_params.G_p = BACKSIDE_PID_G_p; + backside_params.G_r = BACKSIDE_PID_G_r; + backside_params.output_max = BACKSIDE_PID_OUTPUT_MAX; + if (u3h) { + backside_params.G_d = BACKSIDE_PID_U3H_G_d; + backside_params.input_target = BACKSIDE_PID_U3H_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3H_OUTPUT_MIN; + } else { + backside_params.G_d = BACKSIDE_PID_U3_G_d; + backside_params.input_target = BACKSIDE_PID_U3_INPUT_TARGET; + backside_params.output_min = BACKSIDE_PID_U3_OUTPUT_MIN; + } + state->ticks = 1; state->first = 1; state->pwm = 50; @@ -898,7 +1173,7 @@ DBG("drives:\n"); /* Check fan status */ - rc = get_rpm_fan(DRIVES_FAN_RPM_ID, !RPM_PID_USE_ACTUAL_SPEED); + rc = get_rpm_fan(DRIVES_FAN_RPM_INDEX, !RPM_PID_USE_ACTUAL_SPEED); if (rc < 0) { printk(KERN_WARNING "Error %d reading drives fan !\n", rc); /* XXX What do we do now ? */ @@ -965,7 +1240,7 @@ state->rpm = DRIVES_PID_OUTPUT_MAX; DBG("** DRIVES RPM: %d\n", (int)state->rpm); - set_rpm_fan(DRIVES_FAN_RPM_ID, state->rpm); + set_rpm_fan(DRIVES_FAN_RPM_INDEX, state->rpm); } /* @@ -1032,7 +1307,7 @@ } /* Set the PCI fan once for now */ - set_pwm_fan(SLOTS_FAN_PWM_ID, SLOTS_FAN_DEFAULT_PWM); + set_pwm_fan(SLOTS_FAN_PWM_INDEX, SLOTS_FAN_DEFAULT_PWM); /* Initialize ADCs */ initialize_adc(&cpu_state[0]); @@ -1047,9 +1322,13 @@ start = jiffies; down(&driver_lock); - do_monitor_cpu(&cpu_state[0]); - if (cpu_state[1].monitor != NULL) - do_monitor_cpu(&cpu_state[1]); + if (cpu_pid_type == CPU_PID_TYPE_COMBINED) + do_monitor_cpu_combined(); + else { + do_monitor_cpu_split(&cpu_state[0]); + if (cpu_state[1].monitor != NULL) + do_monitor_cpu_split(&cpu_state[1]); + } do_monitor_backside(&backside_state); do_monitor_drives(&drives_state); up(&driver_lock); @@ -1113,6 +1392,19 @@ DBG("counted %d CPUs in the device-tree\n", cpu_count); + /* Decide the type of PID algorithm to use based on the presence of + * the pumps, though that may not be the best way, that is good enough + * for now + */ + if (machine_is_compatible("PowerMac7,3") + && (cpu_count > 1) + && fcu_fans[CPUA_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID + && fcu_fans[CPUB_PUMP_RPM_INDEX].id != FCU_FAN_ABSENT_ID) { + printk(KERN_INFO "Liquid cooling pumps detected, using new algorithm !\n"); + cpu_pid_type = CPU_PID_TYPE_COMBINED; + } else + cpu_pid_type = CPU_PID_TYPE_SPLIT; + /* Create control loops for everything. If any fail, everything * fails */ @@ -1257,12 +1549,91 @@ return 0; } +static void fcu_lookup_fans(struct device_node *fcu_node) +{ + struct device_node *np = NULL; + int i; + + /* The table is filled by default with values that are suitable + * for the old machines without device-tree informations. We scan + * the device-tree and override those values with whatever is + * there + */ + + DBG("Looking up FCU controls in device-tree...\n"); + + while ((np = of_get_next_child(fcu_node, np)) != NULL) { + int type = -1; + char *loc; + u32 *reg; + + DBG(" control: %s, type: %s\n", np->name, np->type); + + /* Detect control type */ + if (!strcmp(np->type, "fan-rpm-control") || + !strcmp(np->type, "fan-rpm")) + type = FCU_FAN_RPM; + if (!strcmp(np->type, "fan-pwm-control") || + !strcmp(np->type, "fan-pwm")) + type = FCU_FAN_PWM; + /* Only care about fans for now */ + if (type == -1) + continue; + + /* Lookup for a matching location */ + loc = (char *)get_property(np, "location", NULL); + reg = (u32 *)get_property(np, "reg", NULL); + if (loc == NULL || reg == NULL) + continue; + DBG(" matching location: %s, reg: 0x%08x\n", loc, *reg); + + for (i = 0; i < FCU_FAN_COUNT; i++) { + int fan_id; + + if (strcmp(loc, fcu_fans[i].loc)) + continue; + DBG(" location match, index: %d\n", i); + fcu_fans[i].id = FCU_FAN_ABSENT_ID; + if (type != fcu_fans[i].type) { + printk(KERN_WARNING "therm_pm72: Fan type mismatch " + "in device-tree for %s\n", np->full_name); + break; + } + if (type == FCU_FAN_RPM) + fan_id = ((*reg) - 0x10) / 2; + else + fan_id = ((*reg) - 0x30) / 2; + if (fan_id > 7) { + printk(KERN_WARNING "therm_pm72: Can't parse " + "fan ID in device-tree for %s\n", np->full_name); + break; + } + DBG(" fan id -> %d, type -> %d\n", fan_id, type); + fcu_fans[i].id = fan_id; + } + } + + /* Now dump the array */ + printk(KERN_INFO "Detected fan controls:\n"); + for (i = 0; i < FCU_FAN_COUNT; i++) { + if (fcu_fans[i].id == FCU_FAN_ABSENT_ID) + continue; + printk(KERN_INFO " %d: %s fan, id %d, location: %s\n", i, + fcu_fans[i].type == FCU_FAN_RPM ? "RPM" : "PWM", + fcu_fans[i].id, fcu_fans[i].loc); + } +} + static int fcu_of_probe(struct of_device* dev, const struct of_match *match) { int rc; state = state_detached; + /* Lookup the fans in the device tree */ + fcu_lookup_fans(dev->node); + + /* Add the driver */ rc = i2c_add_driver(&therm_pm72_driver); if (rc < 0) return rc; @@ -1301,15 +1672,20 @@ { struct device_node *np; - if (!machine_is_compatible("PowerMac7,2")) + if (!machine_is_compatible("PowerMac7,2") && + !machine_is_compatible("PowerMac7,3")) return -ENODEV; printk(KERN_INFO "PowerMac G5 Thermal control driver %s\n", VERSION); np = of_find_node_by_type(NULL, "fcu"); if (np == NULL) { - printk(KERN_ERR "Can't find FCU in device-tree !\n"); - return -ENODEV; + /* Some machines have strangely broken device-tree */ + np = of_find_node_by_path("/u3 at 0,f8000000/i2c at f8001000/fan at 15e"); + if (np == NULL) { + printk(KERN_ERR "Can't find FCU in device-tree !\n"); + return -ENODEV; + } } of_dev = of_platform_device_create(np, "temperature"); if (of_dev == NULL) { diff -urN linux-2.5/drivers/macintosh/therm_pm72.h linux-pogo/drivers/macintosh/therm_pm72.h --- linux-2.5/drivers/macintosh/therm_pm72.h 2004-09-24 14:34:05.000000000 +1000 +++ linux-pogo/drivers/macintosh/therm_pm72.h 2004-10-16 18:29:29.000000000 +1000 @@ -119,18 +119,33 @@ #define ADC_CPU_CURRENT_SCALE 0x1f40 /* _AD4 */ /* - * PID factors for the U3/Backside fan control loop + * PID factors for the U3/Backside fan control loop. We have 2 sets + * of values here, one set for U3 and one set for U3H */ -#define BACKSIDE_FAN_PWM_ID 1 -#define BACKSIDE_PID_G_d 0x02800000 +#define BACKSIDE_FAN_PWM_DEFAULT_ID 1 +#define BACKSIDE_FAN_PWM_INDEX 0 +#define BACKSIDE_PID_U3_G_d 0x02800000 +#define BACKSIDE_PID_U3H_G_d 0x01400000 #define BACKSIDE_PID_G_p 0x00500000 #define BACKSIDE_PID_G_r 0x00000000 -#define BACKSIDE_PID_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3_INPUT_TARGET 0x00410000 +#define BACKSIDE_PID_U3H_INPUT_TARGET 0x004b0000 #define BACKSIDE_PID_INTERVAL 5 #define BACKSIDE_PID_OUTPUT_MAX 100 -#define BACKSIDE_PID_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3_OUTPUT_MIN 20 +#define BACKSIDE_PID_U3H_OUTPUT_MIN 30 #define BACKSIDE_PID_HISTORY_SIZE 2 +struct basckside_pid_params +{ + s32 G_d; + s32 G_p; + s32 G_r; + s32 input_target; + s32 output_min; + s32 output_max; +}; + struct backside_pid_state { int ticks; @@ -146,7 +161,8 @@ /* * PID factors for the Drive Bay fan control loop */ -#define DRIVES_FAN_RPM_ID 2 +#define DRIVES_FAN_RPM_DEFAULT_ID 2 +#define DRIVES_FAN_RPM_INDEX 1 #define DRIVES_PID_G_d 0x01e00000 #define DRIVES_PID_G_p 0x00500000 #define DRIVES_PID_G_r 0x00000000 @@ -168,7 +184,8 @@ int first; }; -#define SLOTS_FAN_PWM_ID 2 +#define SLOTS_FAN_PWM_DEFAULT_ID 2 +#define SLOTS_FAN_PWM_INDEX 2 #define SLOTS_FAN_DEFAULT_PWM 50 /* Do better here ! */ /* @@ -191,10 +208,15 @@ * CPU B FAKE POWER 49 (I_V_inputs: 18, 19) */ -#define CPUA_INTAKE_FAN_RPM_ID 3 -#define CPUA_EXHAUST_FAN_RPM_ID 4 -#define CPUB_INTAKE_FAN_RPM_ID 5 -#define CPUB_EXHAUST_FAN_RPM_ID 6 +#define CPUA_INTAKE_FAN_RPM_DEFAULT_ID 3 +#define CPUA_EXHAUST_FAN_RPM_DEFAULT_ID 4 +#define CPUB_INTAKE_FAN_RPM_DEFAULT_ID 5 +#define CPUB_EXHAUST_FAN_RPM_DEFAULT_ID 6 + +#define CPUA_INTAKE_FAN_RPM_INDEX 3 +#define CPUA_EXHAUST_FAN_RPM_INDEX 4 +#define CPUB_INTAKE_FAN_RPM_INDEX 5 +#define CPUB_EXHAUST_FAN_RPM_INDEX 6 #define CPU_INTAKE_SCALE 0x0000f852 #define CPU_TEMP_HISTORY_SIZE 2 @@ -202,6 +224,11 @@ #define CPU_PID_INTERVAL 1 #define CPU_MAX_OVERTEMP 30 +#define CPUA_PUMP_RPM_INDEX 7 +#define CPUB_PUMP_RPM_INDEX 8 +#define CPU_PUMP_OUTPUT_MAX 3700 +#define CPU_PUMP_OUTPUT_MIN 1000 + struct cpu_pid_state { int index; @@ -219,6 +246,7 @@ s32 voltage; s32 current_a; s32 last_temp; + s32 last_power; int first; u8 adc_config; }; From benh at kernel.crashing.org Sun Oct 17 11:12:44 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 17 Oct 2004 11:12:44 +1000 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <16753.49793.159513.618588@cargo.ozlabs.ibm.com> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <16753.49793.159513.618588@cargo.ozlabs.ibm.com> Message-ID: <1097975564.14005.66.camel@gaston> On Sun, 2004-10-17 at 10:53, Paul Mackerras wrote: > H. Peter Anvin writes: > > > It's the backside fan that oscillates; backside_fan_pwm varies between > > 30 and 100 in what is pretty much a squarewave. See attached graph (and > > note how the other fans vary with workload.) > > The sharp rises look like the code thinks it gets into an > over-temperature situation and turns the fans on full blast. It could > be worth putting some printks in the overtemp code. It was in practice a problem when i turned the min/max values into variables, I set them unsigned. That caused that code to crap out: state->pwm += (s32)sum; if (state->pwm < backside_params.output_min) state->pwm = backside_params.output_min; if (state->pwm > backside_params.output_max) state->pwm = backside_params.output_max; When "sum" was negative enough to cause state->pwm to drop below 0 Turning backside_params.* to signed fixed this issue (and possibly others as the other factors are also used as signed fixed values into the previous calculations). Ben. From schwab at suse.de Mon Oct 18 00:50:47 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 17 Oct 2004 16:50:47 +0200 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097974684.8965.59.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 17 Oct 2004 10:58:04 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <1097974684.8965.59.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > Is it all fans or just the backside fan getting crazy ? This later bug > is fixed by version #4 I'll post in a minute... I think it's all fans, but I'll test your patch just in case. Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From hpa at zytor.com Sat Oct 16 15:57:09 2004 From: hpa at zytor.com (H. Peter Anvin) Date: Fri, 15 Oct 2004 22:57:09 -0700 Subject: Fan control for PowerMac7_3 (#3) In-Reply-To: <1097905399.8963.26.camel@gaston> References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <4170A265.6030402@zytor.com> <1097902206.8965.2.camel@gaston> <4170AA5F.6060107@zytor.com> <1097902694.8965.8.camel@gaston> <4170B257.1010602@zytor.com> <1097905399.8963.26.camel@gaston> Message-ID: <4170B835.8050205@zytor.com> Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 15:32, H. Peter Anvin wrote: > > >>It's the backside fan that oscillates; backside_fan_pwm varies between >>30 and 100 in what is pretty much a squarewave. See attached graph (and >>note how the other fans vary with workload.) >> >>I probably need to write a "power virus" program for the G5 to really >>test out the high end (a power virus is a program which keeps the chip >>running as hard as it can; generally keep all pipelines stuffed.) > > > Strange... The values used seem to be identical to OS X (a 75? target > which is high actually, and a different G_d value than old U3). I would > need to see the debug output and compare with the OS X driver built with > debug output as well (don't ask me to fully understand the math of the > PID algorithm) > If you want the file I used to produce the graph, it has all the entries in /sys/devices/temperature snapshotted at 100 ms intervals (attached) in the following order: [time] backside_fan_pwm backside_temperature cpu0_current cpu0_exhaust_fan_rpm cpu0_intake_fan_rpm cpu0_temperature cpu0_voltage cpu1_current cpu1_exhaust_fan_rpm cpu1_intake_fan_rpm cpu1_temperature cpu1_voltage drives_fan_rpm drives_temperature I've also attached /var/log/dmesg in case that's useful. -hpa -------------- next part -------------- A non-text attachment was scrubbed... Name: dmesg.bz2 Type: application/x-bzip2 Size: 5502 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/7bbba4f9/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: temps.dat.bz2 Type: application/x-bzip2 Size: 25736 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041015/7bbba4f9/attachment-0001.bin From schwab at suse.de Mon Oct 18 01:37:22 2004 From: schwab at suse.de (Andreas Schwab) Date: Sun, 17 Oct 2004 17:37:22 +0200 Subject: Fan control for PowerMac7_3 (#4) In-Reply-To: <1097974861.8965.62.camel@gaston> (Benjamin Herrenschmidt's message of "Sun, 17 Oct 2004 11:01:03 +1000") References: <1097831790.1131.111.camel@gaston> <1097831981.1131.113.camel@gaston> <1097832049.1149.115.camel@gaston> <1097893432.6546.37.camel@gaston> <1097974861.8965.62.camel@gaston> Message-ID: Benjamin Herrenschmidt writes: > This version fixes a bug with the backside fan doing crazy things, > it appears to work properly on the dual 2.5Ghz now. Unless I get a > negative report, I intend to submit it to Linus in a couple of days. This works fine for me, too. Thanks! Andreas. -- Andreas Schwab, SuSE Labs, schwab at suse.de SuSE Linux AG, Maxfeldstra?e 5, 90409 N?rnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From olh at suse.de Mon Oct 18 04:55:57 2004 From: olh at suse.de (Olaf Hering) Date: Sun, 17 Oct 2004 20:55:57 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler Message-ID: <20041017185557.GA9619@suse.de> The zImage is a 32bit binary, but a native powerpc64-linux gcc will produce 64bit objects in arch/ppc64/boot. This patch fixes it. Signed-off-by: Olaf Hering diff -purN linux-2.6.9-final/arch/ppc64/boot/Makefile linux-2.6.9-final.native/arch/ppc64/boot/Makefile --- linux-2.6.9-final/arch/ppc64/boot/Makefile 2004-10-16 03:03:50.000000000 +0000 +++ linux-2.6.9-final.native/arch/ppc64/boot/Makefile 2004-10-17 18:44:33.229249956 +0000 @@ -23,14 +23,14 @@ CROSS32_COMPILE ?= #CROSS32_COMPILE = /usr/local/ppc/bin/powerpc-linux- -BOOTCC := $(CROSS32_COMPILE)gcc +BOOTCC := $(CROSS32_COMPILE)gcc -m32 HOSTCC := gcc BOOTCFLAGS := $(HOSTCFLAGS) $(LINUXINCLUDE) -fno-builtin -BOOTAS := $(CROSS32_COMPILE)as +BOOTAS := $(CROSS32_COMPILE)as -a32 BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -BOOTLD := $(CROSS32_COMPILE)ld +BOOTLD := $(CROSS32_COMPILE)ld -m elf32ppc BOOTLFLAGS := -Ttext 0x00400000 -e _start -T $(srctree)/$(src)/zImage.lds -BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy +BOOTOBJCOPY := $(CROSS32_COMPILE)objcopy --target elf32-powerpc OBJCOPYFLAGS := contents,alloc,load,readonly,data src-boot := crt0.S string.S prom.c main.c zlib.c imagesize.c div64.S diff -purN linux-2.6.9-final/arch/ppc64/boot/zImage.lds linux-2.6.9-final.native/arch/ppc64/boot/zImage.lds --- linux-2.6.9-final/arch/ppc64/boot/zImage.lds 2004-10-16 03:01:55.000000000 +0000 +++ linux-2.6.9-final.native/arch/ppc64/boot/zImage.lds 2004-10-17 18:48:14.824288338 +0000 @@ -1,4 +1,4 @@ -OUTPUT_ARCH(powerpc) +OUTPUT_ARCH(powerpc:common) SEARCH_DIR(/lib); SEARCH_DIR(/usr/lib); SEARCH_DIR(/usr/local/lib); SEARCH_DIR(/usr/local/powerpc-any-elf/lib); /* Do we need any of these for elf? __DYNAMIC = 0; */ -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Mon Oct 18 07:46:26 2004 From: paulus at samba.org (Paul Mackerras) Date: Mon, 18 Oct 2004 07:46:26 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041017185557.GA9619@suse.de> References: <20041017185557.GA9619@suse.de> Message-ID: <16754.59442.992185.715900@cargo.ozlabs.ibm.com> Olaf Hering writes: > The zImage is a 32bit binary, but a native powerpc64-linux gcc will > produce 64bit objects in arch/ppc64/boot. > This patch fixes it. ... and breaks the compile on older toolchains that don't understand -m32. We need to make the -m32 conditional on HAS_BIARCH as defined in arch/ppc64/Makefile. Paul. From olh at suse.de Mon Oct 18 14:56:03 2004 From: olh at suse.de (Olaf Hering) Date: Mon, 18 Oct 2004 06:56:03 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <16754.59442.992185.715900@cargo.ozlabs.ibm.com> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> Message-ID: <20041018045603.GA8500@suse.de> On Mon, Oct 18, Paul Mackerras wrote: > Olaf Hering writes: > > > The zImage is a 32bit binary, but a native powerpc64-linux gcc will > > produce 64bit objects in arch/ppc64/boot. > > This patch fixes it. > > ... and breaks the compile on older toolchains that don't understand > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > in arch/ppc64/Makefile. how old? -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Mon Oct 18 15:55:52 2004 From: paulus at samba.org (Paul Mackerras) Date: Mon, 18 Oct 2004 15:55:52 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041018045603.GA8500@suse.de> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> Message-ID: <16755.23272.754150.209624@cargo.ozlabs.ibm.com> Olaf Hering writes: > > ... and breaks the compile on older toolchains that don't understand > > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > > in arch/ppc64/Makefile. > > how old? The gcc that comes with debian sid doesn't understand -m32. That's a 32-bit gcc, which means that I set CROSS_COMPILE when doing a ppc64 kernel compile. With your patch I have to set CROSS32_COMPILE as well, which seems silly when I'm compiling on a ppc32 box already. Ben H suggested making the default BOOTCC be $(CC) -m32, which makes sense to me. Paul. From olh at suse.de Mon Oct 18 17:54:33 2004 From: olh at suse.de (Olaf Hering) Date: Mon, 18 Oct 2004 09:54:33 +0200 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <16755.23272.754150.209624@cargo.ozlabs.ibm.com> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> <16755.23272.754150.209624@cargo.ozlabs.ibm.com> Message-ID: <20041018075433.GA24927@suse.de> On Mon, Oct 18, Paul Mackerras wrote: > Olaf Hering writes: > > > > ... and breaks the compile on older toolchains that don't understand > > > -m32. We need to make the -m32 conditional on HAS_BIARCH as defined > > > in arch/ppc64/Makefile. > > > > how old? > > The gcc that comes with debian sid doesn't understand -m32. That's a > 32-bit gcc, which means that I set CROSS_COMPILE when doing a ppc64 > kernel compile. With your patch I have to set CROSS32_COMPILE as > well, which seems silly when I'm compiling on a ppc32 box already. Makes sense, I confused a native powerpc64-linux gcc from last century with a native/cross powerpc-linux gcc from last century. > Ben H suggested making the default BOOTCC be $(CC) -m32, which makes > sense to me. That may break cross compile. I will provide a new patch. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From ananth at in.ibm.com Mon Oct 18 19:52:29 2004 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 18 Oct 2004 15:22:29 +0530 Subject: [PATCH] Kprobes for ppc64 Message-ID: <20041018095229.GA7394@in.ibm.com> Hi, Here is kprobes for ppc64. The patch applies on 2.6.9-rc4/2.6.9-final and provides the kprobes + jprobes functionality. My earlier post did not reach the mailing lists, hence this resend. Kprobes (Kernel dynamic probes) is a lightweight mechanism for kernel modules to insert probes into a running kernel, without the need to modify the underlying source. The probe handlers can then be coded to log relevent data at the probe point. More information on kprobes can be found at: http://www-124.ibm.com/developerworks/oss/linux/projects/kprobes/ Jprobes (or jumper probes) is a small infrastructure to access function arguments. It can be used by defining a small stub with the same template as the routine in kernel, within which the required parameters can be logged. The following pseudocode illustrates the usage of a jprobe, where the skbuff at tcp_v4_rcv() needs to be decoded: ............ struct jprobe jp; jtcp_v4_rcv(struct skbuff *skb) { /* decode and log skb related details as required */ jprobe_return(); return 0; } init_module { jp.kp.addr = (kprobe_opcode_t *); jp.entry = JPROBE_ENTRY(jtcp_v4_rcv); register_jprobe(&jp); return 0; } cleanup_module { unregister_jprobe(&jp); } ............ NOTE: 1. The current implementation uses xmon's emulate_step() and hence requires xmon to be compiled in. 2. arch_prepare_kprobe() now returns an int. I have made the necessary changes to i386 and sparc64 kprobes files, but is untested. Thanks, Ananth diff -Naurp temp/linux-2.6.9-rc4/arch/i386/kernel/kprobes.c linux-2.6.9-rc4/arch/i386/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/i386/kernel/kprobes.c 2004-10-11 08:27:50.000000000 +0530 +++ linux-2.6.9-rc4/arch/i386/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -58,9 +58,10 @@ static inline int is_IF_modifier(kprobe_ return 0; } -void arch_prepare_kprobe(struct kprobe *p) +int arch_prepare_kprobe(struct kprobe *p) { memcpy(p->insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + return 0; } static inline void disarm_kprobe(struct kprobe *p, struct pt_regs *regs) diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/Kconfig.debug linux-2.6.9-rc4/arch/ppc64/Kconfig.debug --- temp/linux-2.6.9-rc4/arch/ppc64/Kconfig.debug 2004-10-11 08:28:49.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/Kconfig.debug 2004-10-11 15:30:41.000000000 +0530 @@ -6,6 +6,16 @@ config DEBUG_STACKOVERFLOW bool "Check for stack overflows" depends on DEBUG_KERNEL +config KPROBES + bool "Kprobes" + depends on DEBUG_KERNEL + help + Kprobes allows you to trap at almost any kernel address and + execute a callback function. register_kprobe() establishes + a probepoint and specifies the callback. Kprobes is useful + for kernel debugging, non-intrusive instrumentation and testing. + If in doubt, say "N". + config DEBUG_STACK_USAGE bool "Stack utilization instrumentation" depends on DEBUG_KERNEL diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -0,0 +1,260 @@ +/* + * Kernel Probes (KProbes) + * arch/ppc64/kernel/kprobes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-Oct Created by Vamsi Krishna S Kernel + * Probes initial implementation ( includes contributions from + * Rusty Russell). + * 2004-July Suparna Bhattacharya added jumper probes + * interface to access function arguments. + * 2004-Oct Ananth N Mavinakayanahalli kprobes port + * for PPC64 + */ + +#include +#include +#include +#include +#include +#include + +/* kprobe_status settings */ +#define KPROBE_HIT_ACTIVE 0x00000001 +#define KPROBE_HIT_SS 0x00000002 + +static struct kprobe *current_kprobe; +static unsigned long kprobe_status, kprobe_saved_msr; +static struct pt_regs jprobe_saved_regs; + +/* we re-use xmon's emulate_step here */ +extern int emulate_step(struct pt_regs *regs, unsigned int instr); + +int arch_prepare_kprobe(struct kprobe *p) +{ + memcpy(p->insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + if (IS_MTMSRD(p->insn[0]) || IS_RFID(p->insn[0])) + /* cannot put bp on RFID/MTMSRD */ + return 1; + return 0; +} + +static inline void disarm_kprobe(struct kprobe *p, struct pt_regs *regs) +{ + *p->addr = p->opcode; + regs->nip = (unsigned long)p->addr; +} + +static inline void prepare_singlestep(struct kprobe *p, struct pt_regs *regs) +{ + regs->msr |= MSR_SE; + regs->nip = (unsigned long)&p->insn; +} + +/* + * Interrupts are disabled on entry as trap3 is an interrupt gate and they + * remain disabled thorough out this function. + */ +static inline int kprobe_handler(struct pt_regs *regs) +{ + struct kprobe *p; + int ret = 0; + unsigned int *addr = (unsigned int *)regs->nip; + + /* We're in an interrupt, but this is clear and BUG()-safe. */ + preempt_disable(); + + /* Check we're not actually recursing */ + if (kprobe_running()) { + /* We *are* holding lock here, so this is safe. + Disarm the probe we just hit, and ignore it. */ + p = get_kprobe(addr); + if (p) { + disarm_kprobe(p, regs); + ret = 1; + } else { + p = current_kprobe; + if (p->break_handler && p->break_handler(p, regs)) { + goto ss_probe; + } + } + /* If it's not ours, can't be delete race, (we hold lock). */ + goto no_kprobe; + } + + lock_kprobes(); + p = get_kprobe(addr); + if (!p) { + unlock_kprobes(); + if (*addr != BREAKPOINT_INSTRUCTION) { + /* + * The breakpoint instruction was removed right + * after we hit it. Another cpu has removed + * either a probepoint or a debugger breakpoint + * at this address. In either case, no further + * handling of this interrupt is appropriate. + */ + ret = 1; + } + /* Not one of ours: let kernel handle it */ + goto no_kprobe; + } + + kprobe_status = KPROBE_HIT_ACTIVE; + current_kprobe = p; + kprobe_saved_msr = regs->msr; + if (p->pre_handler(p, regs)) { + /* handler has already set things up, so skip ss setup */ + return 1; + } + +ss_probe: + prepare_singlestep(p, regs); + kprobe_status = KPROBE_HIT_SS; + return 1; + +no_kprobe: + preempt_enable_no_resched(); + return ret; +} + +/* + * Called after single-stepping. p->addr is the address of the + * instruction whose first byte has been replaced by the "breakpoint" + * instruction. To avoid the SMP problems that can occur when we + * temporarily put back the original opcode to single-step, we + * single-stepped a copy of the instruction. The address of this + * copy is p->insn. + */ +static void resume_execution(struct kprobe *p, struct pt_regs *regs) +{ + int ret; + + regs->nip = (unsigned long)p->addr; + ret = emulate_step(regs, p->insn[0]); + if (ret == 0) + regs->nip = (unsigned long)p->addr + 4; + + regs->msr &= ~MSR_SE; +} + +static inline int post_kprobe_handler(struct pt_regs *regs) +{ + if (!kprobe_running()) + return 0; + + if (current_kprobe->post_handler) + current_kprobe->post_handler(current_kprobe, regs, 0); + + resume_execution(current_kprobe, regs); + regs->msr |= kprobe_saved_msr; + + unlock_kprobes(); + preempt_enable_no_resched(); + + /* + * if somebody else is singlestepping across a probe point, msr + * will have SE set, in which case, continue the remaining processing + * of do_debug, as if this is not a probe hit. + */ + if (regs->msr & MSR_SE) + return 0; + + return 1; +} + +/* Interrupts disabled, kprobe_lock held. */ +static inline int kprobe_fault_handler(struct pt_regs *regs, int trapnr) +{ + if (current_kprobe->fault_handler + && current_kprobe->fault_handler(current_kprobe, regs, trapnr)) + return 1; + + if (kprobe_status & KPROBE_HIT_SS) { + resume_execution(current_kprobe, regs); + regs->msr |= kprobe_saved_msr; + + unlock_kprobes(); + preempt_enable_no_resched(); + } + return 0; +} + +/* + * Wrapper routine to for handling exceptions. + */ +int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, + void *data) +{ + struct die_args *args = (struct die_args *)data; + switch (val) { + case DIE_IABR_MATCH: + case DIE_DABR_MATCH: + case DIE_BPT: + if (kprobe_handler(args->regs)) + return NOTIFY_STOP; + break; + case DIE_SSTEP: + if (post_kprobe_handler(args->regs)) + return NOTIFY_STOP; + break; + case DIE_GPF: + case DIE_PAGE_FAULT: + if (kprobe_running() && + kprobe_fault_handler(args->regs, args->trapnr)) + return NOTIFY_STOP; + break; + default: + break; + } + return NOTIFY_DONE; +} + +int setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) +{ + struct jprobe *jp = container_of(p, struct jprobe, kp); + + memcpy(&jprobe_saved_regs, regs, sizeof(struct pt_regs)); + + /* setup return addr to the jprobe handler routine */ + regs->nip = (unsigned long)(((func_descr_t *)jp->entry)->entry); + regs->gpr[2] = (unsigned long)(((func_descr_t *)jp->entry)->toc); + + return 1; +} + +void jprobe_return(void) +{ + preempt_enable_no_resched(); + asm volatile("trap" ::: "memory"); +} + +void jprobe_return_end(void) +{ +}; + +int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs) +{ + /* + * FIXME - we should ideally be validating that we got here 'cos + * of the "trap" in jprobe_return() above, before restoring the + * saved regs... + */ + memcpy(regs, &jprobe_saved_regs, sizeof(struct pt_regs)); + return 1; +} diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/Makefile linux-2.6.9-rc4/arch/ppc64/kernel/Makefile --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/Makefile 2004-10-11 08:28:50.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/Makefile 2004-10-11 15:30:41.000000000 +0530 @@ -56,5 +56,6 @@ obj-$(CONFIG_PPC_PMAC) += pmac_smp.o sm endif obj-$(CONFIG_ALTIVEC) += vecemu.o vector.o +obj-$(CONFIG_KPROBES) += kprobes.o CFLAGS_ioctl32.o += -Ifs/ diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/kernel/traps.c linux-2.6.9-rc4/arch/ppc64/kernel/traps.c --- temp/linux-2.6.9-rc4/arch/ppc64/kernel/traps.c 2004-10-11 08:27:59.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/kernel/traps.c 2004-10-11 15:30:41.000000000 +0530 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -61,6 +62,20 @@ EXPORT_SYMBOL(__debugger_dabr_match); EXPORT_SYMBOL(__debugger_fault_handler); #endif +struct notifier_block *ppc64_die_chain; +static spinlock_t die_notifier_lock = SPIN_LOCK_UNLOCKED; + +int register_die_notifier(struct notifier_block *nb) +{ + int err = 0; + unsigned long flags; + + spin_lock_irqsave(&die_notifier_lock, flags); + err = notifier_chain_register(&ppc64_die_chain, nb); + spin_unlock_irqrestore(&die_notifier_lock, flags); + return err; +} + /* * Trap & Exception support */ @@ -287,6 +302,9 @@ UnknownException(struct pt_regs *regs) void InstructionBreakpointException(struct pt_regs *regs) { + if (notify_die(DIE_BPT, "iabr_match", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_iabr_match(regs)) return; _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); @@ -297,6 +315,9 @@ SingleStepException(struct pt_regs *regs { regs->msr &= ~MSR_SE; /* Turn off 'trace' bit */ + if (notify_die(DIE_SSTEP, "single_step", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_sstep(regs)) return; @@ -470,6 +491,9 @@ ProgramCheckException(struct pt_regs *re } else if (regs->msr & 0x20000) { /* trap exception */ + if (notify_die(DIE_BPT, "breakpoint", regs, 5, + 5, SIGTRAP) == NOTIFY_STOP) + return; if (debugger_bpt(regs)) return; diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/mm/fault.c linux-2.6.9-rc4/arch/ppc64/mm/fault.c --- temp/linux-2.6.9-rc4/arch/ppc64/mm/fault.c 2004-10-11 08:28:24.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/mm/fault.c 2004-10-11 15:30:41.000000000 +0530 @@ -36,6 +36,7 @@ #include #include #include +#include /* * Check whether the instruction at regs->nip is a store using @@ -96,6 +97,9 @@ int do_page_fault(struct pt_regs *regs, BUG_ON((trap == 0x380) || (trap == 0x480)); if (trap == 0x300) { + if (notify_die(DIE_PAGE_FAULT, "page_fault", regs, error_code, + 11, SIGSEGV) == NOTIFY_STOP) + return 0; if (debugger_fault_handler(regs)) return 0; } @@ -105,6 +109,9 @@ int do_page_fault(struct pt_regs *regs, return SIGSEGV; if (error_code & 0x00400000) { + if (notify_die(DIE_BPT, "dabr_match", regs, error_code, + 11, SIGSEGV) == NOTIFY_STOP) + return 0; if (debugger_dabr_match(regs)) return 0; } diff -Naurp temp/linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c --- temp/linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c 2004-10-11 08:28:48.000000000 +0530 +++ linux-2.6.9-rc4/arch/ppc64/xmon/xmon.c 2004-10-11 15:30:41.000000000 +0530 @@ -132,7 +132,7 @@ static void csum(void); static void bootcmds(void); void dump_segments(void); static void symbol_lookup(void); -static int emulate_step(struct pt_regs *regs, unsigned int instr); +int emulate_step(struct pt_regs *regs, unsigned int instr); static void xmon_print_symbol(unsigned long address, const char *mid, const char *after); static const char *getvecname(unsigned long vec); @@ -781,7 +781,7 @@ static int branch_taken(unsigned int ins * or -1 if the instruction is one that should not be stepped, * such as an rfid, or a mtmsrd that would clear MSR_RI. */ -static int emulate_step(struct pt_regs *regs, unsigned int instr) +int emulate_step(struct pt_regs *regs, unsigned int instr) { unsigned int opcode, rd; unsigned long int imm; diff -Naurp temp/linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c --- temp/linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c 2004-10-11 08:28:49.000000000 +0530 +++ linux-2.6.9-rc4/arch/sparc64/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -38,10 +38,11 @@ * - Mark that we are no longer actively in a kprobe. */ -void arch_prepare_kprobe(struct kprobe *p) +int arch_prepare_kprobe(struct kprobe *p) { p->insn[0] = *p->addr; p->insn[1] = BREAKPOINT_INSTRUCTION_2; + return 0; } /* kprobe_status settings */ diff -Naurp temp/linux-2.6.9-rc4/include/asm-i386/kprobes.h linux-2.6.9-rc4/include/asm-i386/kprobes.h --- temp/linux-2.6.9-rc4/include/asm-i386/kprobes.h 2004-10-11 08:28:07.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-i386/kprobes.h 2004-10-11 19:28:07.000000000 +0530 @@ -38,6 +38,8 @@ typedef u8 kprobe_opcode_t; ? (MAX_STACK_SIZE) \ : (((unsigned long)current_thread_info()) + THREAD_SIZE - (ADDR))) +#define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)pentry + /* trap3/1 are intr gates for kprobes. So, restore the status of IF, * if necessary, before executing the original int3/1 (trap) handler. */ diff -Naurp temp/linux-2.6.9-rc4/include/asm-ppc64/kdebug.h linux-2.6.9-rc4/include/asm-ppc64/kdebug.h --- temp/linux-2.6.9-rc4/include/asm-ppc64/kdebug.h 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-ppc64/kdebug.h 2004-10-11 15:30:41.000000000 +0530 @@ -0,0 +1,43 @@ +#ifndef _PPC64_KDEBUG_H +#define _PPC64_KDEBUG_H 1 + +/* nearly identical to x86_64/i386 code */ + +#include + +struct pt_regs; + +struct die_args { + struct pt_regs *regs; + const char *str; + long err; + int trapnr; + int signr; +}; + +/* + Note - you should never unregister because that can race with NMIs. + If you really want to do it first unregister - then synchronize_kernel - + then free. + */ +int register_die_notifier(struct notifier_block *nb); +extern struct notifier_block *ppc64_die_chain; + +/* Grossly misnamed. */ +enum die_val { + DIE_OOPS = 1, + DIE_IABR_MATCH, + DIE_DABR_MATCH, + DIE_BPT, + DIE_SSTEP, + DIE_GPF, + DIE_PAGE_FAULT, +}; + +static inline int notify_die(enum die_val val,char *str,struct pt_regs *regs,long err,int trap, int sig) +{ + struct die_args args = { .regs=regs, .str=str, .err=err, .trapnr=trap,.signr=sig }; + return notifier_call_chain(&ppc64_die_chain, val, &args); +} + +#endif diff -Naurp temp/linux-2.6.9-rc4/include/asm-ppc64/kprobes.h linux-2.6.9-rc4/include/asm-ppc64/kprobes.h --- temp/linux-2.6.9-rc4/include/asm-ppc64/kprobes.h 1970-01-01 05:30:00.000000000 +0530 +++ linux-2.6.9-rc4/include/asm-ppc64/kprobes.h 2004-10-12 22:57:04.000000000 +0530 @@ -0,0 +1,53 @@ +#ifndef _ASM_KPROBES_H +#define _ASM_KPROBES_H +/* + * Kernel Probes (KProbes) + * include/asm-ppc64/kprobes.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-Oct Created by Vamsi Krishna S Kernel + * Probes initial implementation ( includes suggestions from + * Rusty Russell). + * 2004-Oct Modified for PPC64 by Ananth N Mavinakayanahalli + * + */ +#include +#include + +struct pt_regs; + +typedef unsigned int kprobe_opcode_t; +#define BREAKPOINT_INSTRUCTION 0x7fe00008 /* trap */ +#define MAX_INSN_SIZE 1 + +#define IS_MTMSRD(instr) (((instr) & 0xfc0007fe) == 0x7c000164) +#define IS_RFID(instr) (((instr) & 0xfc0007fe) == 0x4c000024) + +#define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)((func_descr_t *)pentry) + +#ifdef CONFIG_KPROBES +extern int kprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data); +#else /* !CONFIG_KPROBES */ +static inline int kprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data) +{ + return 0; +} +#endif +#endif /* _ASM_KPROBES_H */ diff -Naurp temp/linux-2.6.9-rc4/include/linux/kprobes.h linux-2.6.9-rc4/include/linux/kprobes.h --- temp/linux-2.6.9-rc4/include/linux/kprobes.h 2004-10-11 08:27:16.000000000 +0530 +++ linux-2.6.9-rc4/include/linux/kprobes.h 2004-10-11 15:30:41.000000000 +0530 @@ -94,7 +94,7 @@ static inline int kprobe_running(void) return kprobe_cpu == smp_processor_id(); } -extern void arch_prepare_kprobe(struct kprobe *p); +extern int arch_prepare_kprobe(struct kprobe *p); extern void show_registers(struct pt_regs *regs); /* Get the kprobe at this addr (if any). Must have called lock_kprobes */ diff -Naurp temp/linux-2.6.9-rc4/kernel/kprobes.c linux-2.6.9-rc4/kernel/kprobes.c --- temp/linux-2.6.9-rc4/kernel/kprobes.c 2004-10-11 08:29:12.000000000 +0530 +++ linux-2.6.9-rc4/kernel/kprobes.c 2004-10-11 15:30:41.000000000 +0530 @@ -27,6 +27,8 @@ * interface to access function arguments. * 2004-Sep Prasanna S Panchamukhi Changed Kprobes * exceptions notifier to be first on the priority list. + * 2004-Oct Ananth N Mavinakayanahalli + * arch_prepare_kprobe now returns an int. */ #include #include @@ -87,12 +89,17 @@ int register_kprobe(struct kprobe *p) hlist_add_head(&p->hlist, &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]); - arch_prepare_kprobe(p); + ret = arch_prepare_kprobe(p); + if (ret) { + unregister_kprobe(p); + ret = -EINVAL; + goto out; + } p->opcode = *p->addr; *p->addr = BREAKPOINT_INSTRUCTION; flush_icache_range((unsigned long) p->addr, (unsigned long) p->addr + sizeof(kprobe_opcode_t)); - out: +out: spin_unlock_irqrestore(&kprobe_lock, flags); return ret; } From segher at kernel.crashing.org Mon Oct 18 19:55:26 2004 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 18 Oct 2004 11:55:26 +0200 Subject: [vHype-discussion] u64 in linux In-Reply-To: <200410152059.03647.arnd@arndb.de> References: <1097849471.25095.97.camel@brick.watson.ibm.com> <16751.62061.393716.650492@kitch0.watson.ibm.com> <200410152059.03647.arnd@arndb.de> Message-ID: > C99 also mandates that the macro PRIu64 contains the correct > format string for uint64_t (which afaik is always the same as u64). > It's currently not defined in linux, but could perhaps be added. Works fine for me: #include char x[] = PRIx64; char u[] = PRIu64; resulting in .globl u .section ".data" .align 3 .type u, @object .size u, 3 u: .string "lu" .globl x .align 3 .type x, @object .size x, 3 x: .string "lx" (this is on a PPC64 system, GCC 3.4.1). Segher From benh at kernel.crashing.org Mon Oct 18 20:58:27 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Mon, 18 Oct 2004 20:58:27 +1000 Subject: [PATCH] allow kernel compile with native ppc64 compiler In-Reply-To: <20041018075433.GA24927@suse.de> References: <20041017185557.GA9619@suse.de> <16754.59442.992185.715900@cargo.ozlabs.ibm.com> <20041018045603.GA8500@suse.de> <16755.23272.754150.209624@cargo.ozlabs.ibm.com> <20041018075433.GA24927@suse.de> Message-ID: <1098097106.30570.6.camel@gaston> On Mon, 2004-10-18 at 17:54, Olaf Hering wrote: > > > Ben H suggested making the default BOOTCC be $(CC) -m32, which makes > > sense to me. How so ? The idea is to add -m32 to whatever compiler you are using for the rest of the kernel (assuming bi-arch) which is a lot more sane than using whatever _local_ compiler you are using _and_ assuming bi-arch... Of course, that would only be the "defaul", with the ability of explicitly passing CROSS32_COMPILE to make... Ben. From nathanl at austin.ibm.com Tue Oct 19 00:46:44 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Mon, 18 Oct 2004 09:46:44 -0500 Subject: [RFC] maxcpus boot option leads to dropped interrupts Message-ID: <1098110804.3165.63.camel@biclops> Hi- Our test group has discovered that booting a 2.6 kernel on a SMP pSeries LPAR with maxcpus=1 will either hang or take a very long time to boot, with lots of dropped interrupt messages or scsi timeouts, e.g. Probing IDE interface ide2... hde: IBM DROM00205, ATAPI CD/DVD-ROM drive Using cfq io scheduler ide2 at 0xfe400-0xfe407,0xfdc02 on irq 166 Probing IDE interface ide3... Probing IDE interface ide3... hde: ATAPI 24X DVD-ROM drive, 256kB Cache Uniform CD-ROM driver Revision: 3.20 ide-cd: cmd 0x25 timed out hde: lost interrupt hde: lost interrupt The problem goes away if CONFIG_IRQ_ALL_CPUS is not set. I am about 85% sure that this is due to the OF "start-cpu" method placing the primary threads of secondary cpus in the global interrupt queue (see the comment in arch/ppc64/kernel/smp.c::start_secondary). With the maxcpus parameter, we never "boot" those cpus; they simply sit in their spin loops waiting to be kicked. However, from the platform's point of view they are fair game to service device interrupts. The RTAS "start-cpu" method apparently does not behave the same way -- I can boot a single CPU (with SMT) Power5 LPAR with maxcpus=1 and interrupts are not lost, even though the secondary thread on the boot cpu has been started by RTAS. So this problem is limited to systems which have more than one cpu device node. I've worked around the problem by modifying the xics code to use the default interrupt server (the boot cpu) if cpu_online_map != cpu_present_map. However that's a nasty hack which will keep interrupts from being distributed in the smt-enabled=off case. I'm not sure whether this happens on non-xics machines. I'm looking for ideas on how to handle this. Some options that occur to me are: o Not booting secondary cpus from the OF client code (but the PPC-OF binding document says we can't do this). I believe I've tried this before, and RTAS was unable to start the secondary cpus later. So this is probably not the way to go. o In smp_cpus_done(), "shoot down" any cpus which have not been kicked out of their spin loops. I've got a very rough version of this working. However, this method assumes that the RTAS "stop-cpu" interface is available, which is a given on LPAR, but I'm not sure it's a safe bet on other systems. o Directing interrupts to the boot cpu instead of using the GIQ when the maxcpus option is detected. This might be the easiest alternative; however this could have a performance impact. Any other ideas? Keep in mind that I would like to get the code to a state which will allow us to hotplug-online cpus which were not started at boot. Nathan From olof at austin.ibm.com Tue Oct 19 05:40:27 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Mon, 18 Oct 2004 14:40:27 -0500 Subject: [PATCH] [PPC64] Fix CPU numa init code thinkos Message-ID: <20041018194027.GA11753@4> There seems to have been a couple of thinkos in the NUMA init code, in particular in find_cpu_node(): * Property size returned is in bytes, not words * Off-by-one error in loop iteration Signed-off-by: Nathan Lynch Signed-off-by: Olof Johansson --- linux-2.5-olof/arch/ppc64/mm/numa.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletion(-) diff -puN arch/ppc64/mm/numa.c~find-cpu-node arch/ppc64/mm/numa.c --- linux-2.5/arch/ppc64/mm/numa.c~find-cpu-node 2004-10-18 14:21:55.603312384 -0500 +++ linux-2.5-olof/arch/ppc64/mm/numa.c 2004-10-18 14:22:19.271552232 -0500 @@ -75,9 +75,11 @@ static struct device_node * __init find_ interrupt_server = (unsigned int *)get_property(cpu_node, "ibm,ppc-interrupt-server#s", &len); + len = len / sizeof(u32); + if (interrupt_server && (len > 0)) { while (len--) { - if (interrupt_server[len-1] == hw_cpuid) + if (interrupt_server[len] == hw_cpuid) return cpu_node; } } else { _ From benh at kernel.crashing.org Tue Oct 19 09:28:42 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 09:28:42 +1000 Subject: ppc64 breakage. In-Reply-To: <20041018222658.GA31577@redhat.com> References: <20041018222658.GA31577@redhat.com> Message-ID: <1098142122.18687.35.camel@gaston> On Tue, 2004-10-19 at 08:26, Dave Jones wrote: > hey guys, > > During a build for an iseries kernel, it blew up with .. > > arch/ppc64/kernel/built-in.o(.text+0x1cd5c): In function `ioport_map': > arch/ppc64/kernel/iomap.c:84: undefined reference to `._IO_IS_VALID' > make: *** [.tmp_vmlinux1] Error 1 > > Ideas ? > > The '.' looks odd. Toolchain bug ? No, it's my fault and I hate iSeries ! You can't do anything in this arch without breaking it :( Why do these systematically pop up just when linus released the new kernel ? Grrrrrr Anton/Paul, do iSeries has any kind of PIO on PCI at all ? Should I do #define _IO_IS_VALID(port) (0) or #define _IO_IS_VALID(port) (1) For iSeries ? Ben. From benh at kernel.crashing.org Tue Oct 19 09:33:56 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 09:33:56 +1000 Subject: ppc64 build failure. In-Reply-To: <20041018230529.GB31577@redhat.com> References: <20041018230529.GB31577@redhat.com> Message-ID: <1098142435.18679.38.camel@gaston> On Tue, 2004-10-19 at 09:05, Dave Jones wrote: > Ignore previous mail, this should fix it. > > .../... What about this one instead ? io_page_mask is set to 0 by default, so iSeries would automatically get _IO_IS_VALID(*) == 0 if it doesn't initialize it... ===== include/asm-ppc64/eeh.h 1.20 vs edited ===== --- 1.20/include/asm-ppc64/eeh.h 2004-10-06 16:05:23 +10:00 +++ edited/include/asm-ppc64/eeh.h 2004-10-19 09:31:54 +10:00 @@ -256,10 +256,6 @@ #undef EEH_CHECK_ALIGN -#define MAX_ISA_PORT 0x10000 -extern unsigned long io_page_mask; -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) & io_page_mask) - static inline u8 eeh_inb(unsigned long port) { u8 val; if (!_IO_IS_VALID(port)) ===== include/asm-ppc64/io.h 1.22 vs edited ===== --- 1.22/include/asm-ppc64/io.h 2004-09-21 19:14:10 +10:00 +++ edited/include/asm-ppc64/io.h 2004-10-19 09:32:20 +10:00 @@ -33,6 +33,12 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; +extern unsigned long io_page_mask; + +#define MAX_ISA_PORT 0x10000 + +#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ + & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ From benh at kernel.crashing.org Tue Oct 19 11:03:29 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 11:03:29 +1000 Subject: ppc64 build failure. In-Reply-To: <1098142435.18679.38.camel@gaston> References: <20041018230529.GB31577@redhat.com> <1098142435.18679.38.camel@gaston> Message-ID: <1098147809.11402.0.camel@gaston> On Tue, 2004-10-19 at 09:33, Benjamin Herrenschmidt wrote: > On Tue, 2004-10-19 at 09:05, Dave Jones wrote: > > Ignore previous mail, this should fix it. > > > > .../... > > What about this one instead ? io_page_mask is set to 0 by default, > so iSeries would automatically get _IO_IS_VALID(*) == 0 if it doesn't > initialize it... OK, since nobody seem to really know what IO cycles are on iSeries, let's allow them rather than mask them, thus falling back to the former behaviour... ===== include/asm-ppc64/eeh.h 1.20 vs edited ===== --- 1.20/include/asm-ppc64/eeh.h 2004-10-06 16:05:23 +10:00 +++ edited/include/asm-ppc64/eeh.h 2004-10-19 09:31:54 +10:00 @@ -256,10 +256,6 @@ #undef EEH_CHECK_ALIGN -#define MAX_ISA_PORT 0x10000 -extern unsigned long io_page_mask; -#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) & io_page_mask) - static inline u8 eeh_inb(unsigned long port) { u8 val; if (!_IO_IS_VALID(port)) ===== include/asm-ppc64/io.h 1.22 vs edited ===== --- 1.22/include/asm-ppc64/io.h 2004-09-21 19:14:10 +10:00 +++ edited/include/asm-ppc64/io.h 2004-10-19 09:32:20 +10:00 @@ -33,6 +33,12 @@ extern unsigned long isa_io_base; extern unsigned long pci_io_base; +extern unsigned long io_page_mask; + +#define MAX_ISA_PORT 0x10000 + +#define _IO_IS_VALID(port) ((port) >= MAX_ISA_PORT || (1 << (port>>PAGE_SHIFT)) \ + & io_page_mask) #ifdef CONFIG_PPC_ISERIES /* __raw_* accessors aren't supported on iSeries */ ===== arch/ppc64/kernel/iSeries_pci.c 1.24 vs edited ===== --- 1.24/arch/ppc64/kernel/iSeries_pci.c 2004-09-11 15:50:12 +10:00 +++ edited/arch/ppc64/kernel/iSeries_pci.c 2004-10-19 11:02:20 +10:00 @@ -55,6 +55,7 @@ extern unsigned long iSeries_Base_Io_Memory; extern struct iommu_table *tceTables[256]; +extern unsigned long io_page_mask; extern void iSeries_MmIoTest(void); @@ -196,6 +197,7 @@ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Entry.\n"); iSeries_IoMmTable_Initialize(); find_and_init_phbs(); + io_page_mask = -1; /* pci_assign_all_busses = 0; SFRXXX*/ PPCDBG(PPCDBG_BUSWALK, "iSeries_pcibios_init Exit.\n"); } From benh at kernel.crashing.org Tue Oct 19 18:28:20 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 19 Oct 2004 18:28:20 +1000 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <200410190714.i9J7Elnx027734@hera.kernel.org> References: <200410190714.i9J7Elnx027734@hera.kernel.org> Message-ID: <1098174500.11449.65.camel@gaston> Hi ! That patch will unfortunately break a load of ppc64 boxes. If you look closely at the ppc64 code, you'll notice we don't use the irq_desc array directly but go through a get_irq_desc() accessor. This is because our interrupt numbers can be very large and scattered, and thus we have a remapping tree. I still like the idea of the patch, so it would be useful if you added the possibility for us to just change that behaviour, that is replace all occursences of irq_descs + i with get_irq_desc() and provide a generic one that just does that, with a #ifndef so that the architecture can provide it's own. If you agree with the principle, though, I suppose I can do it and send a proposed patch tomorrow. Ben. From hch at infradead.org Tue Oct 19 18:41:32 2004 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 19 Oct 2004 09:41:32 +0100 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <1098174500.11449.65.camel@gaston> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> Message-ID: <20041019084131.GA7100@infradead.org> On Tue, Oct 19, 2004 at 06:28:20PM +1000, Benjamin Herrenschmidt wrote: > Hi ! > > That patch will unfortunately break a load of ppc64 boxes. > > If you look closely at the ppc64 code, you'll notice we don't > use the irq_desc array directly but go through a get_irq_desc() > accessor. This is because our interrupt numbers can be very > large and scattered, and thus we have a remapping tree. > > I still like the idea of the patch, so it would be useful if > you added the possibility for us to just change that behaviour, > that is replace all occursences of irq_descs + i with get_irq_desc() > and provide a generic one that just does that, with a #ifndef so > that the architecture can provide it's own. > > If you agree with the principle, though, I suppose I can do it > and send a proposed patch tomorrow. The PPC64 changes were actually my fault. I think get_irq_desc() is okay. From mingo at elte.hu Tue Oct 19 19:15:57 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 19 Oct 2004 11:15:57 +0200 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <1098174500.11449.65.camel@gaston> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> Message-ID: <20041019091557.GA17473@elte.hu> * Benjamin Herrenschmidt wrote: > I still like the idea of the patch, so it would be useful if you added > the possibility for us to just change that behaviour, that is replace > all occursences of irq_descs + i with get_irq_desc() and provide a > generic one that just does that, with a #ifndef so that the > architecture can provide it's own. sure, we could do that. But since there are other architectures with large irq-vector spaces too, you might want to try to move it into the generic IRQ code and just provide a way to switch between 1:1 mapped and sparse-mapped variants. (of course this still means all of the direct indexing in kernel/irq/*.c would have to change.) Ingo From sfr at canb.auug.org.au Wed Oct 20 03:05:30 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 03:05:30 +1000 Subject: test - please ignore Message-ID: <20041020030530.582725f7.sfr@canb.auug.org.au> Just a test after updating the archives. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/4e41f74f/attachment.pgp From sfr at canb.auug.org.au Wed Oct 20 03:16:59 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 03:16:59 +1000 Subject: old mailing list archives Message-ID: <20041020031659.220bdfeb.sfr@canb.auug.org.au> Hi all, Thanks to Wolfgang Denk I have recovered (some of) the old list archives. Please see http://ozlabs.org/pipermail/linuxppc64-dev/ I don't know how complete the archive is ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: 00000000.mimetmp Type: application/pgp-signature Size: 190 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b832635f/attachment.pgp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b832635f/attachment-0001.pgp From jschopp at austin.ibm.com Wed Oct 20 02:52:20 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Tue, 19 Oct 2004 11:52:20 -0500 Subject: status of ppc64 patches Message-ID: <41754644.1010003@austin.ibm.com> 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. Thus, I expect the floodgates into mainline to open soon. I would hope that my patches would be sent on by the architecture maintainers at that time. I am concerned that we may be falling behind on reviewing patches in general and if we don't catch up several very deserving patches may miss this next window of opportunity. The backlog of "New" patches is over a month long now. http://ozlabs.org/ppc64-patches/ Either this page is out of date or we have a very serious bottleneck problem. I'm hoping it is the former, but guessing it is the latter. I think we should consider bringing another architecture maintainer on board to help spread out the load of reviewing and approving architecture patches. Somebody like Olof. Barring that I would like to volunteer some of my own cycles to review some of the current backlog, prioritize them, make sure they still compile/boot, and rebase them. From olof at austin.ibm.com Wed Oct 20 06:42:14 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 19 Oct 2004 15:42:14 -0500 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <41757C26.2030909@austin.ibm.com> Joel Schopp wrote: > 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. 2.6.9 was released yesterday. :) -Olof From sonny at burdell.org Wed Oct 20 09:00:54 2004 From: sonny at burdell.org (Sonny Rao) Date: Tue, 19 Oct 2004 19:00:54 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <1097887510.6487.23.camel@gaston> References: <1097887510.6487.23.camel@gaston> Message-ID: <20041019230054.GA3807@kevlar.burdell.org> On Sat, Oct 16, 2004 at 10:45:10AM +1000, Benjamin Herrenschmidt wrote: > On Sat, 2004-10-16 at 07:00, Santhosh Rao wrote: > > Ok, it appears we aren't dropping into the open firmware debugger > > randomly, the kernel seems to give up early in the boot process > > Below is the output of an attempted boot of 2.6.9-rc4. > > > > Jose, ever seen anything like this? > > > > The machine is a p615 power-4 2-CPU box with 2GB of RAM. > > Can you enable PROM_DEBUG in arch/ppc64/kernel/prom_init.c and send me the > output log ? > > Ben. > Ben, I'm still seeing this issue with 2.6.9 final, do you need anything else? I'm sure you're very busy, but please let me know if I can help. Sonny Rao From benh at kernel.crashing.org Wed Oct 20 09:38:52 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 20 Oct 2004 09:38:52 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <20041019230054.GA3807@kevlar.burdell.org> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> Message-ID: <1098229131.5792.9.camel@gaston> On Wed, 2004-10-20 at 09:00, Sonny Rao wrote: > Ben, I'm still seeing this issue with 2.6.9 final, do you need > anything else? I'm sure you're very busy, but please let me know if I > can help. Well, I can't reproduce here, but it seem basically that one of the calls to alloc_down() is failing, you may want to trace a bit. I'll try to find by myself too & let you know. Ben. From nathanl at austin.ibm.com Wed Oct 20 10:22:29 2004 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Tue, 19 Oct 2004 19:22:29 -0500 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <1098231748.7493.114.camel@pants.austin.ibm.com> On Tue, 2004-10-19 at 11:52, Joel Schopp wrote: > I am concerned that we may be falling behind on reviewing patches in > general and if we don't catch up several very deserving patches may miss > this next window of opportunity. The backlog of "New" patches is over a > month long now. http://ozlabs.org/ppc64-patches/ > Either this page is out of date or we have a very serious bottleneck > problem. I'm hoping it is the former, but guessing it is the latter. It looks to me like the backlog is a bit smaller than a first glance at the page would suggest. It is somewhat out of date in that several of the patches that are marked "new" have already been picked up by Linus or akpm. I think quite a few of the items in the list do not correspond to patches that are intended for submission upstream (e.g. there are several revisions of "Fan control for PowerMac7_3"). > I think we should consider bringing another architecture maintainer on > board to help spread out the load of reviewing and approving > architecture patches. Somebody like Olof. Barring that I would like to The fact that a web page is slightly out of date and some minor non-bugfix patches were not forwarded upstream during the late 2.6.9-rc series fails to convince me that such a change is needed. If you feel a patch has been overlooked, it's usually just a matter of gently nudging one of the maintainers via email or IRC; it Works For Me (tm) ;) Nathan From olof at austin.ibm.com Wed Oct 20 11:03:01 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 19 Oct 2004 20:03:01 -0500 Subject: status of ppc64 patches In-Reply-To: <1098231748.7493.114.camel@pants.austin.ibm.com> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> Message-ID: <20041020010301.GA29579@4> On Tue, Oct 19, 2004 at 07:22:29PM -0500, Nathan Lynch wrote: > > I think we should consider bringing another architecture maintainer on > > board to help spread out the load of reviewing and approving > > architecture patches. Somebody like Olof. Barring that I would like to > > The fact that a web page is slightly out of date and some minor > non-bugfix patches were not forwarded upstream during the late 2.6.9-rc > series fails to convince me that such a change is needed. Agreed. The page is there for the maintainers to track their work, not for us to track them. :-) I hope that each person tracks their own work and follows up as needed. And even if, in the future, current maintainers need help looking at patches, there's no need to promote someone (myself or others) to a "full" maintainer just to pitch in and help out. Anyone has the opportunity to look at a patch and ask questions about it or say that they agree or disagree with it. This happens every day on LKML and other lists, there's no reason we should work differently on our architecture list. Also: Regarding re-basing patches: It has to be the duty of the developer of the patch to re-base it to current trees if it will no longer apply cleanly. I wouldn't expect Anton or Paul to forward-port my patches, just as little as I would expect Andrew Morton or Linus to do so. -Olof From benh at kernel.crashing.org Wed Oct 20 11:24:59 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 20 Oct 2004 11:24:59 +1000 Subject: [PATCH] generic irq subsystem: ppc64 port In-Reply-To: <20041019091557.GA17473@elte.hu> References: <200410190714.i9J7Elnx027734@hera.kernel.org> <1098174500.11449.65.camel@gaston> <20041019091557.GA17473@elte.hu> Message-ID: <1098235499.22943.16.camel@gaston> On Tue, 2004-10-19 at 19:15, Ingo Molnar wrote: > * Benjamin Herrenschmidt wrote: > > > I still like the idea of the patch, so it would be useful if you added > > the possibility for us to just change that behaviour, that is replace > > all occursences of irq_descs + i with get_irq_desc() and provide a > > generic one that just does that, with a #ifndef so that the > > architecture can provide it's own. > > sure, we could do that. But since there are other architectures with > large irq-vector spaces too, you might want to try to move it into the > generic IRQ code and just provide a way to switch between 1:1 mapped and > sparse-mapped variants. False alert ! In fact, Paulus rewrote that stuff a while ago and I totally forgot about it. We no longer do that, our get_irq_desc() is nowadays just doing (&irq_desc[(irq)]). We map the large physical interrupt numbers to "virtual" numbers that are the only thing the generic code sees, so it's fine. Ben. From sfr at canb.auug.org.au Wed Oct 20 15:47:30 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Oct 2004 15:47:30 +1000 Subject: [PATCH] PPC64 iSeries compile broken in 2.6.9-bk3 Message-ID: <20041020154730.39ea3509.sfr@canb.auug.org.au> Hi Andrew, One of the iSeries specific files used HZ without including linux/param.h and previously got away with it. Signed-off-by: Stephen Rothwell Please apply and send to Linus. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruN 2.6.9-bk3/arch/ppc64/kernel/iSeries_proc.c 2.6.9-bk3-sfr.1/arch/ppc64/kernel/iSeries_proc.c --- 2.6.9-bk3/arch/ppc64/kernel/iSeries_proc.c 2004-08-19 17:01:59.000000000 +1000 +++ 2.6.9-bk3-sfr.1/arch/ppc64/kernel/iSeries_proc.c 2004-10-20 15:21:23.000000000 +1000 @@ -20,6 +20,7 @@ #include #include #include +#include /* for HZ */ #include #include #include -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/28f59265/attachment.pgp From greg.quinn at anu.edu.au Wed Oct 20 16:09:55 2004 From: greg.quinn at anu.edu.au (Greg Quinn) Date: Wed, 20 Oct 2004 16:09:55 +1000 Subject: 64 bit compilation and linking - help Message-ID: <41760133.6020801@anu.edu.au> Sorry to intrude on your mailing list. Here at bios.org (Canbia) we've just acquired two new p615 machines courtesy of a generous IBM donation, and we want to put them to work ASAP. We've installed a Suse 9 Enterprise Server distribution. I'm trying to compile a C application in 64-bit mode, but can't get the compilation to succeed. For example ... cc -o m m.c prodices a 32 bit executable, ie pointers are 4 bytes. But ... cc -o m -m64 m.c dies with a bunch of messages like > /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/../../../../powerpc-suse-linux/bin/ld: > skipping incompatible > /usr/lib/gcc-lib/powerpc-suse-linux/3.3.3/../../../libc.so when > searching for -lc We seem to have the 64 bit libraries installed (in /lib64 and /usr/lib64), I just need a clue on how to compile and link with them. It's probably something very simple, so I'd appreciate 10 seconds of somebody's time. -- Greg Quinn CAMBIA http://www.cambiaip.org (02) 62464523 From olh at suse.de Wed Oct 20 16:30:46 2004 From: olh at suse.de (Olaf Hering) Date: Wed, 20 Oct 2004 08:30:46 +0200 Subject: 64 bit compilation and linking - help In-Reply-To: <41760133.6020801@anu.edu.au> References: <41760133.6020801@anu.edu.au> Message-ID: <20041020063046.GA28504@suse.de> On Wed, Oct 20, Greg Quinn wrote: > We seem to have the 64 bit libraries installed (in /lib64 and > /usr/lib64), I just need a clue on how to compile and link with them. > It's probably something very simple, so I'd appreciate 10 seconds of > somebody's time. you have not enough installed, look at 'rpm -qa | grep 64bit'. To install more rpms, use yast and search for package names wich contain '64bit'. I think you just need the glibc-devel-64bit for a simple hello_world.c. -- USB is for mice, FireWire is for men! sUse lINUX ag, n?RNBERG From paulus at samba.org Wed Oct 20 22:10:36 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 20 Oct 2004 22:10:36 +1000 Subject: status of ppc64 patches In-Reply-To: <41754644.1010003@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> Message-ID: <16758.21948.795730.268143@cargo.ozlabs.ibm.com> Joel Schopp writes: > 2.6.9 is now in rc4. Linus claims that the final 2.6.9 is very close. > Thus, I expect the floodgates into mainline to open soon. I would hope > that my patches would be sent on by the architecture maintainers at that > time. And 2.6.9 is now out, and the floodgates are open, and patches are flowing again. As far as your patches are concerned, I am aware of two patches that change things so that we have __boot variants of __pa etc. However, your explanation didn't really get me excited about the change. You said something about "moving towards hotplug memory" but you didn't explain why these changes would help with that, or how I should choose which function to use when I'm making changes in future (that should actually go in a file somewhere under the Documentation directory), or why those changes need to go in now. > I think we should consider bringing another architecture maintainer on > board to help spread out the load of reviewing and approving > architecture patches. Somebody like Olof. Barring that I would like to > volunteer some of my own cycles to review some of the current backlog, > prioritize them, make sure they still compile/boot, and rebase them. Help with reviewing, compile/boot testing and rebasing patches is always welcome. :) Rebasing is really the responsibility of the original submitter though, since they generally know what has been changed and why better than anyone. Paul. From tiwari.amit at gmail.com Wed Oct 20 22:08:28 2004 From: tiwari.amit at gmail.com (Amit K Tiwari) Date: Wed, 20 Oct 2004 17:38:28 +0530 Subject: Max RAM Supported Message-ID: Hi, I have just installed YDL 4.0. The OS does not show all 6GB DRAM I have in my Power Mac G5. It shows only 1.97GB (I ran top to see how much physical memory I have). Looking at the net, http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Bligh-OLS2003.pdf says that kernel 2.5 should support approx 32GB of memory. Do I need to re-build the kernel to enable the support for all of available memory? If yes, with what options? 'High Memory Support' is already enabled in the kernel config. Amit From paulus at samba.org Wed Oct 20 22:21:03 2004 From: paulus at samba.org (Paul Mackerras) Date: Wed, 20 Oct 2004 22:21:03 +1000 Subject: Max RAM Supported In-Reply-To: References: Message-ID: <16758.22575.18560.155884@cargo.ozlabs.ibm.com> Amit K Tiwari writes: > I have just installed YDL 4.0. The OS does not show all 6GB DRAM I > have in my Power Mac G5. It shows only 1.97GB (I ran top to see how > much physical memory I have). Looking at the net, Is that a 32-bit kernel or a 64-bit kernel? (If uname -m prints ppc, it's a 32-bit kernel; if it prints ppc64, it's a 64-bit kernel.) The 32-bit kernel only supports 2GB of RAM, because it can only use physical addresses below 4GB, and the space from 2GB - 4GB in the physical address space is used for I/O and ROM. The 64-bit kernel can address all of the physical address space. > Do I need to re-build the kernel to enable the support for all of > available memory? If yes, with what options? 'High Memory Support' is > already enabled in the kernel config. You need to build a 64-bit kernel (i.e. ARCH=ppc64) rather than a 32-bit kernel (ARCH=ppc). Paul. From dhowells at redhat.com Thu Oct 21 00:44:15 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 20 Oct 2004 15:44:15 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs Message-ID: <3506.1098283455@redhat.com> Hi Linus, Andrew, The attached patch adds syscalls for almost all archs (everything barring m68knommu which is in a real mess, and i386 which already has it). It also adds 32->64 compatibility where appropriate. David Signed-Off-By: David Howells --- warthog>diffstat keys-269bk4.diff arch/alpha/kernel/systbls.S | 3 +++ arch/arm/kernel/calls.S | 3 +++ arch/cris/arch-v10/kernel/entry.S | 3 +++ arch/h8300/kernel/syscalls.S | 3 +++ arch/ia64/ia32/ia32_entry.S | 4 ++++ arch/ia64/ia32/sys_ia32.c | 20 ++++++++++++++++++++ arch/ia64/kernel/entry.S | 6 +++--- arch/ia64/kernel/fsys.S | 6 +++--- arch/m32r/kernel/entry.S | 3 +++ arch/m68k/kernel/entry.S | 3 +++ arch/mips/kernel/scall32-o32.S | 3 +++ arch/mips/kernel/scall64-64.S | 3 +++ arch/mips/kernel/scall64-n32.S | 3 +++ arch/mips/kernel/scall64-o32.S | 3 +++ arch/parisc/kernel/syscall_table.S | 4 +++- arch/ppc/kernel/misc.S | 3 +++ arch/ppc64/kernel/misc.S | 6 ++++++ arch/ppc64/kernel/sys_ppc32.c | 33 +++++++++++++++++++++++++++++++++ arch/s390/kernel/compat_wrapper.S | 26 ++++++++++++++++++++++++++ arch/s390/kernel/syscalls.S | 3 +++ arch/sh/kernel/entry.S | 4 ++++ arch/sh64/kernel/syscalls.S | 4 +++- arch/sparc/kernel/systbls.S | 2 +- arch/sparc64/kernel/sys32.S | 3 +++ arch/sparc64/kernel/systbls.S | 4 ++-- arch/um/kernel/sys_call_table.c | 3 +++ arch/v850/kernel/entry.S | 3 +++ arch/x86_64/ia32/ia32entry.S | 4 ++++ include/asm-alpha/unistd.h | 5 ++++- include/asm-arm/unistd.h | 3 +++ include/asm-arm26/unistd.h | 3 +++ include/asm-cris/unistd.h | 5 ++++- include/asm-h8300/unistd.h | 5 ++++- include/asm-ia64/unistd.h | 3 +++ include/asm-m32r/unistd.h | 5 ++++- include/asm-m68k/unistd.h | 5 ++++- include/asm-mips/unistd.h | 17 +++++++++++++---- include/asm-parisc/unistd.h | 5 ++++- include/asm-ppc/unistd.h | 5 ++++- include/asm-ppc64/unistd.h | 5 ++++- include/asm-s390/unistd.h | 5 ++++- include/asm-sh/unistd.h | 5 ++++- include/asm-sh64/unistd.h | 5 ++++- include/asm-sparc/unistd.h | 3 +++ include/asm-sparc64/unistd.h | 3 +++ include/asm-v850/unistd.h | 3 +++ include/asm-x86_64/unistd.h | 8 +++++++- 47 files changed, 239 insertions(+), 27 deletions(-) diff -uNrp linux-2.6.9-bk4/arch/alpha/kernel/systbls.S linux-2.6.9-bk4-keys/arch/alpha/kernel/systbls.S --- linux-2.6.9-bk4/arch/alpha/kernel/systbls.S 2004-10-19 10:41:41.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/alpha/kernel/systbls.S 2004-10-20 14:47:43.275151615 +0100 @@ -458,6 +458,9 @@ sys_call_table: .quad sys_mq_notify .quad sys_mq_getsetattr .quad sys_waitid + .quad sys_add_key + .quad sys_request_key + .quad sys_keyctl .size sys_call_table, . - sys_call_table .type sys_call_table, @object diff -uNrp linux-2.6.9-bk4/arch/arm/kernel/calls.S linux-2.6.9-bk4-keys/arch/arm/kernel/calls.S --- linux-2.6.9-bk4/arch/arm/kernel/calls.S 2004-10-19 10:41:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/arm/kernel/calls.S 2004-10-20 14:57:39.641915157 +0100 @@ -295,6 +295,9 @@ __syscall_start: .long sys_mq_notify .long sys_mq_getsetattr /* 280 */ .long sys_waitid + .long sys_add_key + .long sys_request_key + .long sys_keyctl __syscall_end: .rept NR_syscalls - (__syscall_end - __syscall_start) / 4 diff -uNrp linux-2.6.9-bk4/arch/cris/arch-v10/kernel/entry.S linux-2.6.9-bk4-keys/arch/cris/arch-v10/kernel/entry.S --- linux-2.6.9-bk4/arch/cris/arch-v10/kernel/entry.S 2004-06-18 13:43:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/cris/arch-v10/kernel/entry.S 2004-10-20 14:44:52.215209105 +0100 @@ -1079,6 +1079,9 @@ sys_call_table: .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr + .long sys_add_key + .long sys_request_key /* 285 */ + .long sys_keyctl /* * NOTE!! This doesn't have to be exact - we just have diff -uNrp linux-2.6.9-bk4/arch/h8300/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/h8300/kernel/syscalls.S --- linux-2.6.9-bk4/arch/h8300/kernel/syscalls.S 2004-06-18 13:43:42.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/h8300/kernel/syscalls.S 2004-10-20 15:00:36.035535939 +0100 @@ -289,6 +289,9 @@ SYMBOL_NAME_LABEL(sys_call_table) .long SYMBOL_NAME(sys_utimes) .long SYMBOL_NAME(sys_fadvise64_64) .long SYMBOL_NAME(sys_ni_syscall) /* sys_vserver */ + .long SYMBOL_NAME(sys_add_key) + .long SYMBOL_NAME(sys_request_key) /* 275 */ + .long SYMBOL_NAME(sys_keyctl) .rept NR_syscalls-(.-SYMBOL_NAME(sys_call_table))/4 .long SYMBOL_NAME(sys_ni_syscall) diff -uNrp linux-2.6.9-bk4/arch/ia64/ia32/ia32_entry.S linux-2.6.9-bk4-keys/arch/ia64/ia32/ia32_entry.S --- linux-2.6.9-bk4/arch/ia64/ia32/ia32_entry.S 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/ia32/ia32_entry.S 2004-10-20 15:25:01.365546264 +0100 @@ -495,6 +495,10 @@ ia32_syscall_table: data8 compat_sys_mq_getsetattr data8 sys_ni_syscall /* reserved for kexec */ data8 sys32_waitid + data8 sys_ni_syscall /* reserved for setaltroot */ + data8 sys32_add_key + data8 sys32_request_key + data8 sys_keyctl // guard against failures to increase IA32_NR_syscalls .org ia32_syscall_table + 8*IA32_NR_syscalls diff -uNrp linux-2.6.9-bk4/arch/ia64/ia32/sys_ia32.c linux-2.6.9-bk4-keys/arch/ia64/ia32/sys_ia32.c --- linux-2.6.9-bk4/arch/ia64/ia32/sys_ia32.c 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/ia32/sys_ia32.c 2004-10-20 15:28:48.663376741 +0100 @@ -2687,6 +2687,26 @@ asmlinkage long sys32_waitid(int which, return copy_siginfo_to_user32(uinfo, &info); } + +asmlinkage long sys32_add_key(const char __user *_type, + const char __user *_description, + const void __user *_payload, + __u32 plen, + __u32 ringid) +{ + sys_add_key(_type, _description, _payload, (size_t) plen, + (key_serial_t) ringid); +} + +asmlinkage long sys32_request_key(const char __user *_type, + const char __user *_description, + const char __user *_callout_info, + __u32 destringid) +{ + sys_request_key(_type, _description, _callout_info, + (key_serial_t) destringid); +} + #ifdef NOTYET /* UNTESTED FOR IA64 FROM HERE DOWN */ asmlinkage long sys32_setreuid(compat_uid_t ruid, compat_uid_t euid) diff -uNrp linux-2.6.9-bk4/arch/ia64/kernel/entry.S linux-2.6.9-bk4-keys/arch/ia64/kernel/entry.S --- linux-2.6.9-bk4/arch/ia64/kernel/entry.S 2004-10-20 14:02:54.138626787 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/kernel/entry.S 2004-10-20 14:45:48.309267588 +0100 @@ -1528,9 +1528,9 @@ sys_call_table: data8 sys_ni_syscall // reserved for kexec_load data8 sys_ni_syscall data8 sys_setaltroot // 1270 - data8 sys_ni_syscall - data8 sys_ni_syscall - data8 sys_ni_syscall + data8 sys_add_key + data8 sys_request_key + data8 sys_keyctl data8 sys_ni_syscall data8 sys_ni_syscall // 1275 data8 sys_ni_syscall diff -uNrp linux-2.6.9-bk4/arch/ia64/kernel/fsys.S linux-2.6.9-bk4-keys/arch/ia64/kernel/fsys.S --- linux-2.6.9-bk4/arch/ia64/kernel/fsys.S 2004-10-19 10:41:43.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ia64/kernel/fsys.S 2004-10-20 14:46:27.814789684 +0100 @@ -868,9 +868,9 @@ fsyscall_table: data8 0 // kexec_load data8 0 data8 0 // 1270 - data8 0 - data8 0 - data8 0 + data8 0 // add_key + data8 0 // request_key + data8 0 // keyctl data8 0 data8 0 // 1275 data8 0 diff -uNrp linux-2.6.9-bk4/arch/m32r/kernel/entry.S linux-2.6.9-bk4-keys/arch/m32r/kernel/entry.S --- linux-2.6.9-bk4/arch/m32r/kernel/entry.S 2004-10-19 10:41:44.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/m32r/kernel/entry.S 2004-10-20 15:09:17.798751465 +0100 @@ -994,6 +994,9 @@ ENTRY(sys_call_table) .long sys_mq_getsetattr .long sys_ni_syscall /* reserved for kexec */ .long sys_waitid + .long sys_add_key /* 285 */ + .long sys_request_key + .long sys_keyctl syscall_table_size=(.-sys_call_table) diff -uNrp linux-2.6.9-bk4/arch/m68k/kernel/entry.S linux-2.6.9-bk4-keys/arch/m68k/kernel/entry.S --- linux-2.6.9-bk4/arch/m68k/kernel/entry.S 2004-06-18 13:43:44.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/m68k/kernel/entry.S 2004-10-20 14:45:20.678701183 +0100 @@ -663,3 +663,6 @@ sys_call_table: .long sys_lremovexattr .long sys_fremovexattr .long sys_futex /* 235 */ + .long sys_add_key + .long sys_request_key + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall32-o32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall32-o32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall32-o32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall32-o32.S 2004-10-20 14:30:46.698878816 +0100 @@ -628,6 +628,9 @@ out: jr ra sys sys_mq_notify 2 /* 4275 */ sys sys_mq_getsetattr 3 sys sys_ni_syscall 0 /* sys_vserver */ + sys sys_add_key 5 + sys sys_request_key 4 + sys sys_keyctl 5 .endm diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-64.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-64.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-64.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-64.S 2004-10-20 14:32:42.206470034 +0100 @@ -448,3 +448,6 @@ sys_call_table: PTR sys_mq_notify PTR sys_mq_getsetattr /* 5235 */ PTR sys_ni_syscall /* sys_vserver */ + PTR sys_add_key + PTR sys_request_key + PTR sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-n32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-n32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-n32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-n32.S 2004-10-20 15:12:10.687967430 +0100 @@ -358,3 +358,6 @@ EXPORT(sysn32_call_table) PTR compat_sys_mq_notify PTR compat_sys_mq_getsetattr /* 6239 */ PTR sys_ni_syscall /* sys_vserver */ + PTR sys_add_key + PTR sys_request_key + PTR sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/mips/kernel/scall64-o32.S linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-o32.S --- linux-2.6.9-bk4/arch/mips/kernel/scall64-o32.S 2004-09-16 12:05:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/mips/kernel/scall64-o32.S 2004-10-20 15:11:26.761722025 +0100 @@ -536,6 +536,9 @@ out: jr ra sys compat_sys_mq_notify 2 /* 4275 */ sys compat_sys_mq_getsetattr 3 sys sys_ni_syscall 0 /* sys_vserver */ + sys sys_add_key 5 + sys sys_request_key 4 + sys sys_keyctl 5 .endm diff -uNrp linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S --- linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S 2004-06-18 13:43:47.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S 2004-10-20 14:58:51.533643420 +0100 @@ -341,5 +341,7 @@ ENTRY_SAME(mq_timedreceive) ENTRY_SAME(mq_notify) ENTRY_SAME(mq_getsetattr) - /* Nothing yet */ /* 235 */ + ENTRY_SAME(add_key) /* 235 */ + ENTRY_SAME(request_key) + ENTRY_SAME(keyctl) diff -uNrp linux-2.6.9-bk4/arch/ppc/kernel/misc.S linux-2.6.9-bk4-keys/arch/ppc/kernel/misc.S --- linux-2.6.9-bk4/arch/ppc/kernel/misc.S 2004-10-19 10:41:46.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc/kernel/misc.S 2004-10-20 14:43:37.665815385 +0100 @@ -1447,3 +1447,6 @@ _GLOBAL(sys_call_table) .long sys_mq_notify .long sys_mq_getsetattr .long sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .long sys_add_key + .long sys_request_key /* 270 */ + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/ppc64/kernel/misc.S linux-2.6.9-bk4-keys/arch/ppc64/kernel/misc.S --- linux-2.6.9-bk4/arch/ppc64/kernel/misc.S 2004-10-20 14:02:55.974474037 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc64/kernel/misc.S 2004-10-20 14:57:18.470763092 +0100 @@ -963,6 +963,9 @@ _GLOBAL(sys_call_table32) .llong .compat_sys_mq_notify .llong .compat_sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys32_add_key + .llong .sys32_request_key + .llong .sys32_keyctl .balign 8 _GLOBAL(sys_call_table) @@ -1235,3 +1238,6 @@ _GLOBAL(sys_call_table) .llong .sys_mq_notify .llong .sys_mq_getsetattr .llong .sys_ni_syscall /* 268 reserved for sys_kexec_load */ + .llong .sys_add_key + .llong .sys_request_key /* 270 */ + .llong .sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/ppc64/kernel/sys_ppc32.c linux-2.6.9-bk4-keys/arch/ppc64/kernel/sys_ppc32.c --- linux-2.6.9-bk4/arch/ppc64/kernel/sys_ppc32.c 2004-10-20 14:02:56.046468047 +0100 +++ linux-2.6.9-bk4-keys/arch/ppc64/kernel/sys_ppc32.c 2004-10-20 15:29:22.936487493 +0100 @@ -1328,3 +1328,36 @@ long ppc32_timer_create(clockid_t clock, return err; } + +asmlinkage long sys32_add_key(const char __user *_type, + const char __user *_description, + const void __user *_payload, + u32 plen, + u32 ringid) +{ + sys_add_key(_type, _description, _payload, (size_t) plen, + (key_serial_t) ringid); +} + +asmlinkage long sys32_request_key(const char __user *_type, + const char __user *_description, + const char __user *_callout_info, + u32 destringid) +{ + sys_request_key(_type, _description, _callout_info, + (key_serial_t) destringid); +} + +/* Note: it is necessary to treat option as an unsigned int, + * with the corresponding cast to a signed int to insure that the + * proper conversion (sign extension) between the register representation of a signed int (msr in 32-bit mode) + * and the register representation of a signed int (msr in 64-bit mode) is performed. + */ +asmlinkage long sys32_keyctl(u32 option, u32 arg2, u32 arg3, u32 arg4, u32 arg5) +{ + return sys_keyctl((int)option, + (unsigned long) arg2, + (unsigned long) arg3, + (unsigned long) arg4, + (unsigned long) arg5); +} diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S --- linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S 2004-06-18 13:43:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S 2004-10-20 15:08:00.071403677 +0100 @@ -1406,3 +1406,29 @@ compat_sys_mq_getsetattr_wrapper: llgtr %r3,%r3 # struct compat_mq_attr * llgtr %r4,%r4 # struct compat_mq_attr * jg compat_sys_mq_getsetattr + + .globl sys32_add_key_wrapper +sys32_add_key_wrapper: + lgfr %r2,%r2 # const char * + llgfr %r3,%r3 # const char * + llgfr %r4,%r4 # const void * + llgfr %r5,%r5 # size_t + llgfr %r6,%r6 # key_serial_t + jg sys_add_key # branch to system call + + .globl sys32_request_key_wrapper +sys32_request_key_wrapper: + lgfr %r2,%r2 # const char * + llgfr %r3,%r3 # const char * + llgfr %r4,%r4 # const char * + llgfr %r5,%r5 # key_serial_t + jg sys_request_key # branch to system call + + .globl sys32_keyctl_wrapper +sys32_keyctl_wrapper: + lgfr %r2,%r2 # int + llgfr %r3,%r3 # unsigned long + llgfr %r4,%r4 # unsigned long + llgfr %r5,%r5 # unsigned long + llgfr %r6,%r6 # unsigned long + jg sys_keyctl # branch to system call diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/s390/kernel/syscalls.S --- linux-2.6.9-bk4/arch/s390/kernel/syscalls.S 2004-06-18 13:43:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/s390/kernel/syscalls.S 2004-10-20 15:05:49.863555437 +0100 @@ -285,3 +285,6 @@ SYSCALL(sys_mq_timedsend,sys_mq_timedsen SYSCALL(sys_mq_timedreceive,sys_mq_timedreceive,compat_sys_mq_timedreceive_wrapper) SYSCALL(sys_mq_notify,sys_mq_notify,compat_sys_mq_notify_wrapper) SYSCALL(sys_mq_getsetattr,sys_mq_getsetattr,compat_sys_mq_getsetattr_wrapper) +SYSCALL(sys_add_key,sys_add_key,sys32_add_key_wrapper) +SYSCALL(sys_request_key,sys_request_key,sys32_request_key_wrapper) +SYSCALL(sys_keyctl,sys_keyctl,sys32_keyctl_wrapper) diff -uNrp linux-2.6.9-bk4/arch/sh/kernel/entry.S linux-2.6.9-bk4-keys/arch/sh/kernel/entry.S --- linux-2.6.9-bk4/arch/sh/kernel/entry.S 2004-10-20 14:02:56.666416464 +0100 +++ linux-2.6.9-bk4-keys/arch/sh/kernel/entry.S 2004-10-20 14:26:32.677689027 +0100 @@ -1140,5 +1140,9 @@ ENTRY(sys_call_table) .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr + .long sys_add_key + .long sys_request_key + .long sys_keyctl /* 285 */ + /* End of entry.S */ diff -uNrp linux-2.6.9-bk4/arch/sh64/kernel/syscalls.S linux-2.6.9-bk4-keys/arch/sh64/kernel/syscalls.S --- linux-2.6.9-bk4/arch/sh64/kernel/syscalls.S 2004-09-16 12:05:50.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sh64/kernel/syscalls.S 2004-10-20 15:08:45.682499668 +0100 @@ -337,4 +337,6 @@ sys_call_table: .long sys_mq_timedreceive .long sys_mq_notify .long sys_mq_getsetattr /* 310 */ - + .long sys_add_key + .long sys_request_key + .long sys_keyctl diff -uNrp linux-2.6.9-bk4/arch/sparc/kernel/systbls.S linux-2.6.9-bk4-keys/arch/sparc/kernel/systbls.S --- linux-2.6.9-bk4/arch/sparc/kernel/systbls.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc/kernel/systbls.S 2004-10-20 14:25:23.775664787 +0100 @@ -75,7 +75,7 @@ sys_call_table: /*265*/ .long sys_timer_delete, sys_timer_create, sys_nis_syscall, sys_io_setup, sys_io_destroy /*270*/ .long sys_io_submit, sys_io_cancel, sys_io_getevents, sys_mq_open, sys_mq_unlink /*275*/ .long sys_mq_timedsend, sys_mq_timedreceive, sys_mq_notify, sys_mq_getsetattr, sys_waitid -/*280*/ .long sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .long sys_add_key, sys_request_key, sys_keyctl #ifdef CONFIG_SUNOS_EMUL /* Now the SunOS syscall table. */ diff -uNrp linux-2.6.9-bk4/arch/sparc64/kernel/sys32.S linux-2.6.9-bk4-keys/arch/sparc64/kernel/sys32.S --- linux-2.6.9-bk4/arch/sparc64/kernel/sys32.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc64/kernel/sys32.S 2004-10-20 15:22:48.095792589 +0100 @@ -135,6 +135,9 @@ SIGN2(sys32_shutdown, sys_shutdown, %o0, SIGN3(sys32_socketpair, sys_socketpair, %o0, %o1, %o2) SIGN1(sys32_getpeername, sys_getpeername, %o0) SIGN1(sys32_getsockname, sys_getsockname, %o0) +SIGN2(sys32_add_key, sys_add_key, %o3, %o4) +SIGN1(sys32_request_key, sys_request_key, %o3) +SIGN1(sys32_keyctl, sys_keyctl, %o0) .globl sys32_mmap2 sys32_mmap2: diff -uNrp linux-2.6.9-bk4/arch/sparc64/kernel/systbls.S linux-2.6.9-bk4-keys/arch/sparc64/kernel/systbls.S --- linux-2.6.9-bk4/arch/sparc64/kernel/systbls.S 2004-10-19 10:41:48.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/sparc64/kernel/systbls.S 2004-10-20 14:42:28.934934888 +0100 @@ -76,7 +76,7 @@ sys_call_table32: .word sys_timer_delete, sys32_timer_create, sys_ni_syscall, compat_sys_io_setup, sys_io_destroy /*270*/ .word sys32_io_submit, sys_io_cancel, compat_sys_io_getevents, sys32_mq_open, sys_mq_unlink .word sys_mq_timedsend, sys_mq_timedreceive, compat_sys_mq_notify, compat_sys_mq_getsetattr, compat_sys_waitid -/*280*/ .word sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .word sys32_add_key, sys32_request_key, sys32_keyctl #endif /* CONFIG_COMPAT */ @@ -142,7 +142,7 @@ sys_call_table: .word sys_timer_delete, sys_timer_create, sys_ni_syscall, sys_io_setup, sys_io_destroy /*270*/ .word sys_io_submit, sys_io_cancel, sys_io_getevents, sys_mq_open, sys_mq_unlink .word sys_mq_timedsend, sys_mq_timedreceive, sys_mq_notify, sys_mq_getsetattr, sys_waitid -/*280*/ .word sys_ni_syscall, sys_ni_syscall, sys_ni_syscall +/*280*/ .word sys_add_key, sys_request_key, sys_keyctl #if defined(CONFIG_SUNOS_EMUL) || defined(CONFIG_SOLARIS_EMUL) || \ defined(CONFIG_SOLARIS_EMUL_MODULE) diff -uNrp linux-2.6.9-bk4/arch/um/kernel/sys_call_table.c linux-2.6.9-bk4-keys/arch/um/kernel/sys_call_table.c --- linux-2.6.9-bk4/arch/um/kernel/sys_call_table.c 2004-10-19 10:41:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/um/kernel/sys_call_table.c 2004-10-20 14:44:10.557889241 +0100 @@ -306,6 +306,9 @@ syscall_handler_t *sys_call_table[] = { [ __NR_utimes ] (syscall_handler_t *) sys_utimes, [ __NR_fadvise64_64 ] (syscall_handler_t *) sys_fadvise64_64, [ __NR_vserver ] (syscall_handler_t *) sys_ni_syscall, + [ __NR_add_key ] (syscall_handler_t *) sys_add_key, + [ __NR_request_key ] (syscall_handler_t *) sys_request_key, + [ __NR_keyctl ] (syscall_handler_t *) sys_keyctl, ARCH_SYSCALLS [ LAST_SYSCALL + 1 ... NR_syscalls ] = diff -uNrp linux-2.6.9-bk4/arch/v850/kernel/entry.S linux-2.6.9-bk4-keys/arch/v850/kernel/entry.S --- linux-2.6.9-bk4/arch/v850/kernel/entry.S 2004-06-18 13:41:13.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/v850/kernel/entry.S 2004-10-20 15:02:06.154739578 +0100 @@ -1117,5 +1117,8 @@ C_DATA(sys_call_table): .long CSYM(sys_pivot_root) // 200 .long CSYM(sys_gettid) .long CSYM(sys_tkill) + .long CSYM(sys_add_key) + .long CSYM(sys_request_key) + .long CSYM(sys_keyctl) // 205 sys_call_table_end: C_END(sys_call_table) diff -uNrp linux-2.6.9-bk4/arch/x86_64/ia32/ia32entry.S linux-2.6.9-bk4-keys/arch/x86_64/ia32/ia32entry.S --- linux-2.6.9-bk4/arch/x86_64/ia32/ia32entry.S 2004-10-19 10:41:49.000000000 +0100 +++ linux-2.6.9-bk4-keys/arch/x86_64/ia32/ia32entry.S 2004-10-20 15:04:46.183013167 +0100 @@ -587,6 +587,10 @@ ia32_sys_call_table: .quad compat_sys_mq_getsetattr .quad quiet_ni_syscall /* reserved for kexec */ .quad sys32_waitid + .quad quiet_ni_syscall /* 285 reserved for setaltroot */ + .quad sys_add_key + .quad sys_request_key + .quad sys_keyctl /* don't forget to change IA32_NR_syscalls */ ia32_syscall_end: .rept IA32_NR_syscalls-(ia32_syscall_end-ia32_sys_call_table)/8 diff -uNrp linux-2.6.9-bk4/include/asm-alpha/unistd.h linux-2.6.9-bk4-keys/include/asm-alpha/unistd.h --- linux-2.6.9-bk4/include/asm-alpha/unistd.h 2004-10-19 10:42:11.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-alpha/unistd.h 2004-10-20 14:18:36.681064345 +0100 @@ -374,8 +374,11 @@ #define __NR_mq_notify 436 #define __NR_mq_getsetattr 437 #define __NR_waitid 438 +#define __NR_add_key 439 +#define __NR_request_key 440 +#define __NR_keyctl 441 -#define NR_SYSCALLS 439 +#define NR_SYSCALLS 442 #if defined(__GNUC__) diff -uNrp linux-2.6.9-bk4/include/asm-arm/unistd.h linux-2.6.9-bk4-keys/include/asm-arm/unistd.h --- linux-2.6.9-bk4/include/asm-arm/unistd.h 2004-10-19 10:42:12.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-arm/unistd.h 2004-10-20 14:17:35.183426405 +0100 @@ -306,6 +306,9 @@ #define __NR_mq_notify (__NR_SYSCALL_BASE+278) #define __NR_mq_getsetattr (__NR_SYSCALL_BASE+279) #define __NR_waitid (__NR_SYSCALL_BASE+280) +#define __NR_add_key (__NR_SYSCALL_BASE+281) +#define __NR_request_key (__NR_SYSCALL_BASE+282) +#define __NR_keyctl (__NR_SYSCALL_BASE+283) /* * The following SWIs are ARM private. diff -uNrp linux-2.6.9-bk4/include/asm-arm26/unistd.h linux-2.6.9-bk4-keys/include/asm-arm26/unistd.h --- linux-2.6.9-bk4/include/asm-arm26/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-arm26/unistd.h 2004-10-20 14:16:45.004804472 +0100 @@ -260,6 +260,9 @@ #define __NR_lremovexattr (__NR_SYSCALL_BASE+236) #define __NR_fremovexattr (__NR_SYSCALL_BASE+237) #define __NR_tkill (__NR_SYSCALL_BASE+238) +#define __NR_add_key (__NR_SYSCALL_BASE+239) +#define __NR_request_key (__NR_SYSCALL_BASE+240) +#define __NR_keyctl (__NR_SYSCALL_BASE+241) /* * The following SWIs are ARM private. diff -uNrp linux-2.6.9-bk4/include/asm-cris/unistd.h linux-2.6.9-bk4-keys/include/asm-cris/unistd.h --- linux-2.6.9-bk4/include/asm-cris/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-cris/unistd.h 2004-10-20 14:16:21.025897563 +0100 @@ -288,8 +288,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 283 +#define __NR_request_key 284 +#define __NR_keyctl 285 -#define NR_syscalls 283 +#define NR_syscalls 286 #ifdef __KERNEL__ diff -uNrp linux-2.6.9-bk4/include/asm-h8300/unistd.h linux-2.6.9-bk4-keys/include/asm-h8300/unistd.h --- linux-2.6.9-bk4/include/asm-h8300/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-h8300/unistd.h 2004-10-20 15:01:16.446016959 +0100 @@ -269,8 +269,11 @@ #define __NR_clock_gettime (__NR_timer_create+6) #define __NR_clock_getres (__NR_timer_create+7) #define __NR_clock_nanosleep (__NR_timer_create+8) +#define __NR_add_key 274 +#define __NR_request_key 275 +#define __NR_keyctl 276 -#define NR_syscalls 268 +#define NR_syscalls 277 /* user-visible error numbers are in the range -1 - -122: see diff -uNrp linux-2.6.9-bk4/include/asm-ia64/unistd.h linux-2.6.9-bk4-keys/include/asm-ia64/unistd.h --- linux-2.6.9-bk4/include/asm-ia64/unistd.h 2004-10-20 14:03:14.832904952 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ia64/unistd.h 2004-10-20 14:14:59.746996878 +0100 @@ -260,6 +260,9 @@ #define __NR_kexec_load 1268 #define __NR_vserver 1269 #define __NR_setaltroot 1270 +#define __NR_add_key 1271 +#define __NR_request_key 1272 +#define __NR_keyctl 1273 #ifdef __KERNEL__ diff -uNrp linux-2.6.9-bk4/include/asm-m32r/unistd.h linux-2.6.9-bk4-keys/include/asm-m32r/unistd.h --- linux-2.6.9-bk4/include/asm-m32r/unistd.h 2004-10-19 10:42:13.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-m32r/unistd.h 2004-10-20 14:14:34.284222397 +0100 @@ -294,8 +294,11 @@ #define __NR_mq_getsetattr (__NR_mq_open+5) #define __NR_sys_kexec_load 283 #define __NR_waitid 284 +#define __NR_add_key 285 +#define __NR_request_key 286 +#define __NR_keyctl 287 -#define NR_syscalls 285 +#define NR_syscalls 288 /* user-visible error numbers are in the range -1 - -124: see * diff -uNrp linux-2.6.9-bk4/include/asm-m68k/unistd.h linux-2.6.9-bk4-keys/include/asm-m68k/unistd.h --- linux-2.6.9-bk4/include/asm-m68k/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-m68k/unistd.h 2004-10-20 14:14:06.358663984 +0100 @@ -238,8 +238,11 @@ #define __NR_lremovexattr 233 #define __NR_fremovexattr 234 #define __NR_futex 235 +#define __NR_add_key 236 +#define __NR_request_key 237 +#define __NR_keyctl 238 -#define NR_syscalls 236 +#define NR_syscalls 239 /* user-visible error numbers are in the range -1 - -124: see */ diff -uNrp linux-2.6.9-bk4/include/asm-mips/unistd.h linux-2.6.9-bk4-keys/include/asm-mips/unistd.h --- linux-2.6.9-bk4/include/asm-mips/unistd.h 2004-09-16 12:06:18.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-mips/unistd.h 2004-10-20 14:12:31.321979696 +0100 @@ -298,16 +298,19 @@ #define __NR_mq_notify (__NR_Linux + 275) #define __NR_mq_getsetattr (__NR_Linux + 276) #define __NR_vserver (__NR_Linux + 277) +#define __NR_add_key (__NR_Linux + 278) +#define __NR_request_key (__NR_Linux + 279) +#define __NR_keyctl (__NR_Linux + 280) /* * Offset of the last Linux o32 flavoured syscall */ -#define __NR_Linux_syscalls 277 +#define __NR_Linux_syscalls 280 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */ #define __NR_O32_Linux 4000 -#define __NR_O32_Linux_syscalls 277 +#define __NR_O32_Linux_syscalls 280 #if _MIPS_SIM == _MIPS_SIM_ABI64 @@ -552,11 +555,14 @@ #define __NR_mq_notify (__NR_Linux + 234) #define __NR_mq_getsetattr (__NR_Linux + 235) #define __NR_vserver (__NR_Linux + 236) +#define __NR_add_key (__NR_Linux + 237) +#define __NR_request_key (__NR_Linux + 238) +#define __NR_keyctl (__NR_Linux + 239) /* * Offset of the last Linux flavoured syscall */ -#define __NR_Linux_syscalls 236 +#define __NR_Linux_syscalls 239 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */ @@ -810,11 +816,14 @@ #define __NR_mq_notify (__NR_Linux + 238) #define __NR_mq_getsetattr (__NR_Linux + 239) #define __NR_vserver (__NR_Linux + 240) +#define __NR_add_key (__NR_Linux + 241) +#define __NR_request_key (__NR_Linux + 242) +#define __NR_keyctl (__NR_Linux + 243) /* * Offset of the last N32 flavoured syscall */ -#define __NR_Linux_syscalls 240 +#define __NR_Linux_syscalls 243 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */ diff -uNrp linux-2.6.9-bk4/include/asm-parisc/unistd.h linux-2.6.9-bk4-keys/include/asm-parisc/unistd.h --- linux-2.6.9-bk4/include/asm-parisc/unistd.h 2004-09-16 12:06:18.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-parisc/unistd.h 2004-10-20 14:11:00.896901332 +0100 @@ -727,8 +727,11 @@ #define __NR_mq_timedreceive (__NR_Linux + 232) #define __NR_mq_notify (__NR_Linux + 233) #define __NR_mq_getsetattr (__NR_Linux + 234) +#define __NR_add_key (__NR_Linux + 235) +#define __NR_request_key (__NR_Linux + 236) +#define __NR_keyctl (__NR_Linux + 237) -#define __NR_Linux_syscalls 235 +#define __NR_Linux_syscalls 238 #define HPUX_GATEWAY_ADDR 0xC0000004 #define LINUX_GATEWAY_ADDR 0x100 diff -uNrp linux-2.6.9-bk4/include/asm-ppc/unistd.h linux-2.6.9-bk4-keys/include/asm-ppc/unistd.h --- linux-2.6.9-bk4/include/asm-ppc/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ppc/unistd.h 2004-10-20 14:10:32.629379614 +0100 @@ -273,8 +273,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #define __NR(n) #n diff -uNrp linux-2.6.9-bk4/include/asm-ppc64/unistd.h linux-2.6.9-bk4-keys/include/asm-ppc64/unistd.h --- linux-2.6.9-bk4/include/asm-ppc64/unistd.h 2004-10-19 10:42:14.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-ppc64/unistd.h 2004-10-20 14:10:19.868498694 +0100 @@ -279,8 +279,11 @@ #define __NR_mq_notify 266 #define __NR_mq_getsetattr 267 #define __NR_kexec_load 268 +#define __NR_add_key 269 +#define __NR_request_key 270 +#define __NR_keyctl 271 -#define __NR_syscalls 269 +#define __NR_syscalls 272 #ifdef __KERNEL__ #define NR_syscalls __NR_syscalls #endif diff -uNrp linux-2.6.9-bk4/include/asm-s390/unistd.h linux-2.6.9-bk4-keys/include/asm-s390/unistd.h --- linux-2.6.9-bk4/include/asm-s390/unistd.h 2004-06-18 13:44:05.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-s390/unistd.h 2004-10-20 14:09:39.572899460 +0100 @@ -269,8 +269,11 @@ #define __NR_mq_timedreceive 274 #define __NR_mq_notify 275 #define __NR_mq_getsetattr 276 +#define __NR_add_key 277 +#define __NR_request_key 278 +#define __NR_keyctl 279 -#define NR_syscalls 277 +#define NR_syscalls 280 /* * There are some system calls that are not present on 64 bit, some diff -uNrp linux-2.6.9-bk4/include/asm-sh/unistd.h linux-2.6.9-bk4-keys/include/asm-sh/unistd.h --- linux-2.6.9-bk4/include/asm-sh/unistd.h 2004-10-20 14:03:16.058802954 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sh/unistd.h 2004-10-20 14:09:16.465821351 +0100 @@ -290,8 +290,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 283 +#define __NR_request_key 284 +#define __NR_keyctl 285 -#define NR_syscalls 283 +#define NR_syscalls 286 /* user-visible error numbers are in the range -1 - -124: see */ diff -uNrp linux-2.6.9-bk4/include/asm-sh64/unistd.h linux-2.6.9-bk4-keys/include/asm-sh64/unistd.h --- linux-2.6.9-bk4/include/asm-sh64/unistd.h 2004-09-16 12:06:19.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sh64/unistd.h 2004-10-20 14:08:45.352409218 +0100 @@ -333,8 +333,11 @@ #define __NR_mq_timedreceive (__NR_mq_open+3) #define __NR_mq_notify (__NR_mq_open+4) #define __NR_mq_getsetattr (__NR_mq_open+5) +#define __NR_add_key 311 +#define __NR_request_key 312 +#define __NR_keyctl 313 -#define NR_syscalls 311 +#define NR_syscalls 314 /* user-visible error numbers are in the range -1 - -125: see */ diff -uNrp linux-2.6.9-bk4/include/asm-sparc/unistd.h linux-2.6.9-bk4-keys/include/asm-sparc/unistd.h --- linux-2.6.9-bk4/include/asm-sparc/unistd.h 2004-10-19 10:42:14.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sparc/unistd.h 2004-10-20 14:08:05.303740383 +0100 @@ -296,6 +296,9 @@ #define __NR_mq_notify 277 #define __NR_mq_getsetattr 278 #define __NR_waitid 279 +#define __NR_add_key 280 +#define __NR_request_key 281 +#define __NR_keyctl 282 /* WARNING: You MAY NOT add syscall numbers larger than 282, since * all of the syscall tables in the Sparc kernel are diff -uNrp linux-2.6.9-bk4/include/asm-sparc64/unistd.h linux-2.6.9-bk4-keys/include/asm-sparc64/unistd.h --- linux-2.6.9-bk4/include/asm-sparc64/unistd.h 2004-10-19 10:42:15.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-sparc64/unistd.h 2004-10-20 14:07:45.586380476 +0100 @@ -298,6 +298,9 @@ #define __NR_mq_notify 277 #define __NR_mq_getsetattr 278 #define __NR_waitid 279 +#define __NR_add_key 280 +#define __NR_request_key 281 +#define __NR_keyctl 282 /* WARNING: You MAY NOT add syscall numbers larger than 282, since * all of the syscall tables in the Sparc kernel are diff -uNrp linux-2.6.9-bk4/include/asm-v850/unistd.h linux-2.6.9-bk4-keys/include/asm-v850/unistd.h --- linux-2.6.9-bk4/include/asm-v850/unistd.h 2004-09-16 12:06:20.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-v850/unistd.h 2004-10-20 14:06:45.477380562 +0100 @@ -205,6 +205,9 @@ #define __NR_pivot_root 200 #define __NR_gettid 201 #define __NR_tkill 202 +#define __NR_add_key 203 +#define __NR_request_key 204 +#define __NR_keyctl 205 /* Syscall protocol: diff -uNrp linux-2.6.9-bk4/include/asm-x86_64/unistd.h linux-2.6.9-bk4-keys/include/asm-x86_64/unistd.h --- linux-2.6.9-bk4/include/asm-x86_64/unistd.h 2004-10-19 10:42:16.000000000 +0100 +++ linux-2.6.9-bk4-keys/include/asm-x86_64/unistd.h 2004-10-20 14:06:01.645026869 +0100 @@ -556,8 +556,14 @@ __SYSCALL(__NR_mq_getsetattr, sys_mq_get __SYSCALL(__NR_kexec_load, sys_ni_syscall) #define __NR_waitid 247 __SYSCALL(__NR_waitid, sys_waitid) +#define __NR_add_key 248 +__SYSCALL(__NR_add_key, sys_add_key) +#define __NR_request_key 249 +__SYSCALL(__NR_request_key, sys_request_key) +#define __NR_keyctl 250 +__SYSCALL(__NR_keyctl, sys_keyctl) -#define __NR_syscall_max __NR_waitid +#define __NR_syscall_max __NR_keyctl #ifndef __NO_STUBS /* user-visible error numbers are in the range -1 - -4095 */ From hch at infradead.org Thu Oct 21 01:29:57 2004 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 20 Oct 2004 16:29:57 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020152957.GA21774@infradead.org> > Hi Linus, Andrew, > > The attached patch adds syscalls for almost all archs (everything barring > m68knommu which is in a real mess, and i386 which already has it). > > It also adds 32->64 compatibility where appropriate. Umm, that patch added the damn multiplexer that had been vetoed multiple times. Why did this happen? From matthew at wil.cx Thu Oct 21 01:49:22 2004 From: matthew at wil.cx (Matthew Wilcox) Date: Wed, 20 Oct 2004 16:49:22 +0100 Subject: [parisc-linux] [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> On Wed, Oct 20, 2004 at 03:44:15PM +0100, David Howells wrote: > The attached patch adds syscalls for almost all archs (everything barring > m68knommu which is in a real mess, and i386 which already has it). > > It also adds 32->64 compatibility where appropriate. > --- linux-2.6.9-bk4/arch/parisc/kernel/syscall_table.S 2004-06-18 13:43:47.000000000 +0100 > +++ linux-2.6.9-bk4-keys/arch/parisc/kernel/syscall_table.S 2004-10-20 14:58:51.533643420 +0100 > @@ -341,5 +341,7 @@ > ENTRY_SAME(mq_timedreceive) > ENTRY_SAME(mq_notify) > ENTRY_SAME(mq_getsetattr) > - /* Nothing yet */ /* 235 */ > + ENTRY_SAME(add_key) /* 235 */ > + ENTRY_SAME(request_key) > + ENTRY_SAME(keyctl) Um, no. Should be ENTRY_COMP() if there's compat syscalls. And those particular syscall numbers have already been assigned (blame Linus for dropping the PA-RISC patch on the floor instead of including it in 2.6.9). -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain From dhowells at redhat.com Thu Oct 21 02:16:17 2004 From: dhowells at redhat.com (David Howells) Date: Wed, 20 Oct 2004 17:16:17 +0100 Subject: [parisc-linux] [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> References: <20041020154922.GV16153@parcelfarce.linux.theplanet.co.uk> <3506.1098283455@redhat.com> Message-ID: <7779.1098288977@redhat.com> > Um, no. Should be ENTRY_COMP() if there's compat syscalls. Not all archs (of which PA-Risc is an example) seem to require the same fixups on the same syscalls. In some instances, the upper half of the register is implicitly zero on 32-bit syscall entry to a 64-bit kernel. In such cases, none of my syscalls require fixing up, assuming the pointers are automatically correct. > And those particular syscall numbers have already been assigned (blame Linus > for dropping the PA-RISC patch on the floor instead of including it in > 2.6.9). There's not a lot I can do about that, except wave a patch under Linus's nose and see who complains. Can you allocate three syscall numbers for me for parisc? David From johnrose at austin.ibm.com Thu Oct 21 02:35:32 2004 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 20 Oct 2004 11:35:32 -0500 Subject: [PATCH] __ioremap_explicit() criterion change Message-ID: <1098290132.15425.7.camel@sinatra.austin.ibm.com> The function __ioremap_explicit() misses a possible (obscure) case when reserving the imalloc area for the new region. This can result in the unexpected DLPAR-add failure for an I/O slot. The failure will be characterized by a kernel message resembling "could not obtain imalloc area for ea 0x..." Here's an explanation: At boot time, imalloc regions are created for the ranges of all PHBs. Upon removal of a child slot for one of these PHBs, the imalloc region is split so that the region for the child slot can be removed. A GFW testcase revealed the following scenario. A PHB is remapped at boot for virtual address range A through C. At boot, the partition owns a slot that spans from A to B. This slot is DLPAR-removed, leaving an imalloc region from B to C. At this point, the user DLPAR adds an EADS slot that was not present at boot, but is a child of the PHB. The new slot happens to have a range that directly matches the leftover PHB range, from B to C. The existing code does not expect this, so the operation fails. Signed-off-by: John Rose diff -Nru a/arch/ppc64/mm/init.c b/arch/ppc64/mm/init.c --- a/arch/ppc64/mm/init.c Wed Oct 20 11:17:47 2004 +++ b/arch/ppc64/mm/init.c Wed Oct 20 11:17:47 2004 @@ -263,7 +263,8 @@ */ ; } else { - area = im_get_area(ea, size, IM_REGION_UNUSED|IM_REGION_SUBSET); + area = im_get_area(ea, size, + IM_REGION_UNUSED|IM_REGION_SUBSET|IM_REGION_EXISTS); if (area == NULL) { printk(KERN_ERR "could not obtain imalloc area for ea 0x%lx\n", ea); return 1; From cchaney at us.ibm.com Thu Oct 21 03:04:20 2004 From: cchaney at us.ibm.com (Craig Chaney) Date: Wed, 20 Oct 2004 13:04:20 -0400 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <1098229131.5792.9.camel@gaston> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> Message-ID: <20041020170420.GA8345@sage.raleigh.ibm.com> On Wed, Oct 20, 2004 at 09:38:52AM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2004-10-20 at 09:00, Sonny Rao wrote: > > > Ben, I'm still seeing this issue with 2.6.9 final, do you need > > anything else? I'm sure you're very busy, but please let me know if I > > can help. > > Well, I can't reproduce here, but it seem basically that one of the > calls to alloc_down() is failing, you may want to trace a bit. I'll > try to find by myself too & let you know. > > Ben. I can reproduce this on a p615 as well. I did a little bit of superficial tracking. The call to alloc_down fails because (RELOC(alloc_top) == RELOC(rmo_top)) is false. On LPAR platforms, alloc_top is set to rmo_top in prom_init_mem. However, for the p615, prom_find_machine_type() returns PLATFORM_PSERIES, which causes the logic in prom_init_mem to set alloc_top to 0x40000000. I can work around this by modifying prom_init_mem to set alloc_top to rmo_top if of_platform is either PLATFORM_PSERIES_LPAR or PLATFORM_PSERIES. This allows me to boot a 2.6.9-rc4 kernel on a p615. Hope this helps. -Craig From arnd at arndb.de Thu Oct 21 03:08:17 2004 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 20 Oct 2004 19:08:17 +0200 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <200410201908.18273.arnd@arndb.de> On Middeweken 20 Oktober 2004 16:44, David Howells wrote: > diff -uNrp linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S > --- linux-2.6.9-bk4/arch/s390/kernel/compat_wrapper.S 2004-06-18 13:43:49.000000000 +0100 > +++ linux-2.6.9-bk4-keys/arch/s390/kernel/compat_wrapper.S 2004-10-20 15:08:00.071403677 +0100 > @@ -1406,3 +1406,29 @@ compat_sys_mq_getsetattr_wrapper: > llgtr %r3,%r3 # struct compat_mq_attr * > llgtr %r4,%r4 # struct compat_mq_attr * > jg compat_sys_mq_getsetattr > + > + .globl sys32_add_key_wrapper > +sys32_add_key_wrapper: > + lgfr %r2,%r2 # const char * > + llgfr %r3,%r3 # const char * > + llgfr %r4,%r4 # const void * > + llgfr %r5,%r5 # size_t > + llgfr %r6,%r6 # key_serial_t > + jg sys_add_key # branch to system call > + > + .globl sys32_request_key_wrapper > +sys32_request_key_wrapper: > + lgfr %r2,%r2 # const char * > + llgfr %r3,%r3 # const char * > + llgfr %r4,%r4 # const char * > + llgfr %r5,%r5 # key_serial_t > + jg sys_request_key # branch to system call > + > + .globl sys32_keyctl_wrapper > +sys32_keyctl_wrapper: > + lgfr %r2,%r2 # int > + llgfr %r3,%r3 # unsigned long > + llgfr %r4,%r4 # unsigned long > + llgfr %r5,%r5 # unsigned long > + llgfr %r6,%r6 # unsigned long > + jg sys_keyctl # branch to system call The comments don't match with the code. Please use the correct lgfr/llgfr/llgtr opcodes for signed/unsigned/pointer extension. Note that for keyctl_wrapper, the actual conversion is not static but depends on the value of %r2. You probably want to code that conversion in C. Arnd <>< -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: signature Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/2a01c183/attachment.pgp From akpm at osdl.org Thu Oct 21 03:50:27 2004 From: akpm at osdl.org (Andrew Morton) Date: Wed, 20 Oct 2004 10:50:27 -0700 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020152957.GA21774@infradead.org> References: <3506.1098283455@redhat.com> <20041020152957.GA21774@infradead.org> Message-ID: <20041020105027.54bf9e89.akpm@osdl.org> Christoph Hellwig wrote: > > > Hi Linus, Andrew, > > > > The attached patch adds syscalls for almost all archs (everything barring > > m68knommu which is in a real mess, and i386 which already has it). > > > > It also adds 32->64 compatibility where appropriate. > > Umm, that patch added the damn multiplexer that had been vetoed multiple > times. Why did this happen? Fifteen new syscalls was judged excessive and the keyfs interface was judged slow and bloaty. From hch at infradead.org Thu Oct 21 04:18:50 2004 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 20 Oct 2004 19:18:50 +0100 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020105027.54bf9e89.akpm@osdl.org> References: <3506.1098283455@redhat.com> <20041020152957.GA21774@infradead.org> <20041020105027.54bf9e89.akpm@osdl.org> Message-ID: <20041020181850.GA23979@infradead.org> On Wed, Oct 20, 2004 at 10:50:27AM -0700, Andrew Morton wrote: > Christoph Hellwig wrote: > > > > > Hi Linus, Andrew, > > > > > > The attached patch adds syscalls for almost all archs (everything barring > > > m68knommu which is in a real mess, and i386 which already has it). > > > > > > It also adds 32->64 compatibility where appropriate. > > > > Umm, that patch added the damn multiplexer that had been vetoed multiple > > times. Why did this happen? > > Fifteen new syscalls was judged excessive and the keyfs interface was > judged slow and bloaty. Maybe 15 syscalls just means the API is goddamn awfull and we certainly shouldn't merge it as-is. From linas at austin.ibm.com Thu Oct 21 04:45:01 2004 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 20 Oct 2004 13:45:01 -0500 Subject: status of ppc64 patches In-Reply-To: <20041020010301.GA29579@4> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> <20041020010301.GA29579@4> Message-ID: <20041020184501.GF10026@austin.ibm.com> On Tue, Oct 19, 2004 at 08:03:01PM -0500, Olof Johansson was heard to remark: > > Also: Regarding re-basing patches: It has to be the duty of the developer > of the patch to re-base it to current trees if it will no longer apply > cleanly. I think this misses the point. I've re-based some of my patches more than half-a-dozen times, and this has gotten so tedious that I've just sort of stopped bothering sending in patches. Excessive delays in moving patches upstream just kills the development process. Patches need to be handled in a timely manner, while they are still 'fresh', so that they don't need to be rebased. Put it another way: it is, at this time, impossible for me to rebase, because I know that my patches will conflict with others in the un-applied patch queue. So all I can do is wait for the patch queue to shrink, wait till the others get into the Torvalds tree, then bk pull, then hurry, hurry, rebase, test, submit, and hope I get in before someone else does and wrecks it again. The turn-around time for "getting lucky" like this is over a month, and if one doesn't get lucky the first month, one has to wait a whole 'nother month for one's next shot. --linas From jschopp at austin.ibm.com Thu Oct 21 05:28:35 2004 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 20 Oct 2004 14:28:35 -0500 Subject: status of ppc64 patches In-Reply-To: <16758.21948.795730.268143@cargo.ozlabs.ibm.com> References: <41754644.1010003@austin.ibm.com> <16758.21948.795730.268143@cargo.ozlabs.ibm.com> Message-ID: <4176BC63.8000700@austin.ibm.com> > As far as your patches are concerned, I am aware of two patches that > change things so that we have __boot variants of __pa etc. However, > your explanation didn't really get me excited about the change. You > said something about "moving towards hotplug memory" but you didn't > explain why these changes would help with that, or how I should choose > which function to use when I'm making changes in future (that should > actually go in a file somewhere under the Documentation directory), or > why those changes need to go in now. The direct answer is that this is a big part of the size of the CONFIG_NONLINEAR patch, without the controversial part that actually does CONFIG_NONLINEAR. CONFIG_NONLINEAR allows us to have big holes in physical memory and to grow physical memory after boot. These changes will be necessary for whatever ends up filling the role CONFIG_NONLINEAR currently does in our hotplug memory tree. So even if you hate CONFIG_NONLINEAR these patches will be necessary for memory hotplug because we will have to differentiate early boot memory from normal memory. We have a tree that does memory add, and is part of the way to doing remove. http://sprucegoose.sr71.net/patches It has 76 patches currently. It is a real job to continue to forward port it. We are trying to get it all upstream. But of course it would be insane to merge 76 very complex patches at once, especially when a few of them are still buggy. These changes need to go in now because they don't hurt anything and they help us a great deal on a project most everybody agrees is a good idea (memory hotplug). If we didn't have a continuous development model they could be ignored until 2.7, but to get large features into a kernel that is always stable it is necessary to merge things a bit at a time. Even if those bits are only worthwhile in the context of the yet unmerged bits. And I apologize for not making this all clear in my initial message. From paulus at samba.org Thu Oct 21 07:30:56 2004 From: paulus at samba.org (Paul Mackerras) Date: Thu, 21 Oct 2004 07:30:56 +1000 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> Message-ID: <16758.55568.809557.670513@cargo.ozlabs.ibm.com> Jake, > We should probably check to make sure that all of the flash > list headers are above 4gig. Not just the first one. Why is the limit 4GB rather than the RMO size? Paul. From moilanen at austin.ibm.com Thu Oct 21 08:08:17 2004 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 20 Oct 2004 17:08:17 -0500 Subject: [PATCH 1/1] rtas_flash_4gig In-Reply-To: <16758.55568.809557.670513@cargo.ozlabs.ibm.com> References: <200410041942.i94Jg4WA154540@westrelay04.boulder.ibm.com> <16758.55568.809557.670513@cargo.ozlabs.ibm.com> Message-ID: <20041020170817.0ee49b64@localhost> > > We should probably check to make sure that all of the flash > > list headers are above 4gig. Not just the first one. > > Why is the limit 4GB rather than the RMO size? According to the RPA (item E7-41 to be exact), the block-list can be anywhere under 4 gigs. RTAS will make hypervisor calls to access this memory. I would infer the reason why they want to allow the block-list outside the RMO is otherwise it may have been difficult for the OS to get an entire flash image under the RMO boundary. Thanks, Jake From davem at davemloft.net Thu Oct 21 08:01:49 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 15:01:49 -0700 Subject: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <3506.1098283455@redhat.com> References: <3506.1098283455@redhat.com> Message-ID: <20041020150149.7be06d6d.davem@davemloft.net> David, I applaud your effort to take care of this. However, this patch will conflict with what I've sent into Linus already for Sparc. I also had to add the sys_altroot syscall entry as well. I've mentioned several times that perhaps the best way to deal with this problem is to purposefully break the build of platforms when new system calls are added. Simply adding a: #error new syscall entries for X and Y needed to include/asm-*/unistd.h would handle this just fine I think. That way it won't be missed, and if the platform maintainer wants to just ignore the new syscall they can choose to do that as well. From olof at austin.ibm.com Thu Oct 21 08:26:41 2004 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 20 Oct 2004 17:26:41 -0500 Subject: [PATCH] create iommu_free_table() In-Reply-To: <1097171661.7087.1.camel@sinatra.austin.ibm.com> References: <1097171661.7087.1.camel@sinatra.austin.ibm.com> Message-ID: <4176E621.3040607@austin.ibm.com> John Rose wrote: > The patch below creates iommu_free_table(). Iommu tables are not currently > freed in PPC64. This could cause a memory leak for DLPAR of an EADS slot. The > function verifies that there are no outstanding TCE entries for the range of > the table before freeing it. I added a call to iommu_free_table() to the code > that dynamically removes a device node. This should be fairly symmetrical with > the table allocation, which happens during dynamic addition of a device node. > > Comments welcome. Looks good, just a couple of minor nitpicks below. -Olof > Signed-off-by: John Rose > > diff -Nru a/arch/ppc64/kernel/pSeries_iommu.c b/arch/ppc64/kernel/pSeries_iommu.c > --- a/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 > +++ b/arch/ppc64/kernel/pSeries_iommu.c Thu Oct 7 11:08:19 2004 > @@ -412,6 +412,38 @@ > dn->iommu_table = iommu_init_table(tbl); > } > > +void iommu_free_table(struct device_node *dn) > +{ > + struct iommu_table *tbl = dn->iommu_table; > + unsigned long bitmap_sz, i; > + unsigned int order; > + > + if (!tbl || !tbl->it_map) { whitespace above looks wrong (or below?) > + printk(KERN_ERR "%s: expected TCE map for %s\n", __FUNCTION__, > + dn->full_name); > + return; > + } > + > + /* verify that table contains no entries */ > + /* it_mapsize is in entries, and we're examining 64 at a time */ > + for (i = 0; i < (tbl->it_mapsize/64); i++) { > + if (tbl->it_map[i] != 0) { > + printk(KERN_WARNING "%s: Unexpected TCEs for %s\n", > + __FUNCTION__, dn->full_name); > + break; > + } Could this get spammy? It could be nice to see a WARN_ON(1) too, so the call stack is dumped. If that's added, a printk_ratelimit() would definately be warranted around both the printk and the WARN_ON(). > + } > + > + /* calculate bitmap size in bytes */ > + bitmap_sz = (tbl->it_mapsize + 7) / 8; > + > + /* free bitmap */ > + order = get_order(bitmap_sz); > + free_pages((unsigned long) tbl->it_map, order); > + > + /* free table */ > + kfree(tbl); whitespace > +} > > void iommu_setup_pSeries(void) > { > diff -Nru a/arch/ppc64/kernel/prom.c b/arch/ppc64/kernel/prom.c > --- a/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 > +++ b/arch/ppc64/kernel/prom.c Thu Oct 7 11:08:19 2004 > @@ -1818,6 +1818,9 @@ > return -EBUSY; > } > > + if (np->iommu_table) > + iommu_free_table(np); > + > write_lock(&devtree_lock); > OF_MARK_STALE(np); > remove_node_proc_entries(np); > diff -Nru a/include/asm-ppc64/iommu.h b/include/asm-ppc64/iommu.h > --- a/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 > +++ b/include/asm-ppc64/iommu.h Thu Oct 7 11:08:19 2004 > @@ -113,6 +113,9 @@ > /* Creates table for an individual device node */ > extern void iommu_devnode_init(struct device_node *dn); > > +/* Frees table for an individual device node */ > +extern void iommu_free_table(struct device_node *dn); > + > #endif /* CONFIG_PPC_MULTIPLATFORM */ > > #ifdef CONFIG_PPC_ISERIES > > > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > From davem at davemloft.net Thu Oct 21 09:04:50 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 16:04:50 -0700 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020225625.GD995@wotan.suse.de> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> Message-ID: <20041020160450.0914270b.davem@davemloft.net> On Thu, 21 Oct 2004 00:56:25 +0200 Andi Kleen wrote: > I don't think that's a good idea. Normally new system calls > are relatively obscure and the system works fine without them, > so urgent action is not needed. > > And I think we can trust architecture maintainers to regularly > sync the system calls with i386. I disagree quite strongly. One major frustration for users of non-x86 platforms is that functionality is often missing for some time that we can make trivial to keep in sync. I religiously watch what goes into Linus's tree for this purpose, but that is kind of a rediculious burdon to expect every platform maintainer to do. It's not just system calls, we have signal handling bug fixes, trap handling infrastructure, and now the nice generic IRQ handling subsystem as other examples. Simply put, if you're not watching the tree in painstaking detail every day, you miss all of these enhancements. The knowledge should come from the person putting the changes into the tree, therefore it gets done once and this makes it so that the other platform maintainers will find out about it automatically next time they update their tree. From ak at suse.de Thu Oct 21 09:25:09 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 01:25:09 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <20041020232509.GF995@wotan.suse.de> On Wed, Oct 20, 2004 at 04:04:50PM -0700, David S. Miller wrote: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: > > > I don't think that's a good idea. Normally new system calls > > are relatively obscure and the system works fine without them, > > so urgent action is not needed. > > > > And I think we can trust architecture maintainers to regularly > > sync the system calls with i386. > > I disagree quite strongly. One major frustration for users of > non-x86 platforms is that functionality is often missing for some > time that we can make trivial to keep in sync. I'm not sure really if the users of some embedded platform are all sheering for key management system calls... I guess they will prefer just something that compiles. > > I religiously watch what goes into Linus's tree for this purpose, > but that is kind of a rediculious burdon to expect every platform > maintainer to do. It's not just system calls, we have signal handling > bug fixes, trap handling infrastructure, and now the nice generic > IRQ handling subsystem as other examples. Most of that is optional. When the arch maintainer choses not to use it you have just unnecessarily broken the build. IMHO breaking the build unnecessarily is extremly bad because it will prevent all testing. And would you really want to hold up the whole linux testing machinery just for some obscure system call? IMHO not a good tradeoff. > > Simply put, if you're not watching the tree in painstaking detail > every day, you miss all of these enhancements. I would assume the other maintainers go at least from time to time through the i386 diffs and check if they miss anything (that is what I do). For system calls they do definitely, although it may take some time. > > The knowledge should come from the person putting the changes into > the tree, therefore it gets done once and this makes it so that > the other platform maintainers will find out about it automatically > next time they update their tree. And causing merging headaches and all kind of other problems. -Andi From ak at suse.de Thu Oct 21 08:56:25 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 00:56:25 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020150149.7be06d6d.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> Message-ID: <20041020225625.GD995@wotan.suse.de> On Wed, Oct 20, 2004 at 03:01:49PM -0700, David S. Miller wrote: > > David, I applaud your effort to take care of this. > However, this patch will conflict with what I've > sent into Linus already for Sparc. I also had to > add the sys_altroot syscall entry as well. > > I've mentioned several times that perhaps the best > way to deal with this problem is to purposefully > break the build of platforms when new system calls > are added. > > Simply adding a: > > #error new syscall entries for X and Y needed > > to include/asm-*/unistd.h would handle this just > fine I think. I don't think that's a good idea. Normally new system calls are relatively obscure and the system works fine without them, so urgent action is not needed. And I think we can trust architecture maintainers to regularly sync the system calls with i386. -Andi From davem at davemloft.net Thu Oct 21 09:41:44 2004 From: davem at davemloft.net (David S. Miller) Date: Wed, 20 Oct 2004 16:41:44 -0700 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020232509.GF995@wotan.suse.de> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> <20041020232509.GF995@wotan.suse.de> Message-ID: <20041020164144.3457eafe.davem@davemloft.net> On Thu, 21 Oct 2004 01:25:09 +0200 Andi Kleen wrote: > IMHO breaking the build unnecessarily is extremly bad because > it will prevent all testing. And would you really want to hold > up the whole linux testing machinery just for some obscure > system call? IMHO not a good tradeoff. Then change the unistd.h cookie from "#error" to a "#warning". It accomplishes both of our goals. From ak at suse.de Thu Oct 21 10:10:42 2004 From: ak at suse.de (Andi Kleen) Date: Thu, 21 Oct 2004 02:10:42 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020164144.3457eafe.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> <20041020232509.GF995@wotan.suse.de> <20041020164144.3457eafe.davem@davemloft.net> Message-ID: <20041021001041.GI995@wotan.suse.de> On Wed, Oct 20, 2004 at 04:41:44PM -0700, David S. Miller wrote: > On Thu, 21 Oct 2004 01:25:09 +0200 > Andi Kleen wrote: > > > IMHO breaking the build unnecessarily is extremly bad because > > it will prevent all testing. And would you really want to hold > > up the whole linux testing machinery just for some obscure > > system call? IMHO not a good tradeoff. > > Then change the unistd.h cookie from "#error" to a "#warning". It > accomplishes both of our goals. #warnings would be fine for me. -Andi From benh at kernel.crashing.org Thu Oct 21 11:30:59 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:30:59 +1000 Subject: 2.6.9-rc4 kernel -- "cannot find space for TCE table" In-Reply-To: <20041020170420.GA8345@sage.raleigh.ibm.com> References: <1097887510.6487.23.camel@gaston> <20041019230054.GA3807@kevlar.burdell.org> <1098229131.5792.9.camel@gaston> <20041020170420.GA8345@sage.raleigh.ibm.com> Message-ID: <1098322258.4183.15.camel@gaston> On Thu, 2004-10-21 at 03:04, Craig Chaney wrote: > which causes the logic in prom_init_mem to set alloc_top to 0x40000000. > > I can work around this by modifying prom_init_mem to set alloc_top to rmo_top > if of_platform is either PLATFORM_PSERIES_LPAR or PLATFORM_PSERIES. This > allows me to boot a 2.6.9-rc4 kernel on a p615. Yes, alloc_top and rmo_top should be both "clamped". Can you try that patch and let me know ? Index: linux-work/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-work.orig/arch/ppc64/kernel/prom_init.c 2004-10-20 18:38:08.911500096 +1000 +++ linux-work/arch/ppc64/kernel/prom_init.c 2004-10-21 11:30:23.570248584 +1000 @@ -675,7 +675,7 @@ if ( RELOC(of_platform) == PLATFORM_PSERIES_LPAR ) RELOC(alloc_top) = RELOC(rmo_top); else - RELOC(alloc_top) = min(0x40000000ul, RELOC(ram_top)); + RELOC(alloc_top) = RELOC(rmo_top) = min(0x40000000ul, RELOC(ram_top)); RELOC(alloc_bottom) = PAGE_ALIGN(RELOC(klimit) - offset + 0x4000); RELOC(alloc_top_high) = RELOC(ram_top); From david at gibson.dropbear.id.au Thu Oct 21 11:32:07 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 11:32:07 +1000 Subject: [PPC64] Don't build virtual IO drivers for PowerMac Message-ID: <20041021013207.GH17760@zax> Andrew, please apply: Only compile vio.c on iSeries and pSeries, since other PPC64 platforms (PowerMac) don't use virtual IO. The resulting #ifdefs in dma.c are kind of ugly, but at least contained, and I can't see a nicer way of doing it for the time being. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/Makefile =================================================================== --- working-2.6.orig/arch/ppc64/kernel/Makefile 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/Makefile 2004-10-05 15:47:16.541962864 +1000 @@ -11,7 +11,7 @@ udbg.o binfmt_elf32.o sys_ppc32.o ioctl32.o \ ptrace32.o signal32.o rtc.o init_task.o \ lmb.o cputable.o cpu_setup_power4.o idle_power4.o \ - iommu.o sysfs.o vio.o + iommu.o sysfs.o obj-$(CONFIG_PPC_OF) += of_device.o @@ -45,6 +45,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_BOOTX_TEXT) += btext.o obj-$(CONFIG_HVCS) += hvcserver.o +obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_PPC_PMAC) += pmac_setup.o pmac_feature.o pmac_pci.o \ pmac_time.o pmac_nvram.o pmac_low_i2c.o \ Index: working-2.6/arch/ppc64/Kconfig =================================================================== --- working-2.6.orig/arch/ppc64/Kconfig 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/Kconfig 2004-10-05 15:47:16.541962864 +1000 @@ -110,6 +110,11 @@ processors, that is, which share physical processors between two or more partitions. +config IBMVIO + depends on PPC_PSERIES || PPC_ISERIES + bool + default y + config U3_DART bool depends on PPC_MULTIPLATFORM Index: working-2.6/arch/ppc64/kernel/dma.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/dma.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/dma.c 2004-10-05 16:02:01.372034952 +1000 @@ -17,8 +17,10 @@ { if (dev->bus == &pci_bus_type) return pci_dma_supported(to_pci_dev(dev), mask); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_dma_supported(to_vio_dev(dev), mask); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -28,8 +30,10 @@ { if (dev->bus == &pci_bus_type) return pci_set_dma_mask(to_pci_dev(dev), dma_mask); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_set_dma_mask(to_vio_dev(dev), dma_mask); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -40,8 +44,10 @@ { if (dev->bus == &pci_bus_type) return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_alloc_consistent(to_vio_dev(dev), size, dma_handle); +#endif /* CONFIG_IBMVIO */ BUG(); return NULL; } @@ -52,8 +58,10 @@ { if (dev->bus == &pci_bus_type) pci_free_consistent(to_pci_dev(dev), size, cpu_addr, dma_handle); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_free_consistent(to_vio_dev(dev), size, cpu_addr, dma_handle); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -64,8 +72,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_single(to_pci_dev(dev), cpu_addr, size, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_single(to_vio_dev(dev), cpu_addr, size, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return (dma_addr_t)0; } @@ -76,8 +86,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_single(to_pci_dev(dev), dma_addr, size, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_single(to_vio_dev(dev), dma_addr, size, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -89,8 +101,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_page(to_pci_dev(dev), page, offset, size, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_page(to_vio_dev(dev), page, offset, size, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return (dma_addr_t)0; } @@ -101,8 +115,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_page(to_pci_dev(dev), dma_address, size, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_page(to_vio_dev(dev), dma_address, size, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } @@ -113,8 +129,10 @@ { if (dev->bus == &pci_bus_type) return pci_map_sg(to_pci_dev(dev), sg, nents, (int)direction); +#ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) return vio_map_sg(to_vio_dev(dev), sg, nents, direction); +#endif /* CONFIG_IBMVIO */ BUG(); return 0; } @@ -125,8 +143,10 @@ { if (dev->bus == &pci_bus_type) pci_unmap_sg(to_pci_dev(dev), sg, nhwentries, (int)direction); +#ifdef CONFIG_IBMVIO else if (dev->bus == &vio_bus_type) vio_unmap_sg(to_vio_dev(dev), sg, nhwentries, direction); +#endif /* CONFIG_IBMVIO */ else BUG(); } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From david at gibson.dropbear.id.au Thu Oct 21 11:35:49 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 11:35:49 +1000 Subject: [PPC64] Trivial sparse cleanups Message-ID: <20041021013549.GI17760@zax> Andrew, please apply: This patch squashes a handful of assorted sparse warnings in the ppc64 code. Should be pretty much trivial and self explanatory. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/nvram.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/nvram.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/nvram.c 2004-10-21 11:34:39.057902952 +1000 @@ -77,7 +77,7 @@ } -static ssize_t dev_nvram_read(struct file *file, char *buf, +static ssize_t dev_nvram_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { ssize_t len; @@ -117,7 +117,7 @@ } -static ssize_t dev_nvram_write(struct file *file, const char *buf, +static ssize_t dev_nvram_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { ssize_t len; Index: working-2.6/arch/ppc64/kernel/setup.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/setup.c 2004-10-05 10:08:10.000000000 +1000 +++ working-2.6/arch/ppc64/kernel/setup.c 2004-10-21 11:34:39.059902648 +1000 @@ -1111,7 +1111,7 @@ { /* ensure xmon is enabled */ xmon_init(); - debugger(0); + debugger(NULL); return 0; } Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-10-20 10:52:39.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-10-21 11:34:39.060902496 +1000 @@ -249,7 +249,7 @@ { if (within_hugepage_high_range(addr, len)) return 0; - else if ((addr < 0x100000000) && ((addr+len) < 0x100000000)) { + else if ((addr < 0x100000000UL) && ((addr+len) < 0x100000000UL)) { int err; /* Yes, we need both tests, in case addr+len overflows * 64-bit arithmetic */ Index: working-2.6/arch/ppc64/mm/hash_utils.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hash_utils.c 2004-09-28 10:22:13.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hash_utils.c 2004-10-21 11:34:39.060902496 +1000 @@ -401,7 +401,7 @@ info.si_signo = SIGBUS; info.si_errno = 0; info.si_code = BUS_ADRERR; - info.si_addr = (void *)address; + info.si_addr = (void __user *)address; force_sig_info(SIGBUS, &info, current); return; } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From benh at kernel.crashing.org Thu Oct 21 11:51:10 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:51:10 +1000 Subject: status of ppc64 patches In-Reply-To: <20041020184501.GF10026@austin.ibm.com> References: <41754644.1010003@austin.ibm.com> <1098231748.7493.114.camel@pants.austin.ibm.com> <20041020010301.GA29579@4> <20041020184501.GF10026@austin.ibm.com> Message-ID: <1098323469.20954.27.camel@gaston> On Thu, 2004-10-21 at 04:45, Linas Vepstas wrote: > Put it another way: it is, at this time, impossible for me to rebase, > because I know that my patches will conflict with others in the > un-applied patch queue. So all I can do is wait for the patch queue to > shrink, wait till the others get into the Torvalds tree, then bk pull, > then hurry, hurry, rebase, test, submit, and hope I get in before > someone else does and wrecks it again. The turn-around time for > "getting lucky" like this is over a month, and if one doesn't get > lucky the first month, one has to wait a whole 'nother month for > one's next shot. For some reason, it seems other people have a lot more luck than you do ... Ben. From benh at kernel.crashing.org Thu Oct 21 11:55:32 2004 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 21 Oct 2004 11:55:32 +1000 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <1098323732.20955.31.camel@gaston> On Thu, 2004-10-21 at 09:04, David S. Miller wrote: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: > > > I don't think that's a good idea. Normally new system calls > > are relatively obscure and the system works fine without them, > > so urgent action is not needed. > > > > And I think we can trust architecture maintainers to regularly > > sync the system calls with i386. > > I disagree quite strongly. One major frustration for users of > non-x86 platforms is that functionality is often missing for some > time that we can make trivial to keep in sync. I agree with David here. It's also easy for arch/platform maintainers to "miss" a new syscall too ... for various reasons, we can't all read _everything_ that gets posted to lkml and we all do occasionally miss some csets going upstream, which means we can very well totally "forget" about addint the new syscall to the arch ... until somebody complains, which can be 1 or 2 releases later ! > I religiously watch what goes into Linus's tree for this purpose, > but that is kind of a rediculious burdon to expect every platform > maintainer to do. It's not just system calls, we have signal handling > bug fixes, trap handling infrastructure, and now the nice generic > IRQ handling subsystem as other examples. Right. > Simply put, if you're not watching the tree in painstaking detail > every day, you miss all of these enhancements. > > The knowledge should come from the person putting the changes into > the tree, therefore it gets done once and this makes it so that > the other platform maintainers will find out about it automatically > next time they update their tree. Agreed, Ben. From david at gibson.dropbear.id.au Thu Oct 21 13:36:17 2004 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 21 Oct 2004 13:36:17 +1000 Subject: [PPC64] xmon sparse cleanups Message-ID: <20041021033617.GK17760@zax> Andrew, please apply: This patch removes many sparse warnings from the xmon code. Mostly K&R function declarations and 0-instead-of-NULLs. There are still a whole bunch of warnings in xmon/ppc-opc.c, which is a copy of a file from binutils. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/xmon/xmon.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/xmon.c 2004-09-24 10:14:09.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/xmon.c 2004-10-05 16:31:01.822963256 +1000 @@ -645,7 +645,7 @@ for (i = 0; i < NBPTS; ++i, ++bp) if (bp->enabled && pc == bp->address) return bp; - return 0; + return NULL; } static struct bpt *in_breakpoint_table(unsigned long nip, unsigned long *offp) @@ -1582,7 +1582,7 @@ extern char dec_exc; void -super_regs() +super_regs(void) { int cmd; unsigned long val; @@ -1816,7 +1816,7 @@ ""; void -memex() +memex(void) { int cmd, inc, i, nslash; unsigned long n; @@ -1967,7 +1967,7 @@ } int -bsesc() +bsesc(void) { int c; @@ -1985,7 +1985,7 @@ || ('a' <= (c) && (c) <= 'f') \ || ('A' <= (c) && (c) <= 'F')) void -dump() +dump(void) { int c; @@ -2150,7 +2150,7 @@ static unsigned mask; void -memlocate() +memlocate(void) { unsigned a, n; unsigned char val[4]; @@ -2183,7 +2183,7 @@ static unsigned long mlim = 0xffffffff; void -memzcan() +memzcan(void) { unsigned char v; unsigned a; @@ -2212,7 +2212,7 @@ /* Input scanning routines */ int -skipbl() +skipbl(void) { int c; @@ -2237,8 +2237,7 @@ }; int -scanhex(vp) -unsigned long *vp; +scanhex(unsigned long *vp) { int c, d; unsigned long v; @@ -2322,7 +2321,7 @@ } void -scannl() +scannl(void) { int c; @@ -2365,13 +2364,13 @@ static char *lineptr; void -flush_input() +flush_input(void) { lineptr = NULL; } int -inchar() +inchar(void) { if (lineptr == NULL || *lineptr == 0) { if (fgets(line, sizeof(line), stdin) == NULL) { @@ -2384,8 +2383,7 @@ } void -take_input(str) -char *str; +take_input(char *str) { lineptr = str; } Index: working-2.6/arch/ppc64/xmon/start.c =================================================================== --- working-2.6.orig/arch/ppc64/xmon/start.c 2004-08-09 09:51:38.000000000 +1000 +++ working-2.6/arch/ppc64/xmon/start.c 2004-10-05 16:33:50.355028808 +1000 @@ -173,7 +173,7 @@ c = xmon_getchar(); if (c == -1) { if (p == str) - return 0; + return NULL; break; } *p++ = c; -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson From wjfast at yahoo.com Thu Oct 21 16:33:30 2004 From: wjfast at yahoo.com (Wjeeha Tahir) Date: Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Subject: Booting Linux from HardDisk on iSeries Message-ID: <20041021063330.34212.qmail@web14926.mail.yahoo.com> Hi, This is my first email on this group, and I am really hopeful to find solution to my problem here. I was installing linux on iSeries in my office but was getting problems. I have installed RedHat Linux 9 on an iSeries machine in LPAR. The version of kernel as given by uname -a command is 2.4.21-4.EL However after installation is complete I want to boot from disk rather than the cd drive. I think there is some need to copy some boot image onto the disk. I looked at theTechnical FAQ for Linux on iSeries: http://www-1.ibm.com/servers/eserver/iseries/linux/tech_faq.html#kernel and performed the following steps. I executed the command fdisk-l and the output was as follows: Disk /dev/iseries/vda: 4194 MB, 4194892800 bytes 255 heads, 63 sectors/track, 510 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/iseries/vda1 * 1 2 16033+ 41 PPC PReP Boot /dev/iseries/vda2 3 384 3068415 83 Linux /dev/iseries/vda3 385 510 1012095 82 Linux swap Hence my Prep Partition is /dev/iseries/vda1 Next the implementaion document tells me to execute the command dd if=/boot/vmlinux/good of=/dev/iseries/vda1 bs=4k However the problem is that there is no file by the name of vmlinux.good in the boot directory. I'll show you the listing of boot directory. [root at TestLinux /]# cd /boot [root at TestLinux boot]# ls cmdline-2.4.21-4.EL kernel.h System.map-2.4.21-4.EL config-2.4.21-4.EL message vmlinitrd-2.4.21-4.EL grub message.ja vmlinux-2.4.21-4.EL initrd-2.4.21-4.EL.img System.map Now I am at a loss at to what should be the input file for the dd command. I tried the command: dd if=/boot/vmlinitrd-2.4.21-4.EL of=/dev/iseries/vda1 bs=4k but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and I get the following error: Partition check: iseries/vda: iseries/vda1 iseries/vda2 iseries/vda3 iSeries virtual I/O: viod: Disk 00 size 4000M, sectors 63, heads 255, cylinders 510, sectsize 512 iSeries virtual I/O: viod: Disk 00 partition 01 start sector 63, # sector 32067 iSeries virtual I/O: viod: Disk 00 partition 02 start sector 32130, # sector 6136830 iSeries virtual I/O: viod: Disk 00 partition 03 start sector 6168960, # sector 2024190 Loading jbd.o module Journalled Block Device driver loaded Loading ext3.o module Mounting /proc filesystem Creating block devices Creating root device Mounting root filesystem VFS: Can't find ext3 filesystem on dev viod(112,1). mount: error 22 mounting ext3 pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2 umount /initrd/proc failed: 2 Freeing unused kernel memory: 156k init Kernel panic: No init found. Try passing init= option to kernel. Rebooting in 180 seconds.. Can anyone tell me the exact command specifying what to copy from where and to where. I'll be very thankful if you could help me in this. Kind Regards, Wjeeha Tahir --------------------------------- Do you Yahoo!? vote.yahoo.com - Register online to vote today! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041020/b38e4fb9/attachment.htm From sfr at canb.auug.org.au Thu Oct 21 18:05:46 2004 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 21 Oct 2004 18:05:46 +1000 Subject: Booting Linux from HardDisk on iSeries In-Reply-To: <20041021063330.34212.qmail@web14926.mail.yahoo.com> References: <20041021063330.34212.qmail@web14926.mail.yahoo.com> Message-ID: <20041021180546.780f3090.sfr@canb.auug.org.au> On Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Wjeeha Tahir wrote: > > but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, > "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and ^^^^ This should be vda2 ... Linux did boot, it just could not find its root file system ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/d0a03011/attachment.pgp From wjfast at yahoo.com Thu Oct 21 18:30:41 2004 From: wjfast at yahoo.com (Wjeeha Tahir) Date: Thu, 21 Oct 2004 01:30:41 -0700 (PDT) Subject: Booting Linux from HardDisk on iSeries In-Reply-To: <20041021180546.780f3090.sfr@canb.auug.org.au> Message-ID: <20041021083041.82774.qmail@web14921.mail.yahoo.com> I changed to vda2 but now Linux isnt booting at all. When the console connects to iSreies then the screen is blank. The errors that were being given initially are not appearing now. I am totally stuck. Please help in this regard. Stephen Rothwell wrote: On Wed, 20 Oct 2004 23:33:30 -0700 (PDT) Wjeeha Tahir wrote: > > but when I booted from "IPL Source" = *NWSSTG ,"Stream file" = *NONE, > "IPL parameters" = 'root=/dev/iseries/vda1," , the Linux doesnt boot and ^^^^ This should be vda2 ... Linux did boot, it just could not find its root file system ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ > ATTACHMENT part 2 application/pgp-signature __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20041021/e28b492e/attachment.htm From jbglaw at lug-owl.de Thu Oct 21 18:47:29 2004 From: jbglaw at lug-owl.de (Jan-Benedict Glaw) Date: Thu, 21 Oct 2004 10:47:29 +0200 Subject: [discuss] Re: [PATCH] Add key management syscalls to non-i386 archs In-Reply-To: <20041020160450.0914270b.davem@davemloft.net> References: <3506.1098283455@redhat.com> <20041020150149.7be06d6d.davem@davemloft.net> <20041020225625.GD995@wotan.suse.de> <20041020160450.0914270b.davem@davemloft.net> Message-ID: <20041021084728.GA5033@lug-owl.de> On Wed, 2004-10-20 16:04:50 -0700, David S. Miller wrote in message <20041020160450.0914270b.davem at davemloft.net>: > On Thu, 21 Oct 2004 00:56:25 +0200 > Andi Kleen wrote: *VAX hacker's hat on* > I disagree quite strongly. One major frustration