From SVakkalankarao at covansys.com Thu Jun 5 21:38:16 2003 From: SVakkalankarao at covansys.com (VAKKALANKA RAO Sridhar) Date: Thu, 5 Jun 2003 17:08:16 +0530 Subject: Kernel startup problems on p630 Message-ID: <207D6ADFC044A84686D44CA11B297EEA02018A37@chn-ex02.cvns.corp.covansys.com> Hello, I am trying to boot a p630 with a 2.4.21-pre4 kernel. At the console, the output indicates that the "start()" function in "/arch/ppc64/boot/zImage.c" has successfully completed. Immediately after that, I get the output: ...ok rtas at 0x000000003fc88000... done opened /pci at 400000000110 open success opened /pci at 400000000112 open success (translate ok) returning from prom_init The screen freezes for several minutes with this display and the machine simply restarts. When it restarts, it reboots with the originally installed AIX 5.1 operating system. (1) A google search revealed that IBM is aware of this problem for the p670 & p690, but their suggestions were not helpful. (2) The very same binary, "zImage.initrd", works fine for someone who possesses the very same machine (p630). This person had his firmware level at a lower level than mine. Yet, when I requested our IBM support to restore the older firmware, they refused stating that it wouldn't help at all and that there was no way to access old firmware levels. Can anyone give me some advise? Thanks Sri ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From engebret at vnet.ibm.com Thu Jun 5 23:50:01 2003 From: engebret at vnet.ibm.com (Dave Engebretsen) Date: Thu, 05 Jun 2003 08:50:01 -0500 Subject: Kernel startup problems on p630 References: <207D6ADFC044A84686D44CA11B297EEA02018A37@chn-ex02.cvns.corp.covansys.com> Message-ID: <3EDF4A89.4E61C740@vnet.ibm.com> Hi - VAKKALANKA RAO Sridhar wrote: > > Hello, > > I am trying to boot a p630 with a 2.4.21-pre4 kernel. At the console, the output indicates that the "start()" function in "/arch/ppc64/boot/zImage.c" has successfully completed. Immediately after that, I get the output: ... > (translate ok) returning from prom_init > > The screen freezes for several minutes with this display and the machine simply restarts. When it restarts, it reboots with the originally installed AIX 5.1 operating system. > > (1) A google search revealed that IBM is aware of this problem for the p670 & p690, but their suggestions were not helpful. The point in the boot process where you see this message is a difficult spot to draw any conclusions on what the actual error is. There is quite a bit of code that runs after this message, and before the next one. Therefore it is not clear to me what might be occuring. The only problem I am aware of on a p630 involving firmware levels requires you to be running in LPAR mode, and I do not belive this would be the symptom. > (2) The very same binary, "zImage.initrd", works fine for someone who possesses the very same machine (p630). This person had his firmware level at a lower level than mine. Yet, when I requested our IBM support to restore the older firmware, they refused stating that it wouldn't help at all and that there was no way to access old firmware levels. > > Can anyone give me some advise? We are presently resyncing up to the most recent 2.4.21 level; hopefully that will be done in a day or two. We can make sure to test this port on a p630 to see that that platform nominally works. If there is a firmware level issue here, this may not help, but at least we will have a more recent code base to debug from. Dave. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mlyons at austin.ibm.com Fri Jun 6 09:21:27 2003 From: mlyons at austin.ibm.com (Mike Lyons) Date: Thu, 05 Jun 2003 18:21:27 -0500 Subject: Kernel startup problems on p630 In-Reply-To: Dave Engebretsen's message of Thu, 05 Jun 2003 08:50:01 CDT.<3EDF4A89.4E61C740@vnet.ibm.com> Message-ID: <200306052321.SAA37910@jigsaw.austin.ibm.com> Dave Engebretsen wrote: > >The point in the boot process where you see this message is a difficult >spot to draw any conclusions on what the actual error is. There is >quite a bit of code that runs after this message, and before the next >one. Therefore it is not clear to me what might be occuring. The only >problem I am aware of on a p630 involving firmware levels requires you >to be running in LPAR mode, and I do not belive this would be the >symptom. > >> Can anyone give me some advise? I agree with Dave the known problem involving interaction between firmware level and the p630 would not show this symptom (that problem occurs further on during SCSI DMA). What level of firmware are you running? Is there anything in the service processor error logs? ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From SVakkalankarao at covansys.com Fri Jun 6 14:47:16 2003 From: SVakkalankarao at covansys.com (VAKKALANKA RAO Sridhar) Date: Fri, 6 Jun 2003 10:17:16 +0530 Subject: Kernel startup problems on p630 Message-ID: <207D6ADFC044A84686D44CA11B297EEA02018A39@chn-ex02.cvns.corp.covansys.com> > > The point in the boot process where you see this message is a difficult > > spot to draw any conclusions on what the actual error is. There is > > quite a bit of code that runs after this message, and before the next > > one. Therefore it is not clear to me what might be occuring. The only > > problem I am aware of on a p630 involving firmware levels requires you > > to be running in LPAR mode, and I do not belive this would be the > > symptom. I am sorry, Dave. Please elaborate what you said. By default, all POWER4 machines (which the p630 is) have LPAR implemented on them. So I assumed that my machine was indeed running in LPAR mode - I am not aware that a p630 can be run without LPAR mode. Are you saying that the firmware needs to be configured to enable LPAR mode? Please reply. > What level of firmware are you running? Is there anything in the > service processor error logs? The firmware level I am running is RR030324. The firmware level on which the binary ran successfully is RR021114. Thanks again for your help. Sri ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From david at gibson.dropbear.id.au Fri Jun 6 16:30:51 2003 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 6 Jun 2003 16:30:51 +1000 Subject: hugetlbfs for ppc64 (POWER4) Message-ID: <20030606063051.GA10910@zax> The patch attached (also up on the LTC patch repository) adds hugetlbfs support to ppc64, using the 16MB pages available on Power4 hardware. The seems to more-or-less work, although it has some limitations and is only lightly tested so far. The patch is against 2.5.70. Known problems/limitations: - as yet untested for 64-bit user programs - hugepages are marked not-executable, if we change that we'll need to ensure that we have dcache/icache flushes where necessary - a fixed virtual range is used for mapping hugepages: 2-3G in 32-bit processes and 1-1.5T in 64-bit processes. (removing this restriction entirely would be extremely difficult because of the PPC's segmented memory model, but we may be able to add a little more flexibility here). - attempting to allocate more hugepages than are available will cause a kernel crash. I'm working on this now, and I suspect it will be easily fixed. - testing, testing testing -- David Gibson | For every complex problem there is a david at gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson -------------- next part -------------- diff -urN /scratch/anton/export/arch/ppc64/Kconfig linux-gogogo/arch/ppc64/Kconfig --- /scratch/anton/export/arch/ppc64/Kconfig 2003-06-03 17:49:13.000000000 +1000 +++ linux-gogogo/arch/ppc64/Kconfig 2003-06-04 12:41:52.000000000 +1000 @@ -69,6 +69,17 @@ bool default y +config HUGETLB_PAGE + bool "Huge TLB Page Support" + help + This enables support for huge pages. User space applications + can make use of this support with the sys_alloc_hugepages and + sys_free_hugepages system calls. If your applications are + huge page aware and your processor supports this (only POWER4, + then say Y here. + + Otherwise, say N. + config SMP bool "Symmetric multi-processing support" ---help--- diff -urN /scratch/anton/export/arch/ppc64/kernel/htab.c linux-gogogo/arch/ppc64/kernel/htab.c --- /scratch/anton/export/arch/ppc64/kernel/htab.c 2003-04-24 18:54:33.000000000 +1000 +++ linux-gogogo/arch/ppc64/kernel/htab.c 2003-06-03 15:44:09.000000000 +1000 @@ -195,7 +195,7 @@ if (!pgd_none(*pg)) { pm = pmd_offset(pg, ea); - if (!pmd_none(*pm)) { + if (pmd_present(*pm)) { pt = pte_offset_kernel(pm, ea); pte = *pt; if (!pte_present(pte)) @@ -431,8 +431,12 @@ if (user_region && (mm->cpu_vm_mask == (1 << smp_processor_id()))) local = 1; - ptep = find_linux_pte(pgdir, ea); - ret = __hash_page(ea, access, vsid, ptep, trap, local); + ret = hash_huge_page(mm, access, ea, vsid, local); + if (ret < 0) { + ptep = find_linux_pte(pgdir, ea); + ret = __hash_page(ea, access, vsid, ptep, trap, local); + } + spin_unlock(&mm->page_table_lock); return ret; diff -urN /scratch/anton/export/arch/ppc64/kernel/stab.c linux-gogogo/arch/ppc64/kernel/stab.c --- /scratch/anton/export/arch/ppc64/kernel/stab.c 2003-05-06 07:49:37.000000000 +1000 +++ linux-gogogo/arch/ppc64/kernel/stab.c 2003-06-06 14:58:26.000000000 +1000 @@ -204,6 +204,12 @@ vsid_data.data.kp = 1; if (large) vsid_data.data.l = 1; + /* FIXME: hack alert! we make user hugepages noexec to + * sidestep icache/dcache coherence issues for now. We should + * fix this properly. */ + if (large && + (REGION_ID(esid << SID_SHIFT) == USER_REGION_ID)) + vsid_data.data.n = 1; if (kernel_segment) vsid_data.data.c = 1; @@ -220,7 +226,7 @@ } static inline void __ste_allocate(unsigned long esid, unsigned long vsid, - int kernel_segment) + int kernel_segment, int hugepage) { if (cpu_has_slb()) { #ifndef CONFIG_PPC_ISERIES @@ -228,7 +234,7 @@ make_slbe(esid, vsid, 1, kernel_segment); else #endif - make_slbe(esid, vsid, 0, kernel_segment); + make_slbe(esid, vsid, hugepage, kernel_segment); } else { unsigned char top_entry, stab_entry, *segments; @@ -254,6 +260,7 @@ { unsigned long vsid, esid; int kernel_segment = 0; + int hugepage = 0; PMC_SW_PROCESSOR(stab_faults); @@ -271,10 +278,12 @@ vsid = get_vsid(mm->context, ea); else return 1; + + hugepage = in_hugepage_area(mm->context, ea); } esid = GET_ESID(ea); - __ste_allocate(esid, vsid, kernel_segment); + __ste_allocate(esid, vsid, kernel_segment, hugepage); if (!cpu_has_slb()) { /* Order update */ asm volatile("sync":::"memory"); @@ -301,7 +310,8 @@ for (esid = 0; esid < 16; esid++) { unsigned long ea = esid << SID_SHIFT; vsid = get_vsid(mm->context, ea); - __ste_allocate(esid, vsid, 0); + __ste_allocate(esid, vsid, + in_hugepage_area(mm->context, ea), 0); } } else { unsigned long pc = KSTK_EIP(tsk); @@ -310,12 +320,17 @@ unsigned long stack_segment = stack & ~SID_MASK; unsigned long vsid; + BUG_ON(in_hugepage_area(mm->context, pc)); + BUG_ON(in_hugepage_area(mm->context, stack)); + /* FIXME: Should we try to deal with the case where pc + * or stack is hugepage? */ + if (pc) { if (!IS_VALID_EA(pc) || (REGION_ID(pc) >= KERNEL_REGION_ID)) return; vsid = get_vsid(mm->context, pc); - __ste_allocate(GET_ESID(pc), vsid, 0); + __ste_allocate(GET_ESID(pc), vsid, 0, 0); } if (stack && (pc_segment != stack_segment)) { @@ -323,7 +338,7 @@ (REGION_ID(stack) >= KERNEL_REGION_ID)) return; vsid = get_vsid(mm->context, stack); - __ste_allocate(GET_ESID(stack), vsid, 0); + __ste_allocate(GET_ESID(stack), vsid, 0, 0); } } diff -urN /scratch/anton/export/arch/ppc64/mm/Makefile linux-gogogo/arch/ppc64/mm/Makefile --- /scratch/anton/export/arch/ppc64/mm/Makefile 2003-02-13 00:02:23.000000000 +1100 +++ linux-gogogo/arch/ppc64/mm/Makefile 2003-06-03 15:44:09.000000000 +1000 @@ -6,3 +6,4 @@ obj-y := fault.o init.o extable.o imalloc.o obj-$(CONFIG_DISCONTIGMEM) += numa.o +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o diff -urN /scratch/anton/export/arch/ppc64/mm/hugetlbpage.c linux-gogogo/arch/ppc64/mm/hugetlbpage.c --- /scratch/anton/export/arch/ppc64/mm/hugetlbpage.c Thu Jan 01 10:00:00 1970 +++ linux-gogogo/arch/ppc64/mm/hugetlbpage.c Fri Jun 06 16:01:47 2003 @@ -0,0 +1,744 @@ +/* + * PPC64 (POWER4) Huge TLB Page Support for Kernel. + * + * Copyright (C) 2003 David Gibson, IBM Corporation. + * + * Based on the IA-32 version: + * Copyright (C) 2002, Rohit Seth + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +static long htlbpagemem; +int htlbpage_max; +static long htlbzone_pages; + +static LIST_HEAD(htlbpage_freelist); +static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED; + +/* HugePTE layout: + * + * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 + * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH RW - HG=1 + */ + +#define HUGEPTE_SHIFT 15 +#define _HUGEPAGE_PFN 0xffff8000 +#define _HUGEPAGE_BAD 0x00007f00 +#define _HUGEPAGE_HASHPTE 0x00000008 +#define _HUGEPAGE_SECONDARY 0x00000010 +#define _HUGEPAGE_GROUP_IX 0x000000e0 +#define _HUGEPAGE_HPTEFLAGS (_HUGEPAGE_HASHPTE | _HUGEPAGE_SECONDARY | \ + _HUGEPAGE_GROUP_IX) +#define _HUGEPAGE_RW 0x00000004 + +typedef struct {unsigned int val;} hugepte_t; +#define hugepte_val(hugepte) ((hugepte).val) +#define __hugepte(x) ((hugepte_t) { (x) } ) +#define hugepte_pfn(x) \ + ((unsigned long)(hugepte_val(x)>>HUGEPTE_SHIFT) << HUGETLB_PAGE_ORDER) +#define mk_hugepte(page,wr) __hugepte( \ + ((page_to_pfn(page)>>HUGETLB_PAGE_ORDER) << HUGEPTE_SHIFT ) \ + | (!!(wr) * _HUGEPAGE_RW) | _PMD_HUGEPAGE ) + +#define hugepte_bad(x) ( !(hugepte_val(x) & _PMD_HUGEPAGE) || \ + (hugepte_val(x) & _HUGEPAGE_BAD) ) +#define hugepte_page(x) pfn_to_page(hugepte_pfn(x)) +#define hugepte_none(x) (!(hugepte_val(x) & _HUGEPAGE_PFN)) + + +static void free_huge_page(struct page *page); +static void flush_hash_hugepage(mm_context_t context, unsigned long ea, + hugepte_t pte, int local); + +static inline unsigned int hugepte_update(hugepte_t *p, unsigned int clr, + unsigned int set) +{ + unsigned int old, tmp; + + __asm__ __volatile__( + "1: lwarx %0,0,%3 # pte_update\n\ + andc %1,%0,%4 \n\ + or %1,%1,%5 \n\ + stwcx. %1,0,%3 \n\ + bne- 1b" + : "=&r" (old), "=&r" (tmp), "=m" (*p) + : "r" (p), "r" (clr), "r" (set), "m" (*p) + : "cc" ); + return old; +} + +static inline void set_hugepte(hugepte_t *ptep, hugepte_t pte) +{ + hugepte_update(ptep, ~_HUGEPAGE_HPTEFLAGS, + hugepte_val(pte) & ~_HUGEPAGE_HPTEFLAGS); +} + +static struct page *alloc_hugetlb_page(void) +{ + int i; + struct page *page; + + spin_lock(&htlbpage_lock); + if (list_empty(&htlbpage_freelist)) { + spin_unlock(&htlbpage_lock); + return NULL; + } + + page = list_entry(htlbpage_freelist.next, struct page, list); + list_del(&page->list); + htlbpagemem--; + spin_unlock(&htlbpage_lock); + set_page_count(page, 1); + page->lru.prev = (void *)free_huge_page; + for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) + clear_highpage(&page[i]); + return page; +} + +static hugepte_t *hugepte_alloc(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pmd_t *pmd = NULL; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + pgd = pgd_offset(mm, addr); + pmd = pmd_alloc(mm, pgd, addr); + + /* We shouldn't find a (normal) PTE page pointer here */ + BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); + + return (hugepte_t *) pmd; +} + +static hugepte_t *hugepte_offset(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + hugepte_t *hugepte = NULL; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + pgd = pgd_offset(mm, addr); + hugepte = (hugepte_t *)pmd_offset(pgd, addr); + + BUG_ON(hugepte_bad(*hugepte)); + + return hugepte; +} + +static void setup_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, + struct page *page, hugepte_t *ptep, + int write_access) +{ + hugepte_t entry; + int i; + + mm->rss += (HPAGE_SIZE / PAGE_SIZE); + entry = mk_hugepte(page, write_access); + for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) + set_hugepte(ptep+i, entry); +} + +static void teardown_huge_pte(hugepte_t *ptep) +{ + int i; + + for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) + pmd_clear((pmd_t *)(ptep+i)); +} + +/* + * This function checks for proper alignment of input addr and len parameters. + */ +int is_aligned_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + if (! is_hugepage_only_range(addr, len)) + return -EINVAL; + return 0; +} + +int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma) +{ + hugepte_t *src_pte, *dst_pte, entry; + struct page *ptepage; + unsigned long addr = vma->vm_start; + unsigned long end = vma->vm_end; + + while (addr < end) { + BUG_ON(! in_hugepage_area(src->context, addr)); + BUG_ON(! in_hugepage_area(dst->context, addr)); + + dst_pte = hugepte_alloc(dst, addr); + if (!dst_pte) + return -ENOMEM; + + src_pte = hugepte_offset(src, addr); + entry = *src_pte; + + if ((addr % HPAGE_SIZE) == 0) { + /* First hugepte in the batch referring to + * this page */ + ptepage = hugepte_page(entry); + get_page(ptepage); + dst->rss += (HPAGE_SIZE / PAGE_SIZE); + } + set_hugepte(dst_pte, entry); + + + addr += PMD_SIZE; + } + return 0; +} + +int +follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i) +{ + unsigned long vpfn, vaddr = *position; + int remainder = *length; + + WARN_ON(!is_vm_hugetlb_page(vma)); + + vpfn = vaddr/PAGE_SIZE; + while (vaddr < vma->vm_end && remainder) { + BUG_ON(!in_hugepage_area(mm->context, vaddr)); + + if (pages) { + hugepte_t *pte; + struct page *page; + + pte = hugepte_offset(mm, vaddr); + + /* hugetlb should be locked, and hence, prefaulted */ + WARN_ON(!pte || hugepte_none(*pte)); + + page = &hugepte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; + + WARN_ON(!PageCompound(page)); + + get_page(page); + pages[i] = page; + } + + if (vmas) + vmas[i] = vma; + + vaddr += PAGE_SIZE; + ++vpfn; + --remainder; + ++i; + } + + *length = remainder; + *position = vaddr; + + return i; +} + +struct page * +follow_huge_addr(struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, int write) +{ + return NULL; +} + +struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr) +{ + return NULL; +} + +int pmd_huge(pmd_t pmd) +{ + return pmd_hugepage(pmd); +} + +struct page * +follow_huge_pmd(struct mm_struct *mm, unsigned long address, + pmd_t *pmd, int write) +{ + struct page *page; + + BUG_ON(! pmd_hugepage(*pmd)); + + page = hugepte_page(*(hugepte_t *)pmd); + if (page) { + page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT); + get_page(page); + } + return page; +} + +static void free_huge_page(struct page *page) +{ + BUG_ON(page_count(page)); + BUG_ON(page->mapping); + + INIT_LIST_HEAD(&page->list); + + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + htlbpagemem++; + spin_unlock(&htlbpage_lock); +} + +void huge_page_release(struct page *page) +{ + if (!put_page_testzero(page)) + return; + + free_huge_page(page); +} + +void unmap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned long addr; + hugepte_t *ptep; + struct page *page; + int local = 0; + + BUG_ON((start % HPAGE_SIZE) != 0); + BUG_ON((end % HPAGE_SIZE) != 0); + BUG_ON(!in_hugepage_area(mm->context, start)); + BUG_ON(!in_hugepage_area(mm->context, end)); + + /* XXX are there races with checking cpu_vm_mask? - Anton */ + if (vma->vm_mm->cpu_vm_mask == (1 << smp_processor_id())) + local = 1; + + for (addr = start; addr < end; addr += HPAGE_SIZE) { + hugepte_t pte; + + ptep = hugepte_offset(mm, addr); + if (!ptep || hugepte_none(*ptep)) + continue; + + pte = *ptep; + page = hugepte_page(pte); + teardown_huge_pte(ptep); + + if (hugepte_val(pte) & _HUGEPAGE_HASHPTE) + flush_hash_hugepage(mm->context, addr, + pte, local); + + huge_page_release(page); + } + + mm->rss -= (end - start) >> PAGE_SHIFT; +} + +void +zap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long length) +{ + struct mm_struct *mm = vma->vm_mm; + spin_lock(&mm->page_table_lock); + unmap_hugepage_range(vma, start, start + length); + spin_unlock(&mm->page_table_lock); +} + +int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) +{ + struct mm_struct *mm = current->mm; + unsigned long addr; + int ret = 0; + + BUG_ON(vma->vm_start & ~HPAGE_MASK); + BUG_ON(vma->vm_end & ~HPAGE_MASK); + + spin_lock(&mm->page_table_lock); + for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { + unsigned long idx; + hugepte_t *pte = hugepte_alloc(mm, addr); + struct page *page; + + if (!pte) { + ret = -ENOMEM; + goto out; + } + if (!hugepte_none(*pte)) + continue; + + idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) + + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); + page = find_get_page(mapping, idx); + if (!page) { + page = alloc_hugetlb_page(); + if (!page) { + ret = -ENOMEM; + goto out; + } + ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); + unlock_page(page); + if (ret) { + free_huge_page(page); + goto out; + } + } + setup_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE); + } +out: + spin_unlock(&mm->page_table_lock); + return ret; +} + +unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + struct vm_area_struct *vma; + + if (len & ~HPAGE_MASK) + return -EINVAL; + + if (!in_hugepage_area(current->mm->context, addr) + || (addr & (HPAGE_SIZE - 1))) + addr = TASK_HPAGE_BASE; + + for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) { + /* At this point: (!vma || addr < vma->vm_end). */ + if (addr + len > TASK_HPAGE_END) + return -ENOMEM; + if (!vma || (addr + len) <= vma->vm_start) + return addr; + addr = ALIGN(vma->vm_end, HPAGE_SIZE); + + /* Because we're in an exclusively hugepage region, + * this alignment shouldn't have skipped over any + * other vmas */ + } +} + +static inline unsigned long computeHugeHptePP(unsigned int hugepte) +{ + unsigned long flags = 0x2; + + if (! (hugepte & _HUGEPAGE_RW)) + flags |= 0x1; + return flags; +} + +int hash_huge_page(struct mm_struct *mm, unsigned long access, + unsigned long ea, unsigned long vsid, int local) +{ + hugepte_t *ptep; + unsigned long va, vpn; + int is_write; + hugepte_t old_pte, new_pte; + unsigned long hpteflags, prpn; + long slot; + + /* Is this for us? */ + if (!in_hugepage_area(mm->context, ea)) + return -1; + + /* We have to find the first hugepte in the batch, since + * that's the one that will store the HPTE flags */ + ptep = hugepte_offset(mm, ea & ~(HPAGE_SIZE-1)); + + /* Search the Linux page table for a match with va */ + va = (vsid << 28) | (ea & 0x0fffffff); + vpn = va >> HPAGE_SHIFT; + + BUG_ON(hugepte_bad(*ptep)); + + /* + * If no pte found or not present, send the problem up to + * do_page_fault + */ + if (unlikely(!ptep || hugepte_none(*ptep))) + return 1; + + /* + * Check the user's access rights to the page. If access should be + * prevented then send the problem up to do_page_fault. + */ + is_write = access & _PAGE_RW; + if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW))) + return 1; + + /* + * At this point, we have a pte (old_pte) which can be used to build + * or update an HPTE. There are 2 cases: + * + * 1. There is a valid (present) pte with no associated HPTE (this is + * the most common case) + * 2. There is a valid (present) pte with an associated HPTE. The + * current values of the pp bits in the HPTE prevent access + * because we are doing software DIRTY bit management and the + * page is currently not DIRTY. + */ + + old_pte = *ptep; + new_pte = old_pte; + + hpteflags = computeHugeHptePP(hugepte_val(new_pte)); + + /* Check if pte already has an hpte (case 2) */ + if (unlikely(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE)) { + /* There MIGHT be an HPTE for this pte */ + unsigned long hash, slot; + + hash = hpt_hash(vpn, 1); + if (hugepte_val(old_pte) & _HUGEPAGE_SECONDARY) + hash = ~hash; + slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot += (hugepte_val(old_pte) & _HUGEPAGE_GROUP_IX) >> 5; + + if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1) + hugepte_val(old_pte) &= ~_HUGEPAGE_HPTEFLAGS; + } + + if (likely(!(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE))) { + unsigned long hash = hpt_hash(vpn, 1); + unsigned long hpte_group; + + prpn = hugepte_pfn(old_pte); + +repeat: + hpte_group = ((hash & htab_data.htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + + /* Update the linux pte with the HPTE slot */ + hugepte_val(new_pte) &= ~_HUGEPAGE_HPTEFLAGS; + hugepte_val(new_pte) |= _HUGEPAGE_HASHPTE; + + slot = ppc_md.hpte_insert(hpte_group, va, prpn, 0, + hpteflags, 0, 1); + + /* Primary is full, try the secondary */ + if (unlikely(slot == -1)) { + hugepte_val(new_pte) |= _HUGEPAGE_SECONDARY; + hpte_group = ((~hash & htab_data.htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + slot = ppc_md.hpte_insert(hpte_group, va, prpn, + 1, hpteflags, 0, 1); + if (slot == -1) { + if (mftb() & 0x1) + hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; + + ppc_md.hpte_remove(hpte_group); + goto repeat; + } + } + + if (unlikely(slot == -2)) + panic("hash_page: pte_insert failed\n"); + + hugepte_val(new_pte) |= (slot<<5) & _HUGEPAGE_GROUP_IX; + + /* + * No need to use ldarx/stdcx here because all who + * might be updating the pte will hold the + * page_table_lock or the hash_table_lock + * (we hold both) + */ + *ptep = new_pte; + } + + return 0; +} + +static void flush_hash_hugepage(mm_context_t context, unsigned long ea, + hugepte_t pte, int local) +{ + unsigned long vsid, vpn, va, hash, secondary, slot; + + BUG_ON(hugepte_bad(pte)); + BUG_ON(!in_hugepage_area(context, ea)); + + vsid = get_vsid(context, ea); + + va = (vsid << 28) | (ea & 0x0fffffff); + vpn = va >> LARGE_PAGE_SHIFT; + hash = hpt_hash(vpn, 1); + secondary = !!(hugepte_val(pte) & _HUGEPAGE_SECONDARY); + if (secondary) + hash = ~hash; + slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot += (hugepte_val(pte) & _HUGEPAGE_GROUP_IX) >> 5; + + ppc_md.hpte_invalidate(slot, va, 1, local); +} + +static void update_and_free_page(struct page *page) +{ + int j; + struct page *map; + + map = page; + htlbzone_pages--; + for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) { + map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | + 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved | + 1 << PG_private | 1<< PG_writeback); + set_page_count(map, 0); + map++; + } + set_page_count(page, 1); + __free_pages(page, HUGETLB_PAGE_ORDER); +} + +static int try_to_free_low(int count) +{ + struct list_head *p; + struct page *page, *map; + + map = NULL; + spin_lock(&htlbpage_lock); + list_for_each(p, &htlbpage_freelist) { + if (map) { + list_del(&map->list); + update_and_free_page(map); + htlbpagemem--; + map = NULL; + if (++count == 0) + break; + } + page = list_entry(p, struct page, list); + if (!PageHighMem(page)) + map = page; + } + if (map) { + list_del(&map->list); + update_and_free_page(map); + htlbpagemem--; + count++; + } + spin_unlock(&htlbpage_lock); + return count; +} + +int set_hugetlb_mem_size(int count) +{ + int lcount; + struct page *page; + extern long htlbzone_pages; + extern struct list_head htlbpage_freelist; + + if (count < 0) + lcount = count; + else + lcount = count - htlbzone_pages; + + if (lcount == 0) + return (int)htlbzone_pages; + if (lcount > 0) { /* Increase the mem size. */ + while (lcount--) { + page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); + if (page == NULL) + break; + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + htlbpagemem++; + htlbzone_pages++; + spin_unlock(&htlbpage_lock); + } + return (int) htlbzone_pages; + } + /* Shrink the memory size. */ + lcount = try_to_free_low(lcount); + while (lcount++) { + page = alloc_hugetlb_page(); + if (page == NULL) + break; + spin_lock(&htlbpage_lock); + update_and_free_page(page); + spin_unlock(&htlbpage_lock); + } + return (int) htlbzone_pages; +} + +int hugetlb_sysctl_handler(ctl_table *table, int write, + struct file *file, void *buffer, size_t *length) +{ + proc_dointvec(table, write, file, buffer, length); + htlbpage_max = set_hugetlb_mem_size(htlbpage_max); + return 0; +} + +static int __init hugetlb_setup(char *s) +{ + if (sscanf(s, "%d", &htlbpage_max) <= 0) + htlbpage_max = 0; + return 1; +} +__setup("hugepages=", hugetlb_setup); + +static int __init hugetlb_init(void) +{ + int i; + struct page *page; + + for (i = 0; i < htlbpage_max; ++i) { + page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); + if (!page) + break; + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + spin_unlock(&htlbpage_lock); + } + htlbpage_max = htlbpagemem = htlbzone_pages = i; + printk("Total HugeTLB memory allocated, %ld\n", htlbpagemem); + return 0; +} +module_init(hugetlb_init); + +int hugetlb_report_meminfo(char *buf) +{ + return sprintf(buf, + "HugePages_Total: %5lu\n" + "HugePages_Free: %5lu\n" + "Hugepagesize: %5lu kB\n", + htlbzone_pages, + htlbpagemem, + HPAGE_SIZE/1024); +} + +int is_hugepage_mem_enough(size_t size) +{ + return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem; +} + +/* + * We cannot handle pagefaults against hugetlb pages at all. They cause + * handle_mm_fault() to try to instantiate regular-sized pages in the + * hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get + * this far. + */ +static struct page *hugetlb_nopage(struct vm_area_struct *vma, + unsigned long address, int unused) +{ + BUG(); + return NULL; +} + +struct vm_operations_struct hugetlb_vm_ops = { + .nopage = hugetlb_nopage, +}; diff -urN /scratch/anton/export/arch/ppc64/mm/init.c linux-gogogo/arch/ppc64/mm/init.c --- /scratch/anton/export/arch/ppc64/mm/init.c 2003-06-04 11:16:24.000000000 +1000 +++ linux-gogogo/arch/ppc64/mm/init.c 2003-06-06 12:51:54.000000000 +1000 @@ -293,7 +293,7 @@ if (!pgd_none(*pgd)) { pmd = pmd_offset(pgd, vmaddr); - if (!pmd_none(*pmd)) { + if (pmd_present(*pmd)) { ptep = pte_offset_kernel(pmd, vmaddr); /* Check if HPTE might exist and flush it if so */ pte = __pte(pte_update(ptep, _PAGE_HPTEFLAGS, 0)); @@ -301,6 +301,7 @@ flush_hash_page(context, vmaddr, pte, local); } } + WARN_ON(pmd_hugepage(*pmd)); } } @@ -349,7 +350,7 @@ pmd_end = (start + PMD_SIZE) & PMD_MASK; if (pmd_end > end) pmd_end = end; - if (!pmd_none(*pmd)) { + if (pmd_present(*pmd)) { ptep = pte_offset_kernel(pmd, start); do { if (pte_val(*ptep) & _PAGE_HASHPTE) { @@ -368,6 +369,7 @@ ++ptep; } while (start < pmd_end); } else { + WARN_ON(pmd_hugepage(*pmd)); start = pmd_end; } ++pmd; diff -urN /scratch/anton/export/include/asm-ppc64/mmu_context.h linux-gogogo/include/asm-ppc64/mmu_context.h --- /scratch/anton/export/include/asm-ppc64/mmu_context.h 2003-02-13 00:02:43.000000000 +1100 +++ linux-gogogo/include/asm-ppc64/mmu_context.h 2003-06-03 15:44:09.000000000 +1000 @@ -36,6 +36,12 @@ #define LAST_USER_CONTEXT 0x8000 /* Same as PID_MAX for now... */ #define NUM_USER_CONTEXT (LAST_USER_CONTEXT-FIRST_USER_CONTEXT) +#ifdef CONFIG_HUGETLB_PAGE +#define CONTEXT_32BIT (1UL<<63) +#else +#define CONTEXT_32BIT 0 +#endif + /* Choose whether we want to implement our context * number allocator as a LIFO or FIFO queue. */ @@ -90,6 +96,8 @@ head = mmu_context_queue.head; mm->context = mmu_context_queue.elements[head]; + if (tsk->thread_info->flags & _TIF_32BIT) + mm->context |= CONTEXT_32BIT; head = (head < LAST_USER_CONTEXT-1) ? head+1 : 0; mmu_context_queue.head = head; @@ -189,6 +197,8 @@ { unsigned long ordinal, vsid; + context &= ~CONTEXT_32BIT; + ordinal = (((ea >> 28) & 0x1fffff) * LAST_USER_CONTEXT) | context; vsid = (ordinal * VSID_RANDOMIZER) & VSID_MASK; diff -urN /scratch/anton/export/include/asm-ppc64/page.h linux-gogogo/include/asm-ppc64/page.h --- /scratch/anton/export/include/asm-ppc64/page.h 2003-04-24 18:54:37.000000000 +1000 +++ linux-gogogo/include/asm-ppc64/page.h 2003-06-03 15:44:09.000000000 +1000 @@ -22,6 +22,40 @@ #define PAGE_MASK (~(PAGE_SIZE-1)) #define PAGE_OFFSET_MASK (PAGE_SIZE-1) +#ifdef CONFIG_HUGETLB_PAGE + +#define HPAGE_SHIFT 24 +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +/* For 64-bit processes the hugepage range is 1T-1.5T */ +#define TASK_HPAGE_BASE_64 (0x0000010000000000UL) +#define TASK_HPAGE_END_64 (0x0000018000000000UL) +/* For 32-bit processes the hugepage range is 2-3G */ +#define TASK_HPAGE_BASE_32 (0x80000000UL) +#define TASK_HPAGE_END_32 (0xc0000000UL) + +#define TASK_HPAGE_BASE (test_thread_flag(TIF_32BIT) ? \ + TASK_HPAGE_BASE_32 : TASK_HPAGE_BASE_64) +#define TASK_HPAGE_END (test_thread_flag(TIF_32BIT) ? \ + TASK_HPAGE_END_32 : TASK_HPAGE_END_64) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define is_hugepage_only_range(addr, len) \ + ((addr > (TASK_HPAGE_BASE-len)) && (addr < TASK_HPAGE_END)) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) (((context) & CONTEXT_32BIT) ? \ + (((addr) >= TASK_HPAGE_BASE_32) && ((addr) < TASK_HPAGE_END_32)) : \ + (((addr) >= TASK_HPAGE_BASE_64) && ((addr) < TASK_HPAGE_END_64))) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ + #define SID_SHIFT 28 #define SID_MASK 0xfffffffff #define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) diff -urN /scratch/anton/export/include/asm-ppc64/pgtable.h linux-gogogo/include/asm-ppc64/pgtable.h --- /scratch/anton/export/include/asm-ppc64/pgtable.h 2003-05-30 01:22:36.000000000 +1000 +++ linux-gogogo/include/asm-ppc64/pgtable.h 2003-06-06 12:52:23.000000000 +1000 @@ -149,6 +149,22 @@ /* shift to put page number into pte */ #define PTE_SHIFT (16) +/* We allow 2^41 bytes of real memory, so we need 29 bits in the PMD + * to give the PTE page number. The bottom two bits are for flags. */ +#define PMD_TO_PTEPAGE_SHIFT (2) +#ifdef CONFIG_HUGETLB_PAGE +#define _PMD_HUGEPAGE 0x00000001U +#define HUGEPTE_BATCH_SIZE (1<<(HPAGE_SHIFT-PMD_SHIFT)) + +int hash_huge_page(struct mm_struct *mm, unsigned long access, + unsigned long ea, unsigned long vsid, int local); +#else + +#define hash_huge_page(mm,a,ea,vsid,local) -1 +#define _PMD_HUGEPAGE 0 + +#endif + #ifndef __ASSEMBLY__ /* @@ -178,12 +194,16 @@ #define pte_pfn(x) ((unsigned long)((pte_val(x) >> PTE_SHIFT))) #define pte_page(x) pfn_to_page(pte_pfn(x)) -#define pmd_set(pmdp, ptep) (pmd_val(*(pmdp)) = (__ba_to_bpn(ptep))) +#define pmd_set(pmdp, ptep) \ + (pmd_val(*(pmdp)) = (__ba_to_bpn(ptep) << PMD_TO_PTEPAGE_SHIFT)) #define pmd_none(pmd) (!pmd_val(pmd)) -#define pmd_bad(pmd) ((pmd_val(pmd)) == 0) -#define pmd_present(pmd) ((pmd_val(pmd)) != 0) +#define pmd_hugepage(pmd) (!!(pmd_val(pmd) & _PMD_HUGEPAGE)) +#define pmd_bad(pmd) (((pmd_val(pmd)) == 0) || pmd_hugepage(pmd)) +#define pmd_present(pmd) ((!pmd_hugepage(pmd)) \ + && (pmd_val(pmd) & ~_PMD_HUGEPAGE) != 0) #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0) -#define pmd_page_kernel(pmd) (__bpn_to_ba(pmd_val(pmd))) +#define pmd_page_kernel(pmd) \ + (__bpn_to_ba(pmd_val(pmd) >> PMD_TO_PTEPAGE_SHIFT)) #define pmd_page(pmd) virt_to_page(pmd_page_kernel(pmd)) #define pgd_set(pgdp, pmdp) (pgd_val(*(pgdp)) = (__ba_to_bpn(pmdp))) #define pgd_none(pgd) (!pgd_val(pgd)) From linas at austin.ibm.com Sat Jun 7 00:45:30 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 6 Jun 2003 09:45:30 -0500 Subject: Kernel startup problems on p630 In-Reply-To: <207D6ADFC044A84686D44CA11B297EEA02018A39@chn-ex02.cvns.corp.covansys.com>; from SVakkalankarao@covansys.com on Fri, Jun 06, 2003 at 10:17:16AM +0530 References: <207D6ADFC044A84686D44CA11B297EEA02018A39@chn-ex02.cvns.corp.covansys.com> Message-ID: <20030606094530.A33958@forte.austin.ibm.com> On Fri, Jun 06, 2003 at 10:17:16AM +0530, VAKKALANKA RAO Sridhar wrote: > > I am sorry, Dave. Please elaborate what you said. By default, all POWER4 machines (which the p630 is) have LPAR implemented on them. So I assumed that my machine was indeed running in LPAR mode - I am not aware that a p630 can be run without LPAR mode. Are you saying that the firmware needs to be configured to enable LPAR mode? Please reply. > There's a non-LPAR mode, commonly referedd to as 'SMP mode', where the LPAR features aren't used. In this case, you can run only one operating system on the thing, and *all* of the hardware belongs to that OS. You get into LPAR mode only if you use 'HMC' the Hardware management Console, the GUI/graphical thing, to create several LPAR's and assign only a fraction of the total hardware to each partition. Some people don't want/need the multiple OS'es, and so don't run in 'LPAR mode'. Sometimes, firmware and/or hardware bugs behave differently depending on which mode you're in. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From boutcher at us.ibm.com Sat Jun 7 00:59:19 2003 From: boutcher at us.ibm.com (David Boutcher) Date: Fri, 6 Jun 2003 09:59:19 -0500 Subject: Kernel startup problems on p630 In-Reply-To: <20030606094530.A33958@forte.austin.ibm.com> Message-ID: On 06/06/2003 09:45 AM linux wrote: > Sometimes, firmware and/or hardware bugs behave differently > depending on which mode you're in. Please be politically correct. "Sometimes, firmware and/or hardware FEATURES behave differently depending on what mode you are in." :-) Dave Boutcher Senior Technical Staff Member IBM PowerPC Linux Development ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 7 01:36:42 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 6 Jun 2003 10:36:42 -0500 Subject: Kernel startup problems on p630 In-Reply-To: ; from boutcher@us.ibm.com on Fri, Jun 06, 2003 at 09:59:19AM -0500 References: <20030606094530.A33958@forte.austin.ibm.com> Message-ID: <20030606103642.B33958@forte.austin.ibm.com> On Fri, Jun 06, 2003 at 09:59:19AM -0500, David Boutcher wrote: > > On 06/06/2003 09:45 AM linux wrote: > > Sometimes, firmware and/or hardware bugs behave differently > > depending on which mode you're in. > > Please be politically correct. "Sometimes, firmware and/or hardware > FEATURES behave differently depending on what mode you are in." :-) Right, right, right, 'You may think that you understood what you thought that I wrote, but what you don't realize is that what you you understood when you read what I wrote is not what I meant.' --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From vince at io.com Sat Jun 7 04:20:59 2003 From: vince at io.com (vince) Date: Fri, 6 Jun 2003 13:20:59 -0500 (CDT) Subject: ? about building 64-bit glibc with UL 1.0 Message-ID: I'm attempting to build a 64-bit gcc compiler on an IBM ppc box (kernel 2.4.19-ul1-ppc64-SMP) that's running United Linux 1.0. I've got binutils and gcc 3.2 source(all source from media) installed and patched. Once I started patching glibc, I became lost(and man, this patching scheme is awful - why can't linux use file versioning?). Anyway, I'm using the cross-ppc64-glibc.spec file as a template for doing the correct patches when I run into: # Install blowfish crypt add-on rm crypt_blowfish-*/crypt.h cp -a crypt_blowfish-*/*.[chS] crypt %patch8 find . -name configure | xargs touch for p in %{S:9} ; do msgattrib -o po/${p##*/} --translated --no-fuzzy $p || echo $p: syntax error done cd ../kernel-headers I don't know what's up with this blowfish stuff so I skipped it. The kernel-headers part looks important so I found those source packages: kernel-headers-2.4.19.tar.bz2 kernel-headers.ppc64.tar.bz2 Where am I supposed to restore these files? Under the glibc-2.2 source directory? Even with it(which would seem correct since the glibc spec file has the "cd ../kernel-headers" in it)? Any help or hints appreciated. I've been using the info on www.linuxppc64.org to get me going but is the 64-bit compiler building process described anywhere else in a clearer fashion? I'm barely linux literate though I did manage to BS my way through building a 32-bit version of gcc that seems to actually work. I'm a member of a build team that has to isolate these gcc compilers so that they run from an isolated tools repository rather than natively from the build machine. Thanks much! -- vince /* Visit the home of the Rancid Tofu Experience */ /* http://www.mp3.com/rancidtofuexperience */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Sat Jun 7 04:52:21 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Fri, 6 Jun 2003 13:52:21 -0500 Subject: ? about building 64-bit glibc with UL 1.0 Message-ID: > Where am I supposed to restore these files? Under the glibc-2.2 source > directory? Even with it(which would seem correct since the glibc spec > file has the "cd ../kernel-headers" in it)? If you are building for that machine, you can use the kernel headers in /usr/include/[asm|linux]. This is the default. If you also plan to build new kernels then you will need separate kernel headers directories. You can put these headers almost anywhere, but you need to tell glibc configure where they are using the --with-headers= option. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Sat Jun 7 05:41:57 2003 From: olh at suse.de (Olaf Hering) Date: Fri, 6 Jun 2003 21:41:57 +0200 Subject: ? about building 64-bit glibc with UL 1.0 In-Reply-To: References: Message-ID: <20030606194157.GC28910@suse.de> On Fri, Jun 06, vince wrote: > Where am I supposed to restore these files? Under the glibc-2.2 source > directory? Even with it(which would seem correct since the glibc spec > file has the "cd ../kernel-headers" in it)? install the glibc.src.rpm rpm -bp /usr/src/packages/SPECS/glibc.rpm Then follow the commands in the %build section. Gruss Olaf -- USB is for mice, FireWire is for men! ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From vince at io.com Sat Jun 7 06:24:14 2003 From: vince at io.com (vince) Date: Fri, 6 Jun 2003 15:24:14 -0500 (CDT) Subject: ? about building 64-bit glibc with UL 1.0 In-Reply-To: Message-ID: On Fri, 6 Jun 2003, Steve Munroe wrote: > > > Where am I supposed to restore these files? Under the glibc-2.2 source > > directory? Even with it(which would seem correct since the glibc spec > > file has the "cd ../kernel-headers" in it)? > > If you are building for that machine, you can use the kernel headers in > /usr/include/[asm|linux]. This is the default. > > If you also plan to build new kernels then you will need separate kernel > headers directories. You can put these headers almost anywhere, but you > need to tell glibc configure where they are using the --with-headers= > option. Thanks for the reply, Steve. I think I was coming to a similar conlusion thinking about this at lunch. Since I'm building on the same machine I'm targeting, I should be able to use existing 64-bit glibc headers/libs already installed on the machine. This is what I did with the 32-bit compiler. All I had to do was change the target to powerpc-unknown-linux-gnu (rather than powerppc64-unknown-linux-gnu) and build binutils then gcc. The only problem I had was figuring out how to tell the gcc config step where to get these files from(then it copied them out to my target dir). Using --with-headers/libs took care of that. If I didn't do this, the gcc compile failed because it couldn't find crti.0. Let's see how far I get by leaving glibc out of the equation. Thanks again! -- vince /* Visit the home of the Rancid Tofu Experience */ /* http://www.mp3.com/rancidtofuexperience */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 7 07:30:45 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 6 Jun 2003 16:30:45 -0500 Subject: ? about building 64-bit glibc with UL 1.0 In-Reply-To: ; from vince@io.com on Fri, Jun 06, 2003 at 03:24:14PM -0500 References: Message-ID: <20030606163045.B40380@forte.austin.ibm.com> On Fri, Jun 06, 2003 at 03:24:14PM -0500, vince wrote: > > need to tell glibc configure where they are using the --with-headers= > > option. > > Thanks for the reply, Steve. I think I was coming to a similar conlusion > thinking about this at lunch. Since I'm building on the same machine > I'm targeting, I should be able to use existing 64-bit glibc headers/libs > already installed on the machine. This is what I did with the 32-bit N.B. that this can simplify things if you are going for the quick-n-dirty build, but can result in a whole lot of confusion if, for any reason, there are differeences between the system default files and the files you needed for the actual target architecture. Chances are 99% that you won't ever trip over one of these subtle differences, but if (when?) you do, you'll have a classic hard-to-recreate, unreproducible bug, i.e. one that only affects you but no one else. Typically these are followed by tense, edgy emails and veiled remarks to the effect of 'xxx sucks' where xxx is linux, gcc, or whatever you have it in for on that given day ... see already, for example, comments on patches vs. file versioning ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 7 08:02:21 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 6 Jun 2003 17:02:21 -0500 Subject: RPC/XDR experience? Message-ID: <20030606170221.A36622@forte.austin.ibm.com> Anyone out there with experience with RPC on 64 bit machines? I've got a 'hello-world' RPC program that falls apart when compiled & linked to 64-bit glibc on PPC64. It doesn't like long ints. I'm getting the feeling that I may be opening a pandora's box; I'm hoping that someone will reply and state that no, 64-bit RPC's work great on alphas and sparcs and mips, and not to worry ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Sat Jun 7 08:15:09 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Fri, 6 Jun 2003 17:15:09 -0500 Subject: RPC/XDR experience? Message-ID: > Anyone out there with experience with RPC on 64 bit machines? "sunrpc" is part of the PPC64 glibc build but "make check" does not seem to generate any tests for sunrpc. The pSeries test folks did write some RPC tests and get them to work for PPC64 on Suse SLES 8. They may not have tried long int ... ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Sat Jun 7 08:17:28 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 06 Jun 2003 17:17:28 -0500 Subject: RPC/XDR experience? In-Reply-To: <20030606170221.A36622@forte.austin.ibm.com> References: <20030606170221.A36622@forte.austin.ibm.com> Message-ID: <3EE112F8.7070707@austin.ibm.com> linas at austin.ibm.com wrote: > Anyone out there with experience with RPC on 64 bit machines? > > I've got a 'hello-world' RPC program that falls apart when > compiled & linked to 64-bit glibc on PPC64. It doesn't like > long ints. I'm getting the feeling that I may be opening a > pandora's box; I'm hoping that someone will reply and state > that no, 64-bit RPC's work great on alphas and sparcs and mips, > and not to worry ... Linas, As far as I can tell it works just fine. Notice however, that the implementation of xdr_long WILL fail for cases where it's found that the 64-bit long contains larger (or smaller, for negative) values than can be stored in the 32-bit over-the-wire long. This might give surprising results when you're using xdr_long to encode an unsigned integer. See code snippet: bool_t xdr_long (XDR *xdrs, long *lp) { if (xdrs->x_op == XDR_ENCODE && (sizeof (int32_t) == sizeof (long) || (int32_t) *lp == *lp)) // <-- HERE is the check return XDR_PUTLONG (xdrs, lp); ... return FALSE; ... } -Olof -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Sat Jun 7 08:24:55 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 06 Jun 2003 17:24:55 -0500 Subject: RPC/XDR experience? In-Reply-To: <3EE112F8.7070707@austin.ibm.com> References: <20030606170221.A36622@forte.austin.ibm.com> <3EE112F8.7070707@austin.ibm.com> Message-ID: <3EE114B7.4010105@austin.ibm.com> Olof Johansson wrote: > This might give surprising results when you're using xdr_long to encode an > unsigned integer. This should (of course) read "unsigned long", not "unsigned integer". -Olof -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 7 08:39:55 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 6 Jun 2003 17:39:55 -0500 Subject: RPC/XDR experience? In-Reply-To: <3EE112F8.7070707@austin.ibm.com>; from olof@austin.ibm.com on Fri, Jun 06, 2003 at 05:17:28PM -0500 References: <20030606170221.A36622@forte.austin.ibm.com> <3EE112F8.7070707@austin.ibm.com> Message-ID: <20030606173954.B36624@forte.austin.ibm.com> On Fri, Jun 06, 2003 at 05:17:28PM -0500, Olof Johansson wrote: > linas at austin.ibm.com wrote: > > Anyone out there with experience with RPC on 64 bit machines? > > > > I've got a 'hello-world' RPC program that falls apart when > > compiled & linked to 64-bit glibc on PPC64. It doesn't like > > long ints. I'm getting the feeling that I may be opening a > > pandora's box; I'm hoping that someone will reply and state > > that no, 64-bit RPC's work great on alphas and sparcs and mips, > > and not to worry ... > > Notice however, that the implementation of xdr_long WILL fail for cases where it's found that the > 64-bit long contains larger (or smaller, for negative) values than can be stored in the 32-bit > over-the-wire long. This might give surprising results when you're using xdr_long to encode an > unsigned integer. This is exactly what is happening. > if (xdrs->x_op == XDR_ENCODE > && (sizeof (int32_t) == sizeof (long) > || (int32_t) *lp == *lp)) // <-- HERE is the check OK, get ready: I presume you pulled this from glibc sunrpc subdir? Or the kernel rpc subdir? (The linux kernel rpc's can/should be different than the glibc ones). Is this a 'well known limitation'? (i.e. that xdr_long will only send 32 bits), or is this a 'buggy implementation'? If its 'well known', where is it documented? 'man xdr' gives no hint, and implies the opposite is true ... Does RPC define a 64-bit encoding (that can be received by 32-bit servers)? If so, shouldn't the xdr_long be modified to use the 64-bit marshalling? I'm thinking to myself that there are valid reasons for applications on 32-bit CPU's to use 64-bit ints. I know gcc handles 64-bit 'long-long ints' on intel machines just fine, and has done so for at least 5 years. (if not 10). By extension, one must conclude that there are valid reasons for RPC's to support 64-bit ints, even on 32-bit arch's. If you buy the argument in the last 3 sentances, then I would think that xdr_long should be generating a 64-bit wire protocol on ppc64. There. Pandora's box is now open ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Sat Jun 7 08:56:35 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 06 Jun 2003 17:56:35 -0500 Subject: RPC/XDR experience? In-Reply-To: <20030606173954.B36624@forte.austin.ibm.com> References: <20030606170221.A36622@forte.austin.ibm.com> <3EE112F8.7070707@austin.ibm.com> <20030606173954.B36624@forte.austin.ibm.com> Message-ID: <3EE11C23.6030705@austin.ibm.com> linas at austin.ibm.com wrote: > This is exactly what is happening. Good. >> if (xdrs->x_op == XDR_ENCODE >> && (sizeof (int32_t) == sizeof (long) >> || (int32_t) *lp == *lp)) // <-- HERE is the check > > OK, get ready: > > I presume you pulled this from glibc sunrpc subdir? Or the kernel > rpc subdir? (The linux kernel rpc's can/should be different than > the glibc ones). glibc (2.2). > Is this a 'well known limitation'? (i.e. that xdr_long will only > send 32 bits), or is this a 'buggy implementation'? This is a (well) known limitation. XDR calls 64-bit integers for "hyper integers". See RFC1014 for reference. > If its 'well known', where is it documented? 'man xdr' gives no hint, > and implies the opposite is true ... I can't find any documentation on Linux/glibc for this. This is the behaviour that AIX, Solaris and other ONC+-based implementations have, to change it would be a bad idea for compatibility reasons. http://docs.sun.com/db/doc/806-6543/6jffrdmf8?a=view for reference. Search for "xdr_long": "The XDR routine xdr_long(3NSL) might seem to be a problem; however, it is still handled as a 32-bit quantity over the wire to be compatible with existing protocols. If the 64-bit version of the routine is asked to encode a long value that does not fit into a 32-bit quantity, the encode operation fails." > Does RPC define a 64-bit encoding (that can be received by 32-bit > servers)? xdr_hyper() > If so, shouldn't the xdr_long be modified to use the 64-bit marshalling? It would probably not be wise, since there's a lot of 32-bit code out there that will change protocol behavior if it's recompiled in 64-bit mode, etc. > I'm thinking to myself that there are valid reasons for applications > on 32-bit CPU's to use 64-bit ints. I know gcc handles 64-bit 'long-long > ints' on intel machines just fine, and has done so for at least 5 years. > (if not 10). By extension, one must conclude that there are valid > reasons for RPC's to support 64-bit ints, even on 32-bit arch's. > If you buy the argument in the last 3 sentances, then I would think > that xdr_long should be generating a 64-bit wire protocol on ppc64. > > There. Pandora's box is now open ... Nah, no need to make the problem bigger than it already is and fix something that isn't broken. -Olof -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From vince at io.com Sat Jun 7 09:29:55 2003 From: vince at io.com (vince) Date: Fri, 6 Jun 2003 18:29:55 -0500 (CDT) Subject: Another ? about building 64-bit compiler In-Reply-To: <3EE11C23.6030705@austin.ibm.com> Message-ID: I thought I had a plan to get this 64-bit ppc compiler built w/o messing with glibc but it didn't work. Here's what I did: #Build binutils export PREFIX=/project/tools/ppc_linux_2/suse/8.1/compilers/gnu/gcc/3.2_ppc64 mkdir binutils_build even to source(this is your obj tree) and cd into it ../binutils-2.12.90.0.15/configure --prefix=$PREFIX/binutils --enable-shared --t arget=powerpc64-unknown-linux-gnu Run "make 2>&1 | tee build.out" Run "make install 2>&1 | tee install.out" This built and installed. #Build gcc cd $PREFIX mkdir -p powerpc64-unknown-linux-gnu/lib Populate with /opt/cross/powerpc64-linux/lib # I had to do this as configure didn't like --with-libs as this wasn't # a cross-compiler I was building. Back in my build dir: export PATH=$PREFIX/binutils/powerpc64-unknown-linux-gnu/bin:$PATH ../gcc-3.2/configure --prefix=$PREFIX --with-local-prefix=$PREFIX/default \ --with-as=$PREFIX/binutils/powerpc-unknown-linux-gnu/bin/as \ --with-ld=$PREFIX/binutils/powerpc-unknown-linux-gnu/bin/ld \ --enable-languages=c,c++ --enable-threads=posix --enable-shared \ --target=powerpc64-unknown-linux-gnu I started the make, which failed complaining about /usr/lib/libc.so.6 being an incompatible library while linking something. I figured I messed up by not editing the top-level makefile and setting CC="/opt/cross/bin/powerpc64-linux-gcc"(it was set to "gcc") so I tried that and, while I got past the failing linker command, I failed with: /gcc/gcc_build_ppc64/gcc/xgcc -B/gcc/gcc_build_ppc64/gcc/ -B/project/tools/ppc_linux_2/suse/8.1/compilers/gnu/gcc/3.2_ppc64/powerpc64-unknown-linux-gnu/bin/ -B/project/tools/ppc_linux_2/suse/8.1/compilers/gnu/gcc/3.2_ppc64/powerpc64-unknown-linux-gnu/lib/ -isystem /project/tools/ppc_linux_2/suse/8.1/compilers/gnu/gcc/3.2_ppc64/powerpc64-unknown-linux-gnu/include -O2 -DIN_GCC -mminimal-toc -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -isystem ./include -mno-minimal-toc -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../gcc-3.2/gcc -I../../gcc-3.2/gcc/. -I../../gcc-3.2/gcc/config -I../../gcc-3.2/gcc/../include -fexceptions -c ../../gcc-3.2/gcc/unwind-dw2.c -o libgcc/./unwind-dw2.o xgcc: Internal error: Killed (program cc1) So, I guess my question is...Am I on the right track? If I am, I need to rebuild binutils with the cross-compiler rather than /usr/bin/gcc. If you read all this, thanks! I'm a total newbie so if someone can suggest a good book/URL that might help me, that would also be great. Thanks. -- vince /* Visit the home of the Rancid Tofu Experience */ /* http://www.mp3.com/rancidtofuexperience */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Jun 7 12:42:36 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 7 Jun 2003 12:42:36 +1000 Subject: ppc64 irq cleanup Message-ID: <20030607024236.GA28914@krispykreme> Hi, Ive been looking at the IRQ consolidation patches and what needs to be done for ppc64 to work. One problem is how we use request_irq before the memory subsystem is up. There were two places in which we did this, the 8259 interrupt and its cascade and the RAS interrupts. I just changed the RAS interrupt initialisation to be called out of an initcall, so that leaves the 8259. This is what I am playing with at the moment, basically we do 8259 initialisation (and the associated request_irqs) in an arch_initcall. When we go to dynamically allocating irq_descs this should make things a bit easier. Anton ===== arch/ppc64/kernel/irq.c 1.29 vs edited ===== --- 1.29/arch/ppc64/kernel/irq.c Sat Jun 7 11:59:39 2003 +++ edited/arch/ppc64/kernel/irq.c Sat Jun 7 12:28:21 2003 @@ -72,40 +72,6 @@ int ppc_spurious_interrupts = 0; unsigned long lpEvent_count = 0; -/* nasty hack for shared irq's since we need to do kmalloc calls but - * can't very early in the boot when we need to do a request irq. - * this needs to be removed. - * -- Cort - */ -#define IRQ_KMALLOC_ENTRIES 16 -static int cache_bitmask = 0; -static struct irqaction malloc_cache[IRQ_KMALLOC_ENTRIES]; -extern int mem_init_done; - -void *irq_kmalloc(size_t size, int pri) -{ - unsigned int i; - if ( mem_init_done ) - return kmalloc(size,pri); - for ( i = 0; i < IRQ_KMALLOC_ENTRIES ; i++ ) - if ( ! ( cache_bitmask & (1<interrupt_controller == IC_OPEN_PIC) { + /* Initialize the cascade */ + if (request_irq(NUM_8259_INTERRUPTS, no_action, SA_INTERRUPT, + "82c59 cascade", NULL)) + printk(KERN_ERR "Unable to get OpenPIC IRQ 0 for cascade\n"); + i8259_init(); + } +} +arch_initcall(openpic_setup_i8259); void openpic_setup_ISU(int isu_num, unsigned long addr) { ===== arch/ppc64/kernel/smp.c 1.38 vs edited ===== --- 1.38/arch/ppc64/kernel/smp.c Sat Jun 7 11:19:27 2003 +++ edited/arch/ppc64/kernel/smp.c Sat Jun 7 12:08:06 2003 @@ -208,7 +208,7 @@ } } -static int __init smp_chrp_probe(void) +static int __init smp_openpic_probe(void) { int i; int nr_cpus = 0; @@ -301,6 +301,10 @@ if (cpu_possible(i)) nr_cpus++; } +#ifdef CONFIG_SMP + extern void xics_request_IPIs(void); + xics_request_IPIs(); +#endif return nr_cpus; } @@ -337,7 +341,7 @@ if (naca->interrupt_controller == IC_OPEN_PIC) { smp_ops->message_pass = smp_openpic_message_pass; - smp_ops->probe = smp_chrp_probe; + smp_ops->probe = smp_openpic_probe; } else { smp_ops->message_pass = smp_xics_message_pass; smp_ops->probe = smp_xics_probe; ===== arch/ppc64/kernel/xics.c 1.24 vs edited ===== --- 1.24/arch/ppc64/kernel/xics.c Wed May 28 08:36:24 2003 +++ edited/arch/ppc64/kernel/xics.c Sat Jun 7 12:27:09 2003 @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -490,23 +491,38 @@ ops->cppr_info(boot_cpuid, 0xff); iosync(); - if (xics_irq_8259_cascade != -1) { + + ppc64_boot_msg(0x21, "XICS Done"); +} + +/* + * We cant do this in init_IRQ because we need the memory subsystem up for + * request_irq() + */ +static int __init xics_setup_i8259(void) +{ + if (naca->interrupt_controller == IC_PPC_XIC && + xics_irq_8259_cascade != -1) { if (request_irq(xics_irq_8259_cascade + XICS_IRQ_OFFSET, no_action, 0, "8259 cascade", 0)) printk(KERN_ERR "xics_init_IRQ: couldn't get 8259 cascade\n"); i8259_init(); } + return 0; +} +arch_initcall(xics_setup_i8259); #ifdef CONFIG_SMP +void xics_request_IPIs(void) +{ real_irq_to_virt_map[XICS_IPI] = virt_irq_to_real_map[XICS_IPI] = XICS_IPI; /* IPIs are marked SA_INTERRUPT as they must run with irqs disabled */ request_irq(XICS_IPI + XICS_IRQ_OFFSET, xics_ipi_action, SA_INTERRUPT, "IPI", 0); irq_desc[XICS_IPI+XICS_IRQ_OFFSET].status |= IRQ_PER_CPU; -#endif - ppc64_boot_msg(0x21, "XICS Done"); } +#endif void xics_set_affinity(unsigned int virq, unsigned long cpumask) { ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From vince at io.com Mon Jun 9 10:31:03 2003 From: vince at io.com (vince) Date: Sun, 8 Jun 2003 19:31:03 -0500 (CDT) Subject: What's the diff in UL 1.0 compilers? In-Reply-To: Message-ID: Just what is the difference between the regular gcc compiler in /usr/bin and the one under /opt/cross? After spending 6 hours today trying to build the "cross-compiler" (see previous notes of confusion), I'm wondering why I'm doing this. My customer says he needs the cross-compiler to build 64-bit programs for PPC linux systems. Couldn't this be done using "/usr/bin/gcc -mpowerpc64"? If the answer is yes, why would you ever run the /opt/cross compiler? BTW, this is a United Linux PPC 1.0 system running a 64-bit kernel(which I believe is the only kernel available). Thanks, Vince ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From SVakkalankarao at covansys.com Mon Jun 9 14:57:06 2003 From: SVakkalankarao at covansys.com (VAKKALANKA RAO Sridhar) Date: Mon, 9 Jun 2003 10:27:06 +0530 Subject: Kernel startup problems on p630 Message-ID: <207D6ADFC044A84686D44CA11B297EEA02018A3B@chn-ex02.cvns.corp.covansys.com> Dave (Boutcher), Linas, Your sense of humor displaying the depth of your knowledge was indeed humbling (trust me, I'm being honest). Now, here's the joke at my expense. A web search revealed that if I wanted LPAR mode on my macine, I should have ordered it with feature# 6576, as it was known then. Today, it is called feature# 9575. We have started plans to get it installed. Dave (Engebretsen), The machine on which the binary ran successfully was indeed in LPAR mode, even though it's firmware level was older! As of today, is it really impossible for Linux to run in SMP mode on the p630 due to firmware bugs (or any other reason)? You also mentioned that you are resynching to the most recent level of 2.4.21 in the next day or two. Do you intend to make it work on a p630 in SMP mode? If yes, please indicate how the kernel configuration would differ between LPAR and SMP modes. Thanks a lot Sri ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From meissner at suse.de Mon Jun 9 17:46:27 2003 From: meissner at suse.de (Marcus Meissner) Date: Mon, 9 Jun 2003 09:46:27 +0200 Subject: What's the diff in UL 1.0 compilers? In-Reply-To: References: Message-ID: <20030609074627.GA23632@suse.de> On Sun, Jun 08, 2003 at 07:31:03PM -0500, vince wrote: > > Just what is the difference between the regular gcc compiler > in /usr/bin and the one under /opt/cross? > > After spending 6 hours today trying to build the "cross-compiler" > (see previous notes of confusion), I'm wondering why I'm > doing this. My customer says he needs the cross-compiler to > build 64-bit programs for PPC linux systems. Couldn't this be done using > "/usr/bin/gcc -mpowerpc64"? If the answer is yes, why would > you ever run the /opt/cross compiler? This will be done using -m64 later. Gcc was not biarch capable for ppc64 at the time we did UL1/SLES8. For now use: /opt/cross/bin/powerpc64-linux-gcc as a 32bit -> 64bit cross compiler. Ciao, Marcus ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From vince at io.com Tue Jun 10 01:40:03 2003 From: vince at io.com (vince) Date: Mon, 9 Jun 2003 10:40:03 -0500 (CDT) Subject: What's the diff in UL 1.0 compilers? In-Reply-To: <20030609074627.GA23632@suse.de> Message-ID: On Mon, 9 Jun 2003, Marcus Meissner wrote: > On Sun, Jun 08, 2003 at 07:31:03PM -0500, vince wrote: > > > > Just what is the difference between the regular gcc compiler > > in /usr/bin and the one under /opt/cross? > > > > After spending 6 hours today trying to build the "cross-compiler" > > (see previous notes of confusion), I'm wondering why I'm > > doing this. My customer says he needs the cross-compiler to > > build 64-bit programs for PPC linux systems. Couldn't this be done using > > "/usr/bin/gcc -mpowerpc64"? If the answer is yes, why would > > you ever run the /opt/cross compiler? > > This will be done using -m64 later. Gcc was not biarch capable for ppc64 > at the time we did UL1/SLES8. > > For now use: /opt/cross/bin/powerpc64-linux-gcc > as a 32bit -> 64bit cross compiler. Thanks for the info, Marcus. I notice that the installed cross-compiler is a 32-bit app. The only way I've gotten the thing to build is as a 64-bit app. Is this correct? Whenever I try to build the 32-bit cross-compiler, it fails because ld complains about /usr/lib/libc.so.6 being incompatible. Thus, the only way I've gotten this to build is to explicitly set CC, LD, NM, etc, to the versions in /opt/cross before running make. I'm guessing that I'm still screwing up. Perhaps it's because I'm not rebuilding glibc. I'm trying just to rebuild binutils and gcc as glibc stumped me. So, I'm probably screwing up my binutils build as I would guess it's the binutils ld I'm using that has somehow hard-coded the wrong libc.so.6 into itself. Heck, if there's anyone in Austin that knows how to build the cross- compiler, I'll pay you $100 to show me how to do it. I live 2 miles from IBM and can tunnel into my build machine from home. Thanks, -- vince /* Visit the home of the Rancid Tofu Experience */ /* http://www.mp3.com/rancidtofuexperience */ ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Jun 10 05:46:04 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 9 Jun 2003 14:46:04 -0500 Subject: Kernel startup problems on p630 In-Reply-To: <207D6ADFC044A84686D44CA11B297EEA02018A3B@chn-ex02.cvns.corp.covansys.com>; from SVakkalankarao@covansys.com on Mon, Jun 09, 2003 at 10:27:06AM +0530 References: <207D6ADFC044A84686D44CA11B297EEA02018A3B@chn-ex02.cvns.corp.covansys.com> Message-ID: <20030609144604.A40270@forte.austin.ibm.com> On Mon, Jun 09, 2003 at 10:27:06AM +0530, VAKKALANKA RAO Sridhar wrote: > > Dave (Boutcher), Linas, > > Your sense of humor Thank you, you can't have fun if you aren't having fun. > You also mentioned that you are resynching to the most recent level of > 2.4.21 in the next day or two. Do you intend to make it work on a p630 > in SMP mode? Is there a reason why you aren't choosing to work with a vendor-supplied kernel? (i.e. one provided by SuSE; possibly by RedHat?) The reason I ask is that software bugs do come up, they do get fixed, and eventually wind thier way from various developer trees and patches into the mainstream distributions. As long as you work with source code of semi-official provenance, there is a chance that you are working with known-buggy code, or code that might not yet have been tested, etc. By comparison, the SuSE kernels have been thoroughly tested, and should 'just work', without all the messing around. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Jun 10 09:18:28 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 10 Jun 2003 09:18:28 +1000 Subject: Kernel startup problems on p630 In-Reply-To: <207D6ADFC044A84686D44CA11B297EEA02018A3B@chn-ex02.cvns.corp.covansys.com> References: <207D6ADFC044A84686D44CA11B297EEA02018A3B@chn-ex02.cvns.corp.covansys.com> Message-ID: <20030609231828.GB31360@krispykreme> > The machine on which the binary ran successfully was indeed in LPAR > mode, even though it's firmware level was older! As of today, is it > really impossible for Linux to run in SMP mode on the p630 due to > firmware bugs (or any other reason)? No, it should work fine. We regularly test 2.5 kernels on SMP mode 630s for example. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From david at gibson.dropbear.id.au Tue Jun 10 17:07:35 2003 From: david at gibson.dropbear.id.au (David Gibson) Date: Tue, 10 Jun 2003 17:07:35 +1000 Subject: Update hugetlbfs patch Message-ID: <20030610070735.GC1507@zax> Below is an updated version of my hugetlbfs patch. It fixes a number of bugs (including the crash when attempting to allocate more pages than available) and includes a number of (fairly minor) cleanups. diff -urN /scratch/anton/export/arch/ppc64/Kconfig linux-gogogo/arch/ppc64/Kconfig --- /scratch/anton/export/arch/ppc64/Kconfig 2003-06-03 17:49:13.000000000 +1000 +++ linux-gogogo/arch/ppc64/Kconfig 2003-06-04 12:41:52.000000000 +1000 @@ -69,6 +69,17 @@ bool default y +config HUGETLB_PAGE + bool "Huge TLB Page Support" + help + This enables support for huge pages. User space applications + can make use of this support with the sys_alloc_hugepages and + sys_free_hugepages system calls. If your applications are + huge page aware and your processor supports this (only POWER4, + then say Y here. + + Otherwise, say N. + config SMP bool "Symmetric multi-processing support" ---help--- diff -urN /scratch/anton/export/arch/ppc64/kernel/htab.c linux-gogogo/arch/ppc64/kernel/htab.c --- /scratch/anton/export/arch/ppc64/kernel/htab.c 2003-04-24 18:54:33.000000000 +1000 +++ linux-gogogo/arch/ppc64/kernel/htab.c 2003-06-03 15:44:09.000000000 +1000 @@ -195,7 +195,7 @@ if (!pgd_none(*pg)) { pm = pmd_offset(pg, ea); - if (!pmd_none(*pm)) { + if (pmd_present(*pm)) { pt = pte_offset_kernel(pm, ea); pte = *pt; if (!pte_present(pte)) @@ -431,8 +431,12 @@ if (user_region && (mm->cpu_vm_mask == (1 << smp_processor_id()))) local = 1; - ptep = find_linux_pte(pgdir, ea); - ret = __hash_page(ea, access, vsid, ptep, trap, local); + ret = hash_huge_page(mm, access, ea, vsid, local); + if (ret < 0) { + ptep = find_linux_pte(pgdir, ea); + ret = __hash_page(ea, access, vsid, ptep, trap, local); + } + spin_unlock(&mm->page_table_lock); return ret; diff -urN /scratch/anton/export/arch/ppc64/kernel/stab.c linux-gogogo/arch/ppc64/kernel/stab.c --- /scratch/anton/export/arch/ppc64/kernel/stab.c 2003-05-06 07:49:37.000000000 +1000 +++ linux-gogogo/arch/ppc64/kernel/stab.c 2003-06-10 11:19:01.000000000 +1000 @@ -204,6 +204,12 @@ vsid_data.data.kp = 1; if (large) vsid_data.data.l = 1; + /* FIXME: hack alert! we make user hugepages noexec to + * sidestep icache/dcache coherence issues for now. We should + * fix this properly. */ + if (large && + (REGION_ID(esid << SID_SHIFT) == USER_REGION_ID)) + vsid_data.data.n = 1; if (kernel_segment) vsid_data.data.c = 1; @@ -220,15 +226,18 @@ } static inline void __ste_allocate(unsigned long esid, unsigned long vsid, - int kernel_segment) + int kernel_segment, mm_context_t context) { if (cpu_has_slb()) { + int large = 0; + #ifndef CONFIG_PPC_ISERIES if (REGION_ID(esid << SID_SHIFT) == KERNEL_REGION_ID) - make_slbe(esid, vsid, 1, kernel_segment); - else + large = 1; + else if (REGION_ID(esid << SID_SHIFT) == USER_REGION_ID) + large = in_hugepage_area(context, esid << SID_SHIFT); #endif - make_slbe(esid, vsid, 0, kernel_segment); + make_slbe(esid, vsid, large, kernel_segment); } else { unsigned char top_entry, stab_entry, *segments; @@ -254,6 +263,7 @@ { unsigned long vsid, esid; int kernel_segment = 0; + mm_context_t context; PMC_SW_PROCESSOR(stab_faults); @@ -265,16 +275,18 @@ if (REGION_ID(ea) >= KERNEL_REGION_ID) { kernel_segment = 1; vsid = get_kernel_vsid(ea); + context = REGION_ID(ea); } else { - struct mm_struct *mm = current->mm; - if (mm) - vsid = get_vsid(mm->context, ea); - else + if (! current->mm) return 1; + + context = current->mm->context; + + vsid = get_vsid(context, ea); } esid = GET_ESID(ea); - __ste_allocate(esid, vsid, kernel_segment); + __ste_allocate(esid, vsid, kernel_segment, context); if (!cpu_has_slb()) { /* Order update */ asm volatile("sync":::"memory"); @@ -301,7 +313,7 @@ for (esid = 0; esid < 16; esid++) { unsigned long ea = esid << SID_SHIFT; vsid = get_vsid(mm->context, ea); - __ste_allocate(esid, vsid, 0); + __ste_allocate(esid, vsid, 0, mm->context); } } else { unsigned long pc = KSTK_EIP(tsk); @@ -315,7 +327,7 @@ (REGION_ID(pc) >= KERNEL_REGION_ID)) return; vsid = get_vsid(mm->context, pc); - __ste_allocate(GET_ESID(pc), vsid, 0); + __ste_allocate(GET_ESID(pc), vsid, 0, mm->context); } if (stack && (pc_segment != stack_segment)) { @@ -323,7 +335,7 @@ (REGION_ID(stack) >= KERNEL_REGION_ID)) return; vsid = get_vsid(mm->context, stack); - __ste_allocate(GET_ESID(stack), vsid, 0); + __ste_allocate(GET_ESID(stack), vsid, 0, mm->context); } } diff -urN /scratch/anton/export/arch/ppc64/mm/Makefile linux-gogogo/arch/ppc64/mm/Makefile --- /scratch/anton/export/arch/ppc64/mm/Makefile 2003-02-13 00:02:23.000000000 +1100 +++ linux-gogogo/arch/ppc64/mm/Makefile 2003-06-03 15:44:09.000000000 +1000 @@ -6,3 +6,4 @@ obj-y := fault.o init.o extable.o imalloc.o obj-$(CONFIG_DISCONTIGMEM) += numa.o +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o diff -urN /scratch/anton/export/arch/ppc64/mm/hugetlbpage.c linux-gogogo/arch/ppc64/mm/hugetlbpage.c --- /scratch/anton/export/arch/ppc64/mm/hugetlbpage.c Thu Jan 01 10:00:00 1970 +++ linux-gogogo/arch/ppc64/mm/hugetlbpage.c Tue Jun 10 16:22:24 2003 @@ -0,0 +1,730 @@ +/* + * PPC64 (POWER4) Huge TLB Page Support for Kernel. + * + * Copyright (C) 2003 David Gibson, IBM Corporation. + * + * Based on the IA-32 version: + * Copyright (C) 2002, Rohit Seth + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +int htlbpage_max; + +/* This lock protects the two counters and list below */ +static spinlock_t htlbpage_lock = SPIN_LOCK_UNLOCKED; + +static int htlbpage_free; /* = 0 */ +static int htlbpage_total; /* = 0 */ +static LIST_HEAD(htlbpage_freelist); + +/* HugePTE layout: + * + * 31 30 ... 15 14 13 12 10 9 8 7 6 5 4 3 2 1 0 + * PFN>>12..... - - - - - - HASH_IX.... 2ND HASH RW - HG=1 + */ + +#define HUGEPTE_SHIFT 15 +#define _HUGEPAGE_PFN 0xffff8000 +#define _HUGEPAGE_BAD 0x00007f00 +#define _HUGEPAGE_HASHPTE 0x00000008 +#define _HUGEPAGE_SECONDARY 0x00000010 +#define _HUGEPAGE_GROUP_IX 0x000000e0 +#define _HUGEPAGE_HPTEFLAGS (_HUGEPAGE_HASHPTE | _HUGEPAGE_SECONDARY | \ + _HUGEPAGE_GROUP_IX) +#define _HUGEPAGE_RW 0x00000004 + +typedef struct {unsigned int val;} hugepte_t; +#define hugepte_val(hugepte) ((hugepte).val) +#define __hugepte(x) ((hugepte_t) { (x) } ) +#define hugepte_pfn(x) \ + ((unsigned long)(hugepte_val(x)>>HUGEPTE_SHIFT) << HUGETLB_PAGE_ORDER) +#define mk_hugepte(page,wr) __hugepte( \ + ((page_to_pfn(page)>>HUGETLB_PAGE_ORDER) << HUGEPTE_SHIFT ) \ + | (!!(wr) * _HUGEPAGE_RW) | _PMD_HUGEPAGE ) + +#define hugepte_bad(x) ( !(hugepte_val(x) & _PMD_HUGEPAGE) || \ + (hugepte_val(x) & _HUGEPAGE_BAD) ) +#define hugepte_page(x) pfn_to_page(hugepte_pfn(x)) +#define hugepte_none(x) (!(hugepte_val(x) & _HUGEPAGE_PFN)) + + +static void free_huge_page(struct page *page); +static void flush_hash_hugepage(mm_context_t context, unsigned long ea, + hugepte_t pte, int local); + +static inline unsigned int hugepte_update(hugepte_t *p, unsigned int clr, + unsigned int set) +{ + unsigned int old, tmp; + + __asm__ __volatile__( + "1: lwarx %0,0,%3 # pte_update\n\ + andc %1,%0,%4 \n\ + or %1,%1,%5 \n\ + stwcx. %1,0,%3 \n\ + bne- 1b" + : "=&r" (old), "=&r" (tmp), "=m" (*p) + : "r" (p), "r" (clr), "r" (set), "m" (*p) + : "cc" ); + return old; +} + +static inline void set_hugepte(hugepte_t *ptep, hugepte_t pte) +{ + hugepte_update(ptep, ~_HUGEPAGE_HPTEFLAGS, + hugepte_val(pte) & ~_HUGEPAGE_HPTEFLAGS); +} + +static struct page *alloc_hugetlb_page(void) +{ + int i; + struct page *page; + + spin_lock(&htlbpage_lock); + if (list_empty(&htlbpage_freelist)) { + spin_unlock(&htlbpage_lock); + return NULL; + } + + page = list_entry(htlbpage_freelist.next, struct page, list); + list_del(&page->list); + htlbpage_free--; + spin_unlock(&htlbpage_lock); + set_page_count(page, 1); + page->lru.prev = (void *)free_huge_page; + for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) + clear_highpage(&page[i]); + return page; +} + +static hugepte_t *hugepte_alloc(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pmd_t *pmd = NULL; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + pgd = pgd_offset(mm, addr); + pmd = pmd_alloc(mm, pgd, addr); + + /* We shouldn't find a (normal) PTE page pointer here */ + BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); + + return (hugepte_t *)pmd; +} + +static hugepte_t *hugepte_offset(struct mm_struct *mm, unsigned long addr) +{ + pgd_t *pgd; + pmd_t *pmd = NULL; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + pgd = pgd_offset(mm, addr); + pmd = pmd_offset(pgd, addr); + + /* We shouldn't find a (normal) PTE page pointer here */ + BUG_ON(!pmd_none(*pmd) && !pmd_hugepage(*pmd)); + + return (hugepte_t *)pmd; +} + +static void setup_huge_pte(struct mm_struct *mm, struct page *page, + hugepte_t *ptep, int write_access) +{ + hugepte_t entry; + int i; + + mm->rss += (HPAGE_SIZE / PAGE_SIZE); + entry = mk_hugepte(page, write_access); + for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) + set_hugepte(ptep+i, entry); +} + +static void teardown_huge_pte(hugepte_t *ptep) +{ + int i; + + for (i = 0; i < HUGEPTE_BATCH_SIZE; i++) + pmd_clear((pmd_t *)(ptep+i)); +} + +/* + * This function checks for proper alignment of input addr and len parameters. + */ +int is_aligned_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + if (! is_hugepage_only_range(addr, len)) + return -EINVAL; + return 0; +} + +int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, + struct vm_area_struct *vma) +{ + hugepte_t *src_pte, *dst_pte, entry; + struct page *ptepage; + unsigned long addr = vma->vm_start; + unsigned long end = vma->vm_end; + + while (addr < end) { + BUG_ON(! in_hugepage_area(src->context, addr)); + BUG_ON(! in_hugepage_area(dst->context, addr)); + + dst_pte = hugepte_alloc(dst, addr); + if (!dst_pte) + return -ENOMEM; + + src_pte = hugepte_offset(src, addr); + entry = *src_pte; + + if ((addr % HPAGE_SIZE) == 0) { + /* This is the first hugepte in a batch */ + ptepage = hugepte_page(entry); + get_page(ptepage); + dst->rss += (HPAGE_SIZE / PAGE_SIZE); + } + set_hugepte(dst_pte, entry); + + + addr += PMD_SIZE; + } + return 0; +} + +int +follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, + struct page **pages, struct vm_area_struct **vmas, + unsigned long *position, int *length, int i) +{ + unsigned long vpfn, vaddr = *position; + int remainder = *length; + + WARN_ON(!is_vm_hugetlb_page(vma)); + + vpfn = vaddr/PAGE_SIZE; + while (vaddr < vma->vm_end && remainder) { + BUG_ON(!in_hugepage_area(mm->context, vaddr)); + + if (pages) { + hugepte_t *pte; + struct page *page; + + pte = hugepte_offset(mm, vaddr); + + /* hugetlb should be locked, and hence, prefaulted */ + WARN_ON(!pte || hugepte_none(*pte)); + + page = &hugepte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)]; + + WARN_ON(!PageCompound(page)); + + get_page(page); + pages[i] = page; + } + + if (vmas) + vmas[i] = vma; + + vaddr += PAGE_SIZE; + ++vpfn; + --remainder; + ++i; + } + + *length = remainder; + *position = vaddr; + + return i; +} + +struct page * +follow_huge_addr(struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long address, int write) +{ + return NULL; +} + +struct vm_area_struct *hugepage_vma(struct mm_struct *mm, unsigned long addr) +{ + return NULL; +} + +int pmd_huge(pmd_t pmd) +{ + return pmd_hugepage(pmd); +} + +struct page * +follow_huge_pmd(struct mm_struct *mm, unsigned long address, + pmd_t *pmd, int write) +{ + struct page *page; + + BUG_ON(! pmd_hugepage(*pmd)); + + page = hugepte_page(*(hugepte_t *)pmd); + if (page) { + page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT); + get_page(page); + } + return page; +} + +static void free_huge_page(struct page *page) +{ + BUG_ON(page_count(page)); + BUG_ON(page->mapping); + + INIT_LIST_HEAD(&page->list); + + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + htlbpage_free++; + spin_unlock(&htlbpage_lock); +} + +void huge_page_release(struct page *page) +{ + if (!put_page_testzero(page)) + return; + + free_huge_page(page); +} + +void unmap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned long addr; + hugepte_t *ptep; + struct page *page; + int local = 0; + + WARN_ON(!is_vm_hugetlb_page(vma)); + BUG_ON((start % HPAGE_SIZE) != 0); + BUG_ON((end % HPAGE_SIZE) != 0); + + /* XXX are there races with checking cpu_vm_mask? - Anton */ + if (vma->vm_mm->cpu_vm_mask == (1 << smp_processor_id())) + local = 1; + + for (addr = start; addr < end; addr += HPAGE_SIZE) { + hugepte_t pte; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + ptep = hugepte_offset(mm, addr); + if (!ptep || hugepte_none(*ptep)) + continue; + + pte = *ptep; + page = hugepte_page(pte); + teardown_huge_pte(ptep); + + if (hugepte_val(pte) & _HUGEPAGE_HASHPTE) + flush_hash_hugepage(mm->context, addr, + pte, local); + + huge_page_release(page); + } + + mm->rss -= (end - start) >> PAGE_SHIFT; +} + +void zap_hugepage_range(struct vm_area_struct *vma, + unsigned long start, unsigned long length) +{ + struct mm_struct *mm = vma->vm_mm; + + spin_lock(&mm->page_table_lock); + unmap_hugepage_range(vma, start, start + length); + spin_unlock(&mm->page_table_lock); +} + +int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma) +{ + struct mm_struct *mm = current->mm; + unsigned long addr; + int ret = 0; + + WARN_ON(!is_vm_hugetlb_page(vma)); + BUG_ON((vma->vm_start % HPAGE_SIZE) != 0); + BUG_ON((vma->vm_end % HPAGE_SIZE) != 0); + + spin_lock(&mm->page_table_lock); + for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) { + unsigned long idx; + hugepte_t *pte = hugepte_alloc(mm, addr); + struct page *page; + + BUG_ON(!in_hugepage_area(mm->context, addr)); + + if (!pte) { + ret = -ENOMEM; + goto out; + } + if (!hugepte_none(*pte)) + continue; + + idx = ((addr - vma->vm_start) >> HPAGE_SHIFT) + + (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT)); + page = find_get_page(mapping, idx); + if (!page) { + page = alloc_hugetlb_page(); + if (!page) { + ret = -ENOMEM; + goto out; + } + ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC); + unlock_page(page); + if (ret) { + free_huge_page(page); + goto out; + } + } + setup_huge_pte(mm, page, pte, vma->vm_flags & VM_WRITE); + } +out: + spin_unlock(&mm->page_table_lock); + return ret; +} + +unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + struct vm_area_struct *vma; + + if (len & ~HPAGE_MASK) + return -EINVAL; + + if (! cpu_has_largepage()) + return -EINVAL; + + if (!in_hugepage_area(current->mm->context, addr) + || (addr & (HPAGE_SIZE - 1))) + addr = TASK_HPAGE_BASE; + + for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) { + /* At this point: (!vma || addr < vma->vm_end). */ + if (addr + len > TASK_HPAGE_END) + return -ENOMEM; + if (!vma || (addr + len) <= vma->vm_start) + return addr; + addr = ALIGN(vma->vm_end, HPAGE_SIZE); + + /* Because we're in an exclusively hugepage region, + * this alignment shouldn't have skipped over any + * other vmas */ + } +} + +static inline unsigned long computeHugeHptePP(unsigned int hugepte) +{ + unsigned long flags = 0x2; + + if (! (hugepte & _HUGEPAGE_RW)) + flags |= 0x1; + return flags; +} + +int hash_huge_page(struct mm_struct *mm, unsigned long access, + unsigned long ea, unsigned long vsid, int local) +{ + hugepte_t *ptep; + unsigned long va, vpn; + int is_write; + hugepte_t old_pte, new_pte; + unsigned long hpteflags, prpn; + long slot; + + /* Is this for us? */ + if (!in_hugepage_area(mm->context, ea)) + return -1; + + /* We have to find the first hugepte in the batch, since + * that's the one that will store the HPTE flags */ + ptep = hugepte_offset(mm, ea & ~(HPAGE_SIZE-1)); + + /* Search the Linux page table for a match with va */ + va = (vsid << 28) | (ea & 0x0fffffff); + vpn = va >> HPAGE_SHIFT; + + BUG_ON(hugepte_bad(*ptep)); + + /* + * If no pte found or not present, send the problem up to + * do_page_fault + */ + if (unlikely(!ptep || hugepte_none(*ptep))) + return 1; + + /* + * Check the user's access rights to the page. If access should be + * prevented then send the problem up to do_page_fault. + */ + is_write = access & _PAGE_RW; + if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW))) + return 1; + + /* + * At this point, we have a pte (old_pte) which can be used to build + * or update an HPTE. There are 2 cases: + * + * 1. There is a valid (present) pte with no associated HPTE (this is + * the most common case) + * 2. There is a valid (present) pte with an associated HPTE. The + * current values of the pp bits in the HPTE prevent access + * because we are doing software DIRTY bit management and the + * page is currently not DIRTY. + */ + + old_pte = *ptep; + new_pte = old_pte; + + hpteflags = computeHugeHptePP(hugepte_val(new_pte)); + + /* Check if pte already has an hpte (case 2) */ + if (unlikely(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE)) { + /* There MIGHT be an HPTE for this pte */ + unsigned long hash, slot; + + hash = hpt_hash(vpn, 1); + if (hugepte_val(old_pte) & _HUGEPAGE_SECONDARY) + hash = ~hash; + slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot += (hugepte_val(old_pte) & _HUGEPAGE_GROUP_IX) >> 5; + + if (ppc_md.hpte_updatepp(slot, hpteflags, va, 1, local) == -1) + hugepte_val(old_pte) &= ~_HUGEPAGE_HPTEFLAGS; + } + + if (likely(!(hugepte_val(old_pte) & _HUGEPAGE_HASHPTE))) { + unsigned long hash = hpt_hash(vpn, 1); + unsigned long hpte_group; + + prpn = hugepte_pfn(old_pte); + +repeat: + hpte_group = ((hash & htab_data.htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + + /* Update the linux pte with the HPTE slot */ + hugepte_val(new_pte) &= ~_HUGEPAGE_HPTEFLAGS; + hugepte_val(new_pte) |= _HUGEPAGE_HASHPTE; + + slot = ppc_md.hpte_insert(hpte_group, va, prpn, 0, + hpteflags, 0, 1); + + /* Primary is full, try the secondary */ + if (unlikely(slot == -1)) { + hugepte_val(new_pte) |= _HUGEPAGE_SECONDARY; + hpte_group = ((~hash & htab_data.htab_hash_mask) * + HPTES_PER_GROUP) & ~0x7UL; + slot = ppc_md.hpte_insert(hpte_group, va, prpn, + 1, hpteflags, 0, 1); + if (slot == -1) { + if (mftb() & 0x1) + hpte_group = ((hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL; + + ppc_md.hpte_remove(hpte_group); + goto repeat; + } + } + + if (unlikely(slot == -2)) + panic("hash_page: pte_insert failed\n"); + + hugepte_val(new_pte) |= (slot<<5) & _HUGEPAGE_GROUP_IX; + + /* + * No need to use ldarx/stdcx here because all who + * might be updating the pte will hold the + * page_table_lock or the hash_table_lock + * (we hold both) + */ + *ptep = new_pte; + } + + return 0; +} + +static void flush_hash_hugepage(mm_context_t context, unsigned long ea, + hugepte_t pte, int local) +{ + unsigned long vsid, vpn, va, hash, secondary, slot; + + BUG_ON(hugepte_bad(pte)); + BUG_ON(!in_hugepage_area(context, ea)); + + vsid = get_vsid(context, ea); + + va = (vsid << 28) | (ea & 0x0fffffff); + vpn = va >> LARGE_PAGE_SHIFT; + hash = hpt_hash(vpn, 1); + secondary = !!(hugepte_val(pte) & _HUGEPAGE_SECONDARY); + if (secondary) + hash = ~hash; + slot = (hash & htab_data.htab_hash_mask) * HPTES_PER_GROUP; + slot += (hugepte_val(pte) & _HUGEPAGE_GROUP_IX) >> 5; + + ppc_md.hpte_invalidate(slot, va, 1, local); +} + +static void split_and_free_hugepage(struct page *page) +{ + int j; + struct page *map; + + map = page; + htlbpage_total--; + for (j = 0; j < (HPAGE_SIZE / PAGE_SIZE); j++) { + map->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | + 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved | + 1 << PG_private | 1<< PG_writeback); + set_page_count(map, 0); + map++; + } + set_page_count(page, 1); + __free_pages(page, HUGETLB_PAGE_ORDER); +} + +int set_hugetlb_mem_size(int count) +{ + int lcount; + struct page *page; + + if (! cpu_has_largepage()) + return 0; + + if (count < 0) + lcount = count; + else + lcount = count - htlbpage_total; + + if (lcount == 0) + return htlbpage_total; + if (lcount > 0) { /* Increase the mem size. */ + while (lcount--) { + page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); + if (page == NULL) + break; + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + htlbpage_free++; + htlbpage_total++; + spin_unlock(&htlbpage_lock); + } + return htlbpage_total; + } + /* Shrink the memory size. */ + while (lcount++) { + page = alloc_hugetlb_page(); + if (page == NULL) + break; + spin_lock(&htlbpage_lock); + split_and_free_hugepage(page); + spin_unlock(&htlbpage_lock); + } + return htlbpage_total; +} + +int hugetlb_sysctl_handler(ctl_table *table, int write, + struct file *file, void *buffer, size_t *length) +{ + proc_dointvec(table, write, file, buffer, length); + htlbpage_max = set_hugetlb_mem_size(htlbpage_max); + return 0; +} + +static int __init hugetlb_setup(char *s) +{ + if (sscanf(s, "%d", &htlbpage_max) <= 0) + htlbpage_max = 0; + return 1; +} +__setup("hugepages=", hugetlb_setup); + +static int __init hugetlb_init(void) +{ + int i; + struct page *page; + + if (cpu_has_largepage()) { + for (i = 0; i < htlbpage_max; ++i) { + page = alloc_pages(__GFP_HIGHMEM, HUGETLB_PAGE_ORDER); + if (!page) + break; + spin_lock(&htlbpage_lock); + list_add(&page->list, &htlbpage_freelist); + spin_unlock(&htlbpage_lock); + } + htlbpage_max = htlbpage_free = htlbpage_total = i; + printk("Total HugeTLB memory allocated, %d\n", htlbpage_free); + } else { + htlbpage_max = 0; + printk("CPU does not support HugeTLB\n"); + } + + return 0; +} +module_init(hugetlb_init); + +int hugetlb_report_meminfo(char *buf) +{ + return sprintf(buf, + "HugePages_Total: %5d\n" + "HugePages_Free: %5d\n" + "Hugepagesize: %5lu kB\n", + htlbpage_total, + htlbpage_free, + HPAGE_SIZE/1024); +} + +/* This is advisory only, so we can get away with accesing + * htlbpage_free without taking the lock. */ +int is_hugepage_mem_enough(size_t size) +{ + return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpage_free; +} + +/* + * We cannot handle pagefaults against hugetlb pages at all. They cause + * handle_mm_fault() to try to instantiate regular-sized pages in the + * hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get + * this far. + */ +static struct page *hugetlb_nopage(struct vm_area_struct *vma, + unsigned long address, int unused) +{ + BUG(); + return NULL; +} + +struct vm_operations_struct hugetlb_vm_ops = { + .nopage = hugetlb_nopage, +}; diff -urN /scratch/anton/export/arch/ppc64/mm/init.c linux-gogogo/arch/ppc64/mm/init.c --- /scratch/anton/export/arch/ppc64/mm/init.c 2003-06-04 11:16:24.000000000 +1000 +++ linux-gogogo/arch/ppc64/mm/init.c 2003-06-06 12:51:54.000000000 +1000 @@ -293,7 +293,7 @@ if (!pgd_none(*pgd)) { pmd = pmd_offset(pgd, vmaddr); - if (!pmd_none(*pmd)) { + if (pmd_present(*pmd)) { ptep = pte_offset_kernel(pmd, vmaddr); /* Check if HPTE might exist and flush it if so */ pte = __pte(pte_update(ptep, _PAGE_HPTEFLAGS, 0)); @@ -301,6 +301,7 @@ flush_hash_page(context, vmaddr, pte, local); } } + WARN_ON(pmd_hugepage(*pmd)); } } @@ -349,7 +350,7 @@ pmd_end = (start + PMD_SIZE) & PMD_MASK; if (pmd_end > end) pmd_end = end; - if (!pmd_none(*pmd)) { + if (pmd_present(*pmd)) { ptep = pte_offset_kernel(pmd, start); do { if (pte_val(*ptep) & _PAGE_HASHPTE) { @@ -368,6 +369,7 @@ ++ptep; } while (start < pmd_end); } else { + WARN_ON(pmd_hugepage(*pmd)); start = pmd_end; } ++pmd; diff -urN /scratch/anton/export/include/asm-ppc64/mmu_context.h linux-gogogo/include/asm-ppc64/mmu_context.h --- /scratch/anton/export/include/asm-ppc64/mmu_context.h 2003-02-13 00:02:43.000000000 +1100 +++ linux-gogogo/include/asm-ppc64/mmu_context.h 2003-06-03 15:44:09.000000000 +1000 @@ -36,6 +36,12 @@ #define LAST_USER_CONTEXT 0x8000 /* Same as PID_MAX for now... */ #define NUM_USER_CONTEXT (LAST_USER_CONTEXT-FIRST_USER_CONTEXT) +#ifdef CONFIG_HUGETLB_PAGE +#define CONTEXT_32BIT (1UL<<63) +#else +#define CONTEXT_32BIT 0 +#endif + /* Choose whether we want to implement our context * number allocator as a LIFO or FIFO queue. */ @@ -90,6 +96,8 @@ head = mmu_context_queue.head; mm->context = mmu_context_queue.elements[head]; + if (tsk->thread_info->flags & _TIF_32BIT) + mm->context |= CONTEXT_32BIT; head = (head < LAST_USER_CONTEXT-1) ? head+1 : 0; mmu_context_queue.head = head; @@ -189,6 +197,8 @@ { unsigned long ordinal, vsid; + context &= ~CONTEXT_32BIT; + ordinal = (((ea >> 28) & 0x1fffff) * LAST_USER_CONTEXT) | context; vsid = (ordinal * VSID_RANDOMIZER) & VSID_MASK; diff -urN /scratch/anton/export/include/asm-ppc64/page.h linux-gogogo/include/asm-ppc64/page.h --- /scratch/anton/export/include/asm-ppc64/page.h 2003-04-24 18:54:37.000000000 +1000 +++ linux-gogogo/include/asm-ppc64/page.h 2003-06-10 14:40:52.000000000 +1000 @@ -22,6 +22,41 @@ #define PAGE_MASK (~(PAGE_SIZE-1)) #define PAGE_OFFSET_MASK (PAGE_SIZE-1) +#ifdef CONFIG_HUGETLB_PAGE + +#define HPAGE_SHIFT 24 +#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT) +#define HPAGE_MASK (~(HPAGE_SIZE - 1)) +#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) + +/* For 64-bit processes the hugepage range is 1T-1.5T */ +#define TASK_HPAGE_BASE_64 (0x0000010000000000UL) +#define TASK_HPAGE_END_64 (0x0000018000000000UL) +/* For 32-bit processes the hugepage range is 2-3G */ +#define TASK_HPAGE_BASE_32 (0x80000000UL) +#define TASK_HPAGE_END_32 (0xc0000000UL) + +#define TASK_HPAGE_BASE (test_thread_flag(TIF_32BIT) ? \ + TASK_HPAGE_BASE_32 : TASK_HPAGE_BASE_64) +#define TASK_HPAGE_END (test_thread_flag(TIF_32BIT) ? \ + TASK_HPAGE_END_32 : TASK_HPAGE_END_64) + +#define ARCH_HAS_HUGEPAGE_ONLY_RANGE +#define is_hugepage_only_range(addr, len) \ + ((addr > (TASK_HPAGE_BASE-len)) && (addr < TASK_HPAGE_END)) +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA + +#define in_hugepage_area(context, addr) (cpu_has_largepage() && \ + (((context) & CONTEXT_32BIT) ? \ + (((addr) >= TASK_HPAGE_BASE_32) && ((addr) < TASK_HPAGE_END_32)) : \ + (((addr) >= TASK_HPAGE_BASE_64) && ((addr) < TASK_HPAGE_END_64)))) + +#else /* !CONFIG_HUGETLB_PAGE */ + +#define in_hugepage_area(mm, addr) 0 + +#endif /* !CONFIG_HUGETLB_PAGE */ + #define SID_SHIFT 28 #define SID_MASK 0xfffffffff #define GET_ESID(x) (((x) >> SID_SHIFT) & SID_MASK) diff -urN /scratch/anton/export/include/asm-ppc64/pgtable.h linux-gogogo/include/asm-ppc64/pgtable.h --- /scratch/anton/export/include/asm-ppc64/pgtable.h 2003-05-30 01:22:36.000000000 +1000 +++ linux-gogogo/include/asm-ppc64/pgtable.h 2003-06-06 12:52:23.000000000 +1000 @@ -149,6 +149,22 @@ /* shift to put page number into pte */ #define PTE_SHIFT (16) +/* We allow 2^41 bytes of real memory, so we need 29 bits in the PMD + * to give the PTE page number. The bottom two bits are for flags. */ +#define PMD_TO_PTEPAGE_SHIFT (2) +#ifdef CONFIG_HUGETLB_PAGE +#define _PMD_HUGEPAGE 0x00000001U +#define HUGEPTE_BATCH_SIZE (1<<(HPAGE_SHIFT-PMD_SHIFT)) + +int hash_huge_page(struct mm_struct *mm, unsigned long access, + unsigned long ea, unsigned long vsid, int local); +#else + +#define hash_huge_page(mm,a,ea,vsid,local) -1 +#define _PMD_HUGEPAGE 0 + +#endif + #ifndef __ASSEMBLY__ /* @@ -178,12 +194,16 @@ #define pte_pfn(x) ((unsigned long)((pte_val(x) >> PTE_SHIFT))) #define pte_page(x) pfn_to_page(pte_pfn(x)) -#define pmd_set(pmdp, ptep) (pmd_val(*(pmdp)) = (__ba_to_bpn(ptep))) +#define pmd_set(pmdp, ptep) \ + (pmd_val(*(pmdp)) = (__ba_to_bpn(ptep) << PMD_TO_PTEPAGE_SHIFT)) #define pmd_none(pmd) (!pmd_val(pmd)) -#define pmd_bad(pmd) ((pmd_val(pmd)) == 0) -#define pmd_present(pmd) ((pmd_val(pmd)) != 0) +#define pmd_hugepage(pmd) (!!(pmd_val(pmd) & _PMD_HUGEPAGE)) +#define pmd_bad(pmd) (((pmd_val(pmd)) == 0) || pmd_hugepage(pmd)) +#define pmd_present(pmd) ((!pmd_hugepage(pmd)) \ + && (pmd_val(pmd) & ~_PMD_HUGEPAGE) != 0) #define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0) -#define pmd_page_kernel(pmd) (__bpn_to_ba(pmd_val(pmd))) +#define pmd_page_kernel(pmd) \ + (__bpn_to_ba(pmd_val(pmd) >> PMD_TO_PTEPAGE_SHIFT)) #define pmd_page(pmd) virt_to_page(pmd_page_kernel(pmd)) #define pgd_set(pgdp, pmdp) (pgd_val(*(pgdp)) = (__ba_to_bpn(pmdp))) #define pgd_none(pgd) (!pgd_val(pgd)) -- David Gibson | For every complex problem there is a david at gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 03:41:19 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 12:41:19 -0500 Subject: PPC64 Compiler bug !! Message-ID: <20030612124119.A6938@forte.austin.ibm.com> My apologies in advance for the hallucinatory message to follow. I have several PPC64 compilers that seem to have a bug in them. Source code: void spin_lock(int *lock); long linas_lock(int dev) { long flags =0; spin_lock(&dev); return flags; } Compiler flags: /opt/cross/bin/powerpc64-linux-gcc -c -Wa,-al linas.i Assembly Listing: GAS LISTING /tmp/ccOqxB1c.s page 1 1 .file "linas.i" 2 .section ".text" 3 .align 2 4 .globl linas_lock 5 .section ".opd","aw" 6 .align 3 7 linas_lock: 8 0000 00000000 .quad .linas_lock,.TOC. at tocbase,0 8 00000000 8 00000000 8 00000000 8 00000000 9 .previous 10 .size linas_lock,24 11 .type .linas_lock, at function 12 .globl .linas_lock 13 .linas_lock: 14 0000 7C0802A6 mflr 0 15 0004 FBE1FFF8 std 31,-8(1) 16 0008 F8010010 std 0,16(1) 17 000c F821FF71 stdu 1,-144(1) 18 0010 7C3F0B78 mr 31,1 19 0014 7C601B78 mr 0,3 20 0018 901F00C0 stw 0,192(31) 21 001c 38000000 li 0,0 22 0020 F81F0070 std 0,112(31) 23 0024 387F00C0 addi 3,31,192 24 0028 48000001 bl .spin_lock 25 002c 60000000 nop 26 0030 C81F0070 lfd 0,112(31) 27 0034 D81F0078 stfd 0,120(31) 28 0038 E81F0078 ld 0,120(31) 29 003c 7C030378 mr 3,0 30 0040 E8210000 ld 1,0(1) 31 0044 E8010010 ld 0,16(1) 32 0048 7C0803A6 mtlr 0 33 004c EBE1FFF8 ld 31,-8(1) 34 0050 4E800020 blr 35 .LTlinas_lock: 36 0054 00000000 .long 0 37 0058 00000001 .byte 0,0,0,1,128,1,0,1 37 80010001 38 .size .linas_lock,.-.linas_lock 39 .ident "GCC: (GNU) 3.2" See line 26, 27 above: lfd, stfd !!!?? The first 6-bits of 0xc8, 0xd8 are primary opcodes 50 and 54, load and store floating point double. Surely when I wake up, it will be clear that this was a dream ?? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From sjmunroe at us.ibm.com Fri Jun 13 04:39:59 2003 From: sjmunroe at us.ibm.com (Steve Munroe) Date: Thu, 12 Jun 2003 13:39:59 -0500 Subject: PPC64 Compiler bug !! Message-ID: linas writes: > See line 26, 27 above: lfd, stfd !!!?? The first 6-bits of 0xc8, 0xd8 are > primary opcodes 50 and 54, load and store floating point double. > > Surely when I wake up, it will be clear that this was a dream ?? No this is not a dream but a "feature" of gcc. Unless you explicitly tell it not to use hardware floating point (-msoft-flost) gcc may use fprs as extra volatile registers for 8-byte move/copies. This seems to be left over from ppc32 there is may have been a good idea (lfd/stfd replaces lwz/lwz/stw/stw). But on PPC64 this is dumb unless you are actually doing floating point. Send you cards and letters to our friend Alan Modra on this topic. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olh at suse.de Fri Jun 13 04:44:29 2003 From: olh at suse.de (Olaf Hering) Date: Thu, 12 Jun 2003 20:44:29 +0200 Subject: PPC64 Compiler bug !! In-Reply-To: References: Message-ID: <20030612184429.GA6913@suse.de> On Thu, Jun 12, Steve Munroe wrote: > Send you cards and letters to our friend Alan Modra on this topic. You can do that, yes. But better use -O2 for your projects. 3.2.2 has it fixed btw. Gruss Olaf -- USB is for mice, FireWire is for men! ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 04:55:28 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 13:55:28 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: ; from sjmunroe@us.ibm.com on Thu, Jun 12, 2003 at 01:39:59PM -0500 References: Message-ID: <20030612135528.E21464@forte.austin.ibm.com> Hi Steve, On Thu, Jun 12, 2003 at 01:39:59PM -0500, Steve Munroe wrote: > linas writes: > > > See line 26, 27 above: lfd, stfd !!!?? The first 6-bits of 0xc8, 0xd8 > are > > primary opcodes 50 and 54, load and store floating point double. > > > > Surely when I wake up, it will be clear that this was a dream ?? > > No this is not a dream but a "feature" of gcc. Unless you explicitly tell > it not to use hardware floating point (-msoft-flost) gcc may use fprs as > extra volatile registers for 8-byte move/copies. This seems to be left > over from ppc32 there is may have been a good idea (lfd/stfd replaces > lwz/lwz/stw/stw). But on PPC64 this is dumb unless you are actually doing > floating point. > > Send you cards and letters to our friend Alan Modra on this topic. OK, that's a sane answer. I first saw this when compiling a custom kernel module; I see that the standard linux kernel uses -msoft-flost in its standard Makefile, and I will be tracking down this custom code shortly. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 05:10:59 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 14:10:59 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <20030612184429.GA6913@suse.de>; from olh@suse.de on Thu, Jun 12, 2003 at 08:44:29PM +0200 References: <20030612184429.GA6913@suse.de> Message-ID: <20030612141058.A7028@forte.austin.ibm.com> On Thu, Jun 12, 2003 at 08:44:29PM +0200, Olaf Hering wrote: > On Thu, Jun 12, Steve Munroe wrote: > > > Send you cards and letters to our friend Alan Modra on this topic. > > You can do that, yes. But better use -O2 for your projects. > 3.2.2 has it fixed btw. My more complex test case, when compiled with gcc-3.2 and -O2 still generates a fair number of lfd, stfd., whereas -msoft-float really makes them all go away. In fact, almost every/any file in the kernel, when compiled w/ -O2 but w/o the -msoft flag is going to have stfd,lfd in it. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From peter at bergner.org Fri Jun 13 06:11:39 2003 From: peter at bergner.org (Peter Bergner) Date: Thu, 12 Jun 2003 15:11:39 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <20030612135528.E21464@forte.austin.ibm.com> References: <20030612135528.E21464@forte.austin.ibm.com> Message-ID: <3EE8DE7B.1030400@bergner.org> linas at austin.ibm.com wrote: > OK, that's a sane answer. I first saw this when compiling a custom > kernel module; I see that the standard linux kernel uses -msoft-flost > in its standard Makefile, and I will be tracking down this custom code > shortly. Using -msoft-float when compiling the kernel and/or kernel modules is a _requirement_! We've hit too many user data integrity errors due to third party modules not being compiled with -msoft-float, which is why I changed the kernel src to force a panic if we take a FP Unavailable exception within the kernel. Peter ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Fri Jun 13 07:47:30 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Fri, 13 Jun 2003 07:17:30 +0930 Subject: PPC64 Compiler bug !! In-Reply-To: References: Message-ID: <20030612214730.GH23826@bubble.sa.bigpond.net.au> Various people have tried to implement patches that prevent floating point regs being used for moves. For examples, see http://gcc.gnu.org/ml/gcc/2003-06/msg00902.html http://gcc.gnu.org/ml/gcc/2002-10/msg00707.html Yes, I could also try to fix gcc to not use float regs for integer moves, but my attempt would also likely be blocked by the current rs6000 maintainers. The trouble is that to prevent this problem re-occurring in kernel code, you'd need to have the options on *by default*, and this will have some impact on user code that could use extra registers. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 08:03:07 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 17:03:07 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <3EE8DE7B.1030400@bergner.org>; from peter@bergner.org on Thu, Jun 12, 2003 at 03:11:39PM -0500 References: <20030612135528.E21464@forte.austin.ibm.com> <3EE8DE7B.1030400@bergner.org> Message-ID: <20030612170307.A41510@forte.austin.ibm.com> On Thu, Jun 12, 2003 at 03:11:39PM -0500, Peter Bergner wrote: > linas at austin.ibm.com wrote: > > OK, that's a sane answer. I first saw this when compiling a custom > > kernel module; I see that the standard linux kernel uses -msoft-flost > > in its standard Makefile, and I will be tracking down this custom code > > shortly. > > Using -msoft-float when compiling the kernel and/or kernel modules is a > _requirement_! We've hit too many user data integrity errors due to > third party modules not being compiled with -msoft-float, which is why > I changed the kernel src to force a panic if we take a FP Unavailable > exception within the kernel. That's fine, that is in fact how we tripped over this thing. I have no complaints, it just sort of came as an unexpected surprise. I've mucked with gcc internals and .md in the past, and even with that background, it never occured to me that this was a 'feature' not a 'bug'. Seems that not too many people know about this. Most people probably never trip on this, since they suck in the standard kernel makefile; I was dealing with a non-standard setup. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 08:04:07 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 17:04:07 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <200306122031.QAA30664@makai.watson.ibm.com>; from dje@watson.ibm.com on Thu, Jun 12, 2003 at 04:31:54PM -0400 References: <20030612124119.A6938@forte.austin.ibm.com> <200306122031.QAA30664@makai.watson.ibm.com> Message-ID: <20030612170407.B41510@forte.austin.ibm.com> On Thu, Jun 12, 2003 at 04:31:54PM -0400, David Edelsohn wrote: > This is not a bug and not producing incorrect code, as others have > explained. If you want me to care about email messages from you, do not > abuse the term "bug" and do not send email with multiple explanation > points in the subject line. > > david ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 08:23:21 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 17:23:21 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <200306122031.QAA30664@makai.watson.ibm.com>; from dje@watson.ibm.com on Thu, Jun 12, 2003 at 04:31:54PM -0400 References: <20030612124119.A6938@forte.austin.ibm.com> <200306122031.QAA30664@makai.watson.ibm.com> Message-ID: <20030612172321.C41510@forte.austin.ibm.com> Hi David, On Thu, Jun 12, 2003 at 04:31:54PM -0400, David Edelsohn wrote: > This is not a bug and not producing incorrect code, as others have > explained. The most reasonable statement I heard was that on a 64-bit machine, there wasn't much point in using the fp regs to copy 64-bit data. I don't know enough about the ppc64 implementation internals to know whether doing this would cause or avoid any kind of pipeline or instruction issue stalls. As to it's being a bug, I've messed around with gcc internals before, as well as with other compilers, and it sure came as a surprise to me. I was helping out some linux kernel developers here who were equally clueless. Not everybody on the planet can know everything. Stop putting me in a defensive posture. > If you want me to care about email messages from you, do not > abuse the term "bug" and do not send email with multiple explanation > points in the subject line. ??!! That's fine!! I'll punctuate however I want to!! ;-> Feel free to add me to your 'kill file' !! -- linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Fri Jun 13 08:25:56 2003 From: anton at samba.org (Anton Blanchard) Date: Fri, 13 Jun 2003 08:25:56 +1000 Subject: PPC64 Compiler bug !! In-Reply-To: <20030612214730.GH23826@bubble.sa.bigpond.net.au> References: <20030612214730.GH23826@bubble.sa.bigpond.net.au> Message-ID: <20030612222556.GL1195@krispykreme> > Yes, I could also try to fix gcc to not use float regs for integer > moves, but my attempt would also likely be blocked by the current > rs6000 maintainers. The trouble is that to prevent this problem > re-occurring in kernel code, you'd need to have the options on > *by default*, and this will have some impact on user code that could use > extra registers. With lazy FP save/restore in Linux we have the extra cost of taking an exception the first time (after a context switch) we use an FP temporary. Im guessing AIX doesnt do this. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Fri Jun 13 08:53:40 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Fri, 13 Jun 2003 08:23:40 +0930 Subject: PPC64 Compiler bug !! In-Reply-To: <20030612222556.GL1195@krispykreme> References: <20030612214730.GH23826@bubble.sa.bigpond.net.au> <20030612222556.GL1195@krispykreme> Message-ID: <20030612225340.GI23826@bubble.sa.bigpond.net.au> On Fri, Jun 13, 2003 at 08:25:56AM +1000, Anton Blanchard wrote: > With lazy FP save/restore in Linux we have the extra cost of taking an > exception the first time (after a context switch) we use an FP temporary. > Im guessing AIX doesnt do this. We could improve this by changing the register allocation order, so that code doesn't tend to use fp regs for moves. Something like the following (untested!) patch. Index: gcc/config/rs6000/rs6000.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.h,v retrieving revision 1.278 diff -u -p -r1.278 rs6000.h --- gcc/config/rs6000/rs6000.h 4 Jun 2003 17:50:43 -0000 1.278 +++ gcc/config/rs6000/rs6000.h 12 Jun 2003 22:47:16 -0000 @@ -881,17 +881,17 @@ extern int rs6000_alignment_flags; #endif #define REG_ALLOC_ORDER \ - {32, \ - 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, \ - 33, \ - 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \ - 50, 49, 48, 47, 46, \ - 75, 74, 69, 68, 72, 71, 70, \ + {75, 74, 69, 68, 72, 71, 70, \ 0, MAYBE_R2_AVAILABLE \ 9, 11, 10, 8, 7, 6, 5, 4, \ 3, \ 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, \ 18, 17, 16, 15, 14, 13, 12, \ + 32, \ + 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, \ + 33, \ + 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \ + 50, 49, 48, 47, 46, \ 64, 66, 65, \ 73, 1, MAYBE_R2_FIXED 67, 76, \ /* AltiVec registers. */ \ -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 13 09:30:31 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 12 Jun 2003 18:30:31 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <20030612214730.GH23826@bubble.sa.bigpond.net.au>; from amodra@bigpond.net.au on Fri, Jun 13, 2003 at 07:17:30AM +0930 References: <20030612214730.GH23826@bubble.sa.bigpond.net.au> Message-ID: <20030612183030.D41510@forte.austin.ibm.com> Hi Alan, On Fri, Jun 13, 2003 at 07:17:30AM +0930, Alan Modra wrote: > Various people have tried to implement patches that prevent floating > point regs being used for moves. For examples, see > > http://gcc.gnu.org/ml/gcc/2003-06/msg00902.html > http://gcc.gnu.org/ml/gcc/2002-10/msg00707.html > > Yes, I could also try to fix gcc to not use float regs for integer > moves, but my attempt would also likely be blocked by the current > rs6000 maintainers. The trouble is that to prevent this problem > re-occurring in kernel code, you'd need to have the options on > *by default*, and this will have some impact on user code that could use > extra registers. > > -- > Alan Modra > IBM OzLabs - Linux Technology Centre Yikes! I had no idea this was a hot issue. Let me think aloud here for a little bit, and see who agrees with what. 1) The use of fp for DImode move is an interesting optimization if either: a) one has run out of regular DImode GPR's. But this was not the case for the simple test case I sent in: I was compiling for a 64-bit target, there were plenty of unused GPR's. b) there is some neato cpu-implementation detail that makes the cpu instruction issue or instruction pipeline go faster by using fp regs for copies. I have no idea, I'd be surprised. 2) Last I looked at gcc guts, there was a way of indicating a preference for which machine description would get used. I don't know if that mechanism is flexible enough. Naively, I would think that the right thing to do would be to implement a DImode would use GPR's till GPR's got exhausted, and then start using FPR's. I'll bet that this would make 99% of all FPR moves disappear. 3) For Linux kernel hackers, there seems to be an acceptable work-around with the -msoft-float flag. I'm willing to accept this flag. Note, however, that this optimization seems to have a real, measurable economic effect: there has been plenty of time and salary $$ spent on this issue: In my area alone, the original report of the kernel crash has floated up & down the management chain, and has impacted the critical timeline for new hardware development. Given the history of the mailing list postings, it seems like I'm not the only one who has been caught off-guard by this. 4) At other times/other eras/other compilers, this sure smells like an -O3 optimization not an -O2 optimization. I've got nothing against optimizations, but the fpr-move is generated even when optimization is turned off. Its a surprise. Surprises cause chaos and economic loss. If this insn was generated when I said -O3, I would have said to myself, 'wow, what a wild and freaky and interesting optimization, the guy/gal who came up with this was a genius!' Instead, its hard to keep myself from thinking 'wow, wild and freaky! Whoever did this was a moron! I wasted a day cleaning up after this turd!' 5) There are some user-land issues having to do with Linux context-switching when an app uses fpr's. With the gcc-3.2.0 ppc64 compiler, I get the impression that *every* ppc64 app is an FP app. I don't know if that's the case for gcc-3.2.2 or if this happens when compiling for the 32-bit target. Because of 1a) and 2), it seems to me that there is a performance loss, because the binary didn't actually run faster (because it wasn't tight on GPR's), but it did run slower cause of the kernel context switch. So at least with 3.2.0, a 64-bit app would see a net performance loss. (albeit slight). Based on the above logic, I want to conclude that the right thing to do is to make ppc64 use GPR's not FPR's for the default DImode move, and make it so that the DI move uses FPR's when -O3 is specified. Assuming that someone provided a patch to do this, would the rs6000 maintainers turn it down? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From dje at watson.ibm.com Fri Jun 13 14:50:17 2003 From: dje at watson.ibm.com (David Edelsohn) Date: Fri, 13 Jun 2003 00:50:17 -0400 Subject: PPC64 Compiler bug !! In-Reply-To: Message from linas@austin.ibm.com of "Thu, 12 Jun 2003 18:30:31 CDT." <20030612183030.D41510@forte.austin.ibm.com> Message-ID: <200306130450.AAA30474@makai.watson.ibm.com> >>>>> linas writes: linas> Assuming that someone provided a patch to do this, would the rs6000 linas> maintainers turn it down? Yes, any such patch will be rejected. Customers, especially in the embedded space, do not want "discourage use of FPRs for DImode", they want "promise to not use any FPRs in an integer-only function". GCC cannot accomplish that safely in the target-specific portion of the compiler. David ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From amodra at bigpond.net.au Fri Jun 13 15:09:28 2003 From: amodra at bigpond.net.au (Alan Modra) Date: Fri, 13 Jun 2003 14:39:28 +0930 Subject: PPC64 Compiler bug !! In-Reply-To: <200306130450.AAA30474@makai.watson.ibm.com> References: <20030612183030.D41510@forte.austin.ibm.com> <200306130450.AAA30474@makai.watson.ibm.com> Message-ID: <20030613050928.GM23826@bubble.sa.bigpond.net.au> On Fri, Jun 13, 2003 at 12:50:17AM -0400, David Edelsohn wrote: > >>>>> linas writes: > > linas> Assuming that someone provided a patch to do this, would the rs6000 > linas> maintainers turn it down? > > Yes, any such patch will be rejected. > > Customers, especially in the embedded space, do not want > "discourage use of FPRs for DImode", they want "promise to not use any > FPRs in an integer-only function". GCC cannot accomplish that safely in > the target-specific portion of the compiler. But we do want "discourage use of FPRs for DImode". User code will incur an exception on first use of a FPR. That means GPRs are cheaper to use than FPRs for moving blocks of memory under Linux. -- Alan Modra IBM OzLabs - Linux Technology Centre ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Fri Jun 13 23:53:56 2003 From: anton at samba.org (Anton Blanchard) Date: Fri, 13 Jun 2003 23:53:56 +1000 Subject: PPC64 Compiler bug !! In-Reply-To: <20030613050928.GM23826@bubble.sa.bigpond.net.au> References: <20030612183030.D41510@forte.austin.ibm.com> <200306130450.AAA30474@makai.watson.ibm.com> <20030613050928.GM23826@bubble.sa.bigpond.net.au> Message-ID: <20030613135356.GS1195@krispykreme> > But we do want "discourage use of FPRs for DImode". User code will > incur an exception on first use of a FPR. That means GPRs are cheaper > to use than FPRs for moving blocks of memory under Linux. For the non believers in the audience try the following program. It forces a context switch (to unlazy the FPU), then times how long a lfd takes. eg on a POWER4 box (timebase ticks at 1/8 processor frequency), I got 139, 420, 133, 202, 186 So best case was over 1000 processor cycles. Ouch. Anton static inline unsigned long read_tsc(void) { unsigned long tmp; asm volatile("mftb %0" : "=r" (tmp)); return tmp; } int main() { unsigned long before, after; double foo; unsigned long bar; unsigned long long blah; /* force a context switch */ sleep(1); before = read_tsc(); #if 1 asm volatile("lfd %0, %1" :"=f"(foo) : "m"(blah)); #else asm volatile("lwz %0, %1": "=r"(bar) : "m"(blah)); #endif after = read_tsc(); printf("%d timebase ticks\n", after - before); } ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From fleming at austin.ibm.com Sat Jun 14 01:16:47 2003 From: fleming at austin.ibm.com (fleming at austin.ibm.com) Date: Fri, 13 Jun 2003 10:16:47 -0500 (CDT) Subject: PPC64 Compiler bug !! In-Reply-To: <20030612222556.GL1195@krispykreme> from "Anton Blanchard" at Jun 13, 2003 08:25:56 AM Message-ID: <200306131516.h5DFGlM39824@fleming.austin.ibm.com> > With lazy FP save/restore in Linux we have the extra cost of taking an > exception the first time (after a context switch) we use an FP temporary. > Im guessing AIX doesnt do this. AIX does lazy floating point save/restore, though perhaps not in the same way as Linux. So the behavior of a program (in AIX) that uses fprs for temp registers depends on the fp usage of other programs running on that CPU. Cheers, Matt -- fleming at austin.ibm.com ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 14 01:39:48 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 13 Jun 2003 10:39:48 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <200306130450.AAA30474@makai.watson.ibm.com>; from dje@watson.ibm.com on Fri, Jun 13, 2003 at 12:50:17AM -0400 References: <20030612183030.D41510@forte.austin.ibm.com> <200306130450.AAA30474@makai.watson.ibm.com> Message-ID: <20030613103947.A24300@forte.austin.ibm.com> On Fri, Jun 13, 2003 at 12:50:17AM -0400, David Edelsohn wrote: > >>>>> linas writes: > > linas> Assuming that someone provided a patch to do this, would the rs6000 > linas> maintainers turn it down? > > Yes, any such patch will be rejected. > > Customers, especially in the embedded space, do not want > "discourage use of FPRs for DImode", Maybe I've forgotten how gcc works, but isn't DImode an "integer only function"? I thought DF was for double-floats.? I haven't looked at gcc source for years, excuse my confusion. > they want "promise to not use any > FPRs in an integer-only function". I have a very simple integer-only function, which I compile w/o any optimization at all, and it uses FPR's. Lets look at the code again: void spin_lock(int *lock); long linas_lock(int dev) { long flags =0; spin_lock(&dev); return flags; } When compiled for a 64-bit target, there should be gobs of free 64-bit regs; I have a hard time understanding why the use of an FPR is desired or advantageous in any way in this func. > GCC cannot accomplish that safely in > the target-specific portion of the compiler. What do you mean by 'safely'? Surely -msoft-float is "safe" or is that not true? During one of the intermediate stages of processing, one must surely be able to scan the function and realize that it was integer-only. At this point, if -O3 is not set, one could internally turn on -msoft-float, and continue onwards. Surely, this would be a safe and simple way to gaurentee the desired result. Its possible such a patch would also hit target-independent code. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From dje at watson.ibm.com Sat Jun 14 02:40:06 2003 From: dje at watson.ibm.com (David Edelsohn) Date: Fri, 13 Jun 2003 12:40:06 -0400 Subject: PPC64 Compiler bug !! In-Reply-To: Message from linas@austin.ibm.com of "Fri, 13 Jun 2003 10:39:48 CDT." <20030613103947.A24300@forte.austin.ibm.com> Message-ID: <200306131640.MAA29696@makai.watson.ibm.com> >>>>> linas writes: linas> What do you mean by 'safely'? Surely -msoft-float is "safe" or is that linas> not true? -msoft-float converts all floating point computations to emulation, as opposed to only using FPRs for floating point statements. Please review the original problem and the documentation for "-msoft-float" for more information. linas> During one of the intermediate stages of processing, one must surely linas> be able to scan the function and realize that it was integer-only. linas> At this point, if -O3 is not set, one could internally turn on -msoft-float, linas> and continue onwards. Surely, this would be a safe and simple way to linas> gaurentee the desired result. Such a patch would not be limited to the target-specific portion of the compiler, which is what all of the previous patches have limited themselves to. David ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 14 03:23:44 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 13 Jun 2003 12:23:44 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <200306131640.MAA29696@makai.watson.ibm.com>; from dje@watson.ibm.com on Fri, Jun 13, 2003 at 12:40:06PM -0400 References: <20030613103947.A24300@forte.austin.ibm.com> <200306131640.MAA29696@makai.watson.ibm.com> Message-ID: <20030613122344.A24578@forte.austin.ibm.com> On Fri, Jun 13, 2003 at 12:40:06PM -0400, David Edelsohn wrote: > >>>>> linas writes: > > linas> What do you mean by 'safely'? Surely -msoft-float is "safe" or is that > linas> not true? > > -msoft-float converts all floating point computations to > emulation, as opposed to only using FPRs for floating point statements. > Please review the original problem and the documentation for > "-msoft-float" for more information. Based on the email replies, and on a few tests run over here, -msoft-float also seems to have the undocumented but valuable side effect of preventing the use of FPR's in DImode moves. The Linux kernel appears to depend vitally on this side-effect. My documentation seems to only mention -msoft-float for the DEC Alpha, and not as a PPC flag. The original problem was that a third party wrote a Linux device driver that failed to use the msoft-float flag, and then handed that code off to some kernel developers here, who didn't know about this flag. They complained of intermittent crashes, which become very common with the latest compiler. It got escalated, it landed on my desk. I was surprised to see floating pt instructions being generated in integer code, as this will cause obvious race conditions and corruption in the kernel. > linas> During one of the intermediate stages of processing, one must surely > linas> be able to scan the function and realize that it was integer-only. > linas> At this point, if -O3 is not set, one could internally turn on -msoft-float, > linas> and continue onwards. Surely, this would be a safe and simple way to > linas> gaurentee the desired result. > > Such a patch would not be limited to the target-specific portion > of the compiler, which is what all of the previous patches have limited > themselves to. I can't comment on the quality of the previous patches; its reasonable to think a target-specific patch would be preferable. On the other hand, if DImode moves are a good optimization for PPC, its likely that it would be good for other modern CPU's as well. There also seem to be other problems mixed into the controversy as well, this seems to be clouding the issue. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Jun 14 08:02:54 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 14 Jun 2003 08:02:54 +1000 Subject: PPC64 Compiler bug !! In-Reply-To: <200306131516.h5DFGlM39824@fleming.austin.ibm.com> References: <20030612222556.GL1195@krispykreme> <200306131516.h5DFGlM39824@fleming.austin.ibm.com> Message-ID: <20030613220254.GA32097@krispykreme> > AIX does lazy floating point save/restore, though perhaps not in the > same way as Linux. So the behavior of a program (in AIX) that uses > fprs for temp registers depends on the fp usage of other programs > running on that CPU. We do that on UP. On SMP the potential cost and complexity of sending an IPI to unlazy the FPU when your process changes cpus is considered too great to bother. Anyway if AIX is using some form of lazy fp save/restore then it casts the gcc optimisation into further doubt. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From s.holzhueter at shoki.de Mon Jun 16 18:13:57 2003 From: s.holzhueter at shoki.de (Sven =?iso-8859-15?q?Holzh=FCter?=) Date: Mon, 16 Jun 2003 10:13:57 +0200 Subject: AS/400 Model 500 Message-ID: <200306161013.57815.s.holzhueter@shoki.de> we have got an older AS/400. It has got already a RISC-Processor. Any Chance to get a PPC-Linux running on that machine? And i would like to get it run natively - without any OS/400. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From rswanber at us.ibm.com Tue Jun 17 05:41:39 2003 From: rswanber at us.ibm.com (Randy Swanberg) Date: Mon, 16 Jun 2003 14:41:39 -0500 Subject: PPC64 Compiler bug !! Message-ID: >> AIX does lazy floating point save/restore, though perhaps not in the >> same way as Linux. So the behavior of a program (in AIX) that uses >> fprs for temp registers depends on the fp usage of other programs >> running on that CPU. > >We do that on UP. On SMP the potential cost and complexity of sending >an IPI to unlazy the FPU when your process changes cpus is considered >too great to bother. > >Anyway if AIX is using some form of lazy fp save/restore then it casts >the gcc optimisation into further doubt. AIX is only "somewhat" lazy on SMP. FP use by kernel or kernel extensions is strictly forbidden (will crash). So, a thread can enter/exit the kernel multiple times (syscalls, interrupts) without any FP save/restore. However, on a thread context switch (in SMP) the FPRs are saved (provided the thread had used them) but not explicitly restored. The next time that thread is dispatched, its first reference to the FP unit will cause its FP state to be restored. Randy ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From bergner at vnet.ibm.com Tue Jun 17 12:18:57 2003 From: bergner at vnet.ibm.com (Peter Bergner) Date: Mon, 16 Jun 2003 21:18:57 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: References: Message-ID: <3EEE7A91.1060803@vnet.ibm.com> Randy Swanberg wrote: > However, on a thread context switch (in SMP) > the FPRs are saved (provided the thread had used them) but not > explicitly restored. The next time that thread is dispatched, > its first reference to the FP unit will cause its FP state to be > restored. This is the same for us on SMP, we don't do lazy save, but we do implement lazy restore. Peter ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mrs at apple.com Tue Jun 17 15:05:22 2003 From: mrs at apple.com (Mike Stump) Date: Mon, 16 Jun 2003 22:05:22 -0700 Subject: PPC64 Compiler bug !! In-Reply-To: <20030613122344.A24578@forte.austin.ibm.com> Message-ID: <4C63A56E-A081-11D7-8138-003065A77310@apple.com> On Friday, June 13, 2003, at 10:23 AM, linas at austin.ibm.com wrote: > Based on the email replies, and on a few tests run over here, > -msoft-float also seems to have the undocumented but valuable side > effect of preventing the use of FPR's in DImode moves. On processors that have specialized registers for the floating point unit, -msoft-float disappears those registers. This should be documented and certainly can be relied upon. If you want to fix the documentation to be clearer, that'd be good. > My documentation seems to only mention -msoft-float for the DEC Alpha, > and not as a PPC flag. Odd. My documentation seems to have it: http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/RS-6000-and-PowerPC- Options.html#RS%2f6000%20and%20PowerPC%20Options ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Tue Jun 17 15:42:21 2003 From: anton at samba.org (Anton Blanchard) Date: Tue, 17 Jun 2003 15:42:21 +1000 Subject: pci <-> device node mapping Message-ID: <20030617054221.GE1172@krispykreme> Hi, I put together a PCI domain patch for 2.5 which removed some of the complexity in our pci code. Im now wondering if there is any reason for us to map from pci dev -> OF node -> phb: #define PCI_GET_PHB_PTR(dev) (((struct device_node *)(dev)->sysdata)->phb) >From memory the main reason for this was for EEH, but we no longer need it for this. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Jun 18 02:28:49 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 17 Jun 2003 11:28:49 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: <4C63A56E-A081-11D7-8138-003065A77310@apple.com>; from mrs@apple.com on Mon, Jun 16, 2003 at 10:05:22PM -0700 References: <20030613122344.A24578@forte.austin.ibm.com> <4C63A56E-A081-11D7-8138-003065A77310@apple.com> Message-ID: <20030617112849.B41234@forte.austin.ibm.com> On Mon, Jun 16, 2003 at 10:05:22PM -0700, Mike Stump wrote: > On Friday, June 13, 2003, at 10:23 AM, linas at austin.ibm.com wrote: > > Based on the email replies, and on a few tests run over here, > > -msoft-float also seems to have the undocumented but valuable side > > effect of preventing the use of FPR's in DImode moves. > > On processors that have specialized registers for the floating point > unit, -msoft-float disappears those registers. This should be > documented and certainly can be relied upon. Well, that belies the point. I think I can safely state that most programmers, when faced with an all-integer program, would assume that the compiler generated pure-integer code, and thus it would never occur to them to study or take interest in any float-point flags. That's what most of this sqwak is about. > If you want to fix the documentation to be clearer, that'd be good. I don't have commit access to the documentation source. Maybe something along the following lines would be acceptable? -msoft-float -mhard-float Generate code that does not use (uses) the floating- point register set. Software floating point emulation is provided if you use the -msoft-float option, and pass the option to GCC when linking. In some cases, gcc will use floating point registers to temporarily hold 64-bit integer values. Specifying -msoft-float will prevent this optimization from being made. > > My documentation seems to only mention -msoft-float for the DEC Alpha, > > and not as a PPC flag. > > Odd. My documentation seems to have it: Right. Very odd, it seem that man-db 2.4.1 is broken, cause I can clearly see the docos in man 2.3.19. The gcc man page is OK, but the latest man command can't find the string in the man page. Ahh, bugs ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From jsm28 at cam.ac.uk Wed Jun 18 02:33:27 2003 From: jsm28 at cam.ac.uk (Joseph S. Myers) Date: Tue, 17 Jun 2003 17:33:27 +0100 (BST) Subject: PPC64 Compiler bug !! In-Reply-To: <20030617112849.B41234@forte.austin.ibm.com> References: <20030613122344.A24578@forte.austin.ibm.com> <4C63A56E-A081-11D7-8138-003065A77310@apple.com> <20030617112849.B41234@forte.austin.ibm.com> Message-ID: On Tue, 17 Jun 2003 linas at austin.ibm.com wrote: > Right. Very odd, it seem that man-db 2.4.1 is broken, cause I can > clearly see the docos in man 2.3.19. The gcc man page is OK, but > the latest man command can't find the string in the man page. > Ahh, bugs ... Might this be something to do with bug 11146 (NUL characters in manpage, maybe arising through a pod2man bug)? That seems like something that could confuse man implementations. -- Joseph S. Myers jsm28 at cam.ac.uk ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Jun 18 02:39:51 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 17 Jun 2003 11:39:51 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030617054221.GE1172@krispykreme>; from anton@samba.org on Tue, Jun 17, 2003 at 03:42:21PM +1000 References: <20030617054221.GE1172@krispykreme> Message-ID: <20030617113951.D41234@forte.austin.ibm.com> Hi Anton, On Tue, Jun 17, 2003 at 03:42:21PM +1000, Anton Blanchard wrote: > > Hi, > > I put together a PCI domain patch for 2.5 which removed some of the > complexity in our pci code. Im now wondering if there is any reason for > us to map from pci dev -> OF node -> phb: > > #define PCI_GET_PHB_PTR(dev) (((struct device_node *)(dev)->sysdata)->phb) > > >From memory the main reason for this was for EEH, but we no longer > need it for this. This is an unrelated issue, but what is the story for numbering/renumbering PCI Id's? In kernel-2.4.19, PHB's seem to chew up 256 PCI id's (for 4 slots) and there are some machine configurations which have nearly a hundred PHB's. In one unhappy situation, we had a machine with a graphics card which ended up with a PCI Id of 4097, and the default X11 couldn't find it, and so X wouldn't run. For some reason, X feels a need to scan the PCI bus. Changing X to scan up to 19,000 bus ID's (the theoretical max for this box) caused X to take 20 seconds to come up. Any chance this might change? I'm not saying it has to, just pointing out one of the unintended consequences of the current design. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From scheel at vnet.ibm.com Wed Jun 18 05:07:35 2003 From: scheel at vnet.ibm.com (Jeffrey J. Scheel) Date: Tue, 17 Jun 2003 14:07:35 -0500 Subject: AS/400 Model 500 References: <200306161013.57815.s.holzhueter@shoki.de> Message-ID: <3EEF66F7.BD2C02@vnet.ibm.com> Sven Holzh?ter wrote: > we have got an older AS/400. > It has got already a RISC-Processor. > Any Chance to get a PPC-Linux running on that machine? Linux support on the AS/400 requires LPAR. The model 500 is not a supported model for LPAR. See http://www-1.ibm.com/servers/eserver/iseries/lpar/chart.htm for more details. -- Jeffrey J. Scheel (scheel at vnet.ibm.com) ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mrs at apple.com Wed Jun 18 07:07:16 2003 From: mrs at apple.com (Mike Stump) Date: Tue, 17 Jun 2003 14:07:16 -0700 Subject: PPC64 Compiler bug !! In-Reply-To: <20030617112849.B41234@forte.austin.ibm.com> Message-ID: On Tuesday, June 17, 2003, at 09:28 AM, linas at austin.ibm.com wrote: > On Mon, Jun 16, 2003 at 10:05:22PM -0700, Mike Stump wrote: >> On Friday, June 13, 2003, at 10:23 AM, linas at austin.ibm.com wrote: >> On processors that have specialized registers for the floating point >> unit, -msoft-float disappears those registers. This should be >> documented and certainly can be relied upon. > > Well, that belies the point. I think I can safely state that most > programmers, when faced with an all-integer program, would assume > that the compiler generated pure-integer code, and thus it would > never occur to them to study or take interest in any float-point > flags. Most programmers don't have a clue about what the compiler can and can't do (then). But, so what? This doesn't stop them from being productive. :-) 99% of them don't care and don't need to care. People generating code for a kernel, need to follow the kernel guide, and in that kernel guide, if they MUST use -msoft-float to compile their code, it will clearly be stated. The don't have to understand it, they just need to follow simple directions. If the kernel requires it and the guide doesn't mention it, that is a bug in the guide. I understand sympathize with your pain. :-( It sounds as if -mno-implicit-fp matches the programmers expectations from your point of view. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Wed Jun 18 07:47:21 2003 From: anton at samba.org (Anton Blanchard) Date: Wed, 18 Jun 2003 07:47:21 +1000 Subject: pci <-> device node mapping In-Reply-To: <20030617113951.D41234@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> Message-ID: <20030617214721.GA20186@krispykreme> > This is an unrelated issue, but what is the story for numbering/renumbering > PCI Id's? In kernel-2.4.19, PHB's seem to chew up 256 PCI id's (for 4 slots) > and there are some machine configurations which have nearly a hundred PHB's. > > In one unhappy situation, we had a machine with a graphics card which > ended up with a PCI Id of 4097, and the default X11 couldn't find it, > and so X wouldn't run. For some reason, X feels a need to scan the > PCI bus. Changing X to scan up to 19,000 bus ID's (the theoretical > max for this box) caused X to take 20 seconds to come up. Yuck. Check out the thread on linux-kernel called "pci_domain_nr vs. /sys/devices". We are hashing out the final bits of pci domains which should solve that problem. http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&threadm=20030611150020%243d80%40gated-at.bofh.it Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Jun 18 09:00:10 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 17 Jun 2003 18:00:10 -0500 Subject: PPC64 Compiler bug !! In-Reply-To: ; from mrs@apple.com on Tue, Jun 17, 2003 at 02:07:16PM -0700 References: <20030617112849.B41234@forte.austin.ibm.com> Message-ID: <20030617180010.C28728@forte.austin.ibm.com> Hi, Well, there still remains the performance issue, that this "optimization" can slow things down, due to the FPU lazy save/restore, and I guess the soon-to-come vector-unit lazy save/restore. The performance gain from this feature is questionable, and may be a performance loss, depending on the number of context switches per second, vs. the number of times an app uses implicit floats. If its used infrequently, its a net performance loss. (Ditto for the vector-unit usage). Lets do the math: Assume 1000 ctx's/second on a loaded system. Assume 1000 cycles burnt on each lazy-restore. We have to gain a lot to make up for this loss. Each process gets 1 millisec time slice. On a 1GHz machine, that's about 1 M cycles. To pay for the lazy-restore, this optimization needs to be "worth it" more often than once per thousand insn's. Its not clear to me that that is the case. If its not, its a net performance loss. We can argue whether my numbers are off by a factor of 5 or 10, but the point is, the less frequently this 'optimization' is actually used or is useful, the more of a performance hit it becomes. Turning it off completely and thoroughly would actually improve performance in this case; I find it curious how it gets inverted like this. Ditto for Geert Bosch's response: >> Well, that belies the point. I think I can safely state that most >> programmers, when faced with an all-integer program, would assume >> that the compiler generated pure-integer code, > > On modern machines with modern compilers, this is no lnoger > a reasonable assumption. In the future, GCC might even use > vector registers while the programmer writes pure scalar code. If the linux kernel is going to handle vector units with a lazy save/restore scheme, the performance benefit of using the vector regs might be (easily) overcome by the performance hit to save the context. --linas p.s. The rest of this reply is off-topic and political .... On Tue, Jun 17, 2003 at 02:07:16PM -0700, Mike Stump wrote: > > On Tuesday, June 17, 2003, at 09:28 AM, linas at austin.ibm.com wrote: > > > > Well, that belies the point. I think I can safely state that most > > programmers, when faced with an all-integer program, would assume > > that the compiler generated pure-integer code, and thus it would > > never occur to them to study or take interest in any float-point > > flags. > > Most programmers don't have a clue about what the compiler can and One of these fabled programmers was sent to me for help with thier kernel crash... > If the kernel requires it and the guide doesn't mention it, that is a > bug in the guide. What kernel guide? I've been poking at the Linux kernel for years, and have yet to see something called 'the kernel guide'. There's megabytes of documentation out there; I haven't read it all, I doubt anyone has. Besides, grep -r soft-float Documentation/* comes up empty. (there's 6MB of stuff there, and its not even the 'good' kernel docs.) > I understand sympathize with your pain. :-( It sounds as if > -mno-implicit-fp matches the programmers expectations from your point > of view. Well, it wasn't my pain, it was someone elses. And it wasn't just my point of view; more'n half-a-dozen other kernel programmers heard about this problem and not a single one of them knew about this 'feature.' I'm just reporting on how others seem to see the world. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Wed Jun 18 09:46:05 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Tue, 17 Jun 2003 18:46:05 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030617214721.GA20186@krispykreme>; from anton@samba.org on Wed, Jun 18, 2003 at 07:47:21AM +1000 References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> Message-ID: <20030617184605.D28728@forte.austin.ibm.com> On Wed, Jun 18, 2003 at 07:47:21AM +1000, Anton Blanchard wrote: > > > This is an unrelated issue, but what is the story for numbering/renumbering > > PCI Id's? In kernel-2.4.19, PHB's seem to chew up 256 PCI id's (for 4 slots) > > and there are some machine configurations which have nearly a hundred PHB's. > > > > In one unhappy situation, we had a machine with a graphics card which > > ended up with a PCI Id of 4097, and the default X11 couldn't find it, > > and so X wouldn't run. For some reason, X feels a need to scan the > > PCI bus. Changing X to scan up to 19,000 bus ID's (the theoretical > > max for this box) caused X to take 20 seconds to come up. > > Yuck. Check out the thread on linux-kernel called "pci_domain_nr vs. > /sys/devices". We are hashing out the final bits of pci domains which > should solve that problem. > > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&threadm=20030611150020%243d80%40gated-at.bofh.it I read it, but I don't know enough about PCI to follow the discussion. Does a PHB map to a PCI domain? As mentioned above some current machines will get something like 50 or 100 PHB's (48 or 64 or 96, I would have to root around for the details), I'm presuming the various participants are aware of this? The other factoid, I guess you're probably aware of this, but the slots with EADS seem to have the bus numbers spaced out so that bus ID's were 32 apart. I don't know why it needs to be so sparse, I thought I heard somebody utter the words 'hot plug' in the same breath, but I'm no longer sure. As long as the pci-hackers on LKML know about these order-of-magnitude numbers & thier sparseness, I will assume I'm in good hands and that the new code really will account for this, right? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Jun 19 01:18:38 2003 From: anton at samba.org (Anton Blanchard) Date: Thu, 19 Jun 2003 01:18:38 +1000 Subject: pci <-> device node mapping In-Reply-To: <20030617184605.D28728@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> Message-ID: <20030618151838.GB23472@krispykreme> > I read it, but I don't know enough about PCI to follow the discussion. > Does a PHB map to a PCI domain? As mentioned above some current > machines will get something like 50 or 100 PHB's (48 or 64 or 96, > I would have to root around for the details), I'm presuming the > various participants are aware of this? On POWER4 and above each host bridge will be a PCI domain. We shouldnt have a problem with large machines, although in the lab we have found some arbitrary limits in things like the sym2 driver (we hit a limit at 32 adapters). Once pci domains and the irq rework have made it in I want to build up a massive machine to see where we fall apart. > The other factoid, I guess you're probably aware of this, but the > slots with EADS seem to have the bus numbers spaced out so that > bus ID's were 32 apart. I don't know why it needs to be so sparse, > I thought I heard somebody utter the words 'hot plug' in the same > breath, but I'm no longer sure. In theory you could hotplug in cards with pci-pci bridges, like the 4 port pcnet32. So to make life easy they space them out a lot. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Thu Jun 19 02:17:47 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Wed, 18 Jun 2003 11:17:47 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030618151838.GB23472@krispykreme>; from anton@samba.org on Thu, Jun 19, 2003 at 01:18:38AM +1000 References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> Message-ID: <20030618111747.A23664@forte.austin.ibm.com> On Thu, Jun 19, 2003 at 01:18:38AM +1000, Anton Blanchard wrote: > > In theory you could hotplug in cards with pci-pci bridges, like the 4 > port pcnet32. So to make life easy they space them out a lot. Ahh, OK, I had this crazy idea that ... never mind. Is there a technical reason for assigned busid's in contiguous order for bridges on cards? Why not just grab the 'next unused id' instead of reserving 32 per slot? I have one RFE, and that is to make sure there is enough info there to be able to correlate the firmware device location string w/ the pci info. The other day, I had a hard time matching the id that the LPAR HMC uses (e.g. P2-I2/E1) with the actual hard drive on a scsi controller (e.g. /dev/hdg). Much of the trouble came from trying to match the entries in /proc/scsi to /proc/pci; I had to compare PCI busids, irq's and do some clever guesswork to match one to the other. If there's a userland tool that does this, we over here aren't aware of it ... --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Thu Jun 19 02:22:20 2003 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 18 Jun 2003 11:22:20 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030618111747.A23664@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> Message-ID: <3EF091BC.2080806@austin.ibm.com> linas at austin.ibm.com wrote: > I had to compare PCI busids, irq's > and do some clever guesswork to match one to the other. If there's a > userland tool that does this, we over here aren't aware of it ... lscfg. -Olof -- Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mrs at apple.com Thu Jun 19 03:51:06 2003 From: mrs at apple.com (Mike Stump) Date: Wed, 18 Jun 2003 10:51:06 -0700 Subject: PPC64 Compiler bug !! In-Reply-To: <20030617180010.C28728@forte.austin.ibm.com> Message-ID: <6F6196D2-A1B5-11D7-9C7C-003065A77310@apple.com> On Tuesday, June 17, 2003, at 04:00 PM, linas at austin.ibm.com wrote: > One of these fabled programmers was sent to me for help with thier > kernel > crash... > >> If the kernel requires it and the guide doesn't mention it, that is a >> bug in the guide. > > What kernel guide? You know, the piece of documentation or Makefile that prevents the kernel crash that you saw. If it doesn't exist, well, you guys will either have to fight kernel crashes all the time, or document it or produce a kernel build environment that sets -msoft-float. That's up to you'll. >> I understand sympathize with your pain. :-( It sounds as if >> -mno-implicit-fp matches the programmers expectations from your point >> of view. > > Well, it wasn't my pain, it was someone elses. And it wasn't just > my point of view; more'n half-a-dozen other kernel programmers heard > about this problem and not a single one of them knew about this > 'feature.' I'm not a kernel programmer and I've known about it for years. :-) Maybe a line like, OS kernels (Linux on PPC, VxWorks on PPC) that don't otherwise provide an FP context in the kernel require that all such code be compiled with -msoft-float on the PPC as otherwise the compiler will use the FP registers for integer code. Simple, direct, to the point. Anyway, since this email includes the linux ppc list, all ppc linux developers now know, don't they. :-) [ I don't mean to discourage you from pressing the gcc folks to provide a solution, if there is one we can provide; I think that would be good ] ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From tinglett at vnet.ibm.com Thu Jun 19 05:36:53 2003 From: tinglett at vnet.ibm.com (Todd Inglett) Date: 18 Jun 2003 14:36:53 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030617054221.GE1172@krispykreme> References: <20030617054221.GE1172@krispykreme> Message-ID: <1055965013.29626.11.camel@q.rchland.ibm.com> On Tue, 2003-06-17 at 00:42, Anton Blanchard wrote: > Hi, > > I put together a PCI domain patch for 2.5 which removed some of the > complexity in our pci code. Im now wondering if there is any reason for > us to map from pci dev -> OF node -> phb: > > #define PCI_GET_PHB_PTR(dev) (((struct device_node *)(dev)->sysdata)->phb) > > >From memory the main reason for this was for EEH, but we no longer > need it for this. The device_node contains a ptr to the tce table. I suppose sysdata could point directly to it, or you could invent some other external means to find it. -- Todd Inglett ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From tinglett at vnet.ibm.com Thu Jun 19 05:40:19 2003 From: tinglett at vnet.ibm.com (Todd Inglett) Date: 18 Jun 2003 14:40:19 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030618111747.A23664@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> Message-ID: <1055965219.29626.16.camel@q.rchland.ibm.com> On Wed, 2003-06-18 at 11:17, linas at austin.ibm.com wrote: > Ahh, OK, I had this crazy idea that ... never mind. Is there a technical > reason for assigned busid's in contiguous order for bridges on cards? > Why not just grab the 'next unused id' instead of reserving 32 per slot? This is just how PCI works. The secondary and subordinate bus numbers of a bridge form a contiguous range of buses that are bridged. So you can't arbitrarily assign numbers without renumbering everything or leaving holes. -- Todd Inglett ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From rswanber at us.ibm.com Thu Jun 19 06:09:37 2003 From: rswanber at us.ibm.com (Randy Swanberg) Date: Wed, 18 Jun 2003 15:09:37 -0500 Subject: pci <-> device node mapping Message-ID: Upstream bridges must know what range of bus ID's to claim and forward downstream. So reserving ranges at the upstream bridge allows subordinate bridges to be dynamically added without forcing reconfiguration of the upstream bridges. Randy Swanberg [ linas at austin.ibm.com wrote: ] On Thu, Jun 19, 2003 at 01:18:38AM +1000, Anton Blanchard wrote: > > In theory you could hotplug in cards with pci-pci bridges, like the 4 > port pcnet32. So to make life easy they space them out a lot. Ahh, OK, I had this crazy idea that ... never mind. Is there a technical reason for assigned busid's in contiguous order for bridges on cards? Why not just grab the 'next unused id' instead of reserving 32 per slot? I have one RFE, and that is to make sure there is enough info there to be able to correlate the firmware device location string w/ the pci info. The other day, I had a hard time matching the id that the LPAR HMC uses (e.g. P2-I2/E1) with the actual hard drive on a scsi controller (e.g. /dev/hdg). Much of the trouble came from trying to match the entries in /proc/scsi to /proc/pci; I had to compare PCI busids, irq's and do some clever guesswork to match one to the other. If there's a userland tool that does this, we over here aren't aware of it ... ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From benh at kernel.crashing.org Thu Jun 19 07:04:00 2003 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: 18 Jun 2003 23:04:00 +0200 Subject: pci <-> device node mapping In-Reply-To: <1055965013.29626.11.camel@q.rchland.ibm.com> References: <20030617054221.GE1172@krispykreme> <1055965013.29626.11.camel@q.rchland.ibm.com> Message-ID: <1055970238.13215.40.camel@gaston> On Wed, 2003-06-18 at 21:36, Todd Inglett wrote: > The device_node contains a ptr to the tce table. I suppose sysdata > could point directly to it, or you could invent some other external > means to find it. Do you rely on this direct pointer in performance critical locations ? One thing I want to do sooner or later on ppc32 and possibly ppc64 as well is to get rid of all fields in the device_node except the actual link pointers and the property list. Additional infos would then be added to device nodes by adding properties. For example, I plan to replace the n_interrupts & interrupts array with a "linux,irq" property. Ben ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Thu Jun 19 21:23:05 2003 From: anton at samba.org (Anton Blanchard) Date: Thu, 19 Jun 2003 21:23:05 +1000 Subject: pci <-> device node mapping In-Reply-To: <20030618111747.A23664@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> Message-ID: <20030619112305.GB13202@krispykreme> > I have one RFE, and that is to make sure there is enough info there to > be able to correlate the firmware device location string w/ the pci info. > > The other day, I had a hard time matching the id that the LPAR HMC > uses (e.g. P2-I2/E1) with the actual hard drive on a scsi controller > (e.g. /dev/hdg). Much of the trouble came from trying to match the > entries in /proc/scsi to /proc/pci; I had to compare PCI busids, irq's > and do some clever guesswork to match one to the other. If there's a > userland tool that does this, we over here aren't aware of it ... Im acutely aware of it, I spent ages trying to bring a large machine up with some broken SCSI adapters, disks and network cards. Im half considering stashing the OF name into pci_dev->slot_name (as well as domain/bus/devfn) and forcing drivers to print it when they fail. Martin Schwenke has also been doing some good work in this area, he some AIX style tools for doing inventory management. We need both, if you dont make it to userspace then you had better hope the driver will print out its full location before dying. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From engebret at vnet.ibm.com Thu Jun 19 23:08:10 2003 From: engebret at vnet.ibm.com (Dave Engebretsen) Date: Thu, 19 Jun 2003 08:08:10 -0500 Subject: pci <-> device node mapping References: <20030617054221.GE1172@krispykreme> <1055965013.29626.11.camel@q.rchland.ibm.com> <1055970238.13215.40.camel@gaston> Message-ID: <3EF1B5BA.B18C15AA@vnet.ibm.com> Benjamin Herrenschmidt wrote: > > On Wed, 2003-06-18 at 21:36, Todd Inglett wrote: > > > The device_node contains a ptr to the tce table. I suppose sysdata > > could point directly to it, or you could invent some other external > > means to find it. > > Do you rely on this direct pointer in performance critical locations ? It is used for each call where TCE mappings are created, so it is on a pretty high use path. Dave. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From tinglett at vnet.ibm.com Fri Jun 20 01:15:21 2003 From: tinglett at vnet.ibm.com (Todd Inglett) Date: 19 Jun 2003 10:15:21 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030619112305.GB13202@krispykreme> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> <20030619112305.GB13202@krispykreme> Message-ID: <1056035721.29626.33.camel@q.rchland.ibm.com> On Thu, 2003-06-19 at 06:23, Anton Blanchard wrote: > Im acutely aware of it, I spent ages trying to bring a large machine up > with some broken SCSI adapters, disks and network cards. Im half > considering stashing the OF name into pci_dev->slot_name (as well > as domain/bus/devfn) and forcing drivers to print it when they fail. I did this in 2.4 with pci_dev->name which actually worked quite well. It would be nice to have an arch defined func to format the name for messages to take care of domains as well as location codes. Doesn't sparc64 have this already (with icky ifdef's everywhere)? -- Todd Inglett ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From tinglett at vnet.ibm.com Fri Jun 20 01:35:37 2003 From: tinglett at vnet.ibm.com (Todd Inglett) Date: 19 Jun 2003 10:35:37 -0500 Subject: pci <-> device node mapping In-Reply-To: <1055970238.13215.40.camel@gaston> References: <20030617054221.GE1172@krispykreme> <1055965013.29626.11.camel@q.rchland.ibm.com> <1055970238.13215.40.camel@gaston> Message-ID: <1056036937.29981.54.camel@q.rchland.ibm.com> On Wed, 2003-06-18 at 16:04, Benjamin Herrenschmidt wrote: > On Wed, 2003-06-18 at 21:36, Todd Inglett wrote: > > > The device_node contains a ptr to the tce table. I suppose sysdata > > could point directly to it, or you could invent some other external > > means to find it. > > Do you rely on this direct pointer in performance critical locations ? > > One thing I want to do sooner or later on ppc32 and possibly ppc64 as > well is to get rid of all fields in the device_node except the actual > link pointers and the property list. Additional infos would then be > added to device nodes by adding properties. For example, I plan to > replace the n_interrupts & interrupts array with a "linux,irq" > property. Yeah, I like this idea. At the time I coded this it was a choice between adding an intermediate node off sysdata pointing to the tce_table and device_node, or just putting the pci stuff into the device node. The latter wasn't as clean but it was easy :). Yeah, poor excuse and it should be cleaned up. The reasons for the direct link to the device node itself may well be all gone -- at least on performance paths. BTW, is there a reason that interrupts array is computed and stored in early boot? Seems to me that the interrupt can be computed on the fly as the pci probe occurs. Not as efficient, but simple and the pci probe itself certainly isn't performance critical. I must be overlooking something. My intuition says the pci_map* paths are performance critical, but I have not personally measured them nor have I seen anyone comment on them and I make it a habit never to count on my intuition for stuff like this :). One other comment is that it was pure hell to map sysdata with a unique value per device. The pci probe code (at least in 2.4) really wanted sysdata to be inherited from the bus being probed. The busno, devfn, etc, that I added to the device node can go away if this is fixed. -- Todd Inglett ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From anton at samba.org Sat Jun 21 07:27:52 2003 From: anton at samba.org (Anton Blanchard) Date: Sat, 21 Jun 2003 07:27:52 +1000 Subject: PPC64 Compiler bug !! In-Reply-To: References: <20030617112849.B41234@forte.austin.ibm.com> Message-ID: <20030620212752.GB1589@krispykreme> > Most programmers don't have a clue about what the compiler can and > can't do (then). But, so what? This doesn't stop them from being > productive. :-) 99% of them don't care and don't need to care. > People generating code for a kernel, need to follow the kernel guide, > and in that kernel guide, if they MUST use -msoft-float to compile > their code, it will clearly be stated. The don't have to understand > it, they just need to follow simple directions. > > If the kernel requires it and the guide doesn't mention it, that is a > bug in the guide. I thought there was going to be an easy way to grab compiler flags in 2.5 for out of tree modules (eg something in /lib/modules/...) although there doesnt seem to be anything (well there is a symlink to the build directory). In 2.5 we might remove -mminimal-toc and if we get adventurous we might look at loading the kernel at -2GB, so correct flags for out of tree modules will be important. Im not sure why we even add -mminimal-toc to kernel modules. A module that overflows the TOC doesnt deserve to live. Anton ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Sat Jun 21 08:14:54 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Fri, 20 Jun 2003 17:14:54 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030619112305.GB13202@krispykreme>; from anton@samba.org on Thu, Jun 19, 2003 at 09:23:05PM +1000 References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> <20030619112305.GB13202@krispykreme> Message-ID: <20030620171454.B30616@forte.austin.ibm.com> Hi Anton, On Thu, Jun 19, 2003 at 09:23:05PM +1000, Anton Blanchard wrote: > > > I have one RFE, and that is to make sure there is enough info there to > > be able to correlate the firmware device location string w/ the pci info. > > > > The other day, I had a hard time matching the id that the LPAR HMC > > uses (e.g. P2-I2/E1) with the actual hard drive on a scsi controller > > (e.g. /dev/hdg). Much of the trouble came from trying to match the > > entries in /proc/scsi to /proc/pci; I had to compare PCI busids, irq's > > and do some clever guesswork to match one to the other. If there's a > > userland tool that does this, we over here aren't aware of it ... > > Im acutely aware of it, I spent ages trying to bring a large machine up > with some broken SCSI adapters, disks and network cards. Im half > considering stashing the OF name into pci_dev->slot_name (as well That would be good. You don't, perchance, have a patch that already supplies this? Linda Xie over here is doing some device driver work, where she gets an RTAS event with a firmware location code in it. Based on the location code, she wants to be able to find the corresponding pci_dev structure. What's the best way to do this? I didn't quite understand the overall design she's working with. She wants to do the mapping in the kernel, although its conceivable the best solution might be to have some userspace deamon catch the RTAS event, convert it to a domain/bus and feed it back to her driver. I dunno, this is unfamiliar ground to me. Without your patch, the only solution I know of is to do a string compare to pci_dev->dev.name and hope to find the first N characters match. Which struct me as a not very pretty way of doing thngs, especially is someone ever mucks with dev.name. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From tinglett at vnet.ibm.com Mon Jun 23 23:41:47 2003 From: tinglett at vnet.ibm.com (Todd Inglett) Date: 23 Jun 2003 08:41:47 -0500 Subject: pci <-> device node mapping In-Reply-To: <20030620171454.B30616@forte.austin.ibm.com> References: <20030617054221.GE1172@krispykreme> <20030617113951.D41234@forte.austin.ibm.com> <20030617214721.GA20186@krispykreme> <20030617184605.D28728@forte.austin.ibm.com> <20030618151838.GB23472@krispykreme> <20030618111747.A23664@forte.austin.ibm.com> <20030619112305.GB13202@krispykreme> <20030620171454.B30616@forte.austin.ibm.com> Message-ID: <1056375707.14053.8.camel@q.rchland.ibm.com> On Fri, 2003-06-20 at 17:14, linas at austin.ibm.com wrote: [...] > Linda Xie over here is doing some device driver work, where she gets > an RTAS event with a firmware location code in it. Based on the location > code, she wants to be able to find the corresponding pci_dev structure. > What's the best way to do this? This is nearly trivial in the current implementation by observing the subject of this thread :). Loop through the pci_dev's and do this for each one until you get a hit: struct device_node *dn = pci_device_to_OF_node(dev); if (dev) { char *loc = (char *)get_property(dn, "ibm,loc-code", 0); if (loc && strcmp(loc, myloc_code) == 0) return dev; } I think Greg KH recently eliminated pci_for_each_dev in 2.5 so you'll have to iterate with another mechanism. IIRC, he patched everything to iterate with pci_find_device() so you might do the same. -- Todd Inglett ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From lxie at us.ibm.com Tue Jun 24 00:29:35 2003 From: lxie at us.ibm.com (Linda Xie) Date: Mon, 23 Jun 2003 09:29:35 -0500 Subject: pci <-> device node mapping Message-ID: Here is what I have in my driver code for getting pci_dev: const struct list head *tmp; struct pci_dev *dev; list_for_each (tmp, &pci_devices) { dev = (struct pci_dev *) pci_dev_g(tmp); if(dev && strstr(dev->dev.name, slot->name)) return dev; } return NULL; I think this is more efficient than using pci_device_to_OF_node(). Todd Inglett [ I wrote: ] > On Fri, 2003-06-20 at 17:14, linas at austin.ibm.com wrote: > [...] > > Linda Xie over here is doing some device driver work, where she > > gets an RTAS event with a firmware location code in it. Based on > > the location code, she wants to be able to find the corresponding > > pci_dev structure. What's the best way to do this? > > This is nearly trivial in the current implementation by observing the > subject of this thread :). Loop through the pci_dev's and do this for > each one until you get a hit: > > struct device_node *dn = pci_device_to_OF_node(dev); > > if (dev) { > char *loc = (char *)get_property(dn, > "ibm,loc-code", 0); > if (loc && strcmp(loc, myloc_code) == 0) > return dev; > } > > I think Greg KH recently eliminated pci_for_each_dev in 2.5 so you'll > have to iterate with another mechanism. IIRC, he patched everything to > iterate with pci_find_device() so you might do the same. ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Tue Jun 24 05:25:29 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Mon, 23 Jun 2003 14:25:29 -0500 Subject: pci <-> device node mapping In-Reply-To: ; from lxie@us.ibm.com on Mon, Jun 23, 2003 at 09:29:35AM -0500 References: Message-ID: <20030623142529.A31502@forte.austin.ibm.com> On Mon, Jun 23, 2003 at 09:29:35AM -0500, Linda Xie wrote: > > Here is what I have in my driver code for getting pci_dev: > > if(dev && strstr(dev->dev.name, slot->name)) Yeah, but I don't think this will give you unique results. If you have the bad luck to be looking for somethng like "P2" (which is the isa bridge on my system) you will have lots of hits, since P2 occurs in other strings as well. Or am I confused? --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mattias at virtutech.se Wed Jun 25 01:51:58 2003 From: mattias at virtutech.se (=?ISO-8859-1?Q?Mattias Engdeg=E5rd?=) Date: Tue, 24 Jun 2003 17:51:58 +0200 Subject: [PATCH] check comport existence for udbg Message-ID: <200306241551.h5OFpwM14501@virtutech.se> If no serial port is detected in the OF device tree, udbg_putc() will still attempt to talk to an imaginary uart at physical address 0. This will likely lead to a hang in udbg_putc(). This trivial patch prevents this by not creating the ioremapping in that case. --- chrp_setup.c~ Tue Jun 17 17:43:50 2003 +++ chrp_setup.c Tue Jun 24 17:37:05 2003 @@ -207,7 +207,9 @@ #endif /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + if (naca->serialPortAddr) + comport = (void *)__ioremap(naca->serialPortAddr, 16, + _PAGE_NO_CACHE); udbg_init_uart(comport); ppc_md.udbg_putc = udbg_putc; ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From mattias at virtutech.se Wed Jun 25 02:06:16 2003 From: mattias at virtutech.se (=?ISO-8859-1?Q?Mattias Engdeg=E5rd?=) Date: Tue, 24 Jun 2003 18:06:16 +0200 Subject: [PATCH] check comport existence for udbg In-Reply-To: <200306241551.h5OFpwM14501@virtutech.se> (mattias@virtutech.se) References: <200306241551.h5OFpwM14501@virtutech.se> Message-ID: <200306241606.h5OG6G115829@virtutech.se> [ description of a bug, and a broken patch ] I forgot a case here. Sorry. Here is a fixed patch. --- chrp_setup.c~ Tue Jun 17 17:43:50 2003 +++ chrp_setup.c Tue Jun 24 18:03:49 2003 @@ -207,7 +207,11 @@ #endif /* Map the uart for udbg. */ - comport = (void *)__ioremap(naca->serialPortAddr, 16, _PAGE_NO_CACHE); + if (naca->serialPortAddr) + comport = (void *)__ioremap(naca->serialPortAddr, 16, + _PAGE_NO_CACHE); + else + comport = NULL; udbg_init_uart(comport); ppc_md.udbg_putc = udbg_putc; ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From SVakkalankarao at covansys.com Thu Jun 26 02:59:38 2003 From: SVakkalankarao at covansys.com (VAKKALANKA RAO Sridhar) Date: Wed, 25 Jun 2003 22:29:38 +0530 Subject: CD drive and vers. 2.5.70 Message-ID: <207D6ADFC044A84686D44CA11B297EEA01D7AD81@chn-ex02.cvns.corp.covansys.com> Hello, A few weeks ago, I successfully got kernel vers. 2.5.70 (patch mm8) up and running on a p630-6E4. The ramdisk I plugged for creating "zImage.initrd" was taken from debian - this ramdisk was built for chrp powerpc machines. Now, I am finding that the CD-ROM drive does not work. When I mount the CD-ROM using the mount command, I get the following error. # mount -t iso9660 -o ro /dev/scd0 /dev/cdrom mount: /dev/scd0 is not a valid block device So, I went back to recompile the kernel. In "make menuconfig", I enabled "IDE,ATA & ATAPI Block devices support", within which, I enabled "Include IDE/ATAPI CDROM support". This only resulted in a syntax error when compiling "zImage.initrd", which I could not resolve. My CDROM is very clearly an IDE but I have no reason to believe that it has SCSI support (ATAPI). Has anyone encountered this problem and solved it? I would appreciate an answer. Thanks in advance Sri ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From linas at austin.ibm.com Fri Jun 27 07:49:25 2003 From: linas at austin.ibm.com (linas at austin.ibm.com) Date: Thu, 26 Jun 2003 16:49:25 -0500 Subject: panic and timer interrupts? Message-ID: <20030626164925.B6890@forte.austin.ibm.com> I've got a machine here that just did one of the stranger kernel things I've ever seen. Due to some bug, it panic'ed. But then, during the panic, it took a timer interrupt, and then handled some network interrupts, handled some network data, and seems to maybe even have scheduled some user-land processes before getting hoplessly tangled up. So, my naive kernel questions as follows: I would have thought that interrupts would be disabled during a panic, but I can't find any code that does this. Why is this? Is this a bug? Is this intentional? It got me to thinking about a hang mode I've seen not infrequently on PC's: Machine is hung, unresponsive to keyboard, telnet, etc. but does reply to pings. I've never bothered to debug those, but now I'm wondering if that's a related manifestation. --linas ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/ From olof at austin.ibm.com Sat Jun 28 04:30:57 2003 From: olof at austin.ibm.com (olof at austin.ibm.com) Date: Fri, 27 Jun 2003 13:30:57 -0500 (CDT) Subject: [PATCH] Bad format of driver name for HVC console Message-ID: The below patch fixes a formatting problem with the name of the HVC console. If devfs is not configured, /proc/devices will contain "hvc/%d" instead of "hvc". Patch is against 2.4.21. Thanks, -Olof Olof Johansson Office: 4E002/905 pSeries Linux Development IBM Systems Group Email: olof at austin.ibm.com Phone: 512-838-9858 All opinions are my own and not those of IBM Index: drivers/char/hvc_console.c =================================================================== RCS file: /cvs/linuxppc64/linuxppc64_2_4/drivers/char/hvc_console.c,v retrieving revision 1.12 diff -p -u -r1.12 hvc_console.c --- drivers/char/hvc_console.c 19 Aug 2002 14:15:52 -0000 1.12 +++ drivers/char/hvc_console.c 26 Jun 2003 19:41:56 -0000 @@ -252,7 +252,11 @@ int __init hvc_init(void) hvc_driver.magic = TTY_DRIVER_MAGIC; hvc_driver.driver_name = "hvc"; +#ifdef CONFIG_DEVFS_FS hvc_driver.name = "hvc/%d"; +#else + hvc_driver.name = "hvc"; +#endif hvc_driver.major = HVC_MAJOR; hvc_driver.minor_start = HVC_MINOR; hvc_driver.num = hvc_count(&hvc_offset); ** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/