From benh at kernel.crashing.org Tue Mar 1 15:29:25 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 01 Mar 2005 15:29:25 +1100 Subject: [PATCH] ppc64: Fix zImage wrapper incorrect size to flush_cache() Message-ID: <1109651365.7669.21.camel@gaston> Hi ! This patch fixes a bug in the ppc64 zImage wrapper causing it to pass an incorrect size to flush_cache() when flushing the data and instruction caches prior to jumping to the kernel entry. This causes crashes on firmare environment that do strict MMU mapping only of actually allocated areas Signed-off-by: Benjamin Herrenschmidt --- dingo/2.6.10-bk5/arch/ppc64/boot/main.c 2004-12-25 08:35:50.000000000 +1100 +++ 2.6.10-bk5/arch/ppc64/boot/main.c 2005-02-16 17:10:49.194263268 +1100 @@ -200,7 +200,7 @@ vmlinux.addr += (unsigned long)elf64ph->p_offset; vmlinux.size -= (unsigned long)elf64ph->p_offset; - flush_cache((void *)vmlinux.addr, vmlinux.memsize); + flush_cache((void *)vmlinux.addr, vmlinux.size); if (a1) printf("initrd head: 0x%lx\n\r", *((u32 *)initrd.addr)); From kravetz at us.ibm.com Wed Mar 2 09:27:13 2005 From: kravetz at us.ibm.com (Mike Kravetz) Date: Tue, 1 Mar 2005 14:27:13 -0800 Subject: [PATCH] NUMA memory fixup Message-ID: <20050301222713.GB5780@w-mikek2.ibm.com> When I booted my new 720 on a kernel configured for NUMA, I received the following during bootup: WARNING: Unexpected node layout: region start 44000000 length 2000000 NUMA is disabled This is due to memory 'holes' within nodes. If such holes are encountered, then NUMA is disabled. The following patch adds support for such configurations. My 720 now boots with the following message: [boot]0012 Setup Arch Node 0 Memory: 0x0-0x8000000 0x44000000-0x12a000000 Node 1 Memory: 0x8000000-0x44000000 0x12a000000-0x1ea000000 I'd appreciate any comments on the approach taken. I'm also working on adding NUMA support on top of the SPARSEMEM implementation being pushed as part of memory hot add. However, it seems important to get the current implementation based on DISCONTIGMEM working first. This patch is against 2.6.11-rc3, but I can provide a later version if needed. -- Signed-off-by: Mike Kravetz diff -Naupr linux-2.6.11-rc3/arch/ppc64/mm/numa.c linux-2.6.11-rc3.work/arch/ppc64/mm/numa.c --- linux-2.6.11-rc3/arch/ppc64/mm/numa.c 2005-02-03 01:57:16.000000000 +0000 +++ linux-2.6.11-rc3.work/arch/ppc64/mm/numa.c 2005-03-01 19:39:21.000000000 +0000 @@ -40,7 +40,6 @@ int nr_cpus_in_node[MAX_NUMNODES] = { [0 struct pglist_data *node_data[MAX_NUMNODES]; bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES]; -static unsigned long node0_io_hole_size; static int min_common_depth; /* @@ -49,7 +48,8 @@ static int min_common_depth; */ static struct { unsigned long node_start_pfn; - unsigned long node_spanned_pages; + unsigned long node_end_pfn; + unsigned long node_present_pages; } init_node_data[MAX_NUMNODES] __initdata; EXPORT_SYMBOL(node_data); @@ -348,33 +348,28 @@ new_range: if (max_domain < numa_domain) max_domain = numa_domain; - /* - * For backwards compatibility, OF splits the first node - * into two regions (the first being 0-4GB). Check for - * this simple case and complain if there is a gap in - * memory + /* + * Initialize new node struct, or add to an existing one. */ - if (init_node_data[numa_domain].node_spanned_pages) { - unsigned long shouldstart = - init_node_data[numa_domain].node_start_pfn + - init_node_data[numa_domain].node_spanned_pages; - if (shouldstart != (start / PAGE_SIZE)) { - /* Revert to non-numa for now */ - printk(KERN_ERR - "WARNING: Unexpected node layout: " - "region start %lx length %lx\n", - start, size); - printk(KERN_ERR "NUMA is disabled\n"); - goto err; - } - init_node_data[numa_domain].node_spanned_pages += + if (init_node_data[numa_domain].node_end_pfn) { + if ((start / PAGE_SIZE) < + init_node_data[numa_domain].node_start_pfn) + init_node_data[numa_domain].node_start_pfn = + start / PAGE_SIZE; + else + init_node_data[numa_domain].node_end_pfn = + (start / PAGE_SIZE) + + (size / PAGE_SIZE); + + init_node_data[numa_domain].node_present_pages += size / PAGE_SIZE; } else { node_set_online(numa_domain); init_node_data[numa_domain].node_start_pfn = start / PAGE_SIZE; - init_node_data[numa_domain].node_spanned_pages = + init_node_data[numa_domain].node_end_pfn = + init_node_data[numa_domain].node_start_pfn + size / PAGE_SIZE; } @@ -391,14 +386,6 @@ new_range: node_set_online(i); return 0; -err: - /* Something has gone wrong; revert any setup we've done */ - for_each_node(i) { - node_set_offline(i); - init_node_data[i].node_start_pfn = 0; - init_node_data[i].node_spanned_pages = 0; - } - return -1; } static void __init setup_nonnuma(void) @@ -426,12 +413,11 @@ static void __init setup_nonnuma(void) node_set_online(0); init_node_data[0].node_start_pfn = 0; - init_node_data[0].node_spanned_pages = lmb_end_of_DRAM() / PAGE_SIZE; + init_node_data[0].node_end_pfn = lmb_end_of_DRAM() / PAGE_SIZE; + init_node_data[0].node_present_pages = total_ram / PAGE_SIZE; for (i = 0 ; i < top_of_ram; i += MEMORY_INCREMENT) numa_memory_lookup_table[i >> MEMORY_INCREMENT_SHIFT] = 0; - - node0_io_hole_size = top_of_ram - total_ram; } static void __init dump_numa_topology(void) @@ -512,6 +498,7 @@ static unsigned long careful_allocation( void __init do_init_bootmem(void) { int nid; + struct device_node *memory = NULL; static struct notifier_block ppc64_numa_nb = { .notifier_call = cpu_numa_callback, .priority = 1 /* Must run before sched domains notifier. */ @@ -535,7 +522,7 @@ void __init do_init_bootmem(void) unsigned long bootmap_pages; start_paddr = init_node_data[nid].node_start_pfn * PAGE_SIZE; - end_paddr = start_paddr + (init_node_data[nid].node_spanned_pages * PAGE_SIZE); + end_paddr = init_node_data[nid].node_end_pfn * PAGE_SIZE; /* Allocate the node structure node local if possible */ NODE_DATA(nid) = (struct pglist_data *)careful_allocation(nid, @@ -551,9 +538,9 @@ void __init do_init_bootmem(void) NODE_DATA(nid)->node_start_pfn = init_node_data[nid].node_start_pfn; NODE_DATA(nid)->node_spanned_pages = - init_node_data[nid].node_spanned_pages; + end_paddr - start_paddr; - if (init_node_data[nid].node_spanned_pages == 0) + if (NODE_DATA(nid)->node_spanned_pages == 0) continue; dbg("start_paddr = %lx\n", start_paddr); @@ -572,33 +559,48 @@ void __init do_init_bootmem(void) start_paddr >> PAGE_SHIFT, end_paddr >> PAGE_SHIFT); - for (i = 0; i < lmb.memory.cnt; i++) { - unsigned long physbase, size; - - physbase = lmb.memory.region[i].physbase; - size = lmb.memory.region[i].size; - - if (physbase < end_paddr && - (physbase+size) > start_paddr) { - /* overlaps */ - if (physbase < start_paddr) { - size -= start_paddr - physbase; - physbase = start_paddr; - } - - if (size > end_paddr - physbase) - size = end_paddr - physbase; - - dbg("free_bootmem %lx %lx\n", physbase, size); - free_bootmem_node(NODE_DATA(nid), physbase, - size); + /* + * We need to do another scan of all memory sections to + * associate memory with the correct node. + */ + memory = NULL; + while ((memory = of_find_node_by_type(memory, "memory")) != NULL) { + unsigned long mem_start, mem_size; + int numa_domain; + unsigned int *memcell_buf; + unsigned int len; + + memcell_buf = (unsigned int *)get_property(memory, "reg", &len); + if (!memcell_buf || len <= 0) + continue; + + mem_start = read_cell_ul(memory, &memcell_buf); + mem_size = read_cell_ul(memory, &memcell_buf); + numa_domain = of_node_numa_domain(memory); + + if (numa_domain != nid) + continue; + + if (mem_start < end_paddr && + (mem_start+mem_size) > start_paddr) { + /* should be no overlaps ! */ + dbg("free_bootmem %lx %lx\n", mem_start, mem_size); + free_bootmem_node(NODE_DATA(nid), mem_start, + mem_size); } } + /* + * Mark reserved regions on this node + */ for (i = 0; i < lmb.reserved.cnt; i++) { unsigned long physbase = lmb.reserved.region[i].physbase; unsigned long size = lmb.reserved.region[i].size; + if (pa_to_nid(physbase) != nid && + pa_to_nid(physbase+size-1) != nid) + continue; + if (physbase < end_paddr && (physbase+size) > start_paddr) { /* overlaps */ @@ -632,13 +634,12 @@ void __init paging_init(void) unsigned long start_pfn; unsigned long end_pfn; - start_pfn = plat_node_bdata[nid].node_boot_start >> PAGE_SHIFT; - end_pfn = plat_node_bdata[nid].node_low_pfn; + start_pfn = init_node_data[nid].node_start_pfn; + end_pfn = init_node_data[nid].node_end_pfn; zones_size[ZONE_DMA] = end_pfn - start_pfn; - zholes_size[ZONE_DMA] = 0; - if (nid == 0) - zholes_size[ZONE_DMA] = node0_io_hole_size >> PAGE_SHIFT; + zholes_size[ZONE_DMA] = zones_size[ZONE_DMA] - + init_node_data[nid].node_present_pages; dbg("free_area_init node %d %lx %lx (hole: %lx)\n", nid, zones_size[ZONE_DMA], start_pfn, zholes_size[ZONE_DMA]); From ntl at pobox.com Wed Mar 2 12:47:01 2005 From: ntl at pobox.com (Nathan Lynch) Date: Tue, 1 Mar 2005 19:47:01 -0600 Subject: [PATCH] explicitly bind idle tasks In-Reply-To: <20050227144928.6c71adaf.akpm@osdl.org> References: <20050227031655.67233bb5.akpm@osdl.org> <1109542971.14993.217.camel@gaston> <20050227144928.6c71adaf.akpm@osdl.org> Message-ID: <20050302014701.GA5897@otto> On Sun, Feb 27, 2005 at 02:49:28PM -0800, Andrew Morton wrote: > Benjamin Herrenschmidt wrote: > > > > > - if (cpu_is_offline(smp_processor_id()) && > > > + if (cpu_is_offline(_smp_processor_id()) && > > > system_state == SYSTEM_RUNNING) > > > cpu_die(); > > > } > > > _ > > > > This is the idle loop. Is that ever supposed to be preempted ? > > Nope, it's a false positive. We had to do the same in x86's idle loop and > probably others will hit it. Perhaps I'm missing something, but is there any reason we can't do the following? I've tested it on ppc64, doesn't seem to break anything. With hotplug cpu and preempt, we tend to see smp_processor_id warnings from idle loop code because it's always checking whether its cpu has gone offline. Replacing every use of smp_processor_id with _smp_processor_id in all idle loop code is one solution; another way is explicitly binding idle threads to their cpus (the smp_processor_id warning does not fire if the caller is bound only to the calling cpu). This has the (admittedly slight) advantage of letting us know if an idle thread ever runs on the wrong cpu. Signed-off-by: Nathan Lynch Index: linux-2.6.11-rc5-mm1/init/main.c =================================================================== --- linux-2.6.11-rc5-mm1.orig/init/main.c 2005-03-02 00:12:07.000000000 +0000 +++ linux-2.6.11-rc5-mm1/init/main.c 2005-03-02 00:53:04.000000000 +0000 @@ -638,6 +638,10 @@ { lock_kernel(); /* + * init can run on any cpu. + */ + set_cpus_allowed(current, CPU_MASK_ALL); + /* * Tell the world that we're going to be the grim * reaper of innocent orphaned children. * Index: linux-2.6.11-rc5-mm1/kernel/sched.c =================================================================== --- linux-2.6.11-rc5-mm1.orig/kernel/sched.c 2005-03-02 00:12:07.000000000 +0000 +++ linux-2.6.11-rc5-mm1/kernel/sched.c 2005-03-02 00:47:14.000000000 +0000 @@ -4092,6 +4092,7 @@ idle->array = NULL; idle->prio = MAX_PRIO; idle->state = TASK_RUNNING; + idle->cpus_allowed = cpumask_of_cpu(cpu); set_task_cpu(idle, cpu); spin_lock_irqsave(&rq->lock, flags); From zwane at arm.linux.org.uk Wed Mar 2 14:13:26 2005 From: zwane at arm.linux.org.uk (Zwane Mwaikambo) Date: Tue, 1 Mar 2005 20:13:26 -0700 (MST) Subject: [PATCH] explicitly bind idle tasks In-Reply-To: <20050302014701.GA5897@otto> References: <20050227031655.67233bb5.akpm@osdl.org> <1109542971.14993.217.camel@gaston> <20050227144928.6c71adaf.akpm@osdl.org> <20050302014701.GA5897@otto> Message-ID: On Tue, 1 Mar 2005, Nathan Lynch wrote: > On Sun, Feb 27, 2005 at 02:49:28PM -0800, Andrew Morton wrote: > > Benjamin Herrenschmidt wrote: > > > > > > > - if (cpu_is_offline(smp_processor_id()) && > > > > + if (cpu_is_offline(_smp_processor_id()) && > > > > system_state == SYSTEM_RUNNING) > > > > cpu_die(); > > > > } > > > > _ > > > > > > This is the idle loop. Is that ever supposed to be preempted ? > > > > Nope, it's a false positive. We had to do the same in x86's idle loop and > > probably others will hit it. > > Perhaps I'm missing something, but is there any reason we can't do > the following? I've tested it on ppc64, doesn't seem to break anything. > > With hotplug cpu and preempt, we tend to see smp_processor_id warnings > from idle loop code because it's always checking whether its cpu has > gone offline. Replacing every use of smp_processor_id with > _smp_processor_id in all idle loop code is one solution; another way > is explicitly binding idle threads to their cpus (the smp_processor_id > warning does not fire if the caller is bound only to the calling cpu). > This has the (admittedly slight) advantage of letting us know if an > idle thread ever runs on the wrong cpu. Makes sense to me, for some reason i thought the smp_processor_id() function did a cpu_rq->idle check of some sort. Thanks, Zwane From Derek.Fults at gd-ais.com Tue Mar 1 05:16:12 2005 From: Derek.Fults at gd-ais.com (Derek.Fults at gd-ais.com) Date: Mon, 28 Feb 2005 12:16:12 -0600 Subject: CPU Freq Scaling Message-ID: <1109614572.20610.41.camel@kato.gd-ais.com> Hi All, I'm looking for information on CPU frequency scaling of the 970. I've got a request to clock it down to 500 SPECINTs. Is this currently being worked on or down the pipe a ways? Thanks for any info. -- Derek L. Fults From nacc at us.ibm.com Thu Mar 3 05:12:06 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 2 Mar 2005 10:12:06 -0800 Subject: eeh.h compile warnings / adbhid.c build failure Message-ID: <20050302181206.GA2741@us.ibm.com> Hi, While building 2.6.11 for a G5, I noticed the following errors being spit out (gcc 3.3.5): include/asm/eeh.h: In function `eeh_memcpy_fromio': include/asm/eeh.h:265: warning: statement with no effect include/asm/eeh.h: In function `eeh_insb': include/asm/eeh.h:353: warning: statement with no effect include/asm/eeh.h: In function `eeh_insw_ns': include/asm/eeh.h:360: warning: statement with no effect include/asm/eeh.h: In function `eeh_insl_ns': include/asm/eeh.h:367: warning: statement with no effect include/asm/eeh.h: In function `eeh_memcpy_fromio': include/asm/eeh.h:265: warning: statement with no effect include/asm/eeh.h: In function `eeh_insb': include/asm/eeh.h:353: warning: statement with no effect include/asm/eeh.h: In function `eeh_insw_ns': include/asm/eeh.h:360: warning: statement with no effect include/asm/eeh.h: In function `eeh_insl_ns': include/asm/eeh.h:367: warning: statement with no effect These warnings are emitted for pretty much every driver. It looks like it is becuase with CONFIG_EEH undefined (it's a pSeries thing? -- my interpretation from looking at the ppc64 Kconfig), eeh_check_failure() becomes #define'd to simply it's second parameter, which in the case of assignment statements ia statement with no effect. It's not a big deal, the kernels still compile (with a patch to adbhid.c which I'll mention in a second) but it's a lot of noise to be generated because I don't have a pSeries machine... Now, to the build-blocking code: In drivers/macintosh/adbhid.c::1159: static int __init adbhid_init(void) { #ifndef CONFIG_MAC if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) return 0; #endif ... I don't see CONFIG_MAC in my .config (attached below [1]), and _MACH_chrp is not defined for ppc64 (it's in asm-ppc/processor.h not in ppc64/processor.h. I just removed the _MACH_chrp conditional in my local code to get the kernel to build. I'm not sure what the actual solution is, but I thought you all should know about it. Thanks, Nish # # Automatically generated make config: don't edit # Linux kernel version: 2.6.11 # Wed Mar 2 09:27:02 2005 # CONFIG_64BIT=y CONFIG_MMU=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_GENERIC_ISA_DMA=y CONFIG_HAVE_DEC_LOCK=y CONFIG_EARLY_PRINTK=y CONFIG_COMPAT=y CONFIG_FRAME_POINTER=y CONFIG_FORCE_MAX_ZONEORDER=13 # # Code maturity level options # CONFIG_EXPERIMENTAL=y # CONFIG_CLEAN_COMPILE is not set CONFIG_BROKEN=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y # # General setup # CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y CONFIG_SYSVIPC_COMPAT=y # # Platform support # # CONFIG_PPC_ISERIES is not set CONFIG_PPC_MULTIPLATFORM=y # CONFIG_PPC_PSERIES is not set CONFIG_PPC_PMAC=y # CONFIG_PPC_MAPLE is not set CONFIG_PPC=y CONFIG_PPC64=y CONFIG_PPC_OF=y CONFIG_ALTIVEC=y CONFIG_U3_DART=y CONFIG_PPC_PMAC64=y CONFIG_BOOTX_TEXT=y # CONFIG_POWER4_ONLY is not set # CONFIG_IOMMU_VMERGE is not set CONFIG_SMP=y CONFIG_NR_CPUS=2 CONFIG_SCHED_SMT=y CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_GENERIC_HARDIRQS=y # # General setup # CONFIG_PCI=y CONFIG_PCI_DOMAINS=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PC-card bridges # # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set CONFIG_PROC_DEVICETREE=y CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="console=ttyS0,9600 console=tty0 root=/dev/sda2" # # Device Drivers # # # Generic Driver Options # # CONFIG_STANDALONE is not set # CONFIG_PREVENT_FIRMWARE_BUILD is not set CONFIG_FW_LOADER=y # CONFIG_DEBUG_DRIVER is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # # CONFIG_PARPORT is not set # # Plug and Play support # # # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CDROM_PKTCDVD is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_ATA_OVER_ETH is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y # CONFIG_BLK_DEV_OFFBOARD is not set CONFIG_BLK_DEV_GENERIC=y # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_SL82C105 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set # CONFIG_BLK_DEV_AMD74XX is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set CONFIG_BLK_DEV_IDE_PMAC=y CONFIG_BLK_DEV_IDE_PMAC_ATA100FIRST=y CONFIG_BLK_DEV_IDEDMA_PMAC=y # CONFIG_BLK_DEV_IDE_PMAC_BLINK is not set # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=y # CONFIG_CHR_DEV_ST is not set # CONFIG_CHR_DEV_OSST is not set CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_SCSI_ADVANSYS is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_AHCI is not set CONFIG_SCSI_SATA_SVW=y # CONFIG_SCSI_ATA_PIIX is not set # CONFIG_SCSI_SATA_NV is not set # CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_QSTOR is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set # CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_PCI2000 is not set # CONFIG_SCSI_PCI2220I is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=y # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set # CONFIG_IEEE1394_OUI_DB is not set CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y CONFIG_IEEE1394_CONFIG_ROM_IP1394=y # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=y # # Protocol Drivers # CONFIG_IEEE1394_VIDEO1394=y CONFIG_IEEE1394_SBP2=y # CONFIG_IEEE1394_SBP2_PHYS_DMA is not set CONFIG_IEEE1394_ETH1394=y CONFIG_IEEE1394_DV1394=y CONFIG_IEEE1394_RAWIO=y CONFIG_IEEE1394_CMP=y # CONFIG_IEEE1394_AMDTP is not set # # I2O device support # CONFIG_I2O=y CONFIG_I2O_CONFIG=y CONFIG_I2O_BLOCK=y CONFIG_I2O_SCSI=y CONFIG_I2O_PROC=y # # Macintosh device drivers # CONFIG_ADB=y CONFIG_ADB_PMU=y # CONFIG_PMAC_PBOOK is not set # CONFIG_PMAC_BACKLIGHT is not set # CONFIG_MAC_SERIAL is not set CONFIG_INPUT_ADBHID=y CONFIG_MAC_EMUMOUSEBTN=y CONFIG_THERM_PM72=y # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set # CONFIG_NETLINK_DEV is not set CONFIG_UNIX=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=y # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y CONFIG_INET_TUNNEL=y CONFIG_IP_TCPDIAG=y # CONFIG_IP_TCPDIAG_IPV6 is not set # # IP: Virtual Server Configuration # # CONFIG_IP_VS is not set # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=y # CONFIG_IP_NF_CT_ACCT is not set # CONFIG_IP_NF_CONNTRACK_MARK is not set # CONFIG_IP_NF_CT_PROTO_SCTP is not set CONFIG_IP_NF_FTP=y CONFIG_IP_NF_IRC=y CONFIG_IP_NF_TFTP=y CONFIG_IP_NF_AMANDA=y CONFIG_IP_NF_QUEUE=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_MATCH_LIMIT=y CONFIG_IP_NF_MATCH_IPRANGE=y CONFIG_IP_NF_MATCH_MAC=y CONFIG_IP_NF_MATCH_PKTTYPE=y CONFIG_IP_NF_MATCH_MARK=y CONFIG_IP_NF_MATCH_MULTIPORT=y CONFIG_IP_NF_MATCH_TOS=y CONFIG_IP_NF_MATCH_RECENT=y CONFIG_IP_NF_MATCH_ECN=y CONFIG_IP_NF_MATCH_DSCP=y CONFIG_IP_NF_MATCH_AH_ESP=y CONFIG_IP_NF_MATCH_LENGTH=y CONFIG_IP_NF_MATCH_TTL=y CONFIG_IP_NF_MATCH_TCPMSS=y CONFIG_IP_NF_MATCH_HELPER=y CONFIG_IP_NF_MATCH_STATE=y CONFIG_IP_NF_MATCH_CONNTRACK=y CONFIG_IP_NF_MATCH_OWNER=y # CONFIG_IP_NF_MATCH_ADDRTYPE is not set # CONFIG_IP_NF_MATCH_REALM is not set # CONFIG_IP_NF_MATCH_SCTP is not set # CONFIG_IP_NF_MATCH_COMMENT is not set # CONFIG_IP_NF_MATCH_HASHLIMIT is not set CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IP_NF_TARGET_ULOG=y CONFIG_IP_NF_TARGET_TCPMSS=y CONFIG_IP_NF_NAT=y CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=y CONFIG_IP_NF_TARGET_REDIRECT=y CONFIG_IP_NF_TARGET_NETMAP=y CONFIG_IP_NF_TARGET_SAME=y CONFIG_IP_NF_NAT_SNMP_BASIC=y CONFIG_IP_NF_NAT_IRC=y CONFIG_IP_NF_NAT_FTP=y CONFIG_IP_NF_NAT_TFTP=y CONFIG_IP_NF_NAT_AMANDA=y CONFIG_IP_NF_MANGLE=y CONFIG_IP_NF_TARGET_TOS=y CONFIG_IP_NF_TARGET_ECN=y CONFIG_IP_NF_TARGET_DSCP=y CONFIG_IP_NF_TARGET_MARK=y CONFIG_IP_NF_TARGET_CLASSIFY=y # CONFIG_IP_NF_RAW is not set CONFIG_IP_NF_ARPTABLES=y CONFIG_IP_NF_ARPFILTER=y CONFIG_IP_NF_ARP_MANGLE=y CONFIG_XFRM=y CONFIG_XFRM_USER=y # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set CONFIG_NETPOLL=y # CONFIG_NETPOLL_RX is not set # CONFIG_NETPOLL_TRAP is not set CONFIG_NET_POLL_CONTROLLER=y # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y CONFIG_DUMMY=y # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_OAKNET is not set # CONFIG_HAPPYMEAL is not set CONFIG_SUNGEM=y # CONFIG_NET_VENDOR_3COM is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_HP100 is not set # CONFIG_NET_PCI is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set # CONFIG_R8169 is not set # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set CONFIG_NETCONSOLE=y # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=y # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PCIPS2 is not set CONFIG_SERIO_LIBPS2=y # CONFIG_SERIO_RAW is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_NR_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_SERIAL_PMACZILOG=y CONFIG_SERIAL_PMACZILOG_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 # # IPMI # # CONFIG_IPMI_HANDLER is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set # CONFIG_RTC is not set # CONFIG_GEN_RTC is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # # Ftape, the floppy tape device driver # CONFIG_DRM=y # CONFIG_DRM_TDFX is not set # CONFIG_DRM_GAMMA is not set # CONFIG_DRM_R128 is not set CONFIG_DRM_RADEON=y # CONFIG_RAW_DRIVER is not set # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y CONFIG_I2C_ALGOPCF=y CONFIG_I2C_ALGOPCA=y # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set # CONFIG_I2C_ISA is not set CONFIG_I2C_KEYWEST=y # CONFIG_I2C_MPC is not set # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_STUB is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # CONFIG_I2C_PCA_ISA is not set # # Hardware Sensors Chip support # # CONFIG_I2C_SENSOR is not set # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1026 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM63 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM87 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_PC87360 is not set # CONFIG_SENSORS_SMSC47B397 is not set # CONFIG_SENSORS_SMSC47M1 is not set # CONFIG_SENSORS_VIA686A is not set # CONFIG_SENSORS_W83781D is not set # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # # CONFIG_SENSORS_EEPROM is not set # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # # Multimedia devices # # CONFIG_VIDEO_DEV is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set # # Graphics support # CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_TILEBLITTING is not set # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set CONFIG_FB_OF=y # CONFIG_FB_CONTROL is not set # CONFIG_FB_PLATINUM is not set # CONFIG_FB_VALKYRIE is not set # CONFIG_FB_CT65550 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_S3TRIO is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_RIVA is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SAVAGE is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_PM3 is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y # CONFIG_FONTS is not set CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y # # Logo configuration # CONFIG_LOGO=y CONFIG_LOGO_LINUX_MONO=y CONFIG_LOGO_LINUX_VGA16=y CONFIG_LOGO_LINUX_CLUT224=y # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # # Sound # # CONFIG_SOUND is not set # # USB support # CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_OTG is not set CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_OHCI_HCD=y # CONFIG_USB_UHCI_HCD is not set # CONFIG_USB_SL811_HCD is not set # # USB Device Class drivers # # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_ACM is not set # CONFIG_USB_PRINTER is not set # # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' may also be needed; see USB_STORAGE Help for more information # # CONFIG_USB_STORAGE is not set # # USB Input Devices # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set CONFIG_USB_HIDDEV=y # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # CONFIG_USB_HPUSBSCSI is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # # Video4Linux support is needed for USB Multimedia device support # # # USB Network Adapters # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # # USB port drivers # # # USB Serial Converter support # # CONFIG_USB_SERIAL is not set # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETKIT is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_IDMOUSE is not set # CONFIG_USB_TEST is not set # # USB ATM/DSL drivers # # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # MMC/SD Card support # # CONFIG_MMC is not set # # InfiniBand support # # CONFIG_INFINIBAND is not set # # File systems # # CONFIG_EXT2_FS is not set CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set CONFIG_FS_POSIX_ACL=y # # XFS support # # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set CONFIG_DNOTIFY=y # CONFIG_AUTOFS_FS is not set CONFIG_AUTOFS4_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y # CONFIG_JOLIET is not set # CONFIG_ZISOFS is not set CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set CONFIG_DEVPTS_FS_XATTR=y # CONFIG_DEVPTS_FS_SECURITY is not set CONFIG_TMPFS=y # CONFIG_TMPFS_XATTR is not set CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set CONFIG_HFS_FS=y CONFIG_HFSPLUS_FS=y # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set CONFIG_CRAMFS=y # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFS_V4=y CONFIG_NFS_DIRECTIO=y # CONFIG_NFSD is not set CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y CONFIG_RPCSEC_GSS_KRB5=y # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set # CONFIG_OSF_PARTITION is not set # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y # CONFIG_BSD_DISKLABEL is not set # CONFIG_MINIX_SUBPARTITION is not set # CONFIG_SOLARIS_X86_PARTITION is not set # CONFIG_UNIXWARE_DISKLABEL is not set # CONFIG_LDM_PARTITION is not set # CONFIG_SGI_PARTITION is not set # CONFIG_ULTRIX_PARTITION is not set # CONFIG_SUN_PARTITION is not set # CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y # CONFIG_NLS_CODEPAGE_737 is not set # CONFIG_NLS_CODEPAGE_775 is not set # CONFIG_NLS_CODEPAGE_850 is not set # CONFIG_NLS_CODEPAGE_852 is not set # CONFIG_NLS_CODEPAGE_855 is not set # CONFIG_NLS_CODEPAGE_857 is not set # CONFIG_NLS_CODEPAGE_860 is not set # CONFIG_NLS_CODEPAGE_861 is not set # CONFIG_NLS_CODEPAGE_862 is not set # CONFIG_NLS_CODEPAGE_863 is not set # CONFIG_NLS_CODEPAGE_864 is not set # CONFIG_NLS_CODEPAGE_865 is not set # CONFIG_NLS_CODEPAGE_866 is not set # CONFIG_NLS_CODEPAGE_869 is not set # CONFIG_NLS_CODEPAGE_936 is not set # CONFIG_NLS_CODEPAGE_950 is not set # CONFIG_NLS_CODEPAGE_932 is not set # CONFIG_NLS_CODEPAGE_949 is not set # CONFIG_NLS_CODEPAGE_874 is not set # CONFIG_NLS_ISO8859_8 is not set # CONFIG_NLS_CODEPAGE_1250 is not set # CONFIG_NLS_CODEPAGE_1251 is not set CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y # CONFIG_NLS_ISO8859_2 is not set # CONFIG_NLS_ISO8859_3 is not set # CONFIG_NLS_ISO8859_4 is not set # CONFIG_NLS_ISO8859_5 is not set # CONFIG_NLS_ISO8859_6 is not set # CONFIG_NLS_ISO8859_7 is not set # CONFIG_NLS_ISO8859_9 is not set # CONFIG_NLS_ISO8859_13 is not set # CONFIG_NLS_ISO8859_14 is not set CONFIG_NLS_ISO8859_15=y # CONFIG_NLS_KOI8_R is not set # CONFIG_NLS_KOI8_U is not set CONFIG_NLS_UTF8=y # # Profiling support # # CONFIG_PROFILING is not set # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y # CONFIG_SCHEDSTATS is not set # CONFIG_DEBUG_SLAB is not set CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_DEBUG_KOBJECT is not set CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_FS is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set CONFIG_DEBUGGER=y # CONFIG_XMON is not set CONFIG_PPCDBG=y # CONFIG_IRQSTACKS is not set # # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=y CONFIG_CRYPTO_MD4=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_SHA512=y # CONFIG_CRYPTO_WP512 is not set CONFIG_CRYPTO_DES=y CONFIG_CRYPTO_BLOWFISH=y CONFIG_CRYPTO_TWOFISH=y CONFIG_CRYPTO_SERPENT=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_CAST5=y CONFIG_CRYPTO_CAST6=y # CONFIG_CRYPTO_TEA is not set CONFIG_CRYPTO_ARC4=y # CONFIG_CRYPTO_KHAZAD is not set # CONFIG_CRYPTO_ANUBIS is not set CONFIG_CRYPTO_DEFLATE=y # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set CONFIG_CRYPTO_TEST=y # # Hardware crypto devices # # # Library routines # CONFIG_CRC_CCITT=y CONFIG_CRC32=y CONFIG_LIBCRC32C=y CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y From nacc at us.ibm.com Thu Mar 3 06:28:11 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 2 Mar 2005 11:28:11 -0800 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050302192005.GA21615@pants.nu> References: <20050302181206.GA2741@us.ibm.com> <20050302192005.GA21615@pants.nu> Message-ID: <20050302192811.GD2741@us.ibm.com> On Wed, Mar 02, 2005 at 11:20:06AM -0800, Brad Boyer wrote: > On Wed, Mar 02, 2005 at 10:12:06AM -0800, Nishanth Aravamudan wrote: > > Now, to the build-blocking code: > > > > In drivers/macintosh/adbhid.c::1159: > > > > static int __init adbhid_init(void) > > { > > #ifndef CONFIG_MAC > > if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) > > return 0; > > #endif > > ... > > > > I don't see CONFIG_MAC in my .config (attached below [1]), and _MACH_chrp is > > not defined for ppc64 (it's in asm-ppc/processor.h not in > > ppc64/processor.h. I just removed the _MACH_chrp conditional in my local > > code to get the kernel to build. I'm not sure what the actual solution > > is, but I thought you all should know about it. > > The CONFIG_MAC symbol is defined for mac68k support. The m68k arch has a > different way of telling the various machines apart, although I notice > that there isn't an equivalent block here for that. I guess no other 68k > people cared enough to make sure the ADB layer doesn't load on their boxes. > > In my opinion, the machine selectors should be reconciled between ppc and > ppc64 due to the amount of code that expects them to act the same. So > even though you wouldn't have a CHRP ppc64 box, the define will be there. I definitely think this is the ideal solution. I did notice, though, that hte _MACH_Pmac #define varies between ppc and ppc64, so I'm not sure how that would work for _MACH_chrp. I am more than happy to test code, generate patches (if you tell me what you'd like me to change), etc. Thanks, Nish From flar at allandria.com Thu Mar 3 06:20:06 2005 From: flar at allandria.com (Brad Boyer) Date: Wed, 2 Mar 2005 11:20:06 -0800 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050302181206.GA2741@us.ibm.com> References: <20050302181206.GA2741@us.ibm.com> Message-ID: <20050302192005.GA21615@pants.nu> On Wed, Mar 02, 2005 at 10:12:06AM -0800, Nishanth Aravamudan wrote: > Now, to the build-blocking code: > > In drivers/macintosh/adbhid.c::1159: > > static int __init adbhid_init(void) > { > #ifndef CONFIG_MAC > if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) > return 0; > #endif > ... > > I don't see CONFIG_MAC in my .config (attached below [1]), and _MACH_chrp is > not defined for ppc64 (it's in asm-ppc/processor.h not in > ppc64/processor.h. I just removed the _MACH_chrp conditional in my local > code to get the kernel to build. I'm not sure what the actual solution > is, but I thought you all should know about it. The CONFIG_MAC symbol is defined for mac68k support. The m68k arch has a different way of telling the various machines apart, although I notice that there isn't an equivalent block here for that. I guess no other 68k people cared enough to make sure the ADB layer doesn't load on their boxes. In my opinion, the machine selectors should be reconciled between ppc and ppc64 due to the amount of code that expects them to act the same. So even though you wouldn't have a CHRP ppc64 box, the define will be there. Brad Boyer flar at allandria.com From johnrose at austin.ibm.com Thu Mar 3 08:10:37 2005 From: johnrose at austin.ibm.com (John Rose) Date: Wed, 02 Mar 2005 15:10:37 -0600 Subject: [PATCH] error code cleanups for rtas wrappers Message-ID: <1109797837.9434.2.camel@sinatra.austin.ibm.com> This patch changes the rtas wrapper functions in rtas.c to map RTAS failures to conventional error values. The goal is to make failure conditions obvious in the wrapper functions and in the caller code. Flame away :) John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pSeries_smp.c~01_rtas_rcs arch/ppc64/kernel/pSeries_smp.c --- 2_6_linus_3/arch/ppc64/kernel/pSeries_smp.c~01_rtas_rcs 2005-03-02 14:50:33.000000000 -0600 +++ 2_6_linus_3-johnrose/arch/ppc64/kernel/pSeries_smp.c 2005-03-02 14:50:33.000000000 -0600 @@ -151,7 +151,7 @@ static unsigned int find_physical_cpu_to if (index) { int state; int rc = rtas_get_sensor(9003, *index, &state); - if (rc != 0 || state != 1) + if (rc < 0 || state != 1) continue; } diff -puN arch/ppc64/kernel/rtas.c~01_rtas_rcs arch/ppc64/kernel/rtas.c --- 2_6_linus_3/arch/ppc64/kernel/rtas.c~01_rtas_rcs 2005-03-02 14:50:33.000000000 -0600 +++ 2_6_linus_3-johnrose/arch/ppc64/kernel/rtas.c 2005-03-02 14:50:33.000000000 -0600 @@ -255,29 +255,59 @@ rtas_extended_busy_delay_time(int status return ms; } -int -rtas_get_power_level(int powerdomain, int *level) +int rtas_error_rc(int rtas_rc) +{ + int rc; + + switch (rtas_rc) { + case -1: /* Hardware Error */ + rc = -EIO; + break; + case -3: /* Bad indicator/domain/etc */ + rc = -EINVAL; + break; + case -9000: /* Isolation error */ + rc = -EFAULT; + break; + case -9001: /* Outstanding TCE/PTE */ + rc = -EEXIST; + break; + case -9002: /* No usable slot */ + rc = -ENODEV; + break; + default: + printk(KERN_ERR "%s: unexpected RTAS error %d\n", + __FUNCTION__, rtas_rc); + rc = -ERANGE; + break; + } + return rc; +} + +int rtas_get_power_level(int powerdomain, int *level) { int token = rtas_token("get-power-level"); int rc; if (token == RTAS_UNKNOWN_SERVICE) - return RTAS_UNKNOWN_OP; + return -ENOENT; while ((rc = rtas_call(token, 1, 2, level, powerdomain)) == RTAS_BUSY) udelay(1); + + if (rc < 0) + return rtas_error_rc(rc); return rc; } -int -rtas_set_power_level(int powerdomain, int level, int *setlevel) +int rtas_set_power_level(int powerdomain, int level, int *setlevel) { int token = rtas_token("set-power-level"); unsigned int wait_time; int rc; if (token == RTAS_UNKNOWN_SERVICE) - return RTAS_UNKNOWN_OP; + return -ENOENT; while (1) { rc = rtas_call(token, 2, 2, setlevel, powerdomain, level); @@ -289,18 +319,20 @@ rtas_set_power_level(int powerdomain, in } else break; } + + if (rc < 0) + return rtas_error_rc(rc); return rc; } -int -rtas_get_sensor(int sensor, int index, int *state) +int rtas_get_sensor(int sensor, int index, int *state) { int token = rtas_token("get-sensor-state"); unsigned int wait_time; int rc; if (token == RTAS_UNKNOWN_SERVICE) - return RTAS_UNKNOWN_OP; + return -ENOENT; while (1) { rc = rtas_call(token, 2, 2, state, sensor, index); @@ -312,18 +344,20 @@ rtas_get_sensor(int sensor, int index, i } else break; } + + if (rc < 0) + return rtas_error_rc(rc); return rc; } -int -rtas_set_indicator(int indicator, int index, int new_value) +int rtas_set_indicator(int indicator, int index, int new_value) { int token = rtas_token("set-indicator"); unsigned int wait_time; int rc; if (token == RTAS_UNKNOWN_SERVICE) - return RTAS_UNKNOWN_OP; + return -ENOENT; while (1) { rc = rtas_call(token, 3, 1, NULL, indicator, index, new_value); @@ -337,6 +371,8 @@ rtas_set_indicator(int indicator, int in break; } + if (rc < 0) + return rtas_error_rc(rc); return rc; } diff -puN arch/ppc64/kernel/rtasd.c~01_rtas_rcs arch/ppc64/kernel/rtasd.c --- 2_6_linus_3/arch/ppc64/kernel/rtasd.c~01_rtas_rcs 2005-03-02 14:50:33.000000000 -0600 +++ 2_6_linus_3-johnrose/arch/ppc64/kernel/rtasd.c 2005-03-02 14:50:33.000000000 -0600 @@ -347,7 +347,7 @@ static int enable_surveillance(int timeo if (error == 0) return 0; - if (error == RTAS_NO_SUCH_INDICATOR) { + if (error == -EINVAL) { printk(KERN_INFO "rtasd: surveillance not supported\n"); return 0; } diff -puN arch/ppc64/kernel/xics.c~01_rtas_rcs arch/ppc64/kernel/xics.c --- 2_6_linus_3/arch/ppc64/kernel/xics.c~01_rtas_rcs 2005-03-02 14:50:33.000000000 -0600 +++ 2_6_linus_3-johnrose/arch/ppc64/kernel/xics.c 2005-03-02 14:50:33.000000000 -0600 @@ -654,7 +654,7 @@ void xics_migrate_irqs_away(void) /* remove ourselves from the global interrupt queue */ status = rtas_set_indicator(GLOBAL_INTERRUPT_QUEUE, (1UL << interrupt_server_size) - 1 - default_distrib_server, 0); - WARN_ON(status != 0); + WARN_ON(status < 0); /* Allow IPIs again... */ ops->cppr_info(cpu, DEFAULT_PRIORITY); diff -puN include/asm-ppc64/rtas.h~01_rtas_rcs include/asm-ppc64/rtas.h --- 2_6_linus_3/include/asm-ppc64/rtas.h~01_rtas_rcs 2005-03-02 14:50:33.000000000 -0600 +++ 2_6_linus_3-johnrose/include/asm-ppc64/rtas.h 2005-03-02 14:50:33.000000000 -0600 @@ -24,12 +24,9 @@ /* RTAS return status codes */ #define RTAS_BUSY -2 /* RTAS Busy */ -#define RTAS_NO_SUCH_INDICATOR -3 /* No such indicator implemented */ #define RTAS_EXTENDED_DELAY_MIN 9900 #define RTAS_EXTENDED_DELAY_MAX 9905 -#define RTAS_UNKNOWN_OP -1099 /* Unknown RTAS Token */ - /* * In general to call RTAS use rtas_token("string") to lookup * an RTAS token for the given string (e.g. "event-scan"). _ From moilanen at austin.ibm.com Thu Mar 3 09:34:12 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Wed, 2 Mar 2005 16:34:12 -0600 Subject: [PATCH][RFC] unlikely spinlocks Message-ID: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> On our raw spinlocks, we currently have an attempt at the lock, and if we do not get it we enter a spin loop. This spinloop will likely continue for awhile, and we pridict likely. Shouldn't we predict that we will get out of the loop so our next instructions are already prefetched. Even when we miss because the lock is still held, it won't matter since we are waiting anyways. I did a couple quick benchmarks, but the results are inconclusive. 16-way 690 running specjbb with original code # ./specjbb 3000 16 1 1 19 30 120 ... Valid run, Score is 59282 16-way 690 running specjbb with unlikely code # ./specjbb 3000 16 1 1 19 30 120 ... Valid run, Score is 59541 I saw a smaller increase on a JS20 (~1.6%) JS20 specjbb w/ original code # ./specjbb 400 2 1 1 19 30 120 ... Valid run, Score is 20460 JS20 specjbb w/ unlikely code # ./specjbb 400 2 1 1 19 30 120 ... Valid run, Score is 20803 Jake Signed-off-by: Jake Moilanen --- diff -puN include/asm-ppc64/spinlock.h~unlikely-spinlocks include/asm-ppc64/spinlock.h --- linux-2.6-bk/include/asm-ppc64/spinlock.h~unlikely-spinlocks Wed Mar 2 13:55:39 2005 +++ linux-2.6-bk-moilanen/include/asm-ppc64/spinlock.h Wed Mar 2 13:55:40 2005 @@ -110,7 +110,7 @@ static void __inline__ _raw_spin_lock(sp HMT_low(); if (SHARED_PROCESSOR) __spin_yield(lock); - } while (likely(lock->lock != 0)); + } while (unlikely(lock->lock != 0)); HMT_medium(); } } @@ -128,7 +128,7 @@ static void __inline__ _raw_spin_lock_fl HMT_low(); if (SHARED_PROCESSOR) __spin_yield(lock); - } while (likely(lock->lock != 0)); + } while (unlikely(lock->lock != 0)); HMT_medium(); local_irq_restore(flags_dis); } @@ -194,7 +194,7 @@ static void __inline__ _raw_read_lock(rw HMT_low(); if (SHARED_PROCESSOR) __rw_yield(rw); - } while (likely(rw->lock < 0)); + } while (unlikely(rw->lock < 0)); HMT_medium(); } } @@ -251,7 +251,7 @@ static void __inline__ _raw_write_lock(r HMT_low(); if (SHARED_PROCESSOR) __rw_yield(rw); - } while (likely(rw->lock != 0)); + } while (unlikely(rw->lock != 0)); HMT_medium(); } } _ From benh at kernel.crashing.org Thu Mar 3 10:39:16 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Mar 2005 10:39:16 +1100 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050302181206.GA2741@us.ibm.com> References: <20050302181206.GA2741@us.ibm.com> Message-ID: <1109806756.5680.127.camel@gaston> On Wed, 2005-03-02 at 10:12 -0800, Nishanth Aravamudan wrote: > > These warnings are emitted for pretty much every driver. It looks like > it is becuase with CONFIG_EEH undefined (it's a pSeries thing? -- my > interpretation from looking at the ppc64 Kconfig), eeh_check_failure() > becomes #define'd to simply it's second parameter, which in the case of > assignment statements ia statement with no effect. It's not a big deal, > the kernels still compile (with a patch to adbhid.c which I'll mention > in a second) but it's a lot of noise to be generated because I don't > have a pSeries machine... Stupid gcc :) > Now, to the build-blocking code: > > In drivers/macintosh/adbhid.c::1159: > > static int __init adbhid_init(void) > { > #ifndef CONFIG_MAC > if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) > return 0; > #endif > ... > > I don't see CONFIG_MAC in my .config (attached below [1]), and _MACH_chrp is > not defined for ppc64 (it's in asm-ppc/processor.h not in > ppc64/processor.h. I just removed the _MACH_chrp conditional in my local > code to get the kernel to build. I'm not sure what the actual solution > is, but I thought you all should know about it. There is no ADB bus on a G5, so the driver isn't useful anyway. Currently, ppc64 allows you to enable pmac drivers that won't build, but they also are useless on G5s. I'll fix that over time though. Ben. From nacc at us.ibm.com Thu Mar 3 11:23:08 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 2 Mar 2005 16:23:08 -0800 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <1109806756.5680.127.camel@gaston> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> Message-ID: <20050303002308.GO2741@us.ibm.com> On Thu, Mar 03, 2005 at 10:39:16AM +1100, Benjamin Herrenschmidt wrote: > On Wed, 2005-03-02 at 10:12 -0800, Nishanth Aravamudan wrote: > > > > These warnings are emitted for pretty much every driver. It looks like > > it is becuase with CONFIG_EEH undefined (it's a pSeries thing? -- my > > interpretation from looking at the ppc64 Kconfig), eeh_check_failure() > > becomes #define'd to simply it's second parameter, which in the case of > > assignment statements ia statement with no effect. It's not a big deal, > > the kernels still compile (with a patch to adbhid.c which I'll mention > > in a second) but it's a lot of noise to be generated because I don't > > have a pSeries machine... > > Stupid gcc :) > > > Now, to the build-blocking code: > > > > In drivers/macintosh/adbhid.c::1159: > > > > static int __init adbhid_init(void) > > { > > #ifndef CONFIG_MAC > > if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) > > return 0; > > #endif > > ... > > > > I don't see CONFIG_MAC in my .config (attached below [1]), and _MACH_chrp is > > not defined for ppc64 (it's in asm-ppc/processor.h not in > > ppc64/processor.h. I just removed the _MACH_chrp conditional in my local > > code to get the kernel to build. I'm not sure what the actual solution > > is, but I thought you all should know about it. > > There is no ADB bus on a G5, so the driver isn't useful anyway. > Currently, ppc64 allows you to enable pmac drivers that won't build, but > they also are useless on G5s. I'll fix that over time though. Ok, I'll take it out of the config. Thanks! -Nish From jschopp at austin.ibm.com Thu Mar 3 11:55:47 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Wed, 02 Mar 2005 18:55:47 -0600 Subject: [PATCH][RFC] unlikely spinlocks In-Reply-To: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> References: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> Message-ID: <42266093.5080101@austin.ibm.com> Jake Moilanen wrote: > On our raw spinlocks, we currently have an attempt at the lock, and if > we do not get it we enter a spin loop. This spinloop will likely > continue for awhile, and we pridict likely. > > Shouldn't we predict that we will get out of the loop so our next > instructions are already prefetched. Even when we miss because the lock > is still held, it won't matter since we are waiting anyways. I agree with you in principle. It would be nice to have some better supporting measurements as well though. > > I did a couple quick benchmarks, but the results are inconclusive. > > 16-way 690 running specjbb with original code > # ./specjbb 3000 16 1 1 19 30 120 > ... > Valid run, Score is 59282 > > 16-way 690 running specjbb with unlikely code > # ./specjbb 3000 16 1 1 19 30 120 > ... > Valid run, Score is 59541 > > I saw a smaller increase on a JS20 (~1.6%) Percentage wise the 690 increase was smaller > > JS20 specjbb w/ original code > # ./specjbb 400 2 1 1 19 30 120 > ... > Valid run, Score is 20460 > > > JS20 specjbb w/ unlikely code > # ./specjbb 400 2 1 1 19 30 120 > ... > Valid run, Score is 20803 > > Jake My guess is there is some variance in specjbb runs. The variance might be greater than the amount of improvement. It is still possible to use statistics to show the amount of the increase. If you could get me the results of say 12 runs on each kernel I could do the analysis for you. On a side note. Do you have the assembly generated by _raw_spin_lock() and brethren? They always get inlined so I doubt a simple objdump would do it. I'm curious how good the compiler is at optimizing away things. From xma at us.ibm.com Thu Mar 3 12:58:58 2005 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 3 Mar 2005 01:58:58 +0000 Subject: PCI: Unable to reserve mem region Message-ID: When I loaded Mellanox device driver on P615, I hit below problem: Mar 2 14:31:39 elm3b5 kernel: ib_mthca: Initializing (0000:62:00.0) Mar 2 14:31:39 elm3b5 kernel: PCI: Enabling device: (0000:62:00.0), cmd 142 Mar 2 14:31:39 elm3b5 kernel: PCI: Unable to reserve mem region #5:8000000 at 3fcc 0000000 for device 0000:62:00.0 Mar 2 14:31:39 elm3b5 kernel: ib_mthca 0000:62:00.0: Cannot obtain PCI resource s, aborting. Mar 2 14:31:39 elm3b5 kernel: ib_mchca: probe of 0000:62:00.0 failed with error -16 Anybody has any idea to fix this problem? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050303/96bbec81/attachment.htm From ntl at pobox.com Thu Mar 3 14:37:57 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 2 Mar 2005 21:37:57 -0600 Subject: [PATCH] fix eeh.h compile warnings In-Reply-To: <20050302181206.GA2741@us.ibm.com> References: <20050302181206.GA2741@us.ibm.com> Message-ID: <20050303033757.GD5897@otto> On Wed, Mar 02, 2005 at 07:41:13PM -0600, Nishanth Aravamudan wrote: > > While building 2.6.11 for a G5, I noticed the following errors being > spit out (gcc 3.3.5): > > include/asm/eeh.h: In function `eeh_memcpy_fromio': > include/asm/eeh.h:265: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insb': > include/asm/eeh.h:353: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insw_ns': > include/asm/eeh.h:360: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insl_ns': > include/asm/eeh.h:367: warning: statement with no effect > include/asm/eeh.h: In function `eeh_memcpy_fromio': > include/asm/eeh.h:265: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insb': > include/asm/eeh.h:353: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insw_ns': > include/asm/eeh.h:360: warning: statement with no effect > include/asm/eeh.h: In function `eeh_insl_ns': > include/asm/eeh.h:367: warning: statement with no effect > > These warnings are emitted for pretty much every driver. It looks like > it is becuase with CONFIG_EEH undefined (it's a pSeries thing? -- my > interpretation from looking at the ppc64 Kconfig), eeh_check_failure() > becomes #define'd to simply it's second parameter, which in the case of > assignment statements ia statement with no effect I don't have a toolchain readily available which gives these warnings, but does this fix them? Use static inlines instead of #defines for stub functions when CONFIG_EEH=n. Signed-off-by: Nathan Lynch Index: linux-2.6.11/include/asm-ppc64/eeh.h =================================================================== --- linux-2.6.11.orig/include/asm-ppc64/eeh.h 2005-03-02 07:38:38.000000000 +0000 +++ linux-2.6.11/include/asm-ppc64/eeh.h 2005-03-03 01:39:25.000000000 +0000 @@ -104,17 +104,30 @@ int eeh_unregister_notifier(struct notif */ #define EEH_IO_ERROR_VALUE(size) (~0U >> ((4 - (size)) * 8)) -#else -#define eeh_init() -#define eeh_check_failure(token, val) (val) -#define eeh_dn_check_failure(dn, dev) (0) -#define pci_addr_cache_build() -#define eeh_add_device_early(dn) -#define eeh_add_device_late(dev) -#define eeh_remove_device(dev) +#else /* !CONFIG_EEH */ +static inline void eeh_init(void) { } + +static inline unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) +{ + return val; +} + +static inline int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) +{ + return 0; +} + +static inline void pci_addr_cache_build(void) { } + +static inline void eeh_add_device_early(struct device_node *dn) { } + +static inline void eeh_add_device_late(struct pci_dev *dev) { } + +static inline void eeh_remove_device(struct pci_dev *dev) { } + #define EEH_POSSIBLE_ERROR(val, type) (0) #define EEH_IO_ERROR_VALUE(size) (-1UL) -#endif +#endif /* CONFIG_EEH */ /* * MMIO read/write operations with EEH support. From apw at us.ibm.com Thu Mar 3 11:59:48 2005 From: apw at us.ibm.com (Amos Waterland) Date: Wed, 2 Mar 2005 19:59:48 -0500 Subject: [patch] init_boot_display link error Message-ID: <20050303005948.GA691@kvasir.austin.ibm.com> In pmac_setup.c, the function init_boot_display as currently written only makes sense with CONFIG_BOOTX_TEXT enabled, and causes a link error if it is not enabled. Signed-off-by: Amos Waterland ===== arch/ppc64/kernel/pmac_setup.c 1.15 vs edited ===== --- 1.15/arch/ppc64/kernel/pmac_setup.c 2005-01-08 00:43:52 -05:00 +++ edited/arch/ppc64/kernel/pmac_setup.c 2005-03-02 19:37:31 -05:00 @@ -244,7 +244,6 @@ { btext_drawchar(c); } -#endif /* CONFIG_BOOTX_TEXT */ static void __init init_boot_display(void) { @@ -280,6 +279,7 @@ return; } } +#endif /* CONFIG_BOOTX_TEXT */ /* * Early initialization. From benh at kernel.crashing.org Thu Mar 3 16:08:14 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Mar 2005 16:08:14 +1100 Subject: PCI: Unable to reserve mem region In-Reply-To: References: Message-ID: <1109826494.5679.174.camel@gaston> On Thu, 2005-03-03 at 01:58 +0000, Shirley Ma wrote: > > When I loaded Mellanox device driver on P615, I hit below problem: > > Mar 2 14:31:39 elm3b5 kernel: ib_mthca: Initializing (0000:62:00.0) > Mar 2 14:31:39 elm3b5 kernel: PCI: Enabling device: (0000:62:00.0), > cmd 142 > Mar 2 14:31:39 elm3b5 kernel: PCI: Unable to reserve mem region > #5:8000000 at 3fcc > 0000000 for device 0000:62:00.0 > Mar 2 14:31:39 elm3b5 kernel: ib_mthca 0000:62:00.0: Cannot obtain > PCI resource > s, aborting. > Mar 2 14:31:39 elm3b5 kernel: ib_mchca: probe of 0000:62:00.0 failed > with error > -16 Is it possible to have remote access to the machine & HMC ? Ben. From xma at us.ibm.com Thu Mar 3 17:24:54 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 2 Mar 2005 22:24:54 -0800 Subject: PCI: Unable to reserve mem region In-Reply-To: <1109826494.5679.174.camel@gaston> Message-ID: > Is it possible to have remote access to the machine & HMC ? Sorry. It can't be reachable. If you have any suggestion to debug this problem, I can try it out. I installed 2.6.10 kernel with openib.org Gen2 mthca driver. This driver works OK on both ia64 and ia32. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050302/006f38f1/attachment.htm From benh at kernel.crashing.org Thu Mar 3 17:24:33 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 03 Mar 2005 17:24:33 +1100 Subject: PCI: Unable to reserve mem region In-Reply-To: References: Message-ID: <1109831073.5610.185.camel@gaston> On Wed, 2005-03-02 at 22:24 -0800, Shirley Ma wrote: > > Is it possible to have remote access to the machine & HMC ? > Sorry. It can't be reachable. > If you have any suggestion to debug this problem, I can try it out. I > installed 2.6.10 kernel with openib.org Gen2 mthca driver. This driver > works OK on both ia64 and ia32. The problem is a core kernel PCI allocation issue it seems, it will require quite a bit of debugging to figure out what's wrong I'm afraid... Ben. From anton at samba.org Thu Mar 3 17:20:05 2005 From: anton at samba.org (Anton Blanchard) Date: Thu, 3 Mar 2005 17:20:05 +1100 Subject: PCI: Unable to reserve mem region In-Reply-To: References: <1109826494.5679.174.camel@gaston> Message-ID: <20050303062005.GA16915@krispykreme.ozlabs.ibm.com> > Sorry. It can't be reachable. > If you have any suggestion to debug this problem, I can try it out. I > installed 2.6.10 kernel with openib.org Gen2 mthca driver. This driver > works OK on both ia64 and ia32. How big is the PCI MMIO window on this card? lspci -v should give us this information. Anton From sonny at burdell.org Thu Mar 3 18:02:22 2005 From: sonny at burdell.org (Sonny Rao) Date: Thu, 3 Mar 2005 02:02:22 -0500 Subject: [PATCH][RFC] unlikely spinlocks In-Reply-To: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> References: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> Message-ID: <20050303070222.GA26059@kevlar.burdell.org> On Wed, Mar 02, 2005 at 04:34:12PM -0600, Jake Moilanen wrote: > On our raw spinlocks, we currently have an attempt at the lock, and if > we do not get it we enter a spin loop. This spinloop will likely > continue for awhile, and we pridict likely. > > Shouldn't we predict that we will get out of the loop so our next > instructions are already prefetched. Even when we miss because the lock > is still held, it won't matter since we are waiting anyways. > > I did a couple quick benchmarks, but the results are inconclusive. > > 16-way 690 running specjbb with original code > # ./specjbb 3000 16 1 1 19 30 120 > ... > Valid run, Score is 59282 > > 16-way 690 running specjbb with unlikely code > # ./specjbb 3000 16 1 1 19 30 120 > ... > Valid run, Score is 59541 > > I saw a smaller increase on a JS20 (~1.6%) > > JS20 specjbb w/ original code > # ./specjbb 400 2 1 1 19 30 120 > ... > Valid run, Score is 20460 > > > JS20 specjbb w/ unlikely code > # ./specjbb 400 2 1 1 19 30 120 > ... > Valid run, Score is 20803 Hmm, I doubt you want to use specjbb to show spin-lock contention. Unless I'm missing something, jbb scales really well in terms of the kernel, most of the benchmark runs in userspace and the JVM's own locking strategies probably have a bigger impact on performance than the kernel's _raw_spin_lock() implementation. I should probably have Java Perf. guys get oprofile data for jbb to confirm this conclusively. If you use FFSB with enough threads doing lots of file-descriptor activity, you'll see tons of lock contention on the fget_light function. This is a pretty well known scalability problem, and I've been able to drive my 16-way LPAR to > 40% spin_lock time doing things like this with FFSB and tons of threads. Sonny From moilanen at austin.ibm.com Fri Mar 4 06:40:34 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 3 Mar 2005 13:40:34 -0600 Subject: [PATCH] PCI address getting truncated to 32-bits Message-ID: <20050303134034.779c79e0.moilanen@austin.ibm.com> While looking at another problem, I ran across this. It looks like we are truncated our pci addresses coming out of "assigned-addresses" to 32-bits. Signed-off-by: Jake Moilanen -- diff -puN arch/ppc64/kernel/prom.c~offb_dsi arch/ppc64/kernel/prom.c --- linux-2.6.11/arch/ppc64/kernel/prom.c~offb_dsi Thu Mar 3 10:23:22 2005 +++ linux-2.6.11-moilanen/arch/ppc64/kernel/prom.c Thu Mar 3 13:25:54 2005 @@ -333,7 +333,7 @@ static unsigned long __init interpret_pc while ((l -= sizeof(struct pci_reg_property)) >= 0) { if (!measure_only) { adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo; + adr[i].address = ((unsigned long)pci_addrs[i].addr.a_mid << 32) | pci_addrs[i].addr.a_lo; adr[i].size = pci_addrs[i].size_lo; } ++i; From markus at unixforces.net Fri Mar 4 06:54:32 2005 From: markus at unixforces.net (Markus Rothe) Date: Thu, 3 Mar 2005 19:54:32 +0000 Subject: Display problems with kernel > 2.6.9 Message-ID: <20050303195432.GA9010@unixforces.net> Hi, I have a problem with my Apple Cinema Display, if I use kernel versions above 2.6.9. The display is connected through the Apple Display Connector (ADC) to my G5 and it's ATI Radeon 9600 graphics card. The problem is that there are many "blue lightnings" all over the display. With blue lightning I mean a small set of pixels which turn into light blue for about half a second. This happens both in console mode and if I run Xorg. I've taken tree screenshots available at [1], [2] and [3]. The screenshots have been taken while runnig kernel-2.6.11, but this problem occured with all kernels after 2.6.9. Markus [1] http://www.unixforces.net/downloads/blue_lightning_1.png (~1.9 MB) [2] http://www.unixforces.net/downloads/blue_lightning_2.png (~2.0 MB) [3] http://www.unixforces.net/downloads/blue_lightning_3.png (~1.9 MB) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050303/f8d2ebd3/attachment.pgp From moilanen at austin.ibm.com Fri Mar 4 09:02:08 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 3 Mar 2005 16:02:08 -0600 Subject: [PATCH] PCI address getting truncated to 32-bits In-Reply-To: <20050303134034.779c79e0.moilanen@austin.ibm.com> References: <20050303134034.779c79e0.moilanen@austin.ibm.com> Message-ID: <20050303160208.242e29a9.moilanen@austin.ibm.com> On Thu, 3 Mar 2005 13:40:34 -0600 Jake Moilanen wrote: > While looking at another problem, I ran across this. It looks like we > are truncated our pci addresses coming out of "assigned-addresses" to > 32-bits. Probably need it for of_finish_dynamic_node() too. Signed-off-by: Jake Moilanen --- diff -puN arch/ppc64/kernel/prom.c~offb_dsi arch/ppc64/kernel/prom.c --- linux-2.6.11/arch/ppc64/kernel/prom.c~offb_dsi Thu Mar 3 10:23:22 2005 +++ linux-2.6.11-moilanen/arch/ppc64/kernel/prom.c Thu Mar 3 16:09:02 2005 @@ -333,7 +333,8 @@ static unsigned long __init interpret_pc while ((l -= sizeof(struct pci_reg_property)) >= 0) { if (!measure_only) { adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo; + adr[i].address = ((unsigned long)pci_addrs[i].addr.a_mid << 32) + | pci_addrs[i].addr.a_lo; adr[i].size = pci_addrs[i].size_lo; } ++i; @@ -1712,7 +1713,8 @@ static int of_finish_dynamic_node(struct } while ((l -= sizeof(struct pci_reg_property)) >= 0) { adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo; + adr[i].address = ((unsigned long)pci_addrs[i].addr.a_mid << 32) + | pci_addrs[i].addr.a_lo; adr[i].size = pci_addrs[i].size_lo; ++i; } From paulus at samba.org Fri Mar 4 20:18:39 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 4 Mar 2005 20:18:39 +1100 Subject: RFC/Patch more xmon additions In-Reply-To: <421E3BE3.90301@vnet.ibm.com> References: <421E3BE3.90301@vnet.ibm.com> Message-ID: <16936.10223.704710.234312@cargo.ozlabs.ibm.com> will schmidt writes: > Am looking for comments on this additional function i've added to xmon > on the side.. > > the bulk of my intent was to make it easier for me to poke at memory > within a particular user process. The main problem I have with it is that we seem to be accessing a lot of kernel data structures without checking any pointers or using mread() to read the memory safely. One of the goals of xmon is that it should be as reliable as possible even if kernel data structures are corrupted, and I think your patch would reduce that reliability. Also, I'm not sure that there is any point doing a spin_trylock(), since all cpus are supposed to be in xmon by the time you get to a command prompt. By all means bail out if spin_is_locked() returns true, but I don't see the need to actually take the lock. Regards, Paul. From linas at austin.ibm.com Sat Mar 5 03:42:36 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 4 Mar 2005 10:42:36 -0600 Subject: [PATCH] fix eeh.h compile warnings In-Reply-To: <20050303033757.GD5897@otto> References: <20050302181206.GA2741@us.ibm.com> <20050303033757.GD5897@otto> Message-ID: <20050304164236.GT1220@austin.ibm.com> On Wed, Mar 02, 2005 at 09:37:57PM -0600, Nathan Lynch was heard to remark: > > I don't have a toolchain readily available which gives these warnings, > but does this fix them? I think it should > Use static inlines instead of #defines for stub functions when > CONFIG_EEH=n. Its more elegant your way anyway ... > Signed-off-by: Nathan Lynch From olof at austin.ibm.com Sat Mar 5 03:57:17 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 4 Mar 2005 10:57:17 -0600 Subject: [PATCH] fix eeh.h compile warnings In-Reply-To: <20050303033757.GD5897@otto> References: <20050302181206.GA2741@us.ibm.com> <20050303033757.GD5897@otto> Message-ID: <20050304165717.GA5789@austin.ibm.com> On Wed, Mar 02, 2005 at 09:37:57PM -0600, Nathan Lynch wrote: > I don't have a toolchain readily available which gives these warnings, > but does this fix them? Yep, it does here. > Use static inlines instead of #defines for stub functions when > CONFIG_EEH=n. > > Signed-off-by: Nathan Lynch Acked-by: Olof Johansson From nacc at us.ibm.com Sat Mar 5 05:13:32 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Fri, 4 Mar 2005 10:13:32 -0800 Subject: [PATCH] fix eeh.h compile warnings In-Reply-To: <20050304165717.GA5789@austin.ibm.com> References: <20050302181206.GA2741@us.ibm.com> <20050303033757.GD5897@otto> <20050304165717.GA5789@austin.ibm.com> Message-ID: <20050304181332.GA2689@us.ibm.com> On Fri, Mar 04, 2005 at 10:57:17AM -0600, Olof Johansson wrote: > On Wed, Mar 02, 2005 at 09:37:57PM -0600, Nathan Lynch wrote: > > > I don't have a toolchain readily available which gives these warnings, > > but does this fix them? > > Yep, it does here. Here as well, sorry for the delayed response. > > Use static inlines instead of #defines for stub functions when > > CONFIG_EEH=n. > > > > Signed-off-by: Nathan Lynch > > Acked-by: Olof Johansson Acked-by: Nishanth Aravamudan From paulus at samba.org Sat Mar 5 23:13:02 2005 From: paulus at samba.org (Paul Mackerras) Date: Sat, 5 Mar 2005 23:13:02 +1100 Subject: [PATCH] Updated U3 AGP patch Message-ID: <16937.41550.251010.982065@cargo.ozlabs.ibm.com> This patch is based on Jerome Glisse's work with some extra bits that I found in Darwin. It adds support for the U3 AGP bridge for both ppc32 and ppc64 kernels. It also includes the suspend/resume support that I need for my 1.5GHz powerbook to be able to sleep when AGP is active. This doesn't solve the potential cache problems, but so far I haven't seen them in practice... Signed-off-by: Paul Mackerras diff -urN linux-2.5/drivers/char/agp/Kconfig g5-ppc64/drivers/char/agp/Kconfig --- linux-2.5/drivers/char/agp/Kconfig 2005-01-21 08:40:04.000000000 +1100 +++ g5-ppc64/drivers/char/agp/Kconfig 2005-02-21 18:29:10.000000000 +1100 @@ -1,6 +1,6 @@ config AGP tristate "/dev/agpgart (AGP Support)" if !GART_IOMMU - depends on ALPHA || IA64 || PPC32 || X86 + depends on ALPHA || IA64 || PPC32 || PPC64 || X86 default y if GART_IOMMU ---help--- AGP (Accelerated Graphics Port) is a bus system mainly used to @@ -156,11 +156,11 @@ default AGP config AGP_UNINORTH - tristate "Apple UniNorth AGP support" + tristate "Apple UniNorth & U3 AGP support" depends on AGP && PPC_PMAC help This option gives you AGP support for Apple machines with a - UniNorth bridge. + UniNorth or U3 (Apple G5) bridge. config AGP_EFFICEON tristate "Transmeta Efficeon support" diff -urN linux-2.5/drivers/char/agp/uninorth-agp.c g5-ppc64/drivers/char/agp/uninorth-agp.c --- linux-2.5/drivers/char/agp/uninorth-agp.c 2004-12-28 10:24:26.000000000 +1100 +++ g5-ppc64/drivers/char/agp/uninorth-agp.c 2005-03-05 23:09:40.000000000 +1100 @@ -6,10 +6,26 @@ #include #include #include +#include #include #include +#include #include "agp.h" +/* + * NOTES for uninorth3 (G5 AGP) supports : + * + * There maybe also possibility to have bigger cache line size for + * agp (see pmac_pci.c and look for cache line). Need to be investigated + * by someone. + * + * PAGE size are hardcoded but this may change, see asm/page.h. + * + * Jerome Glisse + */ +static int uninorth_rev; +static int is_u3; + static int uninorth_fetch_size(void) { int i; @@ -39,26 +55,39 @@ static void uninorth_tlbflush(struct agp_memory *mem) { + u32 ctrl = UNI_N_CFG_GART_ENABLE; + + if (is_u3) + ctrl |= U3_N_CFG_GART_PERFRD; pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_INVAL); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_2xRESET); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE); + ctrl | UNI_N_CFG_GART_INVAL); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, ctrl); + + if (uninorth_rev <= 0x30) { + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + ctrl | UNI_N_CFG_GART_2xRESET); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + ctrl); + } } static void uninorth_cleanup(void) { - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_INVAL); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - 0); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_2xRESET); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - 0); + u32 tmp; + + pci_read_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, &tmp); + if (!(tmp & UNI_N_CFG_GART_ENABLE)) + return; + tmp |= UNI_N_CFG_GART_INVAL; + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, tmp); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, 0); + + if (uninorth_rev <= 0x30) { + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + UNI_N_CFG_GART_2xRESET); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + 0); + } } static int uninorth_configure(void) @@ -81,8 +110,21 @@ * the AGP aperture isn't mapped at bus physical address 0 */ agp_bridge->gart_bus_addr = 0; +#ifdef CONFIG_PPC64 + /* Assume U3 or later on PPC64 systems */ + /* high 4 bits of GART physical address go in UNI_N_CFG_AGP_BASE */ + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_AGP_BASE, + (agp_bridge->gatt_bus_addr >> 32) & 0xf); +#else pci_write_config_dword(agp_bridge->dev, - UNI_N_CFG_AGP_BASE, agp_bridge->gart_bus_addr); + UNI_N_CFG_AGP_BASE, agp_bridge->gart_bus_addr); +#endif + + if (is_u3) { + pci_write_config_dword(agp_bridge->dev, + UNI_N_CFG_GART_DUMMY_PAGE, + agp_bridge->scratch_page_real >> 12); + } return 0; } @@ -111,14 +153,54 @@ } for (i = 0, j = pg_start; i < mem->page_count; i++, j++) { - agp_bridge->gatt_table[j] = cpu_to_le32((mem->memory[i] & 0xfffff000) | 0x00000001UL); + agp_bridge->gatt_table[j] = + cpu_to_le32((mem->memory[i] & 0xFFFFF000UL) | 0x1UL); + flush_dcache_range((unsigned long)__va(mem->memory[i]), + (unsigned long)__va(mem->memory[i])+0x1000); + } + (void)in_le32((volatile u32*)&agp_bridge->gatt_table[pg_start]); + mb(); + flush_dcache_range((unsigned long)&agp_bridge->gatt_table[pg_start], + (unsigned long)&agp_bridge->gatt_table[pg_start + + mem->page_count]); + + uninorth_tlbflush(mem); + return 0; +} + +static int u3_insert_memory(struct agp_memory *mem, off_t pg_start, int type) +{ + int i, j, num_entries; + void *temp; + + temp = agp_bridge->current_size; + num_entries = A_SIZE_32(temp)->num_entries; + + if (type != 0 || mem->type != 0) + /* We know nothing of memory types */ + return -EINVAL; + if ((pg_start + mem->page_count) > num_entries) + return -EINVAL; + + j = pg_start; + + while (j < (pg_start + mem->page_count)) { + if (!PGE_EMPTY(agp_bridge, agp_bridge->gatt_table[j])) + return -EBUSY; + j++; + } + + for (i = 0, j = pg_start; i < mem->page_count; i++, j++) { + agp_bridge->gatt_table[j] = ((mem->memory[i] >> PAGE_SHIFT) | + 0x80000000UL); flush_dcache_range((unsigned long)__va(mem->memory[i]), (unsigned long)__va(mem->memory[i])+0x1000); } (void)in_le32((volatile u32*)&agp_bridge->gatt_table[pg_start]); mb(); flush_dcache_range((unsigned long)&agp_bridge->gatt_table[pg_start], - (unsigned long)&agp_bridge->gatt_table[pg_start + mem->page_count]); + (unsigned long)&agp_bridge->gatt_table[pg_start + + mem->page_count]); uninorth_tlbflush(mem); return 0; @@ -126,15 +208,31 @@ static void uninorth_agp_enable(u32 mode) { - u32 command, scratch; + u32 command, scratch, status; int timeout; pci_read_config_dword(agp_bridge->dev, agp_bridge->capndx + PCI_AGP_STATUS, - &command); + &status); + + command = agp_collect_device_status(mode, status); + command |= PCI_AGP_COMMAND_AGP; + + if (uninorth_rev == 0x21) { + /* + * Darwin disable AGP 4x on this revision, thus we + * may assume it's broken. This is an AGP2 controller. + */ + command &= ~AGPSTAT2_4X; + } - command = agp_collect_device_status(mode, command); - command |= 0x100; + if ((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { + /* + * We need to to set REQ_DEPTH to 7 for U3 versions 1.0, 2.1, + * 2.2 and 2.3, Darwin do so. + */ + command |= (7 << AGPSTAT_RQ_DEPTH_SHIFT); + } uninorth_tlbflush(NULL); @@ -146,15 +244,74 @@ pci_read_config_dword(agp_bridge->dev, agp_bridge->capndx + PCI_AGP_COMMAND, &scratch); - } while ((scratch & 0x100) == 0 && ++timeout < 1000); - if ((scratch & 0x100) == 0) + } while ((scratch & PCI_AGP_COMMAND_AGP) == 0 && ++timeout < 1000); + if ((scratch & PCI_AGP_COMMAND_AGP) == 0) printk(KERN_ERR PFX "failed to write UniNorth AGP command reg\n"); - agp_device_command(command, 0); + if (uninorth_rev >= 0x30) { + /* This is an AGP V3 */ + agp_device_command(command, (status & 0x8)); + } else { + /* AGP V2 */ + agp_device_command(command, 0); + } uninorth_tlbflush(NULL); } +#ifdef CONFIG_PM +static int agp_uninorth_suspend(struct pci_dev *pdev, pm_message_t state) +{ + u32 cmd; + u8 agp; + struct pci_dev *device = NULL; + + if (state != PMSG_SUSPEND) + return 0; + + /* turn off AGP on the video chip, if it was enabled */ + for_each_pci_dev(device) { + /* Don't touch the bridge yet, device first */ + if (device == pdev) + continue; + /* Only deal with devices on the same bus here, no Mac has a P2P + * bridge on the AGP port, and mucking around the entire PCI tree + * is source of problems on some machines because of a bug in + * some versions of pci_find_capability() when hitting a dead device + */ + if (device->bus != pdev->bus) + continue; + agp = pci_find_capability(device, PCI_CAP_ID_AGP); + if (!agp) + continue; + pci_read_config_dword(device, agp + PCI_AGP_COMMAND, &cmd); + if (!(cmd & PCI_AGP_COMMAND_AGP)) + continue; + printk("uninorth-agp: disabling AGP on device %s\n", pci_name(device)); + cmd &= ~PCI_AGP_COMMAND_AGP; + pci_write_config_dword(device, agp + PCI_AGP_COMMAND, cmd); + } + + /* turn off AGP on the bridge */ + agp = pci_find_capability(pdev, PCI_CAP_ID_AGP); + pci_read_config_dword(pdev, agp + PCI_AGP_COMMAND, &cmd); + if (cmd & PCI_AGP_COMMAND_AGP) { + printk("uninorth-agp: disabling AGP on bridge %s\n", pci_name(pdev)); + cmd &= ~PCI_AGP_COMMAND_AGP; + pci_write_config_dword(pdev, agp + PCI_AGP_COMMAND, cmd); + } + /* turn off the GART */ + uninorth_cleanup(); + + return 0; +} + +static int agp_uninorth_resume(struct pci_dev *pdev) +{ + return 0; +} +#endif + static int uninorth_create_gatt_table(void) { char *table; @@ -202,10 +359,8 @@ agp_bridge->gatt_table = (u32 *)table; agp_bridge->gatt_bus_addr = virt_to_phys(table); - for (i = 0; i < num_entries; i++) { - agp_bridge->gatt_table[i] = - (unsigned long) agp_bridge->scratch_page; - } + for (i = 0; i < num_entries; i++) + agp_bridge->gatt_table[i] = 0; flush_dcache_range((unsigned long)table, (unsigned long)table_end); @@ -258,6 +413,22 @@ {4, 1024, 0, 1} }; +/* + * Not sure that u3 supports that high aperture sizes but it + * would strange if it did not :) + */ +static struct aper_size_info_32 u3_sizes[8] = +{ + {512, 131072, 7, 128}, + {256, 65536, 6, 64}, + {128, 32768, 5, 32}, + {64, 16384, 4, 16}, + {32, 8192, 3, 8}, + {16, 4096, 2, 4}, + {8, 2048, 1, 2}, + {4, 1024, 0, 1} +}; + struct agp_bridge_driver uninorth_agp_driver = { .owner = THIS_MODULE, .aperture_sizes = (void *)uninorth_sizes, @@ -282,6 +453,31 @@ .cant_use_aperture = 1, }; +struct agp_bridge_driver u3_agp_driver = { + .owner = THIS_MODULE, + .aperture_sizes = (void *)u3_sizes, + .size_type = U32_APER_SIZE, + .num_aperture_sizes = 8, + .configure = uninorth_configure, + .fetch_size = uninorth_fetch_size, + .cleanup = uninorth_cleanup, + .tlb_flush = uninorth_tlbflush, + .mask_memory = agp_generic_mask_memory, + .masks = NULL, + .cache_flush = null_cache_flush, + .agp_enable = uninorth_agp_enable, + .create_gatt_table = uninorth_create_gatt_table, + .free_gatt_table = uninorth_free_gatt_table, + .insert_memory = u3_insert_memory, + .remove_memory = agp_generic_remove_memory, + .alloc_by_type = agp_generic_alloc_by_type, + .free_by_type = agp_generic_free_by_type, + .agp_alloc_page = agp_generic_alloc_page, + .agp_destroy_page = agp_generic_destroy_page, + .cant_use_aperture = 1, + .needs_scratch_page = 1, +}; + static struct agp_device_ids uninorth_agp_device_ids[] __devinitdata = { { .device_id = PCI_DEVICE_ID_APPLE_UNI_N_AGP, @@ -299,6 +495,18 @@ .device_id = PCI_DEVICE_ID_APPLE_UNI_N_AGP2, .chipset_name = "UniNorth 2", }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3_AGP, + .chipset_name = "U3", + }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3L_AGP, + .chipset_name = "U3L", + }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3H_AGP, + .chipset_name = "U3H", + }, }; static int __devinit agp_uninorth_probe(struct pci_dev *pdev, @@ -306,6 +514,7 @@ { struct agp_device_ids *devs = uninorth_agp_device_ids; struct agp_bridge_data *bridge; + struct device_node *uninorth_node; u8 cap_ptr; int j; @@ -327,11 +536,33 @@ return -ENODEV; found: + /* Set revision to 0 if we could not read it. */ + uninorth_rev = 0; + is_u3 = 0; + /* Locate core99 Uni-N */ + uninorth_node = of_find_node_by_name(NULL, "uni-n"); + /* Locate G5 u3 */ + if (uninorth_node == NULL) { + is_u3 = 1; + uninorth_node = of_find_node_by_name(NULL, "u3"); + } + if (uninorth_node) { + int *revprop = (int *) + get_property(uninorth_node, "device-rev", NULL); + if (revprop != NULL) + uninorth_rev = *revprop & 0x3f; + of_node_put(uninorth_node); + } + bridge = agp_alloc_bridge(); if (!bridge) return -ENOMEM; - bridge->driver = &uninorth_agp_driver; + if (is_u3) + bridge->driver = &u3_agp_driver; + else + bridge->driver = &uninorth_agp_driver; + bridge->dev = pdev; bridge->capndx = cap_ptr; @@ -369,6 +600,10 @@ .id_table = agp_uninorth_pci_table, .probe = agp_uninorth_probe, .remove = agp_uninorth_remove, +#ifdef CONFIG_PM + .suspend = agp_uninorth_suspend, + .resume = agp_uninorth_resume, +#endif }; static int __init agp_uninorth_init(void) diff -urN linux-2.5/include/asm-ppc/uninorth.h g5-ppc64/include/asm-ppc/uninorth.h --- linux-2.5/include/asm-ppc/uninorth.h 2005-02-03 18:00:28.000000000 +1100 +++ g5-ppc64/include/asm-ppc/uninorth.h 2005-03-05 17:28:33.000000000 +1100 @@ -27,13 +27,18 @@ #define UNI_N_CFG_AGP_BASE 0x90 #define UNI_N_CFG_GART_CTRL 0x94 #define UNI_N_CFG_INTERNAL_STATUS 0x98 +#define UNI_N_CFG_GART_DUMMY_PAGE 0xa4 /* UNI_N_CFG_GART_CTRL bits definitions */ -/* Not U3 */ #define UNI_N_CFG_GART_INVAL 0x00000001 #define UNI_N_CFG_GART_ENABLE 0x00000100 #define UNI_N_CFG_GART_2xRESET 0x00010000 #define UNI_N_CFG_GART_DISSBADET 0x00020000 +/* The following seems to only be used only on U3 */ +#define U3_N_CFG_GART_SYNCMODE 0x00040000 +#define U3_N_CFG_GART_PERFRD 0x00080000 +#define U3_N_CFG_GART_B2BGNT 0x00200000 +#define U3_N_CFG_GART_FASTDDR 0x00400000 /* My understanding of UniNorth AGP as of UniNorth rev 1.0x, * revision 1.5 (x4 AGP) may need further changes. diff -urN linux-2.5/include/asm-ppc64/agp.h g5-ppc64/include/asm-ppc64/agp.h --- /dev/null 2005-02-22 20:41:05.000000000 +1100 +++ g5-ppc64/include/asm-ppc64/agp.h 2005-02-21 18:30:02.000000000 +1100 @@ -0,0 +1,13 @@ +#ifndef AGP_H +#define AGP_H 1 + +#include + +/* nothing much needed here */ + +#define map_page_into_agp(page) +#define unmap_page_from_agp(page) +#define flush_agp_mappings() +#define flush_agp_cache() mb() + +#endif From stefan at nocrew.org Sun Mar 6 08:01:27 2005 From: stefan at nocrew.org (Stefan Berndtsson) Date: Sat, 05 Mar 2005 22:01:27 +0100 Subject: BTTV in linux/ppc64. Message-ID: <87mzth24ew.fsf@hades.nocrew.org> I'm having trouble getting bttv working in linux/ppc64. The same kernel works fine with linux/ppc on the same hardware. Kernel used is 2.6.11 (from kernel.org) Machine is a 1.8GHz G5 (single cpu). BTTV-card is a PCTV Rave with a bt878 chipset. It compiles nicely, but when the module is loaded, the modprobe process hangs and never returns. The rest of the system keeps working as it should. As far as I've been able to figure out, it calls driver_register(), but never returns from this. Another issue, where I get an oops, is the loading of the sound alsa module for the Vortex card in the machine. It's a Vortex au8820. Like the bttv issue, the card and driver works fine with a 32bit kernel. The oops looks like this: PCI: Enabling device: (0001:06:03.0), cmd 7 Vortex: init.... Oops: Kernel access of bad area, sig: 11 [#1] POWERMAC Modules linked in: snd_au8820 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd_mpu401_uart snd_rawmidi snd soundco ic5 kernel: NIP: D0000000000FF8EC XER: 00000000 LR: D0000000000FF8D8 CTR: C00000000018F188 REGS: c000000001c6f450 TRAP: 0300 Not tainted (2.6.11-ppc64) MSR: 9000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 24008422 DAR: e000000083f30018 DSISR: 0000000042000000 TASK: c00000000fbd77a0[1228] 'modprobe' THREAD: c000000001c6c000 GPR00: E000000083F30018 C000000001C6F6D0 D00000000010CE20 0000000000000014 GPR04: C0000000004916A0 C00000001F788990 0000000000000008 C000000000481B88 GPR08: 00000000FFFBF3FE E000000083F2B000 C000000000481578 FFFFFFFFFFFFFFFF GPR12: 0000000044008428 C000000000399C00 0000000000000000 000000000000000A GPR16: D0000000000F2112 D0000000000F2210 D000000000104600 0000000000000124 GPR20: 0000000000000000 D000000000104650 C00000000044F448 D0000000000E9000 GPR24: 0000000000000000 C000000001813C00 C00000001F79E000 C000000001E2C000 GPR28: C00000001F79E000 0000000000000000 D00000000010C880 C000000001C6F6D0 NIP [d0000000000ff8ec] .snd_vortex_probe+0x194/0x1028 [snd_au8820] LR [d0000000000ff8d8] .snd_vortex_probe+0x180/0x1028 [snd_au8820] Call Trace: [c000000001c6f6d0] [d0000000000ff8d8] .snd_vortex_probe+0x180/0x1028 [snd_au8820] (unreliable) [c000000001c6f7e0] [c00000000016c990] .pci_device_probe+0xec/0x20c [c000000001c6f880] [c0000000001c1dcc] .driver_probe_device+0x80/0x11c [c000000001c6f910] [c0000000001c2010] .driver_attach+0x84/0xfc [c000000001c6f9b0] [c0000000001c2568] .bus_add_driver+0xc4/0x1ec [c000000001c6fa60] [c0000000001c2e1c] .driver_register+0x38/0x50 [c000000001c6faf0] [c00000000016c49c] .pci_register_driver+0x80/0xf0 [c000000001c6fb80] [d0000000001007f8] .alsa_card_vortex_init+0x24/0x40 [snd_au8820] [c000000001c6fc00] [c000000000062c98] .sys_init_module+0x3d4/0x1918 [c000000001c6fe30] [c00000000000d400] syscall_exit+0x0/0x18 Instruction dump: 4800150d e8410028 2fa30000 f87b1500 419e0b78 e87e8200 48000fb5 e8410028 e93b1500 3960ffff 3d290002 38095018 <7d60052c> 7c0004ac 38600005 48001051 Any ideas? /Stefan Berndtsson From paulus at samba.org Sun Mar 6 10:42:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Sun, 6 Mar 2005 10:42:17 +1100 Subject: [PATCH] Updated U3 AGP patch In-Reply-To: References: <16937.41550.251010.982065@cargo.ozlabs.ibm.com> Message-ID: <16938.17369.373338.88220@cargo.ozlabs.ibm.com> Andreas Schwab writes: > I can't find these ids being defined anywhere in 2.6.11. Oops, my mistake, you need this bit too. Paul. diff -urN linux-2.5/include/linux/pci_ids.h g5-ppc64/include/linux/pci_ids.h --- linux-2.5/include/linux/pci_ids.h 2005-03-03 08:14:30.000000000 +1100 +++ g5-ppc64/include/linux/pci_ids.h 2005-03-03 09:36:03.000000000 +1100 @@ -861,7 +861,10 @@ #define PCI_DEVICE_ID_APPLE_IPID_ATA100 0x003b #define PCI_DEVICE_ID_APPLE_KEYLARGO_I 0x003e #define PCI_DEVICE_ID_APPLE_K2_ATA100 0x0043 +#define PCI_DEVICE_ID_APPLE_U3_AGP 0x004b #define PCI_DEVICE_ID_APPLE_K2_GMAC 0x004c +#define PCI_DEVICE_ID_APPLE_U3L_AGP 0x0058 +#define PCI_DEVICE_ID_APPLE_U3H_AGP 0x0059 #define PCI_DEVICE_ID_APPLE_TIGON3 0x1645 #define PCI_VENDOR_ID_YAMAHA 0x1073 From service at paypal.com Mon Mar 7 06:36:14 2005 From: service at paypal.com (PayPal) Date: Sun, 06 Mar 05 19:36:14 GMT Subject: PayPal Account Security Measures Message-ID: <442k$k4j$nhu6eit@1uar.yv> An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050306/8379e45c/attachment.htm From domen at coderock.org Mon Mar 7 09:23:30 2005 From: domen at coderock.org (domen at coderock.org) Date: Sun, 06 Mar 2005 23:23:30 +0100 Subject: [patch 2/2] delete unused file include_asm_ppc64_iSeries_iSeries_fixup.h Message-ID: <20050306222330.D06231ED3D@trashy.coderock.org> Remove nowhere referenced file. (egrep "filename\." didn't find anything) Signed-off-by: Domen Puncer --- kj/include/asm-ppc64/iSeries/iSeries_fixup.h | 25 ------------------------- 1 files changed, 25 deletions(-) diff -L include/asm-ppc64/iSeries/iSeries_fixup.h -puN include/asm-ppc64/iSeries/iSeries_fixup.h~remove_file-include_asm_ppc64_iSeries_iSeries_fixup.h /dev/null --- kj/include/asm-ppc64/iSeries/iSeries_fixup.h +++ /dev/null 2005-03-02 11:34:59.000000000 +0100 @@ -1,25 +0,0 @@ - -#ifndef __ISERIES_FIXUP_H__ -#define __ISERIES_FIXUP_H__ -#include - -#ifdef __cplusplus -extern "C" { -#endif - -void iSeries_fixup (void); -void iSeries_fixup_bus (struct pci_bus*); -unsigned int iSeries_scan_slot (struct pci_dev*, u16, u8, u8); - - -/* Need to store information related to the PHB bucc and make it accessible to the hose */ -struct iSeries_hose_arch_data { - u32 hvBusNumber; -}; - - -#ifdef __cplusplus -} -#endif - -#endif /* __ISERIES_FIXUP_H__ */ _ From domen at coderock.org Mon Mar 7 09:23:27 2005 From: domen at coderock.org (domen at coderock.org) Date: Sun, 06 Mar 2005 23:23:27 +0100 Subject: [patch 1/2] delete unused file arch_ppc64_boot_no_initrd.c Message-ID: <20050306222327.CD55F1EC90@trashy.coderock.org> Remove nowhere referenced file. (egrep "filename\." didn't find anything) Signed-off-by: Domen Puncer --- kj/arch/ppc64/boot/no_initrd.c | 2 -- 1 files changed, 2 deletions(-) diff -L arch/ppc64/boot/no_initrd.c -puN arch/ppc64/boot/no_initrd.c~remove_file-arch_ppc64_boot_no_initrd.c /dev/null --- kj/arch/ppc64/boot/no_initrd.c +++ /dev/null 2005-03-02 11:34:59.000000000 +0100 @@ -1,2 +0,0 @@ -char initrd_data[1]; -int initrd_len = 0; _ From sfr at canb.auug.org.au Mon Mar 7 11:36:00 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 7 Mar 2005 11:36:00 +1100 Subject: [patch 2/2] delete unused file include_asm_ppc64_iSeries_iSeries_fixup.h In-Reply-To: <20050306222330.D06231ED3D@trashy.coderock.org> References: <20050306222330.D06231ED3D@trashy.coderock.org> Message-ID: <20050307113600.2c1d52b5.sfr@canb.auug.org.au> On Sun, 06 Mar 2005 23:23:30 +0100 domen at coderock.org wrote: > > > Remove nowhere referenced file. (egrep "filename\." didn't find anything) And, in fact, none of the things declared here exist any more. And iSeries build happily without it. > Signed-off-by: Domen Puncer Acked-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050307/785b7cea/attachment.pgp From paulus at samba.org Mon Mar 7 19:57:50 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Mar 2005 19:57:50 +1100 Subject: [PATCH] PPC64 Addresses from OF getting truncated to 32-bits Message-ID: <16940.6030.778567.924954@cargo.ozlabs.ibm.com> This patch is from Jake Moilanen , reformatted by me. Signed-off-by: Jake Moilanen Signed-off-by: Paul Mackerras The `assigned-addresses' property in the Open Firmware device tree nodes for PCI devices has 64 bits of PCI bus address, but we were only using 32. This patch fixes it so we use all 64. diff -urN linux-2.5/arch/ppc64/kernel/prom.c test/arch/ppc64/kernel/prom.c --- linux-2.5/arch/ppc64/kernel/prom.c 2005-03-07 08:21:53.000000000 +1100 +++ test/arch/ppc64/kernel/prom.c 2005-03-07 19:49:13.000000000 +1100 @@ -335,7 +335,8 @@ while ((l -= sizeof(struct pci_reg_property)) >= 0) { if (!measure_only) { adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo; + adr[i].address = pci_addrs[i].addr.a_lo | + ((u64)pci_addrs[i].addr.a_mid << 32); adr[i].size = pci_addrs[i].size_lo; } ++i; @@ -1721,7 +1722,8 @@ } while ((l -= sizeof(struct pci_reg_property)) >= 0) { adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo; + adr[i].address = pci_addrs[i].addr.a_lo | + ((u64)pci_addrs[i].addr.a_mid << 32); adr[i].size = pci_addrs[i].size_lo; ++i; } From paulus at samba.org Mon Mar 7 21:33:20 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 7 Mar 2005 21:33:20 +1100 Subject: [PATCH] error code cleanups for rtas wrappers In-Reply-To: <1109797837.9434.2.camel@sinatra.austin.ibm.com> References: <1109797837.9434.2.camel@sinatra.austin.ibm.com> Message-ID: <16940.11760.291422.528712@cargo.ozlabs.ibm.com> John Rose writes: > This patch changes the rtas wrapper functions in rtas.c to map RTAS failures > to conventional error values. The goal is to make failure conditions > obvious in the wrapper functions and in the caller code. Looks good, got a patch to change all the callers? Paul. From johnrose at austin.ibm.com Tue Mar 8 03:11:52 2005 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 07 Mar 2005 10:11:52 -0600 Subject: [PATCH] error code cleanups for rtas wrappers In-Reply-To: <16940.11760.291422.528712@cargo.ozlabs.ibm.com> References: <1109797837.9434.2.camel@sinatra.austin.ibm.com> <16940.11760.291422.528712@cargo.ozlabs.ibm.com> Message-ID: <1110211912.2538.15.camel@sinatra.austin.ibm.com> > Looks good, got a patch to change all the callers? I do, but the patch only affects RPA PCI Hotplug/DLPAR. I figured I'd wait for acceptance on this end before submitting those changes. Callers within PPC64 "base" were either changed by the patch above, or already had sufficient checks (rc == 0, etc). Thanks- John From paulus at samba.org Tue Mar 8 10:02:18 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 8 Mar 2005 10:02:18 +1100 Subject: [PATCH] error code cleanups for rtas wrappers In-Reply-To: <1110211912.2538.15.camel@sinatra.austin.ibm.com> References: <1109797837.9434.2.camel@sinatra.austin.ibm.com> <16940.11760.291422.528712@cargo.ozlabs.ibm.com> <1110211912.2538.15.camel@sinatra.austin.ibm.com> Message-ID: <16940.56698.707687.831617@cargo.ozlabs.ibm.com> John Rose writes: > I do, but the patch only affects RPA PCI Hotplug/DLPAR. I figured I'd > wait for acceptance on this end before submitting those changes. > Callers within PPC64 "base" were either changed by the patch above, or > already had sufficient checks (rc == 0, etc). Yes, it was the callers in drivers/pci/hotplug/rpa* that I was concerned about. Some of them were testing for specific return values. If you have a patch to fix them too I'll forward both patches to Andrew. Paul. From jschopp at austin.ibm.com Tue Mar 8 10:01:28 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Mon, 07 Mar 2005 17:01:28 -0600 Subject: [PATCH] explicitly bind idle tasks In-Reply-To: <20050302014701.GA5897@otto> References: <20050227031655.67233bb5.akpm@osdl.org> <1109542971.14993.217.camel@gaston> <20050227144928.6c71adaf.akpm@osdl.org> <20050302014701.GA5897@otto> Message-ID: <422CDD48.10006@austin.ibm.com> Nathan Lynch wrote: > With hotplug cpu and preempt, we tend to see smp_processor_id warnings > from idle loop code because it's always checking whether its cpu has > gone offline. Replacing every use of smp_processor_id with > _smp_processor_id in all idle loop code is one solution; another way > is explicitly binding idle threads to their cpus (the smp_processor_id > warning does not fire if the caller is bound only to the calling cpu). > This has the (admittedly slight) advantage of letting us know if an > idle thread ever runs on the wrong cpu. I also prefer explicitly binding idle threads to their cpus instead of replacing use of smp_processor_id with _smp_processor_id. > > > Signed-off-by: Nathan Lynch Acked-by: Joel Schopp > > Index: linux-2.6.11-rc5-mm1/init/main.c > =================================================================== > --- linux-2.6.11-rc5-mm1.orig/init/main.c 2005-03-02 00:12:07.000000000 +0000 > +++ linux-2.6.11-rc5-mm1/init/main.c 2005-03-02 00:53:04.000000000 +0000 > @@ -638,6 +638,10 @@ > { > lock_kernel(); > /* > + * init can run on any cpu. > + */ > + set_cpus_allowed(current, CPU_MASK_ALL); > + /* > * Tell the world that we're going to be the grim > * reaper of innocent orphaned children. > * > Index: linux-2.6.11-rc5-mm1/kernel/sched.c > =================================================================== > --- linux-2.6.11-rc5-mm1.orig/kernel/sched.c 2005-03-02 00:12:07.000000000 +0000 > +++ linux-2.6.11-rc5-mm1/kernel/sched.c 2005-03-02 00:47:14.000000000 +0000 > @@ -4092,6 +4092,7 @@ > idle->array = NULL; > idle->prio = MAX_PRIO; > idle->state = TASK_RUNNING; > + idle->cpus_allowed = cpumask_of_cpu(cpu); > set_task_cpu(idle, cpu); > > spin_lock_irqsave(&rq->lock, flags); > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > From johnrose at austin.ibm.com Tue Mar 8 10:54:54 2005 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 07 Mar 2005 17:54:54 -0600 Subject: [PATCH] error code cleanups rpa[php,dlpar] In-Reply-To: <16940.56698.707687.831617@cargo.ozlabs.ibm.com> References: <1109797837.9434.2.camel@sinatra.austin.ibm.com> <16940.11760.291422.528712@cargo.ozlabs.ibm.com> <1110211912.2538.15.camel@sinatra.austin.ibm.com> <16940.56698.707687.831617@cargo.ozlabs.ibm.com> Message-ID: <1110239693.2538.26.camel@sinatra.austin.ibm.com> > Yes, it was the callers in drivers/pci/hotplug/rpa* that I was > concerned about. Some of them were testing for specific return > values. If you have a patch to fix them too I'll forward both patches > to Andrew. This patch changes the RPA PCI Hotplug and DLPAR modules to use more conventional error values for return codes. The goal is to make failure conditions obvious in the wrapper functions and in the caller code. Thanks Paul. Signed-off-by: John Rose diff -puN drivers/pci/hotplug/rpaphp.h~02_rpaphp_rcs drivers/pci/hotplug/rpaphp.h --- 2_6_linus_3/drivers/pci/hotplug/rpaphp.h~02_rpaphp_rcs 2005-03-07 17:52:20.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp.h 2005-03-07 17:52:20.000000000 -0600 @@ -45,11 +45,6 @@ #define LED_ID 2 /* slow blinking */ #define LED_ACTION 3 /* fast blinking */ -/* Error status from rtas_get-sensor */ -#define NEED_POWER -9000 /* slot must be power up and unisolated to get state */ -#define PWR_ONLY -9001 /* slot must be powerd up to get state, leave isolated */ -#define ERR_SENSE_USE -9002 /* No DR operation will succeed, slot is unusable */ - /* Sensor values from rtas_get-sensor */ #define EMPTY 0 /* No card in slot */ #define PRESENT 1 /* Card in slot */ diff -puN drivers/pci/hotplug/rpaphp_core.c~02_rpaphp_rcs drivers/pci/hotplug/rpaphp_core.c --- 2_6_linus_3/drivers/pci/hotplug/rpaphp_core.c~02_rpaphp_rcs 2005-03-07 17:52:20.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_core.c 2005-03-07 17:52:20.000000000 -0600 @@ -256,12 +256,12 @@ int rpaphp_get_drc_props(struct device_n my_index = (int *) get_property(dn, "ibm,my-drc-index", NULL); if (!my_index) { /* Node isn't DLPAR/hotplug capable */ - return 1; + return -EINVAL; } rc = get_children_props(dn->parent, &indexes, &names, &types, &domains); if (rc < 0) { - return 1; + return -EINVAL; } name_tmp = (char *) &names[1]; @@ -284,7 +284,7 @@ int rpaphp_get_drc_props(struct device_n type_tmp += (strlen(type_tmp) + 1); } - return 1; + return -EINVAL; } static int is_php_type(char *drc_type) diff -puN drivers/pci/hotplug/rpaphp_pci.c~02_rpaphp_rcs drivers/pci/hotplug/rpaphp_pci.c --- 2_6_linus_3/drivers/pci/hotplug/rpaphp_pci.c~02_rpaphp_rcs 2005-03-07 17:52:20.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_pci.c 2005-03-07 17:52:20.000000000 -0600 @@ -81,8 +81,8 @@ static int rpaphp_get_sensor_state(struc rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state); - if (rc) { - if (rc == NEED_POWER || rc == PWR_ONLY) { + if (rc < 0) { + if (rc == -EFAULT || rc == -EEXIST) { dbg("%s: slot must be power up to get sensor-state\n", __FUNCTION__); @@ -91,14 +91,14 @@ static int rpaphp_get_sensor_state(struc */ rc = rtas_set_power_level(slot->power_domain, POWER_ON, &setlevel); - if (rc) { + if (rc < 0) { dbg("%s: power on slot[%s] failed rc=%d.\n", __FUNCTION__, slot->name, rc); } else { rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state); } - } else if (rc == ERR_SENSE_USE) + } else if (rc == -ENODEV) info("%s: slot is unusable\n", __FUNCTION__); else err("%s failed to get sensor state\n", __FUNCTION__); @@ -413,7 +413,7 @@ static int setup_pci_hotplug_slot_info(s if (slot->hotplug_slot->info->adapter_status == NOT_VALID) { err("%s: NOT_VALID: skip dn->full_name=%s\n", __FUNCTION__, slot->dn->full_name); - return -1; + return -EINVAL; } return 0; } @@ -426,15 +426,15 @@ static int set_phb_slot_name(struct slot dn = slot->dn; if (!dn) { - return 1; + return -EINVAL; } phb = dn->phb; if (!phb) { - return 1; + return -EINVAL; } bus = phb->bus; if (!bus) { - return 1; + return -EINVAL; } sprintf(slot->name, "%04x:%02x:%02x.%x", pci_domain_nr(bus), @@ -448,7 +448,7 @@ static int setup_pci_slot(struct slot *s if (slot->type == PHB) { rc = set_phb_slot_name(slot); - if (rc) { + if (rc < 0) { err("%s: failed to set phb slot name\n", __FUNCTION__); goto exit_rc; } @@ -509,12 +509,12 @@ static int setup_pci_slot(struct slot *s return 0; exit_rc: dealloc_slot_struct(slot); - return 1; + return -EINVAL; } int register_pci_slot(struct slot *slot) { - int rc = 1; + int rc = -EINVAL; slot->dev_type = PCI_DEV; if ((slot->type == EMBEDDED) || (slot->type == PHB)) diff -puN drivers/pci/hotplug/rpaphp_slot.c~02_rpaphp_rcs drivers/pci/hotplug/rpaphp_slot.c --- 2_6_linus_3/drivers/pci/hotplug/rpaphp_slot.c~02_rpaphp_rcs 2005-03-07 17:52:20.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_slot.c 2005-03-07 17:52:20.000000000 -0600 @@ -211,7 +211,7 @@ int register_slot(struct slot *slot) if (is_registered(slot)) { /* should't be here */ err("register_slot: slot[%s] is already registered\n", slot->name); rpaphp_release_slot(slot->hotplug_slot); - return 1; + return -EAGAIN; } retval = pci_hp_register(slot->hotplug_slot); if (retval) { @@ -270,7 +270,7 @@ int rpaphp_set_attention_status(struct s /* status: LED_OFF or LED_ON */ rc = rtas_set_indicator(DR_INDICATOR, slot->index, status); - if (rc) + if (rc < 0) err("slot(name=%s location=%s index=0x%x) set attention-status(%d) failed! rc=0x%x\n", slot->name, slot->location, slot->index, status, rc); diff -puN drivers/pci/hotplug/rpaphp_vio.c~02_rpaphp_rcs drivers/pci/hotplug/rpaphp_vio.c --- 2_6_linus_3/drivers/pci/hotplug/rpaphp_vio.c~02_rpaphp_rcs 2005-03-07 17:52:20.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_vio.c 2005-03-07 17:52:20.000000000 -0600 @@ -71,11 +71,11 @@ int register_vio_slot(struct device_node { u32 *index; char *name; - int rc = 1; + int rc = -EINVAL; struct slot *slot = NULL; rc = rpaphp_get_drc_props(dn, NULL, &name, NULL, NULL); - if (rc) + if (rc < 0) goto exit_rc; index = (u32 *) get_property(dn, "ibm,my-drc-index", NULL); if (!index) diff -puN drivers/pci/hotplug/rpadlpar_core.c~02_rpaphp_rcs drivers/pci/hotplug/rpadlpar_core.c --- 2_6_linus_3/drivers/pci/hotplug/rpadlpar_core.c~02_rpaphp_rcs 2005-03-07 17:52:51.000000000 -0600 +++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpadlpar_core.c 2005-03-07 17:53:02.000000000 -0600 @@ -142,7 +142,7 @@ static int pci_add_secondary_bus(struct child = pci_add_new_bus(bridge_dev->bus, bridge_dev, sec_busno); if (!child) { printk(KERN_ERR "%s: could not add secondary bus\n", __FUNCTION__); - return 1; + return -ENOMEM; } sprintf(child->name, "PCI Bus #%02x", child->number); @@ -204,7 +204,7 @@ static int dlpar_pci_remove_bus(struct p if (!bridge_dev) { printk(KERN_ERR "%s: unexpected null device\n", __FUNCTION__); - return 1; + return -EINVAL; } secondary_bus = bridge_dev->subordinate; @@ -212,7 +212,7 @@ static int dlpar_pci_remove_bus(struct p if (unmap_bus_range(secondary_bus)) { printk(KERN_ERR "%s: failed to unmap bus range\n", __FUNCTION__); - return 1; + return -ERANGE; } pci_remove_bus_device(bridge_dev); @@ -282,7 +282,7 @@ static int dlpar_remove_phb(struct slot } rc = dlpar_remove_root_bus(phb); - if (rc) + if (rc < 0) return rc; return 0; @@ -294,7 +294,7 @@ static int dlpar_add_phb(struct device_n phb = init_phb_dynamic(dn); if (!phb) - return 1; + return -EINVAL; return 0; } _ From ntl at pobox.com Tue Mar 8 12:56:38 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 7 Mar 2005 19:56:38 -0600 Subject: [PATCH] call idle_task_exit with irqs disabled Message-ID: <20050308015638.GA21853@otto> Seeing this very occasionally during cpu hotplug testing: Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52 Call Trace: [c0000000ef0efbe0] [c0000000000127a0] .__switch_to+0xa4/0xf0 (unreliable) [c0000000ef0efc80] [c000000000050178] .idle_task_exit+0xbc/0x15c [c0000000ef0efd10] [c00000000000d108] .cpu_die+0x18/0x68 [c0000000ef0efd90] [c00000000001023c] .dedicated_idle+0x1fc/0x254 [c0000000ef0efe80] [c00000000000fc80] .cpu_idle+0x3c/0x54 [c0000000ef0eff00] [c00000000003aa90] .start_secondary+0x108/0x148 [c0000000ef0eff90] [c00000000000bd28] .enable_64b_mode+0x0/0x28 idle_task_exit can result in a call to slb_flush_and_rebolt, which must not be called with interrupts enabled. Make the call with interrupts disabled. Signed-off-by: Nathan Lynch pSeries_setup.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.11-bk2/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- linux-2.6.11-bk2.orig/arch/ppc64/kernel/pSeries_setup.c 2005-03-07 04:09:29.000000000 +0000 +++ linux-2.6.11-bk2/arch/ppc64/kernel/pSeries_setup.c 2005-03-07 04:15:22.000000000 +0000 @@ -322,8 +322,8 @@ static void __init pSeries_discover_pic static void pSeries_mach_cpu_die(void) { - idle_task_exit(); local_irq_disable(); + idle_task_exit(); /* Some hardware requires clearing the CPPR, while other hardware does not * it is safe either way */ From ntl at pobox.com Tue Mar 8 13:00:17 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 7 Mar 2005 20:00:17 -0600 Subject: [PATCH] update irq affinity mask when migrating irqs Message-ID: <20050308020017.GB21853@otto> When offlining a cpu, any device interrupts which are bound to the cpu have their affinity forcibly reset to all cpus (the default). However, the value in /proc/irq/XXX/smp_affinity remains unchanged. Since we're doing this while all the other cpus are stopped, it should be safe to just call desc->handler->set_affinity and manually update the irq_affinity array. Signed-off-by: Nathan Lynch xics.c | 11 ++--------- 1 files changed, 2 insertions(+), 9 deletions(-) Index: linux-2.6.11-bk2/arch/ppc64/kernel/xics.c =================================================================== --- linux-2.6.11-bk2.orig/arch/ppc64/kernel/xics.c 2005-03-02 07:38:10.000000000 +0000 +++ linux-2.6.11-bk2/arch/ppc64/kernel/xics.c 2005-03-07 03:52:08.000000000 +0000 @@ -704,15 +704,8 @@ void xics_migrate_irqs_away(void) virq, cpu); /* Reset affinity to all cpus */ - xics_status[0] = default_distrib_server; - - status = rtas_call(ibm_set_xive, 3, 1, NULL, irq, - xics_status[0], xics_status[1]); - if (status) - printk(KERN_ERR "migrate_irqs_away: irq=%d " - "ibm,set-xive returns %d\n", - virq, status); - + desc->handler->set_affinity(virq, CPU_MASK_ALL); + irq_affinity[virq] = CPU_MASK_ALL; unlock: spin_unlock_irqrestore(&desc->lock, flags); } From sfr at canb.auug.org.au Tue Mar 8 17:54:22 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 8 Mar 2005 17:54:22 +1100 Subject: [PATCH] PPC64: "invert" dma mapping routines Message-ID: <20050308175422.47ff6e85.sfr@canb.auug.org.au> Hi Andrew, Linus, This patch "inverts" the PPC64 dma mapping routines so that the pci_ and vio_ ... routines are implemented in terms of the dma_ ... routines (the vio_ routines disappear anyway as noone uses them directly any more). The most noticable change after this patch is applied will be that the flags passed to dma_alloc_coherent will now be honoured (whereas they were previously silently ignored since we used to just call pci_alloc_consistent). Signed-off-by: Stephen Rothwell diffstat looks like this: arch/ppc64/kernel/dma.c | 100 +++++++++++++-------------- arch/ppc64/kernel/iommu.c | 8 +- arch/ppc64/kernel/pci.c | 2 arch/ppc64/kernel/pci_direct_iommu.c | 34 +++++---- arch/ppc64/kernel/pci_iommu.c | 55 ++++++++------- arch/ppc64/kernel/vio.c | 55 +++++++++------ include/asm-ppc64/dma-mapping.h | 20 +++++ include/asm-ppc64/iommu.h | 6 - include/asm-ppc64/pci.h | 126 +---------------------------------- include/asm-ppc64/vio.h | 27 ------- 10 files changed, 166 insertions(+), 267 deletions(-) This has been compiled for iSeries, pSeries and g5 (default configs) and booted on iSeries. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus/arch/ppc64/kernel/dma.c linus-dma.4/arch/ppc64/kernel/dma.c --- linus/arch/ppc64/kernel/dma.c 2004-10-26 16:06:41.000000000 +1000 +++ linus-dma.4/arch/ppc64/kernel/dma.c 2005-02-07 17:47:41.000000000 +1100 @@ -13,14 +13,23 @@ #include #include -int dma_supported(struct device *dev, u64 mask) +static struct dma_mapping_ops *get_dma_ops(struct device *dev) { if (dev->bus == &pci_bus_type) - return pci_dma_supported(to_pci_dev(dev), mask); + return &pci_dma_ops; #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) - return vio_dma_supported(to_vio_dev(dev), mask); -#endif /* CONFIG_IBMVIO */ + return &vio_dma_ops; +#endif + return NULL; +} + +int dma_supported(struct device *dev, u64 mask) +{ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->dma_supported(dev, mask); BUG(); return 0; } @@ -32,7 +41,7 @@ int dma_set_mask(struct device *dev, u64 return pci_set_dma_mask(to_pci_dev(dev), dma_mask); #ifdef CONFIG_IBMVIO if (dev->bus == &vio_bus_type) - return vio_set_dma_mask(to_vio_dev(dev), dma_mask); + return -EIO; #endif /* CONFIG_IBMVIO */ BUG(); return 0; @@ -42,12 +51,10 @@ EXPORT_SYMBOL(dma_set_mask); void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, int flag) { - if (dev->bus == &pci_bus_type) - return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_alloc_consistent(to_vio_dev(dev), size, dma_handle); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->alloc_coherent(dev, size, dma_handle, flag); BUG(); return NULL; } @@ -56,12 +63,10 @@ EXPORT_SYMBOL(dma_alloc_coherent); void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_handle) { - if (dev->bus == &pci_bus_type) - pci_free_consistent(to_pci_dev(dev), size, cpu_addr, dma_handle); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_free_consistent(to_vio_dev(dev), size, cpu_addr, dma_handle); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->free_coherent(dev, size, cpu_addr, dma_handle); else BUG(); } @@ -70,12 +75,10 @@ EXPORT_SYMBOL(dma_free_coherent); dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_single(to_pci_dev(dev), cpu_addr, size, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_single(to_vio_dev(dev), cpu_addr, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_single(dev, cpu_addr, size, direction); BUG(); return (dma_addr_t)0; } @@ -84,12 +87,10 @@ EXPORT_SYMBOL(dma_map_single); void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_single(to_pci_dev(dev), dma_addr, size, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_single(to_vio_dev(dev), dma_addr, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_single(dev, dma_addr, size, direction); else BUG(); } @@ -99,12 +100,11 @@ dma_addr_t dma_map_page(struct device *d unsigned long offset, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_page(to_pci_dev(dev), page, offset, size, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_page(to_vio_dev(dev), page, offset, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_single(dev, + (page_address(page) + offset), size, direction); BUG(); return (dma_addr_t)0; } @@ -113,12 +113,10 @@ EXPORT_SYMBOL(dma_map_page); void dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_page(to_pci_dev(dev), dma_address, size, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_page(to_vio_dev(dev), dma_address, size, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_single(dev, dma_address, size, direction); else BUG(); } @@ -127,12 +125,10 @@ EXPORT_SYMBOL(dma_unmap_page); int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - return pci_map_sg(to_pci_dev(dev), sg, nents, (int)direction); -#ifdef CONFIG_IBMVIO - if (dev->bus == &vio_bus_type) - return vio_map_sg(to_vio_dev(dev), sg, nents, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + return dma_ops->map_sg(dev, sg, nents, direction); BUG(); return 0; } @@ -141,12 +137,10 @@ EXPORT_SYMBOL(dma_map_sg); void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries, enum dma_data_direction direction) { - if (dev->bus == &pci_bus_type) - pci_unmap_sg(to_pci_dev(dev), sg, nhwentries, (int)direction); -#ifdef CONFIG_IBMVIO - else if (dev->bus == &vio_bus_type) - vio_unmap_sg(to_vio_dev(dev), sg, nhwentries, direction); -#endif /* CONFIG_IBMVIO */ + struct dma_mapping_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops) + dma_ops->unmap_sg(dev, sg, nhwentries, direction); else BUG(); } diff -ruNp linus/arch/ppc64/kernel/iommu.c linus-dma.4/arch/ppc64/kernel/iommu.c --- linus/arch/ppc64/kernel/iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-dma.4/arch/ppc64/kernel/iommu.c 2005-02-07 15:00:06.000000000 +1100 @@ -513,8 +513,8 @@ void iommu_unmap_single(struct iommu_tab * Returns the virtual address of the buffer and sets dma_handle * to the dma address (mapping) of the first page. */ -void *iommu_alloc_consistent(struct iommu_table *tbl, size_t size, - dma_addr_t *dma_handle) +void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size, + dma_addr_t *dma_handle, int flag) { void *ret = NULL; dma_addr_t mapping; @@ -538,7 +538,7 @@ void *iommu_alloc_consistent(struct iomm return NULL; /* Alloc enough pages (and possibly more) */ - ret = (void *)__get_free_pages(GFP_ATOMIC, order); + ret = (void *)__get_free_pages(flag, order); if (!ret) return NULL; memset(ret, 0, size); @@ -553,7 +553,7 @@ void *iommu_alloc_consistent(struct iomm return ret; } -void iommu_free_consistent(struct iommu_table *tbl, size_t size, +void iommu_free_coherent(struct iommu_table *tbl, size_t size, void *vaddr, dma_addr_t dma_handle) { unsigned int npages; diff -ruNp linus/arch/ppc64/kernel/pci.c linus-dma.4/arch/ppc64/kernel/pci.c --- linus/arch/ppc64/kernel/pci.c 2005-03-06 07:08:24.000000000 +1100 +++ linus-dma.4/arch/ppc64/kernel/pci.c 2005-03-07 10:23:14.000000000 +1100 @@ -71,7 +71,7 @@ void iSeries_pcibios_init(void); LIST_HEAD(hose_list); -struct pci_dma_ops pci_dma_ops; +struct dma_mapping_ops pci_dma_ops; EXPORT_SYMBOL(pci_dma_ops); int global_phb_number; /* Global phb counter */ diff -ruNp linus/arch/ppc64/kernel/pci_direct_iommu.c linus-dma.4/arch/ppc64/kernel/pci_direct_iommu.c --- linus/arch/ppc64/kernel/pci_direct_iommu.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-dma.4/arch/ppc64/kernel/pci_direct_iommu.c 2005-02-07 16:00:47.000000000 +1100 @@ -30,12 +30,12 @@ #include "pci.h" -static void *pci_direct_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +static void *pci_direct_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, int flag) { void *ret; - ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size)); + ret = (void *)__get_free_pages(flag, get_order(size)); if (ret != NULL) { memset(ret, 0, size); *dma_handle = virt_to_abs(ret); @@ -43,24 +43,24 @@ static void *pci_direct_alloc_consistent return ret; } -static void pci_direct_free_consistent(struct pci_dev *hwdev, size_t size, +static void pci_direct_free_coherent(struct device *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) { free_pages((unsigned long)vaddr, get_order(size)); } -static dma_addr_t pci_direct_map_single(struct pci_dev *hwdev, void *ptr, +static dma_addr_t pci_direct_map_single(struct device *hwdev, void *ptr, size_t size, enum dma_data_direction direction) { return virt_to_abs(ptr); } -static void pci_direct_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, +static void pci_direct_unmap_single(struct device *hwdev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction) { } -static int pci_direct_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, +static int pci_direct_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { int i; @@ -73,17 +73,23 @@ static int pci_direct_map_sg(struct pci_ return nents; } -static void pci_direct_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, +static void pci_direct_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction direction) { } +static int pci_direct_dma_supported(struct device *dev, u64 mask) +{ + return mask < 0x100000000ull; +} + void __init pci_direct_iommu_init(void) { - pci_dma_ops.pci_alloc_consistent = pci_direct_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_direct_free_consistent; - pci_dma_ops.pci_map_single = pci_direct_map_single; - pci_dma_ops.pci_unmap_single = pci_direct_unmap_single; - pci_dma_ops.pci_map_sg = pci_direct_map_sg; - pci_dma_ops.pci_unmap_sg = pci_direct_unmap_sg; + pci_dma_ops.alloc_coherent = pci_direct_alloc_coherent; + pci_dma_ops.free_coherent = pci_direct_free_coherent; + pci_dma_ops.map_single = pci_direct_map_single; + pci_dma_ops.unmap_single = pci_direct_unmap_single; + pci_dma_ops.map_sg = pci_direct_map_sg; + pci_dma_ops.unmap_sg = pci_direct_unmap_sg; + pci_dma_ops.dma_supported = pci_direct_dma_supported; } diff -ruNp linus/arch/ppc64/kernel/pci_iommu.c linus-dma.4/arch/ppc64/kernel/pci_iommu.c --- linus/arch/ppc64/kernel/pci_iommu.c 2004-11-16 16:05:10.000000000 +1100 +++ linus-dma.4/arch/ppc64/kernel/pci_iommu.c 2005-02-07 15:10:05.000000000 +1100 @@ -50,19 +50,23 @@ */ #define PCI_GET_DN(dev) ((struct device_node *)((dev)->sysdata)) -static inline struct iommu_table *devnode_table(struct pci_dev *dev) +static inline struct iommu_table *devnode_table(struct device *dev) { - if (!dev) - dev = ppc64_isabridge_dev; - if (!dev) - return NULL; + struct pci_dev *pdev; + + if (!dev) { + pdev = ppc64_isabridge_dev; + if (!pdev) + return NULL; + } else + pdev = to_pci_dev(dev); #ifdef CONFIG_PPC_ISERIES - return ISERIES_DEVNODE(dev)->iommu_table; + return ISERIES_DEVNODE(pdev)->iommu_table; #endif /* CONFIG_PPC_ISERIES */ #ifdef CONFIG_PPC_MULTIPLATFORM - return PCI_GET_DN(dev)->iommu_table; + return PCI_GET_DN(pdev)->iommu_table; #endif /* CONFIG_PPC_MULTIPLATFORM */ } @@ -71,16 +75,17 @@ static inline struct iommu_table *devnod * Returns the virtual address of the buffer and sets dma_handle * to the dma address (mapping) of the first page. */ -static void *pci_iommu_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +static void *pci_iommu_alloc_coherent(struct device *hwdev, size_t size, + dma_addr_t *dma_handle, int flag) { - return iommu_alloc_consistent(devnode_table(hwdev), size, dma_handle); + return iommu_alloc_coherent(devnode_table(hwdev), size, dma_handle, + flag); } -static void pci_iommu_free_consistent(struct pci_dev *hwdev, size_t size, +static void pci_iommu_free_coherent(struct device *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) { - iommu_free_consistent(devnode_table(hwdev), size, vaddr, dma_handle); + iommu_free_coherent(devnode_table(hwdev), size, vaddr, dma_handle); } /* Creates TCEs for a user provided buffer. The user buffer must be @@ -89,46 +94,46 @@ static void pci_iommu_free_consistent(st * need not be page aligned, the dma_addr_t returned will point to the same * byte within the page as vaddr. */ -static dma_addr_t pci_iommu_map_single(struct pci_dev *hwdev, void *vaddr, +static dma_addr_t pci_iommu_map_single(struct device *hwdev, void *vaddr, size_t size, enum dma_data_direction direction) { return iommu_map_single(devnode_table(hwdev), vaddr, size, direction); } -static void pci_iommu_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_handle, +static void pci_iommu_unmap_single(struct device *hwdev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction) { iommu_unmap_single(devnode_table(hwdev), dma_handle, size, direction); } -static int pci_iommu_map_sg(struct pci_dev *pdev, struct scatterlist *sglist, +static int pci_iommu_map_sg(struct device *pdev, struct scatterlist *sglist, int nelems, enum dma_data_direction direction) { - return iommu_map_sg(&pdev->dev, devnode_table(pdev), sglist, + return iommu_map_sg(pdev, devnode_table(pdev), sglist, nelems, direction); } -static void pci_iommu_unmap_sg(struct pci_dev *pdev, struct scatterlist *sglist, +static void pci_iommu_unmap_sg(struct device *pdev, struct scatterlist *sglist, int nelems, enum dma_data_direction direction) { iommu_unmap_sg(devnode_table(pdev), sglist, nelems, direction); } /* We support DMA to/from any memory page via the iommu */ -static int pci_iommu_dma_supported(struct pci_dev *pdev, u64 mask) +static int pci_iommu_dma_supported(struct device *dev, u64 mask) { return 1; } void pci_iommu_init(void) { - pci_dma_ops.pci_alloc_consistent = pci_iommu_alloc_consistent; - pci_dma_ops.pci_free_consistent = pci_iommu_free_consistent; - pci_dma_ops.pci_map_single = pci_iommu_map_single; - pci_dma_ops.pci_unmap_single = pci_iommu_unmap_single; - pci_dma_ops.pci_map_sg = pci_iommu_map_sg; - pci_dma_ops.pci_unmap_sg = pci_iommu_unmap_sg; - pci_dma_ops.pci_dma_supported = pci_iommu_dma_supported; + pci_dma_ops.alloc_coherent = pci_iommu_alloc_coherent; + pci_dma_ops.free_coherent = pci_iommu_free_coherent; + pci_dma_ops.map_single = pci_iommu_map_single; + pci_dma_ops.unmap_single = pci_iommu_unmap_single; + pci_dma_ops.map_sg = pci_iommu_map_sg; + pci_dma_ops.unmap_sg = pci_iommu_unmap_sg; + pci_dma_ops.dma_supported = pci_iommu_dma_supported; } diff -ruNp linus/arch/ppc64/kernel/vio.c linus-dma.4/arch/ppc64/kernel/vio.c --- linus/arch/ppc64/kernel/vio.c 2005-01-09 10:05:39.000000000 +1100 +++ linus-dma.4/arch/ppc64/kernel/vio.c 2005-02-07 15:45:00.000000000 +1100 @@ -557,48 +557,61 @@ int vio_disable_interrupts(struct vio_de EXPORT_SYMBOL(vio_disable_interrupts); #endif -dma_addr_t vio_map_single(struct vio_dev *dev, void *vaddr, +static dma_addr_t vio_map_single(struct device *dev, void *vaddr, size_t size, enum dma_data_direction direction) { - return iommu_map_single(dev->iommu_table, vaddr, size, direction); + return iommu_map_single(to_vio_dev(dev)->iommu_table, vaddr, size, + direction); } -EXPORT_SYMBOL(vio_map_single); -void vio_unmap_single(struct vio_dev *dev, dma_addr_t dma_handle, +static void vio_unmap_single(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction) { - iommu_unmap_single(dev->iommu_table, dma_handle, size, direction); + iommu_unmap_single(to_vio_dev(dev)->iommu_table, dma_handle, size, + direction); } -EXPORT_SYMBOL(vio_unmap_single); -int vio_map_sg(struct vio_dev *vdev, struct scatterlist *sglist, int nelems, - enum dma_data_direction direction) +static int vio_map_sg(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction direction) { - return iommu_map_sg(&vdev->dev, vdev->iommu_table, sglist, + return iommu_map_sg(dev, to_vio_dev(dev)->iommu_table, sglist, nelems, direction); } -EXPORT_SYMBOL(vio_map_sg); -void vio_unmap_sg(struct vio_dev *vdev, struct scatterlist *sglist, int nelems, - enum dma_data_direction direction) +static void vio_unmap_sg(struct device *dev, struct scatterlist *sglist, + int nelems, enum dma_data_direction direction) { - iommu_unmap_sg(vdev->iommu_table, sglist, nelems, direction); + iommu_unmap_sg(to_vio_dev(dev)->iommu_table, sglist, nelems, direction); } -EXPORT_SYMBOL(vio_unmap_sg); -void *vio_alloc_consistent(struct vio_dev *dev, size_t size, - dma_addr_t *dma_handle) +static void *vio_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, int flag) { - return iommu_alloc_consistent(dev->iommu_table, size, dma_handle); + return iommu_alloc_coherent(to_vio_dev(dev)->iommu_table, size, + dma_handle, flag); } -EXPORT_SYMBOL(vio_alloc_consistent); -void vio_free_consistent(struct vio_dev *dev, size_t size, +static void vio_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle) { - iommu_free_consistent(dev->iommu_table, size, vaddr, dma_handle); + iommu_free_coherent(to_vio_dev(dev)->iommu_table, size, vaddr, + dma_handle); } -EXPORT_SYMBOL(vio_free_consistent); + +static int vio_dma_supported(struct device *dev, u64 mask) +{ + return 1; +} + +struct dma_mapping_ops vio_dma_ops = { + .alloc_coherent = vio_alloc_coherent, + .free_coherent = vio_free_coherent, + .map_single = vio_map_single, + .unmap_single = vio_unmap_single, + .map_sg = vio_map_sg, + .unmap_sg = vio_unmap_sg, + .dma_supported = vio_dma_supported, +}; static int vio_bus_match(struct device *dev, struct device_driver *drv) { diff -ruNp linus/include/asm-ppc64/dma-mapping.h linus-dma.4/include/asm-ppc64/dma-mapping.h --- linus/include/asm-ppc64/dma-mapping.h 2004-09-14 21:06:08.000000000 +1000 +++ linus-dma.4/include/asm-ppc64/dma-mapping.h 2005-02-07 14:38:01.000000000 +1100 @@ -113,4 +113,24 @@ dma_cache_sync(void *vaddr, size_t size, /* nothing to do */ } +/* + * DMA operations are abstracted for G5 vs. i/pSeries, PCI vs. VIO + */ +struct dma_mapping_ops { + void * (*alloc_coherent)(struct device *dev, size_t size, + dma_addr_t *dma_handle, int flag); + void (*free_coherent)(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_handle); + dma_addr_t (*map_single)(struct device *dev, void *ptr, + size_t size, enum dma_data_direction direction); + void (*unmap_single)(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction direction); + int (*map_sg)(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction); + void (*unmap_sg)(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction direction); + int (*dma_supported)(struct device *dev, u64 mask); + int (*dac_dma_supported)(struct device *dev, u64 mask); +}; + #endif /* _ASM_DMA_MAPPING_H */ diff -ruNp linus/include/asm-ppc64/iommu.h linus-dma.4/include/asm-ppc64/iommu.h --- linus/include/asm-ppc64/iommu.h 2005-01-09 10:05:41.000000000 +1100 +++ linus-dma.4/include/asm-ppc64/iommu.h 2005-02-07 15:02:01.000000000 +1100 @@ -145,9 +145,9 @@ extern int iommu_map_sg(struct device *d extern void iommu_unmap_sg(struct iommu_table *tbl, struct scatterlist *sglist, int nelems, enum dma_data_direction direction); -extern void *iommu_alloc_consistent(struct iommu_table *tbl, size_t size, - dma_addr_t *dma_handle); -extern void iommu_free_consistent(struct iommu_table *tbl, size_t size, +extern void *iommu_alloc_coherent(struct iommu_table *tbl, size_t size, + dma_addr_t *dma_handle, int flag); +extern void iommu_free_coherent(struct iommu_table *tbl, size_t size, void *vaddr, dma_addr_t dma_handle); extern dma_addr_t iommu_map_single(struct iommu_table *tbl, void *vaddr, size_t size, enum dma_data_direction direction); diff -ruNp linus/include/asm-ppc64/pci.h linus-dma.4/include/asm-ppc64/pci.h --- linus/include/asm-ppc64/pci.h 2005-03-05 12:06:15.000000000 +1100 +++ linus-dma.4/include/asm-ppc64/pci.h 2005-03-07 10:24:32.000000000 +1100 @@ -13,11 +13,14 @@ #include #include #include + #include #include #include #include +#include + #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 @@ -63,131 +66,18 @@ static inline int pcibios_prep_mwi(struc extern unsigned int pcibios_assign_all_busses(void); -/* - * PCI DMA operations are abstracted for G5 vs. i/pSeries - */ -struct pci_dma_ops { - void * (*pci_alloc_consistent)(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle); - void (*pci_free_consistent)(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle); - - dma_addr_t (*pci_map_single)(struct pci_dev *hwdev, void *ptr, - size_t size, enum dma_data_direction direction); - void (*pci_unmap_single)(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction direction); - int (*pci_map_sg)(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction); - void (*pci_unmap_sg)(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, enum dma_data_direction direction); - int (*pci_dma_supported)(struct pci_dev *hwdev, u64 mask); - int (*pci_dac_dma_supported)(struct pci_dev *hwdev, u64 mask); -}; - -extern struct pci_dma_ops pci_dma_ops; - -static inline void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - return pci_dma_ops.pci_alloc_consistent(hwdev, size, dma_handle); -} - -static inline void pci_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - pci_dma_ops.pci_free_consistent(hwdev, size, vaddr, dma_handle); -} - -static inline dma_addr_t pci_map_single(struct pci_dev *hwdev, void *ptr, - size_t size, int direction) -{ - return pci_dma_ops.pci_map_single(hwdev, ptr, size, - (enum dma_data_direction)direction); -} - -static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, int direction) -{ - pci_dma_ops.pci_unmap_single(hwdev, dma_addr, size, - (enum dma_data_direction)direction); -} - -static inline int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - return pci_dma_ops.pci_map_sg(hwdev, sg, nents, - (enum dma_data_direction)direction); -} - -static inline void pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - pci_dma_ops.pci_unmap_sg(hwdev, sg, nents, - (enum dma_data_direction)direction); -} - -static inline void pci_dma_sync_single_for_cpu(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_single_for_device(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_sg_for_cpu(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -static inline void pci_dma_sync_sg_for_device(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - /* nothing to do */ -} - -/* Return whether the given PCI device DMA address mask can - * be supported properly. For example, if your device can - * only drive the low 24-bits during PCI bus mastering, then - * you would pass 0x00ffffff as the mask to this function. - * We default to supporting only 32 bits DMA unless we have - * an explicit override of this function in pci_dma_ops for - * the platform - */ -static inline int pci_dma_supported(struct pci_dev *hwdev, u64 mask) -{ - if (pci_dma_ops.pci_dma_supported) - return pci_dma_ops.pci_dma_supported(hwdev, mask); - return (mask < 0x100000000ull); -} +extern struct dma_mapping_ops pci_dma_ops; /* For DAC DMA, we currently don't support it by default, but * we let the platform override this */ static inline int pci_dac_dma_supported(struct pci_dev *hwdev,u64 mask) { - if (pci_dma_ops.pci_dac_dma_supported) - return pci_dma_ops.pci_dac_dma_supported(hwdev, mask); + if (pci_dma_ops.dac_dma_supported) + return pci_dma_ops.dac_dma_supported(&hwdev->dev, mask); return 0; } -static inline int pci_dma_mapping_error(dma_addr_t dma_addr) -{ - return dma_mapping_error(dma_addr); -} - extern int pci_domain_nr(struct pci_bus *bus); /* Decide whether to display the domain number in /proc */ @@ -201,10 +91,6 @@ int pci_mmap_page_range(struct pci_dev * /* Tell drivers/pci/proc.c that we have pci_mmap_page_range() */ #define HAVE_PCI_MMAP 1 -#define pci_map_page(dev, page, off, size, dir) \ - pci_map_single(dev, (page_address(page) + (off)), size, dir) -#define pci_unmap_page(dev,addr,sz,dir) pci_unmap_single(dev,addr,sz,dir) - /* pci_unmap_{single,page} is not a nop, thus... */ #define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME) \ dma_addr_t ADDR_NAME; diff -ruNp linus/include/asm-ppc64/vio.h linus-dma.4/include/asm-ppc64/vio.h --- linus/include/asm-ppc64/vio.h 2004-06-30 15:40:04.000000000 +1000 +++ linus-dma.4/include/asm-ppc64/vio.h 2005-02-07 15:42:37.000000000 +1100 @@ -57,32 +57,7 @@ int vio_get_irq(struct vio_dev *dev); int vio_enable_interrupts(struct vio_dev *dev); int vio_disable_interrupts(struct vio_dev *dev); -dma_addr_t vio_map_single(struct vio_dev *dev, void *vaddr, - size_t size, enum dma_data_direction direction); -void vio_unmap_single(struct vio_dev *dev, dma_addr_t dma_handle, - size_t size, enum dma_data_direction direction); -int vio_map_sg(struct vio_dev *vdev, struct scatterlist *sglist, - int nelems, enum dma_data_direction direction); -void vio_unmap_sg(struct vio_dev *vdev, struct scatterlist *sglist, - int nelems, enum dma_data_direction direction); -void *vio_alloc_consistent(struct vio_dev *dev, size_t size, - dma_addr_t *dma_handle); -void vio_free_consistent(struct vio_dev *dev, size_t size, void *vaddr, - dma_addr_t dma_handle); - -static inline int vio_dma_supported(struct vio_dev *hwdev, u64 mask) -{ - return 1; -} - -#define vio_map_page(dev, page, off, size, dir) \ - vio_map_single(dev, (page_address(page) + (off)), size, dir) -#define vio_unmap_page(dev,addr,sz,dir) vio_unmap_single(dev,addr,sz,dir) - -static inline int vio_set_dma_mask(struct vio_dev *dev, u64 mask) -{ - return -EIO; -} +extern struct dma_mapping_ops vio_dma_ops; extern struct bus_type vio_bus_type; From amodra at bigpond.net.au Tue Mar 8 20:49:33 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Tue, 8 Mar 2005 20:19:33 +1030 Subject: gcc4 miscompiles glibc math test In-Reply-To: <20050306200139.GA15512@suse.de> References: <20050306164645.GA16851@suse.de> <20050306200139.GA15512@suse.de> Message-ID: <20050308094933.GB15642@bubble.modra.org> On Sun, Mar 06, 2005 at 09:01:39PM +0100, Olaf Hering wrote: > On Sun, Mar 06, Olaf Hering wrote: > > > I'm building gcc40 with -O1 now and see if that makes any difference. > > No, building gcc and glibc with -O1 doesnt fix it, still: > > abuild at tangelo:~/objglibc-40-O1> cat /home/abuild/objglibc-40-O1/math/test-float.out > testing float (without inline functions) > Failure: Test: Real part of: cpow (2 + 3 i, 4 + 0 i) == -119.0 - 120.0 i > Result: > is: -1.18999961853027343750e+02 -0x1.dbfff600000000000000p+6 > should be: -1.19000000000000000000e+02 -0x1.dc000000000000000000p+6 > difference: 3.81469726562500000000e-05 0x1.40000000000000000000p-15 > ulp : 5.0000 > max.ulp : 4.0000 > Maximal error of real part of: cpow > is : 5 ulp > accepted: 4 ulp > Maximal error of imaginary part of: cpow > is : 2 ulp > accepted: 2 ulp > > Test suite completed: > 2599 test cases plus 2384 tests for exception flags executed. > 2 errors occurred. > > > Are you seeing the same, or should I go and extract a selfcontained testcase? I get exactly the same result. I wrote this litte testcase to try to narrow down the problem #include #include int main (void) { _Complex float x = 2.0 + 3.0i; _Complex float y = 4.0 + 0.0i; _Complex float z; double dr, di; printf ("%.20e + %.20e i\n", (double) (__real__ x), (double) (__imag__ x)); printf ("%.20e + %.20e i\n", (double) (__real__ y), (double) (__imag__ y)); z = cpowf (x, y); printf ("%.20e + %.20e i\n", (double) (__real__ z), (double) (__imag__ z)); dr = __real__ z; dr -= -119.0f; di = __imag__ z; di -= -120.0f; printf ("%.20e + %.20e i\n", dr, di); printf ("%f, %f ulp\n", (double) dr / -119.0 * (1 << 24), (double) di / -120.0 * (1 << 24)); return 0; } Then I played games mixing various parts of a math library compiled with gcc-4.0 with a math library compiled with gcc-3.4, until I got it down to just one function from the gcc-4.0 compiled glibc. $ gcc/xgcc -Bgcc/ -static -Wl,-u,__kernel_sinf ../glibc64-gcc4.0/math/libm.a cpow.o ../glibc64-gcc3.4/math/libm.a So it's something to do with sysdeps/ieee754/flt-32/k_sinf.c, I thought. Of course, the function is compiled quite differently by gcc-4.0 as compared to gcc-3.4, with different registers and insn scheduling. So it takes a little analysis to find out what is really different. The first thing that stands out is that gcc-4.0 stores different constants, preferring to subtract positive values rather than add negative ones. This doesn't affect the result at all, of course. The only real difference I found is right at the end of the function, with gcc-4.0 generating 94: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5 - v*r 98: ec 09 10 38 fmsubs f0,f9,f0,f2 # z*f0 - y 9c: ed a8 03 7a fmadds f13,f8,f13,f0 # v*-S1 + f0 a0: ec 21 68 28 fsubs f1,f1,f13 # x - f13 x-(v*-S1+(z*(y*0.5-v*r)-y) vs. gcc-3.4 generating 98: ed a1 03 72 fmuls f13,f1,f13 # v*S1 9c: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5-v*r a0: ec 00 12 78 fmsubs f0,f0,f9,f2 # f0*z - y a4: ec 00 68 28 fsubs f0,f0,f13 # f0-v*S1 a8: ec 28 00 28 fsubs f1,f8,f0 # x-f0 x-((y*0.5-v*r)*z-y-v*S1) That's exactly the same, even to the order of operations, except that gcc-3.4 has one extra rounding step, with v*S1 being rounded to float before being added to the sum. The fmadds used by gcc-4.0 _doesn't_ round the product before adding. So, not a bug, but just extra precision affecting the result. If the algorithms in glibc are tuned for best results with ieee754 rounding at each fp operation, then we probably ought to compile glibc with -mno-fused-madd. This will make libm a little slower. -- Alan Modra IBM OzLabs - Linux Technology Centre -------------- next part -------------- k_sinf.o: file format elf64-powerpc Disassembly of section .text: 0000000000000000 <.__kernel_sinf>: 0: d0 21 ff f0 stfs f1,-16(r1) 4: 3d 20 31 ff lis r9,12799 8: 2f 25 00 00 cmpdi cr6,r5,0 c: 61 29 ff ff ori r9,r9,65535 10: 80 01 ff f0 lwz r0,-16(r1) 14: 54 00 00 7e clrlwi r0,r0,1 18: 7f 80 48 00 cmpw cr7,r0,r9 1c: 41 9d 00 20 bgt- cr7,3c <.__kernel_sinf+0x3c> 20: fc 00 08 90 fmr f0,f1 24: fd a0 00 1e fctiwz f13,f0 28: d9 a1 ff e0 stfd f13,-32(r1) 2c: 60 00 00 00 nop 30: 80 01 ff e4 lwz r0,-28(r1) 34: 2f 80 00 00 cmpwi cr7,r0,0 38: 4d 9e 00 20 beqlr cr7 3c: ed 21 00 72 fmuls f9,f1,f1 # z=x*x 40: c1 a2 00 08 lfs f13,8(r2) # -S5 44: c0 02 00 00 lfs f0,0(r2) # S6 48: c1 82 00 10 lfs f12,16(r2) # S4 4c: c1 62 00 18 lfs f11,24(r2) # -S3 50: c1 42 00 20 lfs f10,32(r2) # S2 54: ec 09 68 38 fmsubs f0,f9,f0,f13 # z*S6 - -S5 58: ed 01 02 72 fmuls f8,f1,f9 # v=z*x 5c: ec 09 60 3a fmadds f0,f9,f0,f12 # z*f0 + S4 60: ec 09 58 38 fmsubs f0,f9,f0,f11 # z*f0 - -S3 64: ed a9 50 3a fmadds f13,f9,f0,f10 # z*f0 + S2 68: 40 9a 00 18 bne- cr6,80 <.__kernel_sinf+0x80> 6c: c0 02 00 28 lfs f0,40(r2) # -S1 70: ec 09 03 78 fmsubs f0,f9,f13,f0 # z*r - -S1 74: ec 28 08 3a fmadds f1,f8,f0,f1 # v*f0 + x 78: 4e 80 00 20 blr 7c: 60 00 00 00 nop 80: 3c 00 3f 00 lis r0,16128 84: ec 08 03 72 fmuls f0,f8,f13 # v*r 88: c1 a2 00 28 lfs f13,40(r2) # -S1 8c: 90 01 ff f0 stw r0,-16(r1) 90: c1 81 ff f0 lfs f12,-16(r1) # 0.5 94: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5 - v*r 98: ec 09 10 38 fmsubs f0,f9,f0,f2 # z*f0 - y 9c: ed a8 03 7a fmadds f13,f8,f13,f0 # v*-S1 + f0 a0: ec 21 68 28 fsubs f1,f1,f13 # x - f13 a4: 4e 80 00 20 blr ... b4: 60 00 00 00 nop b8: 60 00 00 00 nop bc: 60 00 00 00 nop Contents of section .toc: 0000 2f2ec9d3 00000000 32d72f34 00000000 /.......2./4.... 0010 3638ef1b 00000000 39500d01 00000000 68......9P...... 0020 3c088889 00000000 3e2aaaab 00000000 <.......>*...... ../../glibc64/math/k_sinf.o: file format elf64-powerpc Disassembly of section .text: 0000000000000000 <.__kernel_sinf>: 0: d0 21 ff f0 stfs f1,-16(r1) 4: 3d 20 31 ff lis r9,12799 8: fd 00 08 90 fmr f8,f1 c: 2f 25 00 00 cmpdi cr6,r5,0 10: 80 01 ff f0 lwz r0,-16(r1) 14: 61 29 ff ff ori r9,r9,65535 18: 78 00 00 60 clrldi r0,r0,33 1c: 7f 80 48 00 cmpw cr7,r0,r9 20: 41 9d 00 24 bgt- cr7,44 <.__kernel_sinf+0x44> 24: fc 00 40 90 fmr f0,f8 28: fc 00 00 1e fctiwz f0,f0 2c: d8 01 ff f8 stfd f0,-8(r1) 30: e8 01 ff f8 ld r0,-8(r1) 34: f8 01 ff e0 std r0,-32(r1) 38: 81 21 ff e4 lwz r9,-28(r1) 3c: 2f 89 00 00 cmpwi cr7,r9,0 40: 4d 9e 00 20 beqlr cr7 44: ed 28 02 32 fmuls f9,f8,f8 # z=x*x 48: c1 a2 00 08 lfs f13,8(r2) # S5 4c: c0 02 00 00 lfs f0,0(r2) # S6 50: c1 82 00 10 lfs f12,16(r2) # S4 54: c1 62 00 18 lfs f11,24(r2) # S3 58: ec 29 02 32 fmuls f1,f9,f8 # v=z*x 5c: ec 09 68 3a fmadds f0,f9,f0,f13 # z*S6+S5 60: c1 42 00 20 lfs f10,32(r2) # S2 64: ec 00 62 7a fmadds f0,f0,f9,f12 # f0*z+S4 68: ec 00 5a 7a fmadds f0,f0,f9,f11 # f0*z+S3 6c: ed a0 52 7a fmadds f13,f0,f9,f10 # f0*z+S2 70: 40 9a 00 14 bne- cr6,84 <.__kernel_sinf+0x84> 74: c0 02 00 28 lfs f0,40(r2) # S1 78: ec 09 03 7a fmadds f0,f9,f13,f0 # z*r+S1 7c: ec 20 40 7a fmadds f1,f0,f1,f8 # f0*v+x 80: 4e 80 00 20 blr 84: 3c 00 3f 00 lis r0,16128 88: ec 01 03 72 fmuls f0,f1,f13 # v*r 8c: c1 a2 00 28 lfs f13,40(r2) # S1 90: 90 01 ff f0 stw r0,-16(r1) 94: c1 81 ff f0 lfs f12,-16(r1) # 0.5 98: ed a1 03 72 fmuls f13,f1,f13 # v*S1 9c: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5-v*r a0: ec 00 12 78 fmsubs f0,f0,f9,f2 # f0*z - y a4: ec 00 68 28 fsubs f0,f0,f13 # f0-v*S1 a8: ec 28 00 28 fsubs f1,f8,f0 # x-f0 ac: 4e 80 00 20 blr ... Contents of section .toc: 0000 2f2ec9d3 00000000 b2d72f34 00000000 /........./4.... 0010 3638ef1b 00000000 b9500d01 00000000 68.......P...... 0020 3c088889 00000000 be2aaaab 00000000 <........*...... From olh at suse.de Tue Mar 8 23:56:35 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 8 Mar 2005 13:56:35 +0100 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <1109806756.5680.127.camel@gaston> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> Message-ID: <20050308125635.GA19169@suse.de> On Thu, Mar 03, Benjamin Herrenschmidt wrote: > There is no ADB bus on a G5, so the driver isn't useful anyway. > Currently, ppc64 allows you to enable pmac drivers that won't build, but > they also are useless on G5s. I'll fix that over time though. They are of course not useless. Send this patch to Linus to allow the mouse button emulation until either someone split it off the ADB driver, or until someone fixes the stupid userinterfaces in Linux. Signed-off-by: Olaf Hering diff -p -purN R/linux-2.6.3/drivers/macintosh/adb.c linux-2.6.3/drivers/macintosh/adb.c --- R/linux-2.6.3/drivers/macintosh/adb.c 2004-02-18 04:59:56.000000000 +0100 +++ linux-2.6.3/drivers/macintosh/adb.c 2004-02-22 15:16:43.000000000 +0100 @@ -294,6 +294,10 @@ int __init adb_init(void) if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) return 0; #endif +#ifdef CONFIG_PPC64 + if (_machine != _MACH_Pmac) + return 0; +#endif #ifdef CONFIG_MAC if (!MACH_IS_MAC) return 0; diff -p -purN R/linux-2.6.3/drivers/macintosh/adbhid.c linux-2.6.3/drivers/macintosh/adbhid.c --- R/linux-2.6.3/drivers/macintosh/adbhid.c 2004-02-18 04:59:57.000000000 +0100 +++ linux-2.6.3/drivers/macintosh/adbhid.c 2004-02-22 15:41:34.000000000 +0100 @@ -1021,10 +1021,14 @@ init_ms_a3(int id) static int __init adbhid_init(void) { -#ifndef CONFIG_MAC +#ifdef CONFIG_PPC32 if ( (_machine != _MACH_chrp) && (_machine != _MACH_Pmac) ) return 0; #endif +#ifdef CONFIG_PPC64 + if (_machine != _MACH_Pmac) + return 0; +#endif led_request.complete = 1; From jschopp at austin.ibm.com Wed Mar 9 04:20:33 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Tue, 08 Mar 2005 11:20:33 -0600 Subject: [PATCH] update irq affinity mask when migrating irqs In-Reply-To: <20050308020017.GB21853@otto> References: <20050308020017.GB21853@otto> Message-ID: <422DDEE1.5040706@austin.ibm.com> Comments below. > > > Signed-off-by: Nathan Lynch > > xics.c | 11 ++--------- > 1 files changed, 2 insertions(+), 9 deletions(-) > > Index: linux-2.6.11-bk2/arch/ppc64/kernel/xics.c > =================================================================== > --- linux-2.6.11-bk2.orig/arch/ppc64/kernel/xics.c 2005-03-02 07:38:10.000000000 +0000 > +++ linux-2.6.11-bk2/arch/ppc64/kernel/xics.c 2005-03-07 03:52:08.000000000 +0000 > @@ -704,15 +704,8 @@ void xics_migrate_irqs_away(void) > virq, cpu); > > /* Reset affinity to all cpus */ > - xics_status[0] = default_distrib_server; > - > - status = rtas_call(ibm_set_xive, 3, 1, NULL, irq, > - xics_status[0], xics_status[1]); > - if (status) > - printk(KERN_ERR "migrate_irqs_away: irq=%d " > - "ibm,set-xive returns %d\n", > - virq, status); > - > + desc->handler->set_affinity(virq, CPU_MASK_ALL); The downside of calling this is it increases the path length and causes ibm_get_xive to be called again. Usually slightly slower is a fine tradeoff for more readable code, but in this case I would have left it how it was. With all the cpus stopped it is best to be as fast as possible. Maybe this is still fast enough, but you'd have to test under heavy load on a variety of systems to be sure. > + irq_affinity[virq] = CPU_MASK_ALL; This was a good catch. > unlock: > spin_unlock_irqrestore(&desc->lock, flags); > } > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev > From benh at kernel.crashing.org Wed Mar 9 09:26:55 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Mar 2005 09:26:55 +1100 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050308125635.GA19169@suse.de> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> <20050308125635.GA19169@suse.de> Message-ID: <1110320815.13593.279.camel@gaston> On Tue, 2005-03-08 at 13:56 +0100, Olaf Hering wrote: > On Thu, Mar 03, Benjamin Herrenschmidt wrote: > > > There is no ADB bus on a G5, so the driver isn't useful anyway. > > Currently, ppc64 allows you to enable pmac drivers that won't build, but > > they also are useless on G5s. I'll fix that over time though. > > They are of course not useless. Send this patch to Linus to allow the > mouse button emulation until either someone split it off the ADB driver, > or until someone fixes the stupid userinterfaces in Linux. Oh well, don't people buy real mice to plug on G5s ? :) Anyway, mouse button emulation should be split off adb stuff. Ben. From moilanen at austin.ibm.com Wed Mar 9 09:59:04 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 8 Mar 2005 16:59:04 -0600 Subject: [PATCH 0/2] No-exec support for ppc64 Message-ID: <20050308165904.0ce07112.moilanen@austin.ibm.com> These patches add no execute support to PPC64. They prohibit executing code on the stack, or most any non-text segment for both user space, and kernel. No execute is supported on Power4 processors and up. These processors support pages that have a no-execute permission bit. The patches include a base fixup from Anton Blanchard. This includes a fix for the wrong bit being used for no-exec and for read/write on the hardware PTEs. For distros that compile w/ pt_gnu_stacks, they depend on Ben Herrenschmidt's vDSO patches for signal trampoline. Without it, the application will hang on the first signal due to the return code being put on the signal context stack to return to the kernel on the completion of the signal handler. The changes should be in the latest BK tree. The patch is broken into two parts: 1/2: PPC64 no-exec support for user space: This will prohibit user space apps from executing in segments not marked as executable. The base support is in here as well. 2/2: PPC64 no-exec support for kernel space: This prohibits the kernel from executing non-text code. Thanks, Jake From flar at allandria.com Wed Mar 9 10:10:47 2005 From: flar at allandria.com (Brad Boyer) Date: Tue, 8 Mar 2005 15:10:47 -0800 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <1110320815.13593.279.camel@gaston> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> <20050308125635.GA19169@suse.de> <1110320815.13593.279.camel@gaston> Message-ID: <20050308231046.GA19175@pants.nu> On Wed, Mar 09, 2005 at 09:26:55AM +1100, Benjamin Herrenschmidt wrote: > On Tue, 2005-03-08 at 13:56 +0100, Olaf Hering wrote: > > On Thu, Mar 03, Benjamin Herrenschmidt wrote: > > > > > There is no ADB bus on a G5, so the driver isn't useful anyway. > > > Currently, ppc64 allows you to enable pmac drivers that won't build, but > > > they also are useless on G5s. I'll fix that over time though. > > > > They are of course not useless. Send this patch to Linus to allow the > > mouse button emulation until either someone split it off the ADB driver, > > or until someone fixes the stupid userinterfaces in Linux. > > Oh well, don't people buy real mice to plug on G5s ? :) I bought a very fancy USB pointing device for my G5, but it would be nice to support the older stuff. Eventually I'm planning to add support for the Griffin iMate, which would give us ADB on anything that supports USB. It's just not at the top of my list. > Anyway, mouse button emulation should be split off adb stuff. I'm pretty sure it already is. The last time I looked at it, the only tie it had left was presenting itself as an ADB device to the input layer (bustype of BUS_ADB). I have to admit I haven't tried it, but it ought to work without any of the actual ADB code even compiled in. I should caveat that by saying that you can't compile the in-kernel mouse emulation in at all unless it's on MAC or PPC_PMAC due to the fact that it's in the drivers/macintosh directory. Brad Boyer flar at allandria.com From moilanen at austin.ibm.com Wed Mar 9 10:08:26 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 8 Mar 2005 17:08:26 -0600 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <20050308165904.0ce07112.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> Message-ID: <20050308170826.13a2299e.moilanen@austin.ibm.com> No-exec base and user space support for PPC64. This will prohibit user space apps that a compile w/ PT_GNU_STACK from executing in segments that are non-executable. Non-PT_GNU_STACK compiled apps will work as well, but will not be able to take advantage of the no-exec feature. Signed-off-by: Jake Moilanen --- linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S | 5 + linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c | 4 + linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c | 2 linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c | 14 +++-- linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S | 12 ++-- linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c | 13 ++++ linux-2.6-bk-moilanen/fs/binfmt_elf.c | 2 linux-2.6-bk-moilanen/include/asm-ppc64/elf.h | 7 ++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h | 19 ++++++- linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h | 45 +++++++++-------- 10 files changed, 87 insertions(+), 36 deletions(-) diff -puN arch/ppc64/kernel/head.S~nx-user-ppc64 arch/ppc64/kernel/head.S --- linux-2.6-bk/arch/ppc64/kernel/head.S~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S 2005-03-08 16:08:54 -06:00 @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_ISERIES @@ -950,11 +951,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB) * accessing a userspace segment (even from the kernel). We assume * kernel addresses always have the high bit set. */ - rlwinm r4,r4,32-23,29,29 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */ rotldi r0,r3,15 /* Move high bit into MSR_PR posn */ orc r0,r12,r0 /* MSR_PR | ~high_bit */ rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ - ori r4,r4,1 /* add _PAGE_PRESENT */ + rlwimi r4,r5,22+2,31-2,31-2 /* Set _PAGE_EXEC if trap is 0x400 */ /* * On iSeries, we soft-disable interrupts here, then diff -puN arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 arch/ppc64/kernel/iSeries_htab.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c 2005-03-08 16:08:54 -06:00 @@ -144,6 +144,10 @@ static long iSeries_hpte_updatepp(unsign HvCallHpt_get(&hpte, slot); if ((hpte.dw0.dw0.avpn == avpn) && (hpte.dw0.dw0.v)) { + /* + * Hypervisor expects bit's as NPPP, which is + * different from how they are mapped in our PP. + */ HvCallHpt_setPp(slot, (newpp & 0x3) | ((newpp & 0x4) << 1)); iSeries_hunlock(slot); return 0; diff -puN arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 arch/ppc64/kernel/pSeries_lpar.c --- linux-2.6-bk/arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c 2005-03-08 16:08:54 -06:00 @@ -470,7 +470,7 @@ static void pSeries_lpar_hpte_updatebolt slot = pSeries_lpar_hpte_find(vpn); BUG_ON(slot == -1); - flags = newpp & 3; + flags = newpp & 7; lpar_rc = plpar_pte_protect(flags, slot, 0); BUG_ON(lpar_rc != H_Success); diff -puN arch/ppc64/mm/fault.c~nx-user-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-08 16:08:54 -06:00 @@ -93,6 +93,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long code = SEGV_MAPERR; unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); + unsigned long is_exec = trap == 0x400; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -199,16 +200,19 @@ int do_page_fault(struct pt_regs *regs, good_area: code = SEGV_ACCERR; + if (is_exec) { + /* protection fault */ + if (error_code & 0x08000000) + goto bad_area; + if (!(vma->vm_flags & VM_EXEC)) + goto bad_area; /* a write */ - if (is_write) { + } else if (is_write) { if (!(vma->vm_flags & VM_WRITE)) goto bad_area; /* a read */ } else { - /* protection fault */ - if (error_code & 0x08000000) - goto bad_area; - if (!(vma->vm_flags & (VM_READ | VM_EXEC))) + if (!(vma->vm_flags & VM_READ)) goto bad_area; } diff -puN arch/ppc64/mm/hash_low.S~nx-user-ppc64 arch/ppc64/mm/hash_low.S --- linux-2.6-bk/arch/ppc64/mm/hash_low.S~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S 2005-03-08 16:08:54 -06:00 @@ -89,7 +89,7 @@ _GLOBAL(__hash_page) /* Prepare new PTE value (turn access RW into DIRTY, then * add BUSY,HASHPTE and ACCESSED) */ - rlwinm r30,r4,5,24,24 /* _PAGE_RW -> _PAGE_DIRTY */ + rlwinm r30,r4,32-9+7,31-7,31-7 /* _PAGE_RW -> _PAGE_DIRTY */ or r30,r30,r31 ori r30,r30,_PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE /* Write the linux PTE atomically (setting busy) */ @@ -112,11 +112,11 @@ _GLOBAL(__hash_page) rldicl r5,r5,0,25 /* vsid & 0x0000007fffffffff */ rldicl r0,r3,64-12,48 /* (ea >> 12) & 0xffff */ xor r28,r5,r0 - - /* Convert linux PTE bits into HW equivalents - */ - andi. r3,r30,0x1fa /* Get basic set of flags */ - rlwinm r0,r30,32-2+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ + + /* Convert linux PTE bits into HW equivalents */ + andi. r3,r30,0x1fe /* Get basic set of flags */ + xori r3,r3,HW_NO_EXEC /* _PAGE_EXEC -> NOEXEC */ + rlwinm r0,r30,32-9+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ rlwinm r4,r30,32-7+1,30,30 /* _PAGE_DIRTY -> _PAGE_USER (r4) */ and r0,r0,r4 /* _PAGE_RW & _PAGE_DIRTY -> r0 bit 30 */ andc r0,r30,r0 /* r0 = pte & ~r0 */ diff -puN arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 arch/ppc64/mm/hugetlbpage.c --- linux-2.6-bk/arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c 2005-03-08 16:08:54 -06:00 @@ -786,6 +786,7 @@ int hash_huge_page(struct mm_struct *mm, pte_t old_pte, new_pte; unsigned long hpteflags, prpn; long slot; + int is_exec; int err = 1; spin_lock(&mm->page_table_lock); @@ -796,6 +797,10 @@ int hash_huge_page(struct mm_struct *mm, va = (vsid << 28) | (ea & 0x0fffffff); vpn = va >> HPAGE_SHIFT; + is_exec = access & _PAGE_EXEC; + if (unlikely(is_exec && !(pte_val(*ptep) & _PAGE_EXEC))) + goto out; + /* * If no pte found or not present, send the problem up to * do_page_fault @@ -828,7 +833,12 @@ int hash_huge_page(struct mm_struct *mm, old_pte = *ptep; new_pte = old_pte; - hpteflags = 0x2 | (! (pte_val(new_pte) & _PAGE_RW)); + hpteflags = (pte_val(new_pte) & _PAGE_RW) | + (!(pte_val(new_pte) & _PAGE_RW)) | + _PAGE_USER; + + /* _PAGE_EXEC -> HW_NO_EXEC since it's inverted */ + hpteflags |= ((pte_val(new_pte) & _PAGE_EXEC) ? 0 : HW_NO_EXEC); /* Check if pte already has an hpte (case 2) */ if (unlikely(pte_val(old_pte) & _PAGE_HASHPTE)) { @@ -898,6 +908,7 @@ repeat: err = 0; out: + spin_unlock(&mm->page_table_lock); return err; diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c 2005-03-08 16:08:54 -06:00 @@ -99,6 +99,8 @@ static int set_brk(unsigned long start, up_write(¤t->mm->mmap_sem); if (BAD_ADDR(addr)) return addr; + + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); } current->mm->start_brk = current->mm->brk = end; return 0; diff -puN include/asm-ppc64/elf.h~nx-user-ppc64 include/asm-ppc64/elf.h --- linux-2.6-bk/include/asm-ppc64/elf.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/elf.h 2005-03-08 16:08:54 -06:00 @@ -226,6 +226,13 @@ do { \ else if (current->personality != PER_LINUX32) \ set_personality(PER_LINUX); \ } while (0) + +/* + * An executable for which elf_read_implies_exec() returns TRUE will + * have the READ_IMPLIES_EXEC personality flag set automatically. + */ +#define elf_read_implies_exec(ex, have_pt_gnu_stack) (!(have_pt_gnu_stack)) + #endif /* diff -puN include/asm-ppc64/page.h~nx-user-ppc64 include/asm-ppc64/page.h --- linux-2.6-bk/include/asm-ppc64/page.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h 2005-03-08 16:08:54 -06:00 @@ -235,8 +235,25 @@ extern u64 ppc64_pft_size; /* Log 2 of #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) #endif /* __KERNEL__ */ #endif /* _PPC64_PAGE_H */ diff -puN include/asm-ppc64/pgtable.h~nx-user-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-08 16:08:54 -06:00 @@ -82,14 +82,14 @@ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ #define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ -#define _PAGE_RW 0x0004 /* software: user write access allowed */ +#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ #define _PAGE_GUARDED 0x0008 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ #define _PAGE_WRITETHRU 0x0040 /* W: cache write-through */ #define _PAGE_DIRTY 0x0080 /* C: page changed */ #define _PAGE_ACCESSED 0x0100 /* R: page referenced */ -#define _PAGE_EXEC 0x0200 /* software: i-cache coherence required */ +#define _PAGE_RW 0x0200 /* software: user write access allowed */ #define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */ #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ @@ -100,7 +100,7 @@ /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_HPTEFLAGS) +#define _PAGE_CHG_MASK (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | _PAGE_WRITETHRU | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HPTEFLAGS | PAGE_MASK) #define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT) @@ -116,31 +116,38 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) -#define PAGE_KERNEL_CI __pgprot(_PAGE_PRESENT | _PAGE_ACCESSED | \ - _PAGE_WRENABLE | _PAGE_NO_CACHE | _PAGE_GUARDED) + +#define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is + * inverted, even though it's the + * same value, hopefully it will be + * clearer in the code what is + * going on. */ /* - * The PowerPC can only do execute protection on a segment (256MB) basis, - * not on a page basis. So we consider execute permission the same as read. + * POWER4 and newer have per page execute protection, older chips can only + * do this on a segment (256MB) basis. + * * Also, write permissions imply read permissions. * This is the closest we can get.. + * + * Note due to the way vm flags are laid out, the bits are XWR */ #define __P000 PAGE_NONE -#define __P001 PAGE_READONLY_X +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY -#define __P011 PAGE_COPY_X -#define __P100 PAGE_READONLY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_X #define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY +#define __P110 PAGE_COPY_X #define __P111 PAGE_COPY_X #define __S000 PAGE_NONE -#define __S001 PAGE_READONLY_X +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED_X -#define __S100 PAGE_READONLY +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_X #define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED +#define __S110 PAGE_SHARED_X #define __S111 PAGE_SHARED_X #ifndef __ASSEMBLY__ @@ -197,7 +204,8 @@ void hugetlb_mm_free_pgd(struct mm_struc }) #define pte_modify(_pte, newprot) \ - (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | pgprot_val(newprot))) + (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | \ + (pgprot_val(newprot) & ~_PAGE_CHG_MASK))) #define pte_none(pte) ((pte_val(pte) & ~_PAGE_HPTEFLAGS) == 0) #define pte_present(pte) (pte_val(pte) & _PAGE_PRESENT) @@ -266,9 +274,6 @@ static inline int pte_young(pte_t pte) { static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_HUGE;} -static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } -static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } - static inline pte_t pte_rdprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_USER; return pte; } static inline pte_t pte_exprotect(pte_t pte) { @@ -438,7 +443,7 @@ static inline void set_pte_at(struct mm_ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry, int dirty) { unsigned long bits = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); unsigned long old, tmp; __asm__ __volatile__( _ From moilanen at austin.ibm.com Wed Mar 9 10:13:26 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 8 Mar 2005 17:13:26 -0600 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <20050308165904.0ce07112.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> Message-ID: <20050308171326.3d72363a.moilanen@austin.ibm.com> No-exec support for the kernel on PPC64. This will mark all non-text kernel pages as no-execute. Signed-off-by: Jake Moilanen --- linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c | 7 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c | 3 + linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c | 25 ++++++++++++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c | 31 ++++++++++++---- linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h | 1 5 files changed, 59 insertions(+), 8 deletions(-) diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c 2005-03-08 16:08:57 -06:00 @@ -624,6 +624,7 @@ static void __init iSeries_bolt_kernel(u { unsigned long pa; unsigned long mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; + unsigned long tmp_mode; HPTE hpte; for (pa = saddr; pa < eaddr ;pa += PAGE_SIZE) { @@ -632,6 +633,12 @@ static void __init iSeries_bolt_kernel(u unsigned long va = (vsid << 28) | (pa & 0xfffffff); unsigned long vpn = va >> PAGE_SHIFT; unsigned long slot = HvCallHpt_findValid(&hpte, vpn); + + tmp_mode = mode_rw; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(ea)) + tmp_mode = mode_rw | HW_NO_EXEC; if (hpte.dw0.dw0.v) { /* HPTE exists, so just bolt it */ diff -puN arch/ppc64/kernel/module.c~nx-kernel-ppc64 arch/ppc64/kernel/module.c --- linux-2.6-bk/arch/ppc64/kernel/module.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c 2005-03-08 16:08:57 -06:00 @@ -102,7 +102,8 @@ void *module_alloc(unsigned long size) { if (size == 0) return NULL; - return vmalloc(size); + + return vmalloc_exec(size); } /* Free memory returned from module_alloc */ diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-08 16:08:57 -06:00 @@ -76,6 +76,21 @@ static int store_updates_sp(struct pt_re return 0; } +pte_t *lookup_address(unsigned long address) +{ + pgd_t *pgd = pgd_offset_k(address); + pmd_t *pmd; + + if (pgd_none(*pgd)) + return NULL; + + pmd = pmd_offset(pgd, address); + if (pmd_none(*pmd)) + return NULL; + + return pte_offset_kernel(pmd, address); +} + /* * The error_code parameter is * - DSISR for a non-SLB data access fault, @@ -94,6 +109,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); unsigned long is_exec = trap == 0x400; + pte_t *ptep; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -253,6 +269,15 @@ bad_area_nosemaphore: info.si_addr = (void __user *) address; force_sig_info(SIGSEGV, &info, current); return 0; + } + + ptep = lookup_address(address); + + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { + if (printk_ratelimit()) + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); + show_stack(current, (unsigned long *)__get_SP()); + do_exit(SIGKILL); } return SIGSEGV; diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-08 16:08:57 -06:00 @@ -51,6 +51,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -89,12 +90,23 @@ static inline void loop_forever(void) ; } +int is_kernel_text(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) + return 1; + + return 0; +} + + + #ifdef CONFIG_PPC_MULTIPLATFORM static inline void create_pte_mapping(unsigned long start, unsigned long end, unsigned long mode, int large) { unsigned long addr; unsigned int step; + unsigned long tmp_mode; if (large) step = 16*MB; @@ -112,6 +124,13 @@ static inline void create_pte_mapping(un else vpn = va >> PAGE_SHIFT; + + tmp_mode = mode; + + /* Make non-kernel text non-executable */ + if (!is_kernel_text(addr)) + tmp_mode = mode | HW_NO_EXEC; + hash = hpt_hash(vpn, large); hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); @@ -120,12 +139,12 @@ static inline void create_pte_mapping(un if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); else #endif /* CONFIG_PPC_PSERIES */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); if (ret == -1) { ppc64_terminate_msg(0x20, "create_pte_mapping"); @@ -238,8 +257,6 @@ unsigned int hash_page_do_lazy_icache(un { struct page *page; -#define PPC64_HWNOEXEC (1 << 2) - if (!pfn_valid(pte_pfn(pte))) return pp; @@ -250,8 +267,8 @@ unsigned int hash_page_do_lazy_icache(un if (trap == 0x400) { __flush_dcache_icache(page_address(page)); set_bit(PG_arch_1, &page->flags); - } else - pp |= PPC64_HWNOEXEC; + } else + pp |= HW_NO_EXEC; } return pp; } @@ -271,7 +288,7 @@ int hash_page(unsigned long ea, unsigned int user_region = 0; int local = 0; cpumask_t tmp; - + switch (REGION_ID(ea)) { case USER_REGION_ID: user_region = 1; diff -puN include/asm-ppc64/pgtable.h~nx-kernel-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-08 16:08:57 -06:00 @@ -116,6 +116,7 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) +#define PAGE_KERNEL_EXEC __pgprot(_PAGE_BASE | _PAGE_WRENABLE | _PAGE_EXEC) #define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is * inverted, even though it's the _ From benh at kernel.crashing.org Wed Mar 9 10:30:08 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Mar 2005 10:30:08 +1100 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050308231046.GA19175@pants.nu> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> <20050308125635.GA19169@suse.de> <1110320815.13593.279.camel@gaston> <20050308231046.GA19175@pants.nu> Message-ID: <1110324608.32556.1.camel@gaston> On Tue, 2005-03-08 at 15:10 -0800, Brad Boyer wrote: > On Wed, Mar 09, 2005 at 09:26:55AM +1100, Benjamin Herrenschmidt wrote: > > On Tue, 2005-03-08 at 13:56 +0100, Olaf Hering wrote: > > > On Thu, Mar 03, Benjamin Herrenschmidt wrote: > > > > > > > There is no ADB bus on a G5, so the driver isn't useful anyway. > > > > Currently, ppc64 allows you to enable pmac drivers that won't build, but > > > > they also are useless on G5s. I'll fix that over time though. > > > > > > They are of course not useless. Send this patch to Linus to allow the > > > mouse button emulation until either someone split it off the ADB driver, > > > or until someone fixes the stupid userinterfaces in Linux. > > > > Oh well, don't people buy real mice to plug on G5s ? :) > > I bought a very fancy USB pointing device for my G5, but it would be > nice to support the older stuff. Eventually I'm planning to add > support for the Griffin iMate, which would give us ADB on anything > that supports USB. It's just not at the top of my list. If we go that way, then we should finally bite the bullet and get ADB in the device model (define an adb bus_type, with proper drivers etc...). That would also allow a clean mecanism (sysfs properties) for things like trackpad settings, etc... > > Anyway, mouse button emulation should be split off adb stuff. > > I'm pretty sure it already is. The last time I looked at it, the > only tie it had left was presenting itself as an ADB device to > the input layer (bustype of BUS_ADB). I have to admit I haven't > tried it, but it ought to work without any of the actual ADB code > even compiled in. > > I should caveat that by saying that you can't compile the in-kernel > mouse emulation in at all unless it's on MAC or PPC_PMAC due to the > fact that it's in the drivers/macintosh directory. > > Brad Boyer > flar at allandria.com -- Benjamin Herrenschmidt From olof at austin.ibm.com Wed Mar 9 10:43:54 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Tue, 8 Mar 2005 17:43:54 -0600 Subject: [PATCH] update irq affinity mask when migrating irqs In-Reply-To: <422DDEE1.5040706@austin.ibm.com> References: <20050308020017.GB21853@otto> <422DDEE1.5040706@austin.ibm.com> Message-ID: <20050308234354.GB18077@austin.ibm.com> On Tue, Mar 08, 2005 at 11:20:33AM -0600, Joel Schopp wrote: > The downside of calling this is it increases the path length and causes > ibm_get_xive to be called again. Usually slightly slower is a fine > tradeoff for more readable code, but in this case I would have left it > how it was. With all the cpus stopped it is best to be as fast as Is CPU removal really that performance critical a path? How long does a ibm_get_xive call take? > possible. Maybe this is still fast enough, but you'd have to test under > heavy load on a variety of systems to be sure. Please define "fast enough". -Olof From jschopp at austin.ibm.com Wed Mar 9 11:31:46 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Tue, 08 Mar 2005 18:31:46 -0600 Subject: [PATCH] update irq affinity mask when migrating irqs In-Reply-To: <20050308234354.GB18077@austin.ibm.com> References: <20050308020017.GB21853@otto> <422DDEE1.5040706@austin.ibm.com> <20050308234354.GB18077@austin.ibm.com> Message-ID: <422E43F2.80601@austin.ibm.com> Olof Johansson wrote: > On Tue, Mar 08, 2005 at 11:20:33AM -0600, Joel Schopp wrote: > > >>The downside of calling this is it increases the path length and causes >> ibm_get_xive to be called again. Usually slightly slower is a fine >>tradeoff for more readable code, but in this case I would have left it >>how it was. With all the cpus stopped it is best to be as fast as > > > Is CPU removal really that performance critical a path? How long does > a ibm_get_xive call take? The part of it where we have all the cpus with interrupts disabled running our high priority tasks is VERY performance critical. Look at the __stop_machine_run() and stop_machine code. The rest of it is not performance critical at all. I couldn't tell you how long an ibm_get_xive call takes, and I suppose it would vary from system to system. I doubt that adding a ibm_get_xive call and a few other instructions would do much damage. Still, I'd hate it to be the straw that broke the camel's back. The way we'd notice such problems would make it painful to determine the root cause in the field. > > >>possible. Maybe this is still fast enough, but you'd have to test under >>heavy load on a variety of systems to be sure. > > > Please define "fast enough". __stop_machine_run doesn't interfere with system operation. No buffers gets overfilled, no interrupts get lost, no packets get dropped, etc. From paulus at samba.org Wed Mar 9 11:59:13 2005 From: paulus at samba.org (Paul Mackerras) Date: Wed, 9 Mar 2005 11:59:13 +1100 Subject: [PATCH] update irq affinity mask when migrating irqs In-Reply-To: <422E43F2.80601@austin.ibm.com> References: <20050308020017.GB21853@otto> <422DDEE1.5040706@austin.ibm.com> <20050308234354.GB18077@austin.ibm.com> <422E43F2.80601@austin.ibm.com> Message-ID: <16942.19041.842622.791326@cargo.ozlabs.ibm.com> Joel Schopp writes: > The part of it where we have all the cpus with interrupts disabled > running our high priority tasks is VERY performance critical. Look at > the __stop_machine_run() and stop_machine code. The rest of it is not > performance critical at all. Well... we have to be careful not to take too long, but I really can't imagine that an extra procedure call and return is going to cause any problems. Paul. From sfr at canb.auug.org.au Wed Mar 9 12:03:43 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 9 Mar 2005 12:03:43 +1100 Subject: [RFC][PATCH] combining header files Message-ID: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> Hi all, I would just like to start a discussion about consolidating (some of) the ppc and ppc64 header files. As a starting point (am I am not saying that this is the right way to go) the following patch replaces (semantically) equivalent ppc64 headers files by just including the asm-ppc file. We *could* use this method to make the journey incremental until there are no nontrivial files left in asm-ppc64 .... Diffstat looks like: asm-ppc/ipc.h | 2 asm-ppc64/ioctl.h | 75 --------------- asm-ppc64/ioctls.h | 115 ------------------------ asm-ppc64/ipc.h | 35 ------- asm-ppc64/mman.h | 53 ----------- asm-ppc64/param.h | 30 ------ asm-ppc64/parport.h | 19 ---- asm-ppc64/poll.h | 33 ------ asm-ppc64/string.h | 36 ------- asm-ppc64/termbits.h | 194 ----------------------------------------- asm-ppc64/termios.h | 236 -------------------------------------------------- asm-ppc64/unaligned.h | 22 ---- 12 files changed, 12 insertions(+), 838 deletions(-) -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus/include/asm-ppc/ipc.h linus-headers/include/asm-ppc/ipc.h --- linus/include/asm-ppc/ipc.h 2003-09-24 10:56:02.000000000 +1000 +++ linus-headers/include/asm-ppc/ipc.h 2005-03-09 11:54:36.000000000 +1100 @@ -4,7 +4,7 @@ /* * These are used to wrap system calls on PowerPC. * - * See arch/ppc/kernel/syscalls.c for ugly details.. + * See arch/ppc{,64}/kernel/syscalls.c for ugly details.. */ struct ipc_kludge { struct msgbuf __user *msgp; diff -ruNp linus/include/asm-ppc64/ioctl.h linus-headers/include/asm-ppc64/ioctl.h --- linus/include/asm-ppc64/ioctl.h 2003-12-31 09:39:13.000000000 +1100 +++ linus-headers/include/asm-ppc64/ioctl.h 2005-03-09 01:10:54.000000000 +1100 @@ -1,74 +1 @@ -#ifndef _PPC64_IOCTL_H -#define _PPC64_IOCTL_H - - -/* - * This was copied from the alpha as it's a bit cleaner there. - * -- Cort - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define _IOC_NRBITS 8 -#define _IOC_TYPEBITS 8 -#define _IOC_SIZEBITS 13 -#define _IOC_DIRBITS 3 - -#define _IOC_NRMASK ((1 << _IOC_NRBITS)-1) -#define _IOC_TYPEMASK ((1 << _IOC_TYPEBITS)-1) -#define _IOC_SIZEMASK ((1 << _IOC_SIZEBITS)-1) -#define _IOC_DIRMASK ((1 << _IOC_DIRBITS)-1) - -#define _IOC_NRSHIFT 0 -#define _IOC_TYPESHIFT (_IOC_NRSHIFT+_IOC_NRBITS) -#define _IOC_SIZESHIFT (_IOC_TYPESHIFT+_IOC_TYPEBITS) -#define _IOC_DIRSHIFT (_IOC_SIZESHIFT+_IOC_SIZEBITS) - -/* - * Direction bits _IOC_NONE could be 0, but OSF/1 gives it a bit. - * And this turns out useful to catch old ioctl numbers in header - * files for us. - */ -#define _IOC_NONE 1U -#define _IOC_READ 2U -#define _IOC_WRITE 4U - -#define _IOC(dir,type,nr,size) \ - (((dir) << _IOC_DIRSHIFT) | \ - ((type) << _IOC_TYPESHIFT) | \ - ((nr) << _IOC_NRSHIFT) | \ - ((size) << _IOC_SIZESHIFT)) - -/* provoke compile error for invalid uses of size argument */ -extern unsigned int __invalid_size_argument_for_IOC; -#define _IOC_TYPECHECK(t) \ - ((sizeof(t) == sizeof(t[1]) && \ - sizeof(t) < (1 << _IOC_SIZEBITS)) ? \ - sizeof(t) : __invalid_size_argument_for_IOC) - -/* used to create numbers */ -#define _IO(type,nr) _IOC(_IOC_NONE,(type),(nr),0) -#define _IOR(type,nr,size) _IOC(_IOC_READ,(type),(nr),(_IOC_TYPECHECK(size))) -#define _IOW(type,nr,size) _IOC(_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size))) -#define _IOWR(type,nr,size) _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size))) -#define _IOR_BAD(type,nr,size) _IOC(_IOC_READ,(type),(nr),sizeof(size)) -#define _IOW_BAD(type,nr,size) _IOC(_IOC_WRITE,(type),(nr),sizeof(size)) -#define _IOWR_BAD(type,nr,size) _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),sizeof(size)) - -/* used to decode them.. */ -#define _IOC_DIR(nr) (((nr) >> _IOC_DIRSHIFT) & _IOC_DIRMASK) -#define _IOC_TYPE(nr) (((nr) >> _IOC_TYPESHIFT) & _IOC_TYPEMASK) -#define _IOC_NR(nr) (((nr) >> _IOC_NRSHIFT) & _IOC_NRMASK) -#define _IOC_SIZE(nr) (((nr) >> _IOC_SIZESHIFT) & _IOC_SIZEMASK) - -/* various drivers, such as the pcmcia stuff, need these... */ -#define IOC_IN (_IOC_WRITE << _IOC_DIRSHIFT) -#define IOC_OUT (_IOC_READ << _IOC_DIRSHIFT) -#define IOC_INOUT ((_IOC_WRITE|_IOC_READ) << _IOC_DIRSHIFT) -#define IOCSIZE_MASK (_IOC_SIZEMASK << _IOC_SIZESHIFT) -#define IOCSIZE_SHIFT (_IOC_SIZESHIFT) - -#endif /* _PPC64_IOCTL_H */ +#include diff -ruNp linus/include/asm-ppc64/ioctls.h linus-headers/include/asm-ppc64/ioctls.h --- linus/include/asm-ppc64/ioctls.h 2003-04-03 08:55:29.000000000 +1000 +++ linus-headers/include/asm-ppc64/ioctls.h 2005-03-09 01:13:05.000000000 +1100 @@ -1,114 +1 @@ -#ifndef _ASM_PPC64_IOCTLS_H -#define _ASM_PPC64_IOCTLS_H - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include - -#define FIOCLEX _IO('f', 1) -#define FIONCLEX _IO('f', 2) -#define FIOASYNC _IOW('f', 125, int) -#define FIONBIO _IOW('f', 126, int) -#define FIONREAD _IOR('f', 127, int) -#define TIOCINQ FIONREAD -#define FIOQSIZE _IOR('f', 128, loff_t) - -#define TIOCGETP _IOR('t', 8, struct sgttyb) -#define TIOCSETP _IOW('t', 9, struct sgttyb) -#define TIOCSETN _IOW('t', 10, struct sgttyb) /* TIOCSETP wo flush */ - -#define TIOCSETC _IOW('t', 17, struct tchars) -#define TIOCGETC _IOR('t', 18, struct tchars) -#define TCGETS _IOR('t', 19, struct termios) -#define TCSETS _IOW('t', 20, struct termios) -#define TCSETSW _IOW('t', 21, struct termios) -#define TCSETSF _IOW('t', 22, struct termios) - -#define TCGETA _IOR('t', 23, struct termio) -#define TCSETA _IOW('t', 24, struct termio) -#define TCSETAW _IOW('t', 25, struct termio) -#define TCSETAF _IOW('t', 28, struct termio) - -#define TCSBRK _IO('t', 29) -#define TCXONC _IO('t', 30) -#define TCFLSH _IO('t', 31) - -#define TIOCSWINSZ _IOW('t', 103, struct winsize) -#define TIOCGWINSZ _IOR('t', 104, struct winsize) -#define TIOCSTART _IO('t', 110) /* start output, like ^Q */ -#define TIOCSTOP _IO('t', 111) /* stop output, like ^S */ -#define TIOCOUTQ _IOR('t', 115, int) /* output queue size */ - -#define TIOCGLTC _IOR('t', 116, struct ltchars) -#define TIOCSLTC _IOW('t', 117, struct ltchars) -#define TIOCSPGRP _IOW('t', 118, int) -#define TIOCGPGRP _IOR('t', 119, int) - -#define TIOCEXCL 0x540C -#define TIOCNXCL 0x540D -#define TIOCSCTTY 0x540E - -#define TIOCSTI 0x5412 -#define TIOCMGET 0x5415 -#define TIOCMBIS 0x5416 -#define TIOCMBIC 0x5417 -#define TIOCMSET 0x5418 -# define TIOCM_LE 0x001 -# define TIOCM_DTR 0x002 -# define TIOCM_RTS 0x004 -# define TIOCM_ST 0x008 -# define TIOCM_SR 0x010 -# define TIOCM_CTS 0x020 -# define TIOCM_CAR 0x040 -# define TIOCM_RNG 0x080 -# define TIOCM_DSR 0x100 -# define TIOCM_CD TIOCM_CAR -# define TIOCM_RI TIOCM_RNG - -#define TIOCGSOFTCAR 0x5419 -#define TIOCSSOFTCAR 0x541A -#define TIOCLINUX 0x541C -#define TIOCCONS 0x541D -#define TIOCGSERIAL 0x541E -#define TIOCSSERIAL 0x541F -#define TIOCPKT 0x5420 -# define TIOCPKT_DATA 0 -# define TIOCPKT_FLUSHREAD 1 -# define TIOCPKT_FLUSHWRITE 2 -# define TIOCPKT_STOP 4 -# define TIOCPKT_START 8 -# define TIOCPKT_NOSTOP 16 -# define TIOCPKT_DOSTOP 32 - - -#define TIOCNOTTY 0x5422 -#define TIOCSETD 0x5423 -#define TIOCGETD 0x5424 -#define TCSBRKP 0x5425 /* Needed for POSIX tcsendbreak() */ -#define TIOCSBRK 0x5427 /* BSD compatibility */ -#define TIOCCBRK 0x5428 /* BSD compatibility */ -#define TIOCGSID 0x5429 /* Return the session ID of FD */ -#define TIOCGPTN _IOR('T',0x30, unsigned int) /* Get Pty Number (of pty-mux device) */ -#define TIOCSPTLCK _IOW('T',0x31, int) /* Lock/unlock Pty */ - -#define TIOCSERCONFIG 0x5453 -#define TIOCSERGWILD 0x5454 -#define TIOCSERSWILD 0x5455 -#define TIOCGLCKTRMIOS 0x5456 -#define TIOCSLCKTRMIOS 0x5457 -#define TIOCSERGSTRUCT 0x5458 /* For debugging only */ -#define TIOCSERGETLSR 0x5459 /* Get line status register */ - /* ioctl (fd, TIOCSERGETLSR, &result) where result may be as below */ -# define TIOCSER_TEMT 0x01 /* Transmitter physically empty */ -#define TIOCSERGETMULTI 0x545A /* Get multiport config */ -#define TIOCSERSETMULTI 0x545B /* Set multiport config */ - -#define TIOCMIWAIT 0x545C /* wait for a change on serial input line(s) */ -#define TIOCGICOUNT 0x545D /* read serial port inline interrupt counts */ - -#endif /* _ASM_PPC64_IOCTLS_H */ +#include diff -ruNp linus/include/asm-ppc64/ipc.h linus-headers/include/asm-ppc64/ipc.h --- linus/include/asm-ppc64/ipc.h 2004-05-30 11:50:26.000000000 +1000 +++ linus-headers/include/asm-ppc64/ipc.h 2005-03-09 01:15:40.000000000 +1100 @@ -1,34 +1 @@ -#ifndef __PPC64_IPC_H__ -#define __PPC64_IPC_H__ - -/* - * These are used to wrap system calls on PowerPC. - * - * See arch/ppc64/kernel/syscalls.c for ugly details.. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -struct ipc_kludge { - struct msgbuf __user *msgp; - long msgtyp; -}; - -#define SEMOP 1 -#define SEMGET 2 -#define SEMCTL 3 -#define SEMTIMEDOP 4 -#define MSGSND 11 -#define MSGRCV 12 -#define MSGGET 13 -#define MSGCTL 14 -#define SHMAT 21 -#define SHMDT 22 -#define SHMGET 23 -#define SHMCTL 24 - -#define IPCCALL(version,op) ((version)<<16 | (op)) - -#endif /* __PPC64_IPC_H__ */ +#include diff -ruNp linus/include/asm-ppc64/mman.h linus-headers/include/asm-ppc64/mman.h --- linus/include/asm-ppc64/mman.h 2003-09-26 07:54:24.000000000 +1000 +++ linus-headers/include/asm-ppc64/mman.h 2005-03-09 01:25:14.000000000 +1100 @@ -1,52 +1 @@ -#ifndef __PPC64_MMAN_H__ -#define __PPC64_MMAN_H__ - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define PROT_READ 0x1 /* page can be read */ -#define PROT_WRITE 0x2 /* page can be written */ -#define PROT_EXEC 0x4 /* page can be executed */ -#define PROT_SEM 0x8 /* page may be used for atomic ops */ -#define PROT_NONE 0x0 /* page can not be accessed */ -#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ -#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */ - -#define MAP_SHARED 0x01 /* Share changes */ -#define MAP_PRIVATE 0x02 /* Changes are private */ -#define MAP_TYPE 0x0f /* Mask for type of mapping */ -#define MAP_FIXED 0x10 /* Interpret addr exactly */ -#define MAP_ANONYMOUS 0x20 /* don't use a file */ -#define MAP_RENAME MAP_ANONYMOUS /* In SunOS terminology */ -#define MAP_NORESERVE 0x40 /* don't reserve swap pages */ -#define MAP_LOCKED 0x80 - -#define MAP_GROWSDOWN 0x0100 /* stack-like segment */ -#define MAP_DENYWRITE 0x0800 /* ETXTBSY */ -#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ - -#define MS_ASYNC 1 /* sync memory asynchronously */ -#define MS_INVALIDATE 2 /* invalidate the caches */ -#define MS_SYNC 4 /* synchronous memory sync */ - -#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */ -#define MCL_FUTURE 0x4000 /* lock all additions to address space */ - -#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ -#define MAP_NONBLOCK 0x10000 /* do not block on IO */ - -#define MADV_NORMAL 0x0 /* default page-in behavior */ -#define MADV_RANDOM 0x1 /* page-in minimum required */ -#define MADV_SEQUENTIAL 0x2 /* read-ahead aggressively */ -#define MADV_WILLNEED 0x3 /* pre-fault pages */ -#define MADV_DONTNEED 0x4 /* discard these pages */ - -/* compatibility flags */ -#define MAP_ANON MAP_ANONYMOUS -#define MAP_FILE 0 - -#endif /* __PPC64_MMAN_H__ */ +#include diff -ruNp linus/include/asm-ppc64/param.h linus-headers/include/asm-ppc64/param.h --- linus/include/asm-ppc64/param.h 2004-02-23 12:05:19.000000000 +1100 +++ linus-headers/include/asm-ppc64/param.h 2005-03-09 01:38:01.000000000 +1100 @@ -1,29 +1 @@ -#ifndef _ASM_PPC64_PARAM_H -#define _ASM_PPC64_PARAM_H - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#ifdef __KERNEL__ -# define HZ 1000 /* Internal kernel timer frequency */ -# define USER_HZ 100 /* .. some user interfaces are in "ticks" */ -# define CLOCKS_PER_SEC (USER_HZ) /* like times() */ -#endif - -#ifndef HZ -#define HZ 100 -#endif - -#define EXEC_PAGESIZE 4096 - -#ifndef NOGROUP -#define NOGROUP (-1) -#endif - -#define MAXHOSTNAMELEN 64 /* max length of hostname */ - -#endif /* _ASM_PPC64_PARAM_H */ +#include diff -ruNp linus/include/asm-ppc64/parport.h linus-headers/include/asm-ppc64/parport.h --- linus/include/asm-ppc64/parport.h 2002-02-14 23:14:36.000000000 +1100 +++ linus-headers/include/asm-ppc64/parport.h 2005-03-09 01:40:11.000000000 +1100 @@ -1,18 +1 @@ -/* - * parport.h: platform-specific PC-style parport initialisation - * - * Copyright (C) 1999, 2000 Tim Waugh - * - * This file should only be included by drivers/parport/parport_pc.c. - */ - -#ifndef _ASM_PPC64_PARPORT_H -#define _ASM_PPC64_PARPORT_H - -static int __devinit parport_pc_find_isa_ports (int autoirq, int autodma); -static int __devinit parport_pc_find_nonpci_ports (int autoirq, int autodma) -{ - return parport_pc_find_isa_ports (autoirq, autodma); -} - -#endif /* !(_ASM_PPC_PARPORT_H) */ +#include diff -ruNp linus/include/asm-ppc64/poll.h linus-headers/include/asm-ppc64/poll.h --- linus/include/asm-ppc64/poll.h 2002-11-01 05:18:30.000000000 +1100 +++ linus-headers/include/asm-ppc64/poll.h 2005-03-09 01:45:19.000000000 +1100 @@ -1,32 +1 @@ -#ifndef __PPC64_POLL_H -#define __PPC64_POLL_H - -/* - * Copyright (C) 2001 PPC64 Team, IBM Corp - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define POLLIN 0x0001 -#define POLLPRI 0x0002 -#define POLLOUT 0x0004 -#define POLLERR 0x0008 -#define POLLHUP 0x0010 -#define POLLNVAL 0x0020 -#define POLLRDNORM 0x0040 -#define POLLRDBAND 0x0080 -#define POLLWRNORM 0x0100 -#define POLLWRBAND 0x0200 -#define POLLMSG 0x0400 -#define POLLREMOVE 0x1000 - -struct pollfd { - int fd; - short events; - short revents; -}; - -#endif /* __PPC64_POLL_H */ +#include diff -ruNp linus/include/asm-ppc64/string.h linus-headers/include/asm-ppc64/string.h --- linus/include/asm-ppc64/string.h 2005-01-29 06:05:47.000000000 +1100 +++ linus-headers/include/asm-ppc64/string.h 2005-03-09 02:01:45.000000000 +1100 @@ -1,35 +1 @@ -#ifndef _PPC64_STRING_H_ -#define _PPC64_STRING_H_ - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define __HAVE_ARCH_STRCPY -#define __HAVE_ARCH_STRNCPY -#define __HAVE_ARCH_STRLEN -#define __HAVE_ARCH_STRCMP -#define __HAVE_ARCH_STRCAT -#define __HAVE_ARCH_MEMSET -#define __HAVE_ARCH_MEMCPY -#define __HAVE_ARCH_MEMMOVE -#define __HAVE_ARCH_MEMCMP -#define __HAVE_ARCH_MEMCHR - -extern int strcasecmp(const char *, const char *); -extern int strncasecmp(const char *, const char *, int); -extern char * strcpy(char *,const char *); -extern char * strncpy(char *,const char *, __kernel_size_t); -extern __kernel_size_t strlen(const char *); -extern int strcmp(const char *,const char *); -extern char * strcat(char *, const char *); -extern void * memset(void *,int,__kernel_size_t); -extern void * memcpy(void *,const void *,__kernel_size_t); -extern void * memmove(void *,const void *,__kernel_size_t); -extern int memcmp(const void *,const void *,__kernel_size_t); -extern void * memchr(const void *,int,__kernel_size_t); - -#endif /* _PPC64_STRING_H_ */ +#include diff -ruNp linus/include/asm-ppc64/termbits.h linus-headers/include/asm-ppc64/termbits.h --- linus/include/asm-ppc64/termbits.h 2004-05-11 07:53:05.000000000 +1000 +++ linus-headers/include/asm-ppc64/termbits.h 2005-03-09 02:04:35.000000000 +1100 @@ -1,193 +1 @@ -#ifndef _PPC64_TERMBITS_H -#define _PPC64_TERMBITS_H - -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include - -typedef unsigned char cc_t; -typedef unsigned int speed_t; -typedef unsigned int tcflag_t; - -/* - * termios type and macro definitions. Be careful about adding stuff - * to this file since it's used in GNU libc and there are strict rules - * concerning namespace pollution. - */ - -#define NCCS 19 -struct termios { - tcflag_t c_iflag; /* input mode flags */ - tcflag_t c_oflag; /* output mode flags */ - tcflag_t c_cflag; /* control mode flags */ - tcflag_t c_lflag; /* local mode flags */ - cc_t c_cc[NCCS]; /* control characters */ - cc_t c_line; /* line discipline (== c_cc[19]) */ - speed_t c_ispeed; /* input speed */ - speed_t c_ospeed; /* output speed */ -}; - -/* c_cc characters */ -#define VINTR 0 -#define VQUIT 1 -#define VERASE 2 -#define VKILL 3 -#define VEOF 4 -#define VMIN 5 -#define VEOL 6 -#define VTIME 7 -#define VEOL2 8 -#define VSWTC 9 -#define VWERASE 10 -#define VREPRINT 11 -#define VSUSP 12 -#define VSTART 13 -#define VSTOP 14 -#define VLNEXT 15 -#define VDISCARD 16 - -/* c_iflag bits */ -#define IGNBRK 0000001 -#define BRKINT 0000002 -#define IGNPAR 0000004 -#define PARMRK 0000010 -#define INPCK 0000020 -#define ISTRIP 0000040 -#define INLCR 0000100 -#define IGNCR 0000200 -#define ICRNL 0000400 -#define IXON 0001000 -#define IXOFF 0002000 -#define IXANY 0004000 -#define IUCLC 0010000 -#define IMAXBEL 0020000 -#define IUTF8 0040000 - -/* c_oflag bits */ -#define OPOST 0000001 -#define ONLCR 0000002 -#define OLCUC 0000004 - -#define OCRNL 0000010 -#define ONOCR 0000020 -#define ONLRET 0000040 - -#define OFILL 00000100 -#define OFDEL 00000200 -#define NLDLY 00001400 -#define NL0 00000000 -#define NL1 00000400 -#define NL2 00001000 -#define NL3 00001400 -#define TABDLY 00006000 -#define TAB0 00000000 -#define TAB1 00002000 -#define TAB2 00004000 -#define TAB3 00006000 -#define XTABS 00006000 /* required by POSIX to == TAB3 */ -#define CRDLY 00030000 -#define CR0 00000000 -#define CR1 00010000 -#define CR2 00020000 -#define CR3 00030000 -#define FFDLY 00040000 -#define FF0 00000000 -#define FF1 00040000 -#define BSDLY 00100000 -#define BS0 00000000 -#define BS1 00100000 -#define VTDLY 00200000 -#define VT0 00000000 -#define VT1 00200000 - -/* c_cflag bit meaning */ -#define CBAUD 0000377 -#define B0 0000000 /* hang up */ -#define B50 0000001 -#define B75 0000002 -#define B110 0000003 -#define B134 0000004 -#define B150 0000005 -#define B200 0000006 -#define B300 0000007 -#define B600 0000010 -#define B1200 0000011 -#define B1800 0000012 -#define B2400 0000013 -#define B4800 0000014 -#define B9600 0000015 -#define B19200 0000016 -#define B38400 0000017 -#define EXTA B19200 -#define EXTB B38400 -#define CBAUDEX 0000000 -#define B57600 00020 -#define B115200 00021 -#define B230400 00022 -#define B460800 00023 -#define B500000 00024 -#define B576000 00025 -#define B921600 00026 -#define B1000000 00027 -#define B1152000 00030 -#define B1500000 00031 -#define B2000000 00032 -#define B2500000 00033 -#define B3000000 00034 -#define B3500000 00035 -#define B4000000 00036 - -#define CSIZE 00001400 -#define CS5 00000000 -#define CS6 00000400 -#define CS7 00001000 -#define CS8 00001400 - -#define CSTOPB 00002000 -#define CREAD 00004000 -#define PARENB 00010000 -#define PARODD 00020000 -#define HUPCL 00040000 - -#define CLOCAL 00100000 -#define CRTSCTS 020000000000 /* flow control */ - -/* c_lflag bits */ -#define ISIG 0x00000080 -#define ICANON 0x00000100 -#define XCASE 0x00004000 -#define ECHO 0x00000008 -#define ECHOE 0x00000002 -#define ECHOK 0x00000004 -#define ECHONL 0x00000010 -#define NOFLSH 0x80000000 -#define TOSTOP 0x00400000 -#define ECHOCTL 0x00000040 -#define ECHOPRT 0x00000020 -#define ECHOKE 0x00000001 -#define FLUSHO 0x00800000 -#define PENDIN 0x20000000 -#define IEXTEN 0x00000400 - -/* Values for the ACTION argument to `tcflow'. */ -#define TCOOFF 0 -#define TCOON 1 -#define TCIOFF 2 -#define TCION 3 - -/* Values for the QUEUE_SELECTOR argument to `tcflush'. */ -#define TCIFLUSH 0 -#define TCOFLUSH 1 -#define TCIOFLUSH 2 - -/* Values for the OPTIONAL_ACTIONS argument to `tcsetattr'. */ -#define TCSANOW 0 -#define TCSADRAIN 1 -#define TCSAFLUSH 2 - -#endif /* _PPC64_TERMBITS_H */ +#include diff -ruNp linus/include/asm-ppc64/termios.h linus-headers/include/asm-ppc64/termios.h --- linus/include/asm-ppc64/termios.h 2003-04-03 08:55:29.000000000 +1000 +++ linus-headers/include/asm-ppc64/termios.h 2005-03-09 02:13:20.000000000 +1100 @@ -1,235 +1 @@ -#ifndef _PPC64_TERMIOS_H -#define _PPC64_TERMIOS_H - -/* - * Liberally adapted from alpha/termios.h. In particular, the c_cc[] - * fields have been reordered so that termio & termios share the - * common subset in the same order (for brain dead programs that don't - * know or care about the differences). - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#include -#include - -struct sgttyb { - char sg_ispeed; - char sg_ospeed; - char sg_erase; - char sg_kill; - short sg_flags; -}; - -struct tchars { - char t_intrc; - char t_quitc; - char t_startc; - char t_stopc; - char t_eofc; - char t_brkc; -}; - -struct ltchars { - char t_suspc; - char t_dsuspc; - char t_rprntc; - char t_flushc; - char t_werasc; - char t_lnextc; -}; - -struct winsize { - unsigned short ws_row; - unsigned short ws_col; - unsigned short ws_xpixel; - unsigned short ws_ypixel; -}; - -#define NCC 10 -struct termio { - unsigned short c_iflag; /* input mode flags */ - unsigned short c_oflag; /* output mode flags */ - unsigned short c_cflag; /* control mode flags */ - unsigned short c_lflag; /* local mode flags */ - unsigned char c_line; /* line discipline */ - unsigned char c_cc[NCC]; /* control characters */ -}; - -/* c_cc characters */ -#define _VINTR 0 -#define _VQUIT 1 -#define _VERASE 2 -#define _VKILL 3 -#define _VEOF 4 -#define _VMIN 5 -#define _VEOL 6 -#define _VTIME 7 -#define _VEOL2 8 -#define _VSWTC 9 - -/* line disciplines */ -#define N_TTY 0 -#define N_SLIP 1 -#define N_MOUSE 2 -#define N_PPP 3 -#define N_STRIP 4 -#define N_AX25 5 -#define N_X25 6 /* X.25 async */ -#define N_6PACK 7 -#define N_MASC 8 /* Reserved for Mobitex module */ -#define N_R3964 9 /* Reserved for Simatic R3964 module */ -#define N_PROFIBUS_FDL 10 /* Reserved for Profibus */ -#define N_IRDA 11 /* Linux IrDa - http://www.cs.uit.no/~dagb/irda/irda.html */ -#define N_SMSBLOCK 12 /* SMS block mode - for talking to GSM data cards about SMS messages */ -#define N_HDLC 13 /* synchronous HDLC */ -#define N_SYNC_PPP 14 - -#ifdef __KERNEL__ -/* ^C ^\ del ^U ^D 1 0 0 0 0 ^W ^R ^Z ^Q ^S ^V ^U */ -#define INIT_C_CC "\003\034\177\025\004\001\000\000\000\000\027\022\032\021\023\026\025" -#endif - -#define FIOCLEX _IO('f', 1) -#define FIONCLEX _IO('f', 2) -#define FIOASYNC _IOW('f', 125, int) -#define FIONBIO _IOW('f', 126, int) -#define FIONREAD _IOR('f', 127, int) -#define TIOCINQ FIONREAD - -#define TIOCGETP _IOR('t', 8, struct sgttyb) -#define TIOCSETP _IOW('t', 9, struct sgttyb) -#define TIOCSETN _IOW('t', 10, struct sgttyb) /* TIOCSETP wo flush */ - -#define TIOCSETC _IOW('t', 17, struct tchars) -#define TIOCGETC _IOR('t', 18, struct tchars) -#define TCGETS _IOR('t', 19, struct termios) -#define TCSETS _IOW('t', 20, struct termios) -#define TCSETSW _IOW('t', 21, struct termios) -#define TCSETSF _IOW('t', 22, struct termios) - -#define TCGETA _IOR('t', 23, struct termio) -#define TCSETA _IOW('t', 24, struct termio) -#define TCSETAW _IOW('t', 25, struct termio) -#define TCSETAF _IOW('t', 28, struct termio) - -#define TCSBRK _IO('t', 29) -#define TCXONC _IO('t', 30) -#define TCFLSH _IO('t', 31) - -#define TIOCSWINSZ _IOW('t', 103, struct winsize) -#define TIOCGWINSZ _IOR('t', 104, struct winsize) -#define TIOCSTART _IO('t', 110) /* start output, like ^Q */ -#define TIOCSTOP _IO('t', 111) /* stop output, like ^S */ -#define TIOCOUTQ _IOR('t', 115, int) /* output queue size */ - -#define TIOCGLTC _IOR('t', 116, struct ltchars) -#define TIOCSLTC _IOW('t', 117, struct ltchars) -#define TIOCSPGRP _IOW('t', 118, int) -#define TIOCGPGRP _IOR('t', 119, int) - -#define TIOCEXCL 0x540C -#define TIOCNXCL 0x540D -#define TIOCSCTTY 0x540E - -#define TIOCSTI 0x5412 -#define TIOCMGET 0x5415 -#define TIOCMBIS 0x5416 -#define TIOCMBIC 0x5417 -#define TIOCMSET 0x5418 -#define TIOCGSOFTCAR 0x5419 -#define TIOCSSOFTCAR 0x541A -#define TIOCLINUX 0x541C -#define TIOCCONS 0x541D -#define TIOCGSERIAL 0x541E -#define TIOCSSERIAL 0x541F -#define TIOCPKT 0x5420 - -#define TIOCNOTTY 0x5422 -#define TIOCSETD 0x5423 -#define TIOCGETD 0x5424 -#define TCSBRKP 0x5425 /* Needed for POSIX tcsendbreak() */ - -#define TIOCSERCONFIG 0x5453 -#define TIOCSERGWILD 0x5454 -#define TIOCSERSWILD 0x5455 -#define TIOCGLCKTRMIOS 0x5456 -#define TIOCSLCKTRMIOS 0x5457 -#define TIOCSERGSTRUCT 0x5458 /* For debugging only */ -#define TIOCSERGETLSR 0x5459 /* Get line status register */ -#define TIOCSERGETMULTI 0x545A /* Get multiport config */ -#define TIOCSERSETMULTI 0x545B /* Set multiport config */ - -#define TIOCMIWAIT 0x545C /* wait for a change on serial input line(s) */ -#define TIOCGICOUNT 0x545D /* read serial port inline interrupt counts */ - -/* Used for packet mode */ -#define TIOCPKT_DATA 0 -#define TIOCPKT_FLUSHREAD 1 -#define TIOCPKT_FLUSHWRITE 2 -#define TIOCPKT_STOP 4 -#define TIOCPKT_START 8 -#define TIOCPKT_NOSTOP 16 -#define TIOCPKT_DOSTOP 32 - -/* modem lines */ -#define TIOCM_LE 0x001 -#define TIOCM_DTR 0x002 -#define TIOCM_RTS 0x004 -#define TIOCM_ST 0x008 -#define TIOCM_SR 0x010 -#define TIOCM_CTS 0x020 -#define TIOCM_CAR 0x040 -#define TIOCM_RNG 0x080 -#define TIOCM_DSR 0x100 -#define TIOCM_CD TIOCM_CAR -#define TIOCM_RI TIOCM_RNG -#define TIOCM_OUT1 0x2000 -#define TIOCM_OUT2 0x4000 -#define TIOCM_LOOP 0x8000 - -/* ioctl (fd, TIOCSERGETLSR, &result) where result may be as below */ -#define TIOCSER_TEMT 0x01 /* Transmitter physically empty */ - -#ifdef __KERNEL__ - -/* - * Translate a "termio" structure into a "termios". Ugh. - */ -#define SET_LOW_TERMIOS_BITS(termios, termio, x) { \ - unsigned short __tmp; \ - get_user(__tmp,&(termio)->x); \ - (termios)->x = (0xffff0000 & (termios)->x) | __tmp; \ -} - -#define user_termio_to_kernel_termios(termios, termio) \ -({ \ - SET_LOW_TERMIOS_BITS(termios, termio, c_iflag); \ - SET_LOW_TERMIOS_BITS(termios, termio, c_oflag); \ - SET_LOW_TERMIOS_BITS(termios, termio, c_cflag); \ - SET_LOW_TERMIOS_BITS(termios, termio, c_lflag); \ - copy_from_user((termios)->c_cc, (termio)->c_cc, NCC); \ -}) - -/* - * Translate a "termios" structure into a "termio". Ugh. - */ -#define kernel_termios_to_user_termio(termio, termios) \ -({ \ - put_user((termios)->c_iflag, &(termio)->c_iflag); \ - put_user((termios)->c_oflag, &(termio)->c_oflag); \ - put_user((termios)->c_cflag, &(termio)->c_cflag); \ - put_user((termios)->c_lflag, &(termio)->c_lflag); \ - put_user((termios)->c_line, &(termio)->c_line); \ - copy_to_user((termio)->c_cc, (termios)->c_cc, NCC); \ -}) - -#define user_termios_to_kernel_termios(k, u) copy_from_user(k, u, sizeof(struct termios)) -#define kernel_termios_to_user_termios(u, k) copy_to_user(u, k, sizeof(struct termios)) - -#endif /* __KERNEL__ */ - -#endif /* _PPC64_TERMIOS_H */ +#include diff -ruNp linus/include/asm-ppc64/unaligned.h linus-headers/include/asm-ppc64/unaligned.h --- linus/include/asm-ppc64/unaligned.h 2002-02-14 23:14:36.000000000 +1100 +++ linus-headers/include/asm-ppc64/unaligned.h 2005-03-09 02:16:30.000000000 +1100 @@ -1,21 +1 @@ -#ifndef __PPC64_UNALIGNED_H -#define __PPC64_UNALIGNED_H - -/* - * The PowerPC can do unaligned accesses itself in big endian mode. - * - * The strange macros are there to make sure these can't - * be misused in a way that makes them not work on other - * architectures where unaligned accesses aren't as simple. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#define get_unaligned(ptr) (*(ptr)) - -#define put_unaligned(val, ptr) ((void)( *(ptr) = (val) )) - -#endif /* __PPC64_UNALIGNED_H */ +#include From amodra at bigpond.net.au Wed Mar 9 12:19:29 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 9 Mar 2005 11:49:29 +1030 Subject: [RFC][PATCH] combining header files In-Reply-To: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> Message-ID: <20050309011929.GI15642@bubble.modra.org> On Wed, Mar 09, 2005 at 12:03:43PM +1100, Stephen Rothwell wrote: > I would just like to start a discussion about consolidating (some of) the > ppc and ppc64 header files. Marvellous! In case it isn't completely obvious, you can often share structure definitions between ppc32 and ppc64 by judicious selection of types. eg. struct stays_the_same { long long some_64bit_var; int some_32bit_var; } struct bigger_in_64bit { long var_sized_by_arch; } -- Alan Modra IBM OzLabs - Linux Technology Centre From benh at kernel.crashing.org Wed Mar 9 14:02:01 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Mar 2005 14:02:01 +1100 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <20050308171326.3d72363a.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> Message-ID: <1110337321.32556.26.camel@gaston> On Tue, 2005-03-08 at 17:13 -0600, Jake Moilanen wrote: > diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c > --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 > +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c 2005-03-08 16:08:57 -06:00 > @@ -624,6 +624,7 @@ static void __init iSeries_bolt_kernel(u > { > unsigned long pa; > unsigned long mode_rw = _PAGE_ACCESSED | _PAGE_COHERENT | PP_RWXX; > + unsigned long tmp_mode; > HPTE hpte; > > for (pa = saddr; pa < eaddr ;pa += PAGE_SIZE) { > @@ -632,6 +633,12 @@ static void __init iSeries_bolt_kernel(u > unsigned long va = (vsid << 28) | (pa & 0xfffffff); > unsigned long vpn = va >> PAGE_SHIFT; > unsigned long slot = HvCallHpt_findValid(&hpte, vpn); > + > + tmp_mode = mode_rw; > + > + /* Make non-kernel text non-executable */ > + if (!is_kernel_text(ea)) > + tmp_mode = mode_rw | HW_NO_EXEC; > > if (hpte.dw0.dw0.v) { > /* HPTE exists, so just bolt it */ tmp_mode doesn't seem to be ever used here ... > /* Free memory returned from module_alloc */ > diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c > --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-08 16:08:57 -06:00 > @@ -76,6 +76,21 @@ static int store_updates_sp(struct pt_re > return 0; > } > > +pte_t *lookup_address(unsigned long address) > +{ > + pgd_t *pgd = pgd_offset_k(address); > + pmd_t *pmd; > + > + if (pgd_none(*pgd)) > + return NULL; > + > + pmd = pmd_offset(pgd, address); > + if (pmd_none(*pmd)) > + return NULL; > + > + return pte_offset_kernel(pmd, address); > +} Use find_linux_pte() here (asm-ppc64/pgtable.h). It will return NULL of the PTE is not present too, so no need to dbl check that. That way, I won't have to fix your copy of the function when I get the proper 4L headers patch in ;) > /* > * The error_code parameter is > * - DSISR for a non-SLB data access fault, > @@ -94,6 +109,7 @@ int do_page_fault(struct pt_regs *regs, > unsigned long is_write = error_code & 0x02000000; > unsigned long trap = TRAP(regs); > unsigned long is_exec = trap == 0x400; > + pte_t *ptep; > > BUG_ON((trap == 0x380) || (trap == 0x480)); > > @@ -253,6 +269,15 @@ bad_area_nosemaphore: > info.si_addr = (void __user *) address; > force_sig_info(SIGSEGV, &info, current); > return 0; > + } > + > + ptep = lookup_address(address); > + > + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { > + if (printk_ratelimit()) > + printk(KERN_CRIT "kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n", current->uid); > + show_stack(current, (unsigned long *)__get_SP()); > + do_exit(SIGKILL); > } Can you try to limit to 80 columns ? (I know, I'm not the best for that neither, but I'm trying to cure myself here, I promise my next rewrite of radeonfb will be fully 80-columns safe :) From flar at allandria.com Wed Mar 9 17:34:44 2005 From: flar at allandria.com (Brad Boyer) Date: Tue, 8 Mar 2005 22:34:44 -0800 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <1110324608.32556.1.camel@gaston> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> <20050308125635.GA19169@suse.de> <1110320815.13593.279.camel@gaston> <20050308231046.GA19175@pants.nu> <1110324608.32556.1.camel@gaston> Message-ID: <20050309063443.GB20610@pants.nu> On Wed, Mar 09, 2005 at 10:30:08AM +1100, Benjamin Herrenschmidt wrote: > If we go that way, then we should finally bite the bullet and get ADB in > the device model (define an adb bus_type, with proper drivers etc...). > That would also allow a clean mecanism (sysfs properties) for things > like trackpad settings, etc... I agree. That's one of the reasons I haven't done it yet. I did start on it, but other stuff took higher priority. Would people want to still be able to use the older code? I also intend to make a new drivers/adb directory, since it won't really be Mac specific at that point. That would include the main bus_type definition, as well as stuff like adbhid, but not via-cuda, via-pmu, and so on. I won't be working on it until I get a couple other things done, but it doesn't sound like anyone else will get to it first. Brad Boyer flar at allandria.com From wangzyu at cn.ibm.com Wed Mar 9 17:40:35 2005 From: wangzyu at cn.ibm.com (Zhao Yu Wang) Date: Wed, 9 Mar 2005 14:40:35 +0800 Subject: A question about dlpar add cpu failed Message-ID: Hi, I meet a failed while trying to add cpu to a partition dynamicly. I am not sure what's wrong. Thanks. 1. about the shared_proc_pool: hscroot at hmc6lte:~> lshwres -r proc -m fsp-pear --level pool shared_proc_pool_id=0,configurable_pool_proc_units=null,curr_avail_pool_proc_units=2.0,pend_avail_pool_proc_units=null 2. partition "pearlp3 RH"'s config: hscroot at hmc6lte:~> lshwres -r proc -m fsp-pear --level lpar --filter "lpar_names=pearlp3 RH" lpar_name=pearlp3 RH,lpar_id=3,curr_shared_proc_pool_id=0,curr_proc_mode=shared,curr_min_proc_units=0.1,curr_proc_units=0.3,curr_max_proc_units=1.0,curr_min_procs=1,curr_procs=3,curr_max_procs=10,curr_sharing_mode=uncap,curr_uncap_weight=128,pend_shared_proc_pool_id=0,pend_proc_mode=shared,pend_min_proc_units=0.1,pend_proc_units=0.3,pend_max_proc_units=1.0,pend_min_procs=1,pend_procs=3,pend_max_procs=10,pend_sharing_mode=uncap,pend_uncap_weight=128,run_proc_units=0.3,run_procs=3,run_uncap_weight=128 3. The operate and result A add 1 procs: hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 RH" --procs 1 HSCL145F Attempted to allocate processing units less than the minimum capacity allowed with the specified virtual processor setting. B add 10 procs: hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 RH" --procs 10 Your request exceeds the profile's maximum virtual processor limit. You can add or move up to 7 virtual processors. Please retry the operation. C add 2 procs: hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 RH" --procs 2 HSCL145F Attempted to allocate processing units less than the minimum capacity allowed with the specified virtual processor setting. Thanks & Best regards, -------------------------------------------- Wang Zhaoyu ??? Email: wangzyu at cn.ibm.com Notes: Zhao Yu Wang/China/Contr/IBM at IBMCN -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050309/a3acdccc/attachment.htm From benh at kernel.crashing.org Wed Mar 9 18:19:50 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Wed, 09 Mar 2005 18:19:50 +1100 Subject: eeh.h compile warnings / adbhid.c build failure In-Reply-To: <20050309063443.GB20610@pants.nu> References: <20050302181206.GA2741@us.ibm.com> <1109806756.5680.127.camel@gaston> <20050308125635.GA19169@suse.de> <1110320815.13593.279.camel@gaston> <20050308231046.GA19175@pants.nu> <1110324608.32556.1.camel@gaston> <20050309063443.GB20610@pants.nu> Message-ID: <1110352790.32557.63.camel@gaston> On Tue, 2005-03-08 at 22:34 -0800, Brad Boyer wrote: > I won't be working on it until I get a couple other things done, but > it doesn't sound like anyone else will get to it first. Let me know when you start, I may beat you to it if I get bored one of these week-ends :) Ben. From geert at linux-m68k.org Wed Mar 9 20:40:13 2005 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Wed, 9 Mar 2005 10:40:13 +0100 (CET) Subject: [RFC][PATCH] combining header files In-Reply-To: <20050309011929.GI15642@bubble.modra.org> References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> <20050309011929.GI15642@bubble.modra.org> Message-ID: On Wed, 9 Mar 2005, Alan Modra wrote: > On Wed, Mar 09, 2005 at 12:03:43PM +1100, Stephen Rothwell wrote: > > I would just like to start a discussion about consolidating (some of) the > > ppc and ppc64 header files. > > Marvellous! In case it isn't completely obvious, you can often share > structure definitions between ppc32 and ppc64 by judicious selection of > types. eg. > > struct stays_the_same { > long long some_64bit_var; > int some_32bit_var; > } If size matters, why not use an explicitly sized type like s64 to make it explicit? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From amodra at bigpond.net.au Wed Mar 9 23:06:01 2005 From: amodra at bigpond.net.au (Alan Modra) Date: Wed, 9 Mar 2005 22:36:01 +1030 Subject: [RFC][PATCH] combining header files In-Reply-To: References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> <20050309011929.GI15642@bubble.modra.org> Message-ID: <20050309120601.GN15642@bubble.modra.org> On Wed, Mar 09, 2005 at 10:40:13AM +0100, Geert Uytterhoeven wrote: > On Wed, 9 Mar 2005, Alan Modra wrote: > > On Wed, Mar 09, 2005 at 12:03:43PM +1100, Stephen Rothwell wrote: > > > I would just like to start a discussion about consolidating (some of) the > > > ppc and ppc64 header files. > > > > Marvellous! In case it isn't completely obvious, you can often share > > structure definitions between ppc32 and ppc64 by judicious selection of > > types. eg. > > > > struct stays_the_same { > > long long some_64bit_var; > > int some_32bit_var; > > } > > If size matters, why not use an explicitly sized type like s64 to make it > explicit? Sure, that's even better. -- Alan Modra IBM OzLabs - Linux Technology Centre From arnd at arndb.de Thu Mar 10 00:01:16 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Wed, 9 Mar 2005 14:01:16 +0100 Subject: [PATCH] linking zImage with biarch ld Message-ID: <200503091401.17143.arnd@arndb.de> I noticed that with the vDSO patch in 2.6.11-bk, it's almost possible to build the kernel with the fedora biarch toolchain. However, I still get warnings from ld about zImage being the wrong architecture, unless I change the script as shown in this patch. I'm not sure if this breaks setups with old binutils that might not understand powerpc:common, otherwise please apply. Signed-off-by: Arnd Bergmann --- 1.4/arch/ppc64/boot/zImage.lds 2004-09-17 00:34:55 -04:00 +++ edited/arch/ppc64/boot/zImage.lds 2005-03-08 11:03:50 -05:00 @@ -1,4 +1,4 @@ -OUTPUT_ARCH(powerpc) +OUTPUT_ARCH(powerpc:common) SEARCH_DIR(/lib); SEARCH_DIR(/usr/lib); SEARCH_DIR(/usr/local/lib); SEARCH_DIR(/usr/local/powerpc-any-elf/lib); /* Do we need any of these for elf? __DYNAMIC = 0; */ From segher at kernel.crashing.org Thu Mar 10 02:30:12 2005 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Wed, 9 Mar 2005 16:30:12 +0100 Subject: [PATCH] linking zImage with biarch ld In-Reply-To: <200503091401.17143.arnd@arndb.de> References: <200503091401.17143.arnd@arndb.de> Message-ID: <67866e6be990bc2d27d34b92df86d946@kernel.crashing.org> > I'm not sure if this breaks setups with old binutils that might not > understand > powerpc:common, otherwise please apply. powerpc:common is fine since, erm, forever (that is, at least three years). Segher From linas at austin.ibm.com Thu Mar 10 07:01:09 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Wed, 9 Mar 2005 14:01:09 -0600 Subject: [RFC][PATCH] combining header files In-Reply-To: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> Message-ID: <20050309200109.GG1220@austin.ibm.com> On Wed, Mar 09, 2005 at 12:03:43PM +1100, Stephen Rothwell was heard to remark: > Hi all, > > I would just like to start a discussion about consolidating (some of) the > ppc and ppc64 header files. As a starting point (am I am not saying that > this is the right way to go) the following patch replaces (semantically) > equivalent ppc64 headers files by just including the asm-ppc file. Why not #include instead? > We *could* use this method to make the journey incremental until there > are no nontrivial files left in asm-ppc64 .... sounds good to me. --linas From nathanl at austin.ibm.com Thu Mar 10 07:54:15 2005 From: nathanl at austin.ibm.com (Nathan Lynch) Date: Wed, 09 Mar 2005 14:54:15 -0600 Subject: A question about dlpar add cpu failed In-Reply-To: References: Message-ID: <1110401655.12027.7.camel@biclops> On Wed, 2005-03-09 at 14:40 +0800, Zhao Yu Wang wrote: > Hi, > I meet a failed while trying to add cpu to a partition dynamicly. I am > not sure what's wrong. Thanks. ... > A add 1 procs: > hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 > RH" --procs 1 > HSCL145F Attempted to allocate processing units less than the minimum > capacity allowed with the specified virtual processor setting. > > B add 10 procs: > hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 > RH" --procs 10 > Your request exceeds the profile's maximum virtual processor limit. > You can add or move up to 7 virtual processors. Please retry the > operation. > > C add 2 procs: > hscroot at hmc6lte:~> chhwres -m fsp-pear -d 5 -r proc -o a -p "pearlp3 > RH" --procs 2 > HSCL145F Attempted to allocate processing units less than the minimum > capacity allowed with the specified virtual processor setting. All of these seem to indicate HMC or platform firmware issues (or operator error), the OS and programs on the partition are not even involved at this point. Nathan From benh at kernel.crashing.org Thu Mar 10 08:39:48 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Mar 2005 08:39:48 +1100 Subject: [PATCH] linking zImage with biarch ld In-Reply-To: <200503091401.17143.arnd@arndb.de> References: <200503091401.17143.arnd@arndb.de> Message-ID: <1110404388.32524.101.camel@gaston> On Wed, 2005-03-09 at 14:01 +0100, Arnd Bergmann wrote: > I noticed that with the vDSO patch in 2.6.11-bk, it's almost possible to build > the kernel with the fedora biarch toolchain. However, I still get warnings > from ld about zImage being the wrong architecture, unless I change the script > as shown in this patch. "Almost possible" ? What's wrong ? Only that ? > I'm not sure if this breaks setups with old binutils that might not understand > powerpc:common, otherwise please apply. > > Signed-off-by: Arnd Bergmann > > --- 1.4/arch/ppc64/boot/zImage.lds 2004-09-17 00:34:55 -04:00 > +++ edited/arch/ppc64/boot/zImage.lds 2005-03-08 11:03:50 -05:00 > @@ -1,4 +1,4 @@ > -OUTPUT_ARCH(powerpc) > +OUTPUT_ARCH(powerpc:common) > SEARCH_DIR(/lib); SEARCH_DIR(/usr/lib); SEARCH_DIR(/usr/local/lib); SEARCH_DIR(/usr/local/powerpc-any-elf/lib); > /* Do we need any of these for elf? > __DYNAMIC = 0; */ -- Benjamin Herrenschmidt From ntl at pobox.com Thu Mar 10 11:51:32 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:32 -0600 (CST) Subject: [PATCH 0/8] reworked support for pSeries dynamic reconfiguration Message-ID: <20050310005132.31309.65485.31668@otto> Hi- This patch series reworks existing ppc64 architecture support for the "dynamic reconfiguration" option of RPA platforms. This includes PCI hotplug and dynamic logical partitioning (DLPAR). This was all motivated by my desire to add code for better handling of processor addition and removal, but I didn't want to just add to the growing mess in prom.c where we have duplicated code for boot and DLPAR/hotplug. This adds very little new function, but gets rid of much duplicated code and introduces a new pSeries-specific file, pSeries_reconfig.c, which contains the core support for dynamic reconfiguration and implements a more refined version of the notifier chain API I posted a few weeks ago. Code that needs to act upon device nodes that are being added or removed can register with this notifier chain. I've ported as much code as possible to that API, and I expect memory DLPAR will want to use it too. The last couple of patches in the series modify the pSeries smp code so that we properly manage cpu_present_map with respect to DLPAR, and includes the "make cpu hotplug play well with maxcpus and smt-enabled" patch, which depends on this. The following cases have been tested on a Power5 system: * CPU add/remove * Virtual I/O adapter add/remove * Logical slot add/remove (thanks to John Rose) I also checked the build against all defconfigs in arch/ppc64/configs. diffstat for the combined series: arch/ppc64/kernel/Makefile | 2 arch/ppc64/kernel/pSeries_iommu.c | 25 arch/ppc64/kernel/pSeries_reconfig.c | 427 +++++++++++++ arch/ppc64/kernel/pSeries_smp.c | 231 +++++-- arch/ppc64/kernel/pci_dn.c | 22 arch/ppc64/kernel/proc_ppc64.c | 249 -------- arch/ppc64/kernel/prom.c | 466 ++++----------- arch/ppc64/kernel/setup.c | 12 arch/ppc64/kernel/smp.c | 13 include/asm-ppc64/machdep.h | 1 include/asm-ppc64/pSeries_reconfig.h | 25 include/asm-ppc64/prom.h | 4 12 files changed, 812 insertions(+), 665 deletions(-) Thanks, Nathan From ntl at pobox.com Thu Mar 10 11:51:37 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:37 -0600 (CST) Subject: [PATCH 1/8] preliminary changes to OF fixup functions In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005137.31309.32303.42763@otto> Preliminary modifications to support using some of the interpret_func family of functions at runtime. Changes the mem_start argument to be passed by reference, and the return type to int for error handling to be implemented in following patches. Signed-off-by: Nathan Lynch prom.c | 135 ++++++++++++++++++++++++++++++++++------------------------------- 1 files changed, 71 insertions(+), 64 deletions(-) Index: linux-2.6.11-bk5/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/prom.c 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/prom.c 2005-03-09 20:02:15.000000000 +0000 @@ -73,8 +73,8 @@ struct isa_reg_property { }; -typedef unsigned long interpret_func(struct device_node *, unsigned long, - int, int, int); +typedef int interpret_func(struct device_node *, unsigned long *, + int, int, int); extern struct rtas_t rtas; extern struct lmb lmb; @@ -255,9 +255,9 @@ static int __devinit map_interrupt(unsig return nintrc; } -static unsigned long __init finish_node_interrupts(struct device_node *np, - unsigned long mem_start, - int measure_only) +static int __init finish_node_interrupts(struct device_node *np, + unsigned long *mem_start, + int measure_only) { unsigned int *ints; int intlen, intrcells, intrcount; @@ -267,14 +267,14 @@ static unsigned long __init finish_node_ ints = (unsigned int *) get_property(np, "interrupts", &intlen); if (ints == NULL) - return mem_start; + return 0; intrcells = prom_n_intr_cells(np); intlen /= intrcells * sizeof(unsigned int); - np->intrs = (struct interrupt_info *) mem_start; - mem_start += intlen * sizeof(struct interrupt_info); + np->intrs = (struct interrupt_info *) (*mem_start); + (*mem_start) += intlen * sizeof(struct interrupt_info); if (measure_only) - return mem_start; + return 0; intrcount = 0; for (i = 0; i < intlen; ++i, ints += intrcells) { @@ -315,13 +315,13 @@ static unsigned long __init finish_node_ } np->n_intrs = intrcount; - return mem_start; + return 0; } -static unsigned long __init interpret_pci_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_pci_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; struct pci_reg_property *pci_addrs; @@ -331,7 +331,7 @@ static unsigned long __init interpret_pc get_property(np, "assigned-addresses", &l); if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct pci_reg_property)) >= 0) { if (!measure_only) { adr[i].space = pci_addrs[i].addr.a_hi; @@ -343,15 +343,15 @@ static unsigned long __init interpret_pc } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_dbdma_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_dbdma_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct reg_property32 *rp; struct address_range *adr; @@ -372,7 +372,7 @@ static unsigned long __init interpret_db rp = (struct reg_property32 *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct reg_property32)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct reg_property32)) >= 0) { if (!measure_only) { adr[i].space = 2; @@ -383,16 +383,16 @@ static unsigned long __init interpret_db } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_macio_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_macio_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct reg_property32 *rp; struct address_range *adr; @@ -413,7 +413,7 @@ static unsigned long __init interpret_ma rp = (struct reg_property32 *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct reg_property32)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct reg_property32)) >= 0) { if (!measure_only) { adr[i].space = 2; @@ -424,16 +424,16 @@ static unsigned long __init interpret_ma } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_isa_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_isa_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct isa_reg_property *rp; struct address_range *adr; @@ -442,7 +442,7 @@ static unsigned long __init interpret_is rp = (struct isa_reg_property *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct isa_reg_property)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct isa_reg_property)) >= 0) { if (!measure_only) { adr[i].space = rp[i].space; @@ -453,16 +453,16 @@ static unsigned long __init interpret_is } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_root_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_root_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; int i, l; @@ -472,7 +472,7 @@ static unsigned long __init interpret_ro rp = (unsigned int *) get_property(np, "reg", &l); if (rp != 0 && l >= rpsize) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= rpsize) >= 0) { if (!measure_only) { adr[i].space = 0; @@ -484,26 +484,30 @@ static unsigned long __init interpret_ro } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init finish_node(struct device_node *np, - unsigned long mem_start, - interpret_func *ifunc, - int naddrc, int nsizec, - int measure_only) +static int __init finish_node(struct device_node *np, + unsigned long *mem_start, + interpret_func *ifunc, + int naddrc, int nsizec, + int measure_only) { struct device_node *child; - int *ip; + int *ip, rc = 0; /* get the device addresses and interrupts */ if (ifunc != NULL) - mem_start = ifunc(np, mem_start, naddrc, nsizec, measure_only); + rc = ifunc(np, mem_start, naddrc, nsizec, measure_only); + if (rc) + goto out; - mem_start = finish_node_interrupts(np, mem_start, measure_only); + rc = finish_node_interrupts(np, mem_start, measure_only); + if (rc) + goto out; /* Look for #address-cells and #size-cells properties. */ ip = (int *) get_property(np, "#address-cells", NULL); @@ -539,11 +543,14 @@ static unsigned long __init finish_node( || !strcmp(np->type, "media-bay")))) ifunc = NULL; - for (child = np->child; child != NULL; child = child->sibling) - mem_start = finish_node(child, mem_start, ifunc, - naddrc, nsizec, measure_only); - - return mem_start; + for (child = np->child; child != NULL; child = child->sibling) { + rc = finish_node(child, mem_start, ifunc, + naddrc, nsizec, measure_only); + if (rc) + goto out; + } +out: + return rc; } /** @@ -555,7 +562,7 @@ static unsigned long __init finish_node( */ void __init finish_device_tree(void) { - unsigned long mem, size; + unsigned long start, end, size = 0; DBG(" -> finish_device_tree\n"); @@ -568,11 +575,11 @@ void __init finish_device_tree(void) virt_irq_init(); /* Finish device-tree (pre-parsing some properties etc...) */ - size = finish_node(allnodes, 0, NULL, 0, 0, 1); - mem = (unsigned long)abs_to_virt(lmb_alloc(size, 128)); - if (finish_node(allnodes, mem, NULL, 0, 0, 0) != mem + size) - BUG(); - + finish_node(allnodes, &size, NULL, 0, 0, 1); + end = start = (unsigned long)abs_to_virt(lmb_alloc(size, 128)); + finish_node(allnodes, &end, NULL, 0, 0, 0); + BUG_ON(end != start + size); + DBG(" <- finish_device_tree\n"); } From ntl at pobox.com Thu Mar 10 11:51:42 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:42 -0600 (CST) Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005142.31309.45788.99418@otto> At boot we recurse through the device tree "fixing up" various fields and properties in the device nodes. Long ago, to support DLPAR and hotplug, we largely duplicated some of this fixup code, the main difference being that the new code used kmalloc for allocating various data structures which are attached to the new device nodes. This patch kills most of the duplicated code and makes finish_node, finish_node_interrupts, and interpret_pci_props suitable for use at runtime. These functions, if passed a null mem_start argument, will use kmalloc for allocating extra data structures for the device node being processed. Not terribly elegant, but it seems worth it to get rid of the duplicated code (and bugs). Signed-off-by: Nathan Lynch prom.c | 169 ++++++++++++++++++++--------------------------------------------- 1 files changed, 54 insertions(+), 115 deletions(-) Index: linux-2.6.11-bk5/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/prom.c 2005-03-09 20:02:15.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/prom.c 2005-03-09 20:08:28.000000000 +0000 @@ -255,9 +255,9 @@ static int __devinit map_interrupt(unsig return nintrc; } -static int __init finish_node_interrupts(struct device_node *np, - unsigned long *mem_start, - int measure_only) +static int __devinit finish_node_interrupts(struct device_node *np, + unsigned long *mem_start, + int measure_only) { unsigned int *ints; int intlen, intrcells, intrcount; @@ -270,8 +270,15 @@ static int __init finish_node_interrupts return 0; intrcells = prom_n_intr_cells(np); intlen /= intrcells * sizeof(unsigned int); - np->intrs = (struct interrupt_info *) (*mem_start); - (*mem_start) += intlen * sizeof(struct interrupt_info); + + if (mem_start) { + np->intrs = (struct interrupt_info *) (*mem_start); + (*mem_start) += intlen * sizeof(struct interrupt_info); + } else { + np->intrs = kmalloc(intlen * sizeof(*(np->intrs)), GFP_KERNEL); + if (!np->intrs) + return -ENOMEM; + } if (measure_only) return 0; @@ -318,33 +325,44 @@ static int __init finish_node_interrupts return 0; } -static int __init interpret_pci_props(struct device_node *np, - unsigned long *mem_start, - int naddrc, int nsizec, - int measure_only) +static int __devinit interpret_pci_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; struct pci_reg_property *pci_addrs; - int i, l; + int i, l, n_addrs; pci_addrs = (struct pci_reg_property *) get_property(np, "assigned-addresses", &l); - if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { - i = 0; - adr = (struct address_range *) (*mem_start); - while ((l -= sizeof(struct pci_reg_property)) >= 0) { - if (!measure_only) { - adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo | - ((u64)pci_addrs[i].addr.a_mid << 32); - adr[i].size = pci_addrs[i].size_lo; - } - ++i; - } - np->addrs = adr; - np->n_addrs = i; - (*mem_start) += i * sizeof(struct address_range); + if (!pci_addrs) + return 0; + + n_addrs = l / sizeof(*pci_addrs); + + if (!mem_start) { + adr = kmalloc(n_addrs * sizeof(*adr), GFP_KERNEL); + if (!adr) + return -ENOMEM; + } else { + adr = (struct address_range *)(*mem_start); + (*mem_start) += n_addrs * sizeof(struct address_range); + } + + if (measure_only) + return 0; + + np->addrs = adr; + np->n_addrs = n_addrs; + + for (i = 0; i < n_addrs; i++) { + adr[i].space = pci_addrs[i].addr.a_hi; + adr[i].address = pci_addrs[i].addr.a_lo | + ((u64)pci_addrs[i].addr.a_mid << 32); + adr[i].size = pci_addrs[i].size_lo; } + return 0; } @@ -490,11 +508,12 @@ static int __init interpret_root_props(s return 0; } -static int __init finish_node(struct device_node *np, - unsigned long *mem_start, - interpret_func *ifunc, - int naddrc, int nsizec, - int measure_only) +/* If mem_start == NULL ifuncs should use kmalloc for allocations. */ +static int __devinit finish_node(struct device_node *np, + unsigned long *mem_start, + interpret_func *ifunc, + int naddrc, int nsizec, + int measure_only) { struct device_node *child; int *ip, rc = 0; @@ -1627,54 +1646,6 @@ static void remove_node_proc_entries(str #endif /* CONFIG_PROC_DEVICETREE */ /* - * Fix up n_intrs and intrs fields in a new device node - * - */ -static int of_finish_dynamic_node_interrupts(struct device_node *node) -{ - int intrcells, intlen, i; - unsigned *irq, *ints, virq; - struct device_node *ic; - - ints = (unsigned int *)get_property(node, "interrupts", &intlen); - intrcells = prom_n_intr_cells(node); - intlen /= intrcells * sizeof(unsigned int); - node->n_intrs = intlen; - node->intrs = kmalloc(sizeof(struct interrupt_info) * intlen, - GFP_KERNEL); - if (!node->intrs) - return -ENOMEM; - - for (i = 0; i < intlen; ++i) { - int n, j; - node->intrs[i].line = 0; - node->intrs[i].sense = 1; - n = map_interrupt(&irq, &ic, node, ints, intrcells); - if (n <= 0) - continue; - virq = virt_irq_create_mapping(irq[0]); - if (virq == NO_IRQ) { - printk(KERN_CRIT "Could not allocate interrupt " - "number for %s\n", node->full_name); - return -ENOMEM; - } - node->intrs[i].line = irq_offset_up(virq); - if (n > 1) - node->intrs[i].sense = irq[1]; - if (n > 2) { - printk(KERN_DEBUG "hmmm, got %d intr cells for %s:", n, - node->full_name); - for (j = 0; j < n; ++j) - printk(" %d", irq[j]); - printk("\n"); - } - ints += intrcells; - } - return 0; -} - - -/* * Fix up the uninitialized fields in a new device node: * name, type, n_addrs, addrs, n_intrs, intrs, and pci-specific fields * @@ -1685,7 +1656,9 @@ static int of_finish_dynamic_node_interr * This should probably be split up into smaller chunks. */ -static int of_finish_dynamic_node(struct device_node *node) +static int of_finish_dynamic_node(struct device_node *node, + unsigned long *unused1, int unused2, + int unused3, int unused4) { struct device_node *parent = of_get_parent(node); u32 *regs; @@ -1710,41 +1683,6 @@ static int of_finish_dynamic_node(struct if ((ibm_phandle = (unsigned int *)get_property(node, "ibm,phandle", NULL))) node->linux_phandle = *ibm_phandle; - /* do the work of interpret_pci_props */ - if (parent->type && !strcmp(parent->type, "pci")) { - struct address_range *adr; - struct pci_reg_property *pci_addrs; - int i, l; - - pci_addrs = (struct pci_reg_property *) - get_property(node, "assigned-addresses", &l); - if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { - i = 0; - adr = kmalloc(sizeof(struct address_range) * - (l / sizeof(struct pci_reg_property)), - GFP_KERNEL); - if (!adr) { - err = -ENOMEM; - goto out; - } - while ((l -= sizeof(struct pci_reg_property)) >= 0) { - adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo | - ((u64)pci_addrs[i].addr.a_mid << 32); - adr[i].size = pci_addrs[i].size_lo; - ++i; - } - node->addrs = adr; - node->n_addrs = i; - } - } - - /* now do the work of finish_node_interrupts */ - if (get_property(node, "interrupts", NULL)) { - err = of_finish_dynamic_node_interrupts(node); - if (err) goto out; - } - /* now do the rough equivalent of update_dn_pci_info, this * probably is not correct for phb's, but should work for * IOAs and slots. @@ -1796,7 +1734,8 @@ int of_add_node(const char *path, struct return -EINVAL; /* could also be ENOMEM, though */ } - if (0 != (err = of_finish_dynamic_node(np))) { + err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); + if (err < 0) { kfree(np); return err; } From ntl at pobox.com Thu Mar 10 11:51:47 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:47 -0600 (CST) Subject: [PATCH 3/8] introduce pSeries_reconfig.[ch] In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005147.31309.61029.66648@otto> Move as much pSeries-specific DLPAR/hotplug code as possible into its own file, which is built only when pSeries support is enabled in the config. This new file is intended to contain support code for the "Dynamic Reconfiguration" option in the RISC Platform Architecture, which encompasses both PCI hotplug and dynamic logical partitioning (DLPAR). This patch mostly just moves code around, but the device node addition and removal API is slightly modified. In this way, of_add_node and of_remove_node are now responsible only for safely updating the device tree and global list, without all the other stuff like proc entries etc. This also adds the definitions and api for a notifier chain which is meant to be used by code that must act upon device node addition or removal. Patches to migrate code to the notifier api follow in this series. Signed-off-by: Nathan Lynch arch/ppc64/kernel/Makefile | 2 arch/ppc64/kernel/pSeries_reconfig.c | 439 +++++++++++++++++++++++++++++++++++ arch/ppc64/kernel/proc_ppc64.c | 249 ------------------- arch/ppc64/kernel/prom.c | 156 +----------- include/asm-ppc64/pSeries_reconfig.h | 25 + include/asm-ppc64/prom.h | 4 6 files changed, 480 insertions(+), 395 deletions(-) Index: linux-2.6.11-bk5/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/Makefile 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/Makefile 2005-03-09 20:16:31.000000000 +0000 @@ -31,7 +31,7 @@ obj-$(CONFIG_PPC_ISERIES) += iSeries_irq obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - pSeries_nvram.o rtasd.o ras.o \ + pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6.11-bk5/arch/ppc64/kernel/proc_ppc64.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/proc_ppc64.c 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/proc_ppc64.c 2005-03-09 20:16:31.000000000 +0000 @@ -41,20 +41,6 @@ static struct file_operations page_map_f .mmap = page_map_mmap }; -#ifdef CONFIG_PPC_PSERIES -/* routines for /proc/ppc64/ofdt */ -static ssize_t ofdt_write(struct file *, const char __user *, size_t, loff_t *); -static void proc_ppc64_create_ofdt(void); -static int do_remove_node(char *); -static int do_add_node(char *, size_t); -static void release_prop_list(const struct property *); -static struct property *new_property(const char *, const int, const unsigned char *, struct property *); -static char * parse_next_property(char *, char *, char **, int *, unsigned char**); -static struct file_operations ofdt_fops = { - .write = ofdt_write -}; -#endif - /* * Create the ppc64 and ppc64/rtas directories early. This allows us to * assume that they have been previously created in drivers. @@ -92,11 +78,6 @@ static int __init proc_ppc64_init(void) pde->size = PAGE_SIZE; pde->proc_fops = &page_map_fops; -#ifdef CONFIG_PPC_PSERIES - if ((systemcfg->platform & PLATFORM_PSERIES)) - proc_ppc64_create_ofdt(); -#endif - return 0; } __initcall(proc_ppc64_init); @@ -145,233 +126,3 @@ static int page_map_mmap( struct file *f return 0; } -#ifdef CONFIG_PPC_PSERIES -/* create /proc/ppc64/ofdt write-only by root */ -static void proc_ppc64_create_ofdt(void) -{ - struct proc_dir_entry *ent; - - ent = create_proc_entry("ppc64/ofdt", S_IWUSR, NULL); - if (ent) { - ent->nlink = 1; - ent->data = NULL; - ent->size = 0; - ent->proc_fops = &ofdt_fops; - } -} - -/** - * ofdt_write - perform operations on the Open Firmware device tree - * - * @file: not used - * @buf: command and arguments - * @count: size of the command buffer - * @off: not used - * - * Operations supported at this time are addition and removal of - * whole nodes along with their properties. Operations on individual - * properties are not implemented (yet). - */ -static ssize_t ofdt_write(struct file *file, const char __user *buf, size_t count, - loff_t *off) -{ - int rv = 0; - char *kbuf; - char *tmp; - - if (!(kbuf = kmalloc(count + 1, GFP_KERNEL))) { - rv = -ENOMEM; - goto out; - } - if (copy_from_user(kbuf, buf, count)) { - rv = -EFAULT; - goto out; - } - - kbuf[count] = '\0'; - - tmp = strchr(kbuf, ' '); - if (!tmp) { - rv = -EINVAL; - goto out; - } - *tmp = '\0'; - tmp++; - - if (!strcmp(kbuf, "add_node")) - rv = do_add_node(tmp, count - (tmp - kbuf)); - else if (!strcmp(kbuf, "remove_node")) - rv = do_remove_node(tmp); - else - rv = -EINVAL; -out: - kfree(kbuf); - return rv ? rv : count; -} - -static int do_remove_node(char *buf) -{ - struct device_node *node; - int rv = -ENODEV; - - if ((node = of_find_node_by_path(buf))) - rv = of_remove_node(node); - - of_node_put(node); - return rv; -} - -static int do_add_node(char *buf, size_t bufsize) -{ - char *path, *end, *name; - struct device_node *np; - struct property *prop = NULL; - unsigned char* value; - int length, rv = 0; - - end = buf + bufsize; - path = buf; - buf = strchr(buf, ' '); - if (!buf) - return -EINVAL; - *buf = '\0'; - buf++; - - if ((np = of_find_node_by_path(path))) { - of_node_put(np); - return -EINVAL; - } - - /* rv = build_prop_list(tmp, bufsize - (tmp - buf), &proplist); */ - while (buf < end && - (buf = parse_next_property(buf, end, &name, &length, &value))) { - struct property *last = prop; - - prop = new_property(name, length, value, last); - if (!prop) { - rv = -ENOMEM; - prop = last; - goto out; - } - } - if (!buf) { - rv = -EINVAL; - goto out; - } - - rv = of_add_node(path, prop); - -out: - if (rv) - release_prop_list(prop); - return rv; -} - -static struct property *new_property(const char *name, const int length, - const unsigned char *value, struct property *last) -{ - struct property *new = kmalloc(sizeof(*new), GFP_KERNEL); - - if (!new) - return NULL; - memset(new, 0, sizeof(*new)); - - if (!(new->name = kmalloc(strlen(name) + 1, GFP_KERNEL))) - goto cleanup; - if (!(new->value = kmalloc(length + 1, GFP_KERNEL))) - goto cleanup; - - strcpy(new->name, name); - memcpy(new->value, value, length); - *(((char *)new->value) + length) = 0; - new->length = length; - new->next = last; - return new; - -cleanup: - if (new->name) - kfree(new->name); - if (new->value) - kfree(new->value); - kfree(new); - return NULL; -} - -/** - * parse_next_property - process the next property from raw input buffer - * @buf: input buffer, must be nul-terminated - * @end: end of the input buffer + 1, for validation - * @name: return value; set to property name in buf - * @length: return value; set to length of value - * @value: return value; set to the property value in buf - * - * Note that the caller must make copies of the name and value returned, - * this function does no allocation or copying of the data. Return value - * is set to the next name in buf, or NULL on error. - */ -static char * parse_next_property(char *buf, char *end, char **name, int *length, - unsigned char **value) -{ - char *tmp; - - *name = buf; - - tmp = strchr(buf, ' '); - if (!tmp) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - *tmp = '\0'; - - if (++tmp >= end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - - /* now we're on the length */ - *length = -1; - *length = simple_strtoul(tmp, &tmp, 10); - if (*length == -1) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - if (*tmp != ' ' || ++tmp >= end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - - /* now we're on the value */ - *value = tmp; - tmp += *length; - if (tmp > end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - else if (tmp < end && *tmp != ' ' && *tmp != '\0') { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - tmp++; - - /* and now we should be on the next name, or the end */ - return tmp; -} - -static void release_prop_list(const struct property *prop) -{ - struct property *next; - for (; prop; prop = next) { - next = prop->next; - kfree(prop->name); - kfree(prop->value); - kfree(prop); - } - -} -#endif /* defined(CONFIG_PPC_PSERIES) */ Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_reconfig.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-09 20:16:31.000000000 +0000 @@ -0,0 +1,439 @@ +/* + * pSeries_reconfig.c - support for dynamic reconfiguration (including PCI + * Hotplug and Dynamic Logical Partitioning on RPA platforms). + * + * Copyright (C) 2005 Nathan Lynch + * Copyright (C) 2005 IBM Corporation + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include +#include +#include + + + +/* + * Routines for "runtime" addition and removal of device tree nodes. + */ +#ifdef CONFIG_PROC_DEVICETREE +/* + * Add a node to /proc/device-tree. + */ +static void add_node_proc_entries(struct device_node *np) +{ + struct proc_dir_entry *ent; + + ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde); + if (ent) + proc_device_tree_add_node(np, ent); +} + +static void remove_node_proc_entries(struct device_node *np) +{ + struct property *pp = np->properties; + struct device_node *parent = np->parent; + + while (pp) { + remove_proc_entry(pp->name, np->pde); + pp = pp->next; + } + + /* Assuming that symlinks have the same parent directory as + * np->pde. + */ + if (np->name_link) + remove_proc_entry(np->name_link->name, parent->pde); + if (np->addr_link) + remove_proc_entry(np->addr_link->name, parent->pde); + if (np->pde) + remove_proc_entry(np->pde->name, parent->pde); +} +#else /* !CONFIG_PROC_DEVICETREE */ +static void add_node_proc_entries(struct device_node *np) +{ + return; +} + +static void remove_node_proc_entries(struct device_node *np) +{ + return; +} +#endif /* CONFIG_PROC_DEVICETREE */ + +/** + * derive_parent - basically like dirname(1) + * @path: the full_name of a node to be added to the tree + * + * Returns the node which should be the parent of the node + * described by path. E.g., for path = "/foo/bar", returns + * the node with full_name = "/foo". + */ +static struct device_node *derive_parent(const char *path) +{ + struct device_node *parent = NULL; + char *parent_path = "/"; + size_t parent_path_len = strrchr(path, '/') - path + 1; + + /* reject if path is "/" */ + if (!strcmp(path, "/")) + return NULL; + + if (strrchr(path, '/') != path) { + parent_path = kmalloc(parent_path_len, GFP_KERNEL); + if (!parent_path) + return NULL; + strlcpy(parent_path, path, parent_path_len); + } + parent = of_find_node_by_path(parent_path); + if (strcmp(parent_path, "/")) + kfree(parent_path); + return parent; +} + +static struct notifier_block *pSeries_reconfig_chain; + +int pSeries_reconfig_notifier_register(struct notifier_block *nb) +{ + return notifier_chain_register(&pSeries_reconfig_chain, nb); +} + +void pSeries_reconfig_notifier_unregister(struct notifier_block *nb) +{ + notifier_chain_unregister(&pSeries_reconfig_chain, nb); +} + +static int pSeries_reconfig_add_node(const char *path, struct property *proplist) +{ + struct device_node *np; + int err = -ENOMEM; + + np = kcalloc(1, sizeof(*np), GFP_KERNEL); + if (!np) + goto out_err; + + np->full_name = kmalloc(strlen(path) + 1, GFP_KERNEL); + if (!np->full_name) + goto out_err; + + strcpy(np->full_name, path); + + np->properties = proplist; + OF_MARK_DYNAMIC(np); + kref_init(&np->kref); + of_node_get(np); + np->parent = derive_parent(path); + if (!np->parent) + goto out_err; + + err = notifier_call_chain(&pSeries_reconfig_chain, + PSERIES_RECONFIG_ADD, np); + if (err == NOTIFY_BAD) { + printk(KERN_ERR "Failed to add device node %s\n", path); + goto out_err; + } + + of_add_node(np); + + add_node_proc_entries(np); + + of_node_put(np->parent); + of_node_put(np); + + return 0; + +out_err: + kfree(np->full_name); + kfree(np); + return err; +} + +/* + * Prepare an OF node for removal from system + * XXX move this to pSeries_iommu.c + */ +static void of_cleanup_node(struct device_node *np) +{ + if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); +} + +static int pSeries_reconfig_remove_node(struct device_node *np) +{ + struct device_node *parent, *child; + + parent = of_get_parent(np); + if (!parent) + return -EINVAL; + + if ((child = of_get_next_child(np, NULL))) { + of_node_put(child); + return -EBUSY; + } + + of_cleanup_node(np); + + remove_node_proc_entries(np); + + notifier_call_chain(&pSeries_reconfig_chain, + PSERIES_RECONFIG_REMOVE, np); + of_remove_node(np); + + of_node_put(parent); + of_node_put(np); /* Must decrement the refcount */ + return 0; +} + +/* + * /proc/ppc64/ofdt - yucky binary interface for adding and removing + * OF device nodes. Should be deprecated as soon as we get an + * in-kernel wrapper for the RTAS ibm,configure-connector call. + */ + +static void release_prop_list(const struct property *prop) +{ + struct property *next; + for (; prop; prop = next) { + next = prop->next; + kfree(prop->name); + kfree(prop->value); + kfree(prop); + } + +} + +/** + * parse_next_property - process the next property from raw input buffer + * @buf: input buffer, must be nul-terminated + * @end: end of the input buffer + 1, for validation + * @name: return value; set to property name in buf + * @length: return value; set to length of value + * @value: return value; set to the property value in buf + * + * Note that the caller must make copies of the name and value returned, + * this function does no allocation or copying of the data. Return value + * is set to the next name in buf, or NULL on error. + */ +static char * parse_next_property(char *buf, char *end, char **name, int *length, + unsigned char **value) +{ + char *tmp; + + *name = buf; + + tmp = strchr(buf, ' '); + if (!tmp) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + *tmp = '\0'; + + if (++tmp >= end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + + /* now we're on the length */ + *length = -1; + *length = simple_strtoul(tmp, &tmp, 10); + if (*length == -1) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + if (*tmp != ' ' || ++tmp >= end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + + /* now we're on the value */ + *value = tmp; + tmp += *length; + if (tmp > end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + else if (tmp < end && *tmp != ' ' && *tmp != '\0') { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + tmp++; + + /* and now we should be on the next name, or the end */ + return tmp; +} + +static struct property *new_property(const char *name, const int length, + const unsigned char *value, struct property *last) +{ + struct property *new = kmalloc(sizeof(*new), GFP_KERNEL); + + if (!new) + return NULL; + memset(new, 0, sizeof(*new)); + + if (!(new->name = kmalloc(strlen(name) + 1, GFP_KERNEL))) + goto cleanup; + if (!(new->value = kmalloc(length + 1, GFP_KERNEL))) + goto cleanup; + + strcpy(new->name, name); + memcpy(new->value, value, length); + *(((char *)new->value) + length) = 0; + new->length = length; + new->next = last; + return new; + +cleanup: + if (new->name) + kfree(new->name); + if (new->value) + kfree(new->value); + kfree(new); + return NULL; +} + +static int do_add_node(char *buf, size_t bufsize) +{ + char *path, *end, *name; + struct device_node *np; + struct property *prop = NULL; + unsigned char* value; + int length, rv = 0; + + end = buf + bufsize; + path = buf; + buf = strchr(buf, ' '); + if (!buf) + return -EINVAL; + *buf = '\0'; + buf++; + + if ((np = of_find_node_by_path(path))) { + of_node_put(np); + return -EINVAL; + } + + /* rv = build_prop_list(tmp, bufsize - (tmp - buf), &proplist); */ + while (buf < end && + (buf = parse_next_property(buf, end, &name, &length, &value))) { + struct property *last = prop; + + prop = new_property(name, length, value, last); + if (!prop) { + rv = -ENOMEM; + prop = last; + goto out; + } + } + if (!buf) { + rv = -EINVAL; + goto out; + } + + rv = pSeries_reconfig_add_node(path, prop); + +out: + if (rv) + release_prop_list(prop); + return rv; +} + +static int do_remove_node(char *buf) +{ + struct device_node *node; + int rv = -ENODEV; + + if ((node = of_find_node_by_path(buf))) + rv = pSeries_reconfig_remove_node(node); + + of_node_put(node); + return rv; +} + +/** + * ofdt_write - perform operations on the Open Firmware device tree + * + * @file: not used + * @buf: command and arguments + * @count: size of the command buffer + * @off: not used + * + * Operations supported at this time are addition and removal of + * whole nodes along with their properties. Operations on individual + * properties are not implemented (yet). + */ +static ssize_t ofdt_write(struct file *file, const char __user *buf, size_t count, + loff_t *off) +{ + int rv = 0; + char *kbuf; + char *tmp; + + if (!(kbuf = kmalloc(count + 1, GFP_KERNEL))) { + rv = -ENOMEM; + goto out; + } + if (copy_from_user(kbuf, buf, count)) { + rv = -EFAULT; + goto out; + } + + kbuf[count] = '\0'; + + tmp = strchr(kbuf, ' '); + if (!tmp) { + rv = -EINVAL; + goto out; + } + *tmp = '\0'; + tmp++; + + if (!strcmp(kbuf, "add_node")) + rv = do_add_node(tmp, count - (tmp - kbuf)); + else if (!strcmp(kbuf, "remove_node")) + rv = do_remove_node(tmp); + else + rv = -EINVAL; +out: + kfree(kbuf); + return rv ? rv : count; +} + +static struct file_operations ofdt_fops = { + .write = ofdt_write +}; + +/* create /proc/ppc64/ofdt write-only by root */ +static int proc_ppc64_create_ofdt(void) +{ + struct proc_dir_entry *ent; + + if (!(systemcfg->platform & PLATFORM_PSERIES)) + return 0; + + ent = create_proc_entry("ppc64/ofdt", S_IWUSR, NULL); + if (ent) { + ent->nlink = 1; + ent->data = NULL; + ent->size = 0; + ent->proc_fops = &ofdt_fops; + } + + return 0; +} +__initcall(proc_ppc64_create_ofdt); Index: linux-2.6.11-bk5/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/prom.c 2005-03-09 20:08:28.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/prom.c 2005-03-09 20:16:31.000000000 +0000 @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include @@ -1567,84 +1566,6 @@ void of_node_put(struct device_node *nod } EXPORT_SYMBOL(of_node_put); -/** - * derive_parent - basically like dirname(1) - * @path: the full_name of a node to be added to the tree - * - * Returns the node which should be the parent of the node - * described by path. E.g., for path = "/foo/bar", returns - * the node with full_name = "/foo". - */ -static struct device_node *derive_parent(const char *path) -{ - struct device_node *parent = NULL; - char *parent_path = "/"; - size_t parent_path_len = strrchr(path, '/') - path + 1; - - /* reject if path is "/" */ - if (!strcmp(path, "/")) - return NULL; - - if (strrchr(path, '/') != path) { - parent_path = kmalloc(parent_path_len, GFP_KERNEL); - if (!parent_path) - return NULL; - strlcpy(parent_path, path, parent_path_len); - } - parent = of_find_node_by_path(parent_path); - if (strcmp(parent_path, "/")) - kfree(parent_path); - return parent; -} - -/* - * Routines for "runtime" addition and removal of device tree nodes. - */ -#ifdef CONFIG_PROC_DEVICETREE -/* - * Add a node to /proc/device-tree. - */ -static void add_node_proc_entries(struct device_node *np) -{ - struct proc_dir_entry *ent; - - ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde); - if (ent) - proc_device_tree_add_node(np, ent); -} - -static void remove_node_proc_entries(struct device_node *np) -{ - struct property *pp = np->properties; - struct device_node *parent = np->parent; - - while (pp) { - remove_proc_entry(pp->name, np->pde); - pp = pp->next; - } - - /* Assuming that symlinks have the same parent directory as - * np->pde. - */ - if (np->name_link) - remove_proc_entry(np->name_link->name, parent->pde); - if (np->addr_link) - remove_proc_entry(np->addr_link->name, parent->pde); - if (np->pde) - remove_proc_entry(np->pde->name, parent->pde); -} -#else /* !CONFIG_PROC_DEVICETREE */ -static void add_node_proc_entries(struct device_node *np) -{ - return; -} - -static void remove_node_proc_entries(struct device_node *np) -{ - return; -} -#endif /* CONFIG_PROC_DEVICETREE */ - /* * Fix up the uninitialized fields in a new device node: * name, type, n_addrs, addrs, n_intrs, intrs, and pci-specific fields @@ -1702,43 +1623,18 @@ out: } /* - * Given a path and a property list, construct an OF device node, add - * it to the device tree and global list, and place it in - * /proc/device-tree. This function may sleep. + * Plug a device node into the tree and global list. */ -int of_add_node(const char *path, struct property *proplist) +void of_add_node(struct device_node *np) { - struct device_node *np; - int err = 0; - - np = kmalloc(sizeof(struct device_node), GFP_KERNEL); - if (!np) - return -ENOMEM; - - memset(np, 0, sizeof(*np)); - - np->full_name = kmalloc(strlen(path) + 1, GFP_KERNEL); - if (!np->full_name) { - kfree(np); - return -ENOMEM; - } - strcpy(np->full_name, path); - - np->properties = proplist; - OF_MARK_DYNAMIC(np); - kref_init(&np->kref); - of_node_get(np); - np->parent = derive_parent(path); - if (!np->parent) { - kfree(np); - return -EINVAL; /* could also be ENOMEM, though */ - } + int err; + /* This use of finish_node will be moved to a notifier so + * the error code can be used. + */ err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); - if (err < 0) { - kfree(np); - return err; - } + if (err < 0) + return; write_lock(&devtree_lock); np->sibling = np->parent->child; @@ -1746,21 +1642,6 @@ int of_add_node(const char *path, struct np->parent->child = np; allnodes = np; write_unlock(&devtree_lock); - - add_node_proc_entries(np); - - of_node_put(np->parent); - of_node_put(np); - return 0; -} - -/* - * Prepare an OF node for removal from system - */ -static void of_cleanup_node(struct device_node *np) -{ - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); } /* @@ -1768,23 +1649,14 @@ static void of_cleanup_node(struct devic * a reference to the node. The memory associated with the node * is not freed until its refcount goes to zero. */ -int of_remove_node(struct device_node *np) +void of_remove_node(const struct device_node *np) { - struct device_node *parent, *child; - - parent = of_get_parent(np); - if (!parent) - return -EINVAL; + struct device_node *parent; - if ((child = of_get_next_child(np, NULL))) { - of_node_put(child); - return -EBUSY; - } + write_lock(&devtree_lock); - of_cleanup_node(np); + parent = np->parent; - write_lock(&devtree_lock); - remove_node_proc_entries(np); if (allnodes == np) allnodes = np->allnext; else { @@ -1806,10 +1678,8 @@ int of_remove_node(struct device_node *n ; prevsib->sibling = np->sibling; } + write_unlock(&devtree_lock); - of_node_put(parent); - of_node_put(np); /* Must decrement the refcount */ - return 0; } /* Index: linux-2.6.11-bk5/include/asm-ppc64/prom.h =================================================================== --- linux-2.6.11-bk5.orig/include/asm-ppc64/prom.h 2005-03-09 20:01:34.000000000 +0000 +++ linux-2.6.11-bk5/include/asm-ppc64/prom.h 2005-03-09 20:16:31.000000000 +0000 @@ -209,8 +209,8 @@ extern struct device_node *of_node_get(s extern void of_node_put(struct device_node *node); /* For updating the device tree at runtime */ -extern int of_add_node(const char *path, struct property *proplist); -extern int of_remove_node(struct device_node *np); +extern void of_add_node(struct device_node *); +extern void of_remove_node(const struct device_node *); /* Other Prototypes */ extern unsigned long prom_init(unsigned long, unsigned long, unsigned long, Index: linux-2.6.11-bk5/include/asm-ppc64/pSeries_reconfig.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.11-bk5/include/asm-ppc64/pSeries_reconfig.h 2005-03-09 20:16:31.000000000 +0000 @@ -0,0 +1,25 @@ +#ifndef _PPC64_PSERIES_RECONFIG_H +#define _PPC64_PSERIES_RECONFIG_H + +#include + +/* + * Use this API if your code needs to know about OF device nodes being + * added or removed on pSeries systems. + */ + +#define PSERIES_RECONFIG_ADD 0x0001 +#define PSERIES_RECONFIG_REMOVE 0x0002 + +#ifdef CONFIG_PPC_PSERIES +extern int pSeries_reconfig_notifier_register(struct notifier_block *); +extern void pSeries_reconfig_notifier_unregister(struct notifier_block *); +#else /* !CONFIG_PPC_PSERIES */ +static inline int pSeries_reconfig_notifier_register(struct notifier_block *nb) +{ + return 0; +} +static inline void pSeries_reconfig_notifier_unregister(struct notifier_block *nb) { } +#endif /* CONFIG_PPC_PSERIES */ + +#endif /* _PPC64_PSERIES_RECONFIG_H */ From ntl at pobox.com Thu Mar 10 11:51:52 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:52 -0600 (CST) Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005152.31309.41959.21947@otto> Use the pSeries_reconfig notifier list to fix up a device node which is about to be added. Signed-off-by: Nathan Lynch prom.c | 40 +++++++++++++++++++++++++++++++--------- 1 files changed, 31 insertions(+), 9 deletions(-) Index: linux-2.6.11-bk4/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk4.orig/arch/ppc64/kernel/prom.c 2005-03-09 04:22:07.000000000 +0000 +++ linux-2.6.11-bk4/arch/ppc64/kernel/prom.c 2005-03-09 06:12:30.000000000 +0000 @@ -52,6 +52,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -1627,15 +1628,6 @@ out: */ void of_add_node(struct device_node *np) { - int err; - - /* This use of finish_node will be moved to a notifier so - * the error code can be used. - */ - err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); - if (err < 0) - return; - write_lock(&devtree_lock); np->sibling = np->parent->child; np->allnext = allnodes; @@ -1682,6 +1674,36 @@ void of_remove_node(const struct device_ write_unlock(&devtree_lock); } +static int prom_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err; + + switch (action) { + case PSERIES_RECONFIG_ADD: + err = finish_node(node, NULL, of_finish_dynamic_node, 0, 0, 0); + if (err < 0) { + printk(KERN_ERR "finish_node returned %d\n", err); + err = NOTIFY_BAD; + } + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block prom_reconfig_nb = { + .notifier_call prom_reconfig_notifier, + .priority = 10, /* This one needs to run first */ +}; + +static int __init prom_reconfig_setup(void) +{ + return pSeries_reconfig_notifier_register(&prom_reconfig_nb); +} +__initcall(prom_reconfig_setup); + /* * Find a property with a given name for a given node * and return the value. From ntl at pobox.com Thu Mar 10 11:51:57 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:51:57 -0600 (CST) Subject: [PATCH 5/8] pci_dn.c: use pSeries reconfig notifier In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005157.31309.82819.78506@otto> Use the pSeries_reconfig notifier list to handle newly added pci device nodes. Remove duplicated version of update_dn_pci_info from prom.c. Signed-off-by: Nathan Lynch pci_dn.c | 22 ++++++++++++++++++++++ prom.c | 14 -------------- 2 files changed, 22 insertions(+), 14 deletions(-) Index: linux-2.6.11-bk5/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pci_dn.c 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pci_dn.c 2005-03-09 20:16:54.000000000 +0000 @@ -27,6 +27,7 @@ #include #include #include +#include #include "pci.h" @@ -161,6 +162,25 @@ struct device_node *fetch_dev_dn(struct } EXPORT_SYMBOL(fetch_dev_dn); +static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + struct device_node *np = node; + int err = NOTIFY_OK; + + switch (action) { + case PSERIES_RECONFIG_ADD: + update_dn_pci_info(np, np->parent->phb); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block pci_dn_reconfig_nb = { + .notifier_call = pci_dn_reconfig_notifier, +}; /* * Actually initialize the phbs. @@ -173,4 +193,6 @@ void __init pci_devs_phb_init(void) /* This must be done first so the device nodes have valid pci info! */ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) pci_devs_phb_init_dynamic(phb); + + pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb); } Index: linux-2.6.11-bk5/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/prom.c 2005-03-09 20:16:43.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/prom.c 2005-03-09 20:16:54.000000000 +0000 @@ -1583,7 +1583,6 @@ static int of_finish_dynamic_node(struct int unused3, int unused4) { struct device_node *parent = of_get_parent(node); - u32 *regs; int err = 0; phandle *ibm_phandle; @@ -1605,19 +1604,6 @@ static int of_finish_dynamic_node(struct if ((ibm_phandle = (unsigned int *)get_property(node, "ibm,phandle", NULL))) node->linux_phandle = *ibm_phandle; - /* now do the rough equivalent of update_dn_pci_info, this - * probably is not correct for phb's, but should work for - * IOAs and slots. - */ - - node->phb = parent->phb; - - regs = (u32 *)get_property(node, "reg", NULL); - if (regs) { - node->busno = (regs[0] >> 16) & 0xff; - node->devfn = (regs[0] >> 8) & 0xff; - } - out: of_node_put(parent); return err; From ntl at pobox.com Thu Mar 10 11:52:02 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:52:02 -0600 (CST) Subject: [PATCH 6/8] pSeries_iommu.c: use pSeries reconfig notifier In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005202.31309.17710.70791@otto> Use the pSeries_reconfig notifier chain for tearing down the iommu table when a device node is removed. Signed-off-by: Nathan Lynch pSeries_iommu.c | 25 +++++++++++++++++++++++++ pSeries_reconfig.c | 12 ------------ 2 files changed, 25 insertions(+), 12 deletions(-) Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_iommu.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pSeries_iommu.c 2005-03-09 20:01:04.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_iommu.c 2005-03-09 20:17:09.000000000 +0000 @@ -43,6 +43,7 @@ #include #include #include +#include #include #include "pci.h" @@ -455,6 +456,28 @@ static void iommu_dev_setup_pSeries(stru } } +static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err = NOTIFY_OK; + struct device_node *np = node; + + switch (action) { + case PSERIES_RECONFIG_REMOVE: + if (np->iommu_table && + get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block iommu_reconfig_nb = { + .notifier_call = iommu_reconfig_notifier, +}; + static void iommu_bus_setup_null(struct pci_bus *b) { } static void iommu_dev_setup_null(struct pci_dev *d) { } @@ -487,6 +510,8 @@ void iommu_init_early_pSeries(void) ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; + pSeries_reconfig_notifier_register(&iommu_reconfig_nb); + pci_iommu_init(); } Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_reconfig.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-09 20:16:31.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-09 20:17:09.000000000 +0000 @@ -157,16 +157,6 @@ out_err: return err; } -/* - * Prepare an OF node for removal from system - * XXX move this to pSeries_iommu.c - */ -static void of_cleanup_node(struct device_node *np) -{ - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); -} - static int pSeries_reconfig_remove_node(struct device_node *np) { struct device_node *parent, *child; @@ -180,8 +170,6 @@ static int pSeries_reconfig_remove_node( return -EBUSY; } - of_cleanup_node(np); - remove_node_proc_entries(np); notifier_call_chain(&pSeries_reconfig_chain, From ntl at pobox.com Thu Mar 10 11:52:07 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:52:07 -0600 (CST) Subject: [PATCH 7/8] pSeries_smp.c: use pSeries reconfig notifier for cpu DLPAR In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005207.31309.32546.73375@otto> Use the pSeries_reconfig notifier API to handle processor addition and removal on pSeries LPAR. This is the "right" way to do it, as opposed to setting cpu_present_map = cpu_possible_map at boot (this is fixed in the next patch). Signed-off-by: Nathan Lynch pSeries_smp.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 126 insertions(+) Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:31:06.000000000 +0000 @@ -44,6 +44,7 @@ #include #include #include +#include #include "mpic.h" @@ -213,6 +214,127 @@ static inline int __devinit smp_startup_ } return 1; } + +/* + * Update cpu_present_map and paca(s) for a new cpu node. The wrinkle + * here is that a cpu device node may represent up to two logical cpus + * in the SMT case. We must honor the assumption in other code that + * the logical ids for sibling SMT threads x and y are adjacent, such + * that x^1 == y and y^1 == x. + */ +static int pSeries_add_processor(struct device_node *np) +{ + unsigned int cpu; + cpumask_t candidate_map, tmp = CPU_MASK_NONE; + int err = -ENOSPC, len, nthreads, i; + u32 *intserv; + + intserv = (u32 *)get_property(np, "ibm,ppc-interrupt-server#s", &len); + if (!intserv) + return 0; + + nthreads = len / sizeof(u32); + for (i = 0; i < nthreads; i++) + cpu_set(i, tmp); + + lock_cpu_hotplug(); + + BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map)); + + /* Get a bitmap of unoccupied slots. */ + cpus_xor(candidate_map, cpu_possible_map, cpu_present_map); + if (cpus_empty(candidate_map)) { + /* If we get here, it most likely means that NR_CPUS is + * less than the partition's max processors setting. + */ + printk(KERN_ERR "Cannot add cpu %s; this system configuration" + " supports %d logical cpus.\n", np->full_name, + cpus_weight(cpu_possible_map)); + goto out_unlock; + } + + while (!cpus_empty(tmp)) + if (cpus_subset(tmp, candidate_map)) + /* Found a range where we can insert the new cpu(s) */ + break; + else + cpus_shift_left(tmp, tmp, nthreads); + + if (cpus_empty(tmp)) { + printk(KERN_ERR "Unable to find space in cpu_present_map for" + " processor %s with %d thread(s)\n", np->name, + nthreads); + goto out_unlock; + } + + for_each_cpu_mask(cpu, tmp) { + BUG_ON(cpu_isset(cpu, cpu_present_map)); + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, *intserv++); + } + err = 0; +out_unlock: + unlock_cpu_hotplug(); + return err; +} + +/* + * Update the present map for a cpu node which is going away, and set + * the hard id in the paca(s) to -1 to be consistent with boot time + * convention for non-present cpus. + */ +static void pSeries_remove_processor(struct device_node *np) +{ + unsigned int cpu; + int len, nthreads, i; + u32 *intserv; + + intserv = (u32 *)get_property(np, "ibm,ppc-interrupt-server#s", &len); + if (!intserv) + return; + + nthreads = len / sizeof(u32); + + lock_cpu_hotplug(); + for (i = 0; i < nthreads; i++) { + for_each_present_cpu(cpu) { + if (get_hard_smp_processor_id(cpu) != intserv[i]) + continue; + BUG_ON(cpu_online(cpu)); + cpu_clear(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, -1); + break; + } + if (cpu == NR_CPUS) + printk(KERN_WARNING "Could not find cpu to remove " + "with physical id 0x%x\n", intserv[i]); + } + unlock_cpu_hotplug(); +} + +static int pSeries_smp_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err = NOTIFY_OK; + + switch (action) { + case PSERIES_RECONFIG_ADD: + if (pSeries_add_processor(node)) + err = NOTIFY_BAD; + break; + case PSERIES_RECONFIG_REMOVE: + pSeries_remove_processor(node); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block pSeries_smp_nb = { + .notifier_call = pSeries_smp_notifier, +}; + #else /* ... CONFIG_HOTPLUG_CPU */ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { @@ -336,6 +458,10 @@ void __init smp_init_pSeries(void) #ifdef CONFIG_HOTPLUG_CPU smp_ops->cpu_disable = pSeries_cpu_disable; smp_ops->cpu_die = pSeries_cpu_die; + + /* Processors can be added/removed only on LPAR */ + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + pSeries_reconfig_notifier_register(&pSeries_smp_nb); #endif /* Start secondary threads on SMT systems; primary threads From ntl at pobox.com Thu Mar 10 11:52:13 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 18:52:13 -0600 (CST) Subject: [PATCH 8/8] make cpu hotplug play well with maxcpus and smt-enabled In-Reply-To: <20050310005132.31309.65485.31668@otto> References: <20050310005132.31309.65485.31668@otto> Message-ID: <20050310005212.31309.16616.43059@otto> This patch allows you to boot a pSeries system with maxcpus=x or smt-enabled=off (or both) and bring up the offline cpus later from userspace, assuming the kernel was built with CONFIG_HOTPLUG_CPU=y. - Record cpus which were started from OF in a cpu map and use that instead of system_state to decide how to start a cpu in smp_startup_cpu. - Change the smp bootup logic slightly so that the path for bringing up secondary threads is exactly the same as hotplugging a cpu later from userspace. - Add a new function to smp_ops - cpu_bootable. This is implemented only by pSeries to filter out secondary threads during boot with smt-enabled=off. Another way this could be done is to change the kick_cpu member to return int and we can check for this case in smp_pSeries_kick_cpu. - Remove the games we play with cpu_present_map and the hard_smp_processor_id to handle smt-enabled=off, since they're now unnecessary. - Remove find_physical_cpu_to_start; assigning threads to logical slots should be done at bootup and at DLPAR time, not during a cpu online operation. One caveat: you need up-to-date firmware on Power5 for the maxcpus option to work on systems with more than one processor. Otherwise interrupts get misrouted, typically resulting in hangs or "unable to find root filesystem" problems. Tested on Power5 with and without CONFIG_HOTPLUG_CPU and with various combinations of the maxcpus= and smt-enabled= parameters. arch/ppc64/kernel/pSeries_smp.c | 183 +++++++++++++++------------------------- arch/ppc64/kernel/setup.c | 12 -- arch/ppc64/kernel/smp.c | 13 -- include/asm-ppc64/machdep.h | 1 4 files changed, 78 insertions(+), 131 deletions(-) Signed-off-by: Nathan Lynch Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:31:06.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:32:55.000000000 +0000 @@ -54,8 +54,16 @@ #define DBG(fmt...) #endif +/* + * The primary thread of each non-boot processor is recorded here before + * smp init. + */ +static cpumask_t of_spin_map; + extern void pSeries_secondary_smp_init(unsigned long); +#ifdef CONFIG_HOTPLUG_CPU + /* Get state of physical CPU. * Return codes: * 0 - The processor is in the RTAS stopped state @@ -82,9 +90,6 @@ static int query_cpu_stopped(unsigned in return cpu_status; } - -#ifdef CONFIG_HOTPLUG_CPU - int pSeries_cpu_disable(void) { systemcfg->processorCount--; @@ -123,98 +128,6 @@ void pSeries_cpu_die(unsigned int cpu) paca[cpu].cpu_start = 0; } -/* Search all cpu device nodes for an offline logical cpu. If a - * device node has a "ibm,my-drc-index" property (meaning this is an - * LPAR), paranoid-check whether we own the cpu. For each "thread" - * of a cpu, if it is offline and has the same hw index as before, - * grab that in preference. - */ -static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) -{ - struct device_node *np = NULL; - unsigned int best = -1U; - - while ((np = of_find_node_by_type(np, "cpu"))) { - int nr_threads, len; - u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); - u32 *tid = (u32 *) - get_property(np, "ibm,ppc-interrupt-server#s", &len); - - if (!tid) - tid = (u32 *)get_property(np, "reg", &len); - - if (!tid) - continue; - - /* If there is a drc-index, make sure that we own - * the cpu. - */ - if (index) { - int state; - int rc = rtas_get_sensor(9003, *index, &state); - if (rc < 0 || state != 1) - continue; - } - - nr_threads = len / sizeof(u32); - - while (nr_threads--) { - if (0 == query_cpu_stopped(tid[nr_threads])) { - best = tid[nr_threads]; - if (best == old_hwindex) - goto out; - } - } - } -out: - of_node_put(np); - return best; -} - -/** - * smp_startup_cpu() - start the given cpu - * - * At boot time, there is nothing to do. At run-time, call RTAS with - * the appropriate start location, if the cpu is in the RTAS stopped - * state. - * - * Returns: - * 0 - failure - * 1 - success - */ -static inline int __devinit smp_startup_cpu(unsigned int lcpu) -{ - int status; - unsigned long start_here = __pa((u32)*((unsigned long *) - pSeries_secondary_smp_init)); - unsigned int pcpu; - - /* At boot time the cpus are already spinning in hold - * loops, so nothing to do. */ - if (system_state < SYSTEM_RUNNING) - return 1; - - pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); - if (pcpu == -1U) { - printk(KERN_INFO "No more cpus available, failing\n"); - return 0; - } - - /* Fixup atomic count: it exited inside IRQ handler. */ - paca[lcpu].__current->thread_info->preempt_count = 0; - - /* At boot this is done in prom.c. */ - paca[lcpu].hw_cpu_id = pcpu; - - status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, - pcpu, start_here, lcpu); - if (status != 0) { - printk(KERN_ERR "start-cpu failed: %i\n", status); - return 0; - } - return 1; -} - /* * Update cpu_present_map and paca(s) for a new cpu node. The wrinkle * here is that a cpu device node may represent up to two logical cpus @@ -335,12 +248,43 @@ static struct notifier_block pSeries_smp .notifier_call = pSeries_smp_notifier, }; -#else /* ... CONFIG_HOTPLUG_CPU */ +#endif /* CONFIG_HOTPLUG_CPU */ + +/** + * smp_startup_cpu() - start the given cpu + * + * At boot time, there is nothing to do for primary threads which were + * started from Open Firmware. For anything else, call RTAS with the + * appropriate start location. + * + * Returns: + * 0 - failure + * 1 - success + */ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { + int status; + unsigned long start_here = __pa((u32)*((unsigned long *) + pSeries_secondary_smp_init)); + unsigned int pcpu; + + if (cpu_isset(lcpu, of_spin_map)) + /* Already started by OF and sitting in spin loop */ + return 1; + + pcpu = get_hard_smp_processor_id(lcpu); + + /* Fixup atomic count: it exited inside IRQ handler. */ + paca[lcpu].__current->thread_info->preempt_count = 0; + + status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, + pcpu, start_here, lcpu); + if (status != 0) { + printk(KERN_ERR "start-cpu failed: %i\n", status); + return 0; + } return 1; } -#endif /* CONFIG_HOTPLUG_CPU */ static inline void smp_xics_do_message(int cpu, int msg) { @@ -380,6 +324,8 @@ static void __devinit smp_xics_setup_cpu if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); + cpu_clear(cpu, of_spin_map); + /* * Put the calling processor into the GIQ. This is really only * necessary from a secondary thread as the OF start-cpu interface @@ -429,6 +375,20 @@ static void __devinit smp_pSeries_kick_c paca[nr].cpu_start = 1; } +static int smp_pSeries_cpu_bootable(unsigned int nr) +{ + /* Special case - we inhibit secondary thread startup + * during boot if the user requests it. Odd-numbered + * cpus are assumed to be secondary threads. + */ + if (system_state < SYSTEM_RUNNING && + cur_cpu_spec->cpu_features & CPU_FTR_SMT && + !smt_enabled_at_boot && nr % 2 != 0) + return 0; + + return 1; +} + static struct smp_ops_t pSeries_mpic_smp_ops = { .message_pass = smp_mpic_message_pass, .probe = smp_mpic_probe, @@ -441,12 +401,13 @@ static struct smp_ops_t pSeries_xics_smp .probe = smp_xics_probe, .kick_cpu = smp_pSeries_kick_cpu, .setup_cpu = smp_xics_setup_cpu, + .cpu_bootable = smp_pSeries_cpu_bootable, }; /* This is called very early */ void __init smp_init_pSeries(void) { - int ret, i; + int i; DBG(" -> smp_init_pSeries()\n"); @@ -464,20 +425,20 @@ void __init smp_init_pSeries(void) pSeries_reconfig_notifier_register(&pSeries_smp_nb); #endif - /* Start secondary threads on SMT systems; primary threads - * are already in the running state. - */ - for_each_present_cpu(i) { - if (query_cpu_stopped(get_hard_smp_processor_id(i)) == 0) { - printk("%16.16x : starting thread\n", i); - DBG("%16.16x : starting thread\n", i); - rtas_call(rtas_token("start-cpu"), 3, 1, &ret, - get_hard_smp_processor_id(i), - __pa((u32)*((unsigned long *) - pSeries_secondary_smp_init)), - i); + /* Mark threads which are still spinning in hold loops. */ + if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + for_each_present_cpu(i) { + if (i % 2 == 0) + /* + * Even-numbered logical cpus correspond to + * primary threads. + */ + cpu_set(i, of_spin_map); } - } + else + of_spin_map = cpu_present_map; + + cpu_clear(boot_cpuid, of_spin_map); /* Non-lpar has additional take/give timebase */ if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) { Index: linux-2.6.11-bk5/include/asm-ppc64/machdep.h =================================================================== --- linux-2.6.11-bk5.orig/include/asm-ppc64/machdep.h 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/include/asm-ppc64/machdep.h 2005-03-09 20:32:55.000000000 +0000 @@ -33,6 +33,7 @@ struct smp_ops_t { int (*cpu_enable)(unsigned int nr); int (*cpu_disable)(void); void (*cpu_die)(unsigned int nr); + int (*cpu_bootable)(unsigned int nr); }; #endif Index: linux-2.6.11-bk5/arch/ppc64/kernel/smp.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/smp.c 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/smp.c 2005-03-09 20:32:55.000000000 +0000 @@ -490,9 +490,8 @@ int __devinit __cpu_up(unsigned int cpu) if (!cpu_enable(cpu)) return 0; - /* At boot, don't bother with non-present cpus -JSCHOPP */ - if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) - return -ENOENT; + if (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)) + return -EINVAL; paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; @@ -606,14 +605,6 @@ void __init smp_cpus_done(unsigned int m smp_ops->setup_cpu(boot_cpuid); set_cpus_allowed(current, old_mask); - - /* - * We know at boot the maximum number of cpus we can add to - * a partition and set cpu_possible_map accordingly. cpu_present_map - * needs to match for the hotplug code to allow us to hot add - * any offline cpus. - */ - cpu_present_map = cpu_possible_map; } #ifdef CONFIG_HOTPLUG_CPU Index: linux-2.6.11-bk5/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/setup.c 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/setup.c 2005-03-09 20:32:55.000000000 +0000 @@ -269,15 +269,9 @@ static void __init setup_cpu_maps(void) nthreads = len / sizeof(u32); for (j = 0; j < nthreads && cpu < NR_CPUS; j++) { - /* - * Only spin up secondary threads if SMT is enabled. - * We must leave space in the logical map for the - * threads. - */ - if (j == 0 || smt_enabled_at_boot) { - cpu_set(cpu, cpu_present_map); - set_hard_smp_processor_id(cpu, intserv[j]); - } + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, intserv[j]); + if (intserv[j] == boot_cpuid_phys) swap_cpuid = cpu; cpu_set(cpu, cpu_possible_map); From ntl at pobox.com Thu Mar 10 12:40:13 2005 From: ntl at pobox.com (Nathan Lynch) Date: Wed, 9 Mar 2005 19:40:13 -0600 Subject: [PATCH] ppc64: fix xmon build break with non-SMP config Message-ID: <20050310014013.GC21853@otto> CC arch/ppc64/xmon/xmon.o arch/ppc64/xmon/xmon.c: In function `set_controlled_dabr': arch/ppc64/xmon/xmon.c:633: warning: implicit declaration of function `plpar_hcall_norets' arch/ppc64/xmon/xmon.c:633: error: `H_SET_DABR' undeclared (first use in this function) arch/ppc64/xmon/xmon.c:633: error: (Each undeclared identifier is reported only once arch/ppc64/xmon/xmon.c:633: error: for each function it appears in.) arch/ppc64/xmon/xmon.c:634: error: `H_Success' undeclared (first use in this function) Signed-off-by: Nathan Lynch xmon.c | 1 + 1 files changed, 1 insertion(+) Index: linux-2.6.11-bk5/arch/ppc64/xmon/xmon.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/xmon/xmon.c 2005-03-09 20:01:32.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/xmon/xmon.c 2005-03-10 01:09:13.000000000 +0000 @@ -32,6 +32,7 @@ #include #include #include +#include #include "nonstdio.h" #include "privinst.h" From david at gibson.dropbear.id.au Thu Mar 10 13:18:48 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Thu, 10 Mar 2005 13:18:48 +1100 Subject: [PPC64] Allow emulation of mfpvr on ppc64 kernel Message-ID: <20050310021848.GD30435@localhost.localdomain> Andrew, please apply. Allow userspace programs on ppc64 to use the (privileged) mfpvr instruction to determine the processor type. At the moment it emulates the instruction to provide the real PVR value, though it could be made to lie in future if for some reason we wish to restrict what CPU features userspace uses. If nothing else this means that some existing ppc32 applications will now run on a 64-bit kernel (the 32-bit kernel has long supported this emulation). It will also be necessary for ppc64 perfctr support, where userspace requires finer-grained cpu type information than the kernel in order to correctly program the performance monitor control registers. Signed-off-by: David Gibson Index: working-2.6/arch/ppc64/kernel/traps.c =================================================================== --- working-2.6.orig/arch/ppc64/kernel/traps.c 2005-03-06 07:08:24.000000000 +1100 +++ working-2.6/arch/ppc64/kernel/traps.c 2005-03-10 13:05:25.000000000 +1100 @@ -279,6 +279,9 @@ * fault. Return zero on success. */ +#define INST_MFSPR_PVR 0x7c1f42a6 +#define INST_MFSPR_PVR_MASK 0xfc1fffff + #define INST_DCBA 0x7c0005ec #define INST_DCBA_MASK 0x7c0007fe @@ -297,6 +300,15 @@ if (get_user(instword, (unsigned int __user *)(regs->nip))) return -EFAULT; + /* Emulate the mfspr rD, PVR. */ + if ((instword & INST_MFSPR_PVR_MASK) == INST_MFSPR_PVR) { + unsigned int rd; + + rd = (instword >> 21) & 0x1f; + regs->gpr[rd] = mfspr(SPRN_PVR); + return 0; + } + /* Emulating the dcba insn is just a no-op. */ if ((instword & INST_DCBA_MASK) == INST_DCBA) { static int warned; @@ -390,11 +402,6 @@ if (regs->msr & 0x100000) { /* IEEE FP exception */ parse_fpe(regs); - - } else if (regs->msr & 0x40000) { - /* Privileged instruction */ - _exception(SIGILL, regs, ILL_PRVOPC, regs->nip); - } else if (regs->msr & 0x20000) { /* trap exception */ @@ -411,7 +418,7 @@ _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); } else { - /* Illegal instruction; try to emulate it. */ + /* Privileged or illegal instruction; try to emulate it. */ switch (emulate_instruction(regs)) { case 0: regs->nip += 4; @@ -423,7 +430,12 @@ break; default: - _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); + if (regs->msr & 0x40000) + /* priveleged */ + _exception(SIGILL, regs, ILL_PRVOPC, regs->nip); + else + /* illegal */ + _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); break; } } -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From sfr at canb.auug.org.au Thu Mar 10 13:42:16 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 10 Mar 2005 13:42:16 +1100 Subject: [RFC][PATCH] combining header files In-Reply-To: <20050309200109.GG1220@austin.ibm.com> References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> <20050309200109.GG1220@austin.ibm.com> Message-ID: <20050310134216.5b9b27ef.sfr@canb.auug.org.au> On Wed, 9 Mar 2005 14:01:09 -0600 Linas Vepstas wrote: > > Why not #include instead? Because I am talking about similarities between ppc and ppc64 not ppc64 and the generic code (though there may be some of those to be exploited as well). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050310/6b006c20/attachment.pgp From olof at austin.ibm.com Thu Mar 10 14:25:07 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 9 Mar 2005 21:25:07 -0600 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <20050308171326.3d72363a.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> Message-ID: <20050310032507.GC20789@austin.ibm.com> Hi, On Tue, Mar 08, 2005 at 05:13:26PM -0600, Jake Moilanen wrote: > diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c > --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-08 16:08:57 -06:00 > @@ -89,12 +90,23 @@ static inline void loop_forever(void) > ; > } > > +int is_kernel_text(unsigned long addr) > +{ > + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) > + return 1; > + > + return 0; > +} This is used in two files, but never declared extern in the second file (iSeries_setup.c). Should it go in a header file as a static inline instead? There also seems to be a local static is_kernel_text() in kallsyms that overlaps (but it's not identical). Removing that redundancy can be taken care of as a janitorial patch outside of the noexec stuff. -Olof From olof at austin.ibm.com Thu Mar 10 14:22:13 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Wed, 9 Mar 2005 21:22:13 -0600 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <20050308170826.13a2299e.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> Message-ID: <20050310032213.GB20789@austin.ibm.com> On Tue, Mar 08, 2005 at 05:08:26PM -0600, Jake Moilanen wrote: > No-exec base and user space support for PPC64. Hi, a couple of comments below. -Olof > @@ -786,6 +786,7 @@ int hash_huge_page(struct mm_struct *mm, > pte_t old_pte, new_pte; > unsigned long hpteflags, prpn; > long slot; > + int is_exec; > int err = 1; > > spin_lock(&mm->page_table_lock); > @@ -796,6 +797,10 @@ int hash_huge_page(struct mm_struct *mm, > va = (vsid << 28) | (ea & 0x0fffffff); > vpn = va >> HPAGE_SHIFT; > > + is_exec = access & _PAGE_EXEC; > + if (unlikely(is_exec && !(pte_val(*ptep) & _PAGE_EXEC))) > + goto out; You only use is_exec this one time, you can probably skip it and just add the mask in the if statement. > @@ -898,6 +908,7 @@ repeat: > err = 0; > > out: > + > spin_unlock(&mm->page_table_lock); Whitespace change > diff -puN include/asm-ppc64/pgtable.h~nx-user-ppc64 include/asm-ppc64/pgtable.h > --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 > +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-08 16:08:54 -06:00 > @@ -82,14 +82,14 @@ > #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ > #define _PAGE_USER 0x0002 /* matches one of the PP bits */ > #define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ > -#define _PAGE_RW 0x0004 /* software: user write access allowed */ > +#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ Good to see the comment there, I remember we talked about that earlier. It can be somewhat confusing. :-) > #define _PAGE_GUARDED 0x0008 > #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ > #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ > #define _PAGE_WRITETHRU 0x0040 /* W: cache write-through */ > #define _PAGE_DIRTY 0x0080 /* C: page changed */ > #define _PAGE_ACCESSED 0x0100 /* R: page referenced */ > -#define _PAGE_EXEC 0x0200 /* software: i-cache coherence required */ > +#define _PAGE_RW 0x0200 /* software: user write access allowed */ > #define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */ > #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ > #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ > @@ -100,7 +100,7 @@ > /* PAGE_MASK gives the right answer below, but only by accident */ > /* It should be preserving the high 48 bits and then specifically */ > /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ > -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_HPTEFLAGS) > +#define _PAGE_CHG_MASK (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | _PAGE_WRITETHRU | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HPTEFLAGS | PAGE_MASK) Can you break it into 80 columns with \ ? From benh at kernel.crashing.org Thu Mar 10 18:15:34 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Thu, 10 Mar 2005 18:15:34 +1100 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <20050310032507.GC20789@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> <20050310032507.GC20789@austin.ibm.com> Message-ID: <1110438934.32524.203.camel@gaston> On Wed, 2005-03-09 at 21:25 -0600, Olof Johansson wrote: > Hi, > > On Tue, Mar 08, 2005 at 05:13:26PM -0600, Jake Moilanen wrote: > > diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c > > --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 > > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-08 16:08:57 -06:00 > > @@ -89,12 +90,23 @@ static inline void loop_forever(void) > > ; > > } > > > > +int is_kernel_text(unsigned long addr) > > +{ > > + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) > > + return 1; > > + > > + return 0; > > +} > > This is used in two files, but never declared extern in the second file > (iSeries_setup.c). Should it go in a header file as a static inline > instead? Yes, I think it should. > There also seems to be a local static is_kernel_text() in kallsyms that > overlaps (but it's not identical). Removing that redundancy can be taken > care of as a janitorial patch outside of the noexec stuff. > > > > -Olof > _______________________________________________ > Linuxppc64-dev mailing list > Linuxppc64-dev at ozlabs.org > https://ozlabs.org/cgi-bin/mailman/listinfo/linuxppc64-dev -- Benjamin Herrenschmidt From arnd at arndb.de Thu Mar 10 20:34:04 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 10 Mar 2005 10:34:04 +0100 Subject: [PATCH] linking zImage with biarch ld In-Reply-To: <1110404388.32524.101.camel@gaston> References: <200503091401.17143.arnd@arndb.de> <1110404388.32524.101.camel@gaston> Message-ID: <200503101034.05737.arnd@arndb.de> On Middeweken 09 M?rz 2005 22:39, Benjamin Herrenschmidt wrote: > > I noticed that with the vDSO patch in 2.6.11-bk, it's almost possible to build > > the kernel with the fedora biarch toolchain. However, I still get warnings > > from ld about zImage being the wrong architecture, unless I change the script > > as shown in this patch. > > "Almost possible" ? What's wrong ? Only that ? Yes, that's the only problem. Arnd <>< From johnrose at austin.ibm.com Fri Mar 11 03:50:10 2005 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 10 Mar 2005 10:50:10 -0600 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050310005142.31309.45788.99418@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> Message-ID: <1110473410.29353.4.camel@sinatra.austin.ibm.com> Hi Nathan- The patch series cleans things up nicely. One comment: > These functions, if passed a null mem_start argument, will > use kmalloc for allocating extra data structures for the device node > being processed. Might it be possible to use the dynamic flag of the device node to decide when to use kmalloc? Thanks- John From johnrose at austin.ibm.com Fri Mar 11 04:20:00 2005 From: johnrose at austin.ibm.com (John Rose) Date: Thu, 10 Mar 2005 11:20:00 -0600 Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <20050310005152.31309.41959.21947@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005152.31309.41959.21947@otto> Message-ID: <1110475200.29353.12.camel@sinatra.austin.ibm.com> Quick comment on this- > void of_add_node(struct device_node *np) > { > - int err; > - > - /* This use of finish_node will be moved to a notifier so > - * the error code can be used. > - */ > - err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); > - if (err < 0) > - return; > - > write_lock(&devtree_lock); > np->sibling = np->parent->child; > np->allnext = allnodes; > @@ -1682,6 +1674,36 @@ void of_remove_node(const struct device_ > write_unlock(&devtree_lock); > } If I understand correctly, of_add_node() now simply adds the node to relevant lists and sets relational pointers to position it in the device tree. The allocation and other setup has been moved out of the function. Might it be more clear to rename it to of_attach_node() or something similar? Thanks- John From arnd at arndb.de Fri Mar 11 06:05:28 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 10 Mar 2005 20:05:28 +0100 Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <20050310005152.31309.41959.21947@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005152.31309.41959.21947@otto> Message-ID: <200503102005.29526.arnd@arndb.de> On Dunnersdag 10 M?rz 2005 01:51, Nathan Lynch wrote: > ?void of_add_node(struct device_node *np) > ?{ While looking at the of_add_node code, I noticed that there are some memory holes in the error path of that function. They should be trivial to fix with the attached patch, but I can't test this because I don't have reconfigurable machines. Unfortunately, this also conflicts with Nathan's patches, but I can submit a new patch when they show up in bitkeeper. Signed-off-by: Arnd Bergmann -------------- next part -------------- A non-text attachment was scrubbed... Name: of_add_node_leak.diff Type: text/x-diff Size: 1307 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050310/448cea66/attachment.diff From jschopp at austin.ibm.com Fri Mar 11 06:41:38 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Thu, 10 Mar 2005 13:41:38 -0600 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050310005142.31309.45788.99418@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> Message-ID: <4230A2F2.7020403@austin.ibm.com> Nathan Lynch wrote: > At boot we recurse through the device tree "fixing up" various fields > and properties in the device nodes. Long ago, to support DLPAR and > hotplug, we largely duplicated some of this fixup code, the main > difference being that the new code used kmalloc for allocating various > data structures which are attached to the new device nodes. > > This patch kills most of the duplicated code and makes finish_node, > finish_node_interrupts, and interpret_pci_props suitable for use at > runtime. These functions, if passed a null mem_start argument, will > use kmalloc for allocating extra data structures for the device node > being processed. Not terribly elegant, but it seems worth it to get > rid of the duplicated code (and bugs). Good idea, I wholeheartedly agree. > -static int of_finish_dynamic_node(struct device_node *node) > +static int of_finish_dynamic_node(struct device_node *node, > + unsigned long *unused1, int unused2, > + int unused3, int unused4) > { Is there a reason for these 4 unused fields that I am just missing? From arnd at arndb.de Fri Mar 11 06:54:36 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Thu, 10 Mar 2005 20:54:36 +0100 Subject: [PATCH] ppc64: kill might_sleep() warnings in __copy_*_user_inatomic Message-ID: <200503102054.38123.arnd@arndb.de> I currently get warnings from futex resulting from Olofs futex+rwsem fix combined with the fact that ppc64 __copy_from_user has a might_sleep check in it: [ 9607.577071] Debug: sleeping function called from invalid context at include2/asm/uaccess.h:2 28 [ 9607.676181] in_atomic():1, irqs_disabled():0 [ 9607.724741] Call Trace: [ 9607.752058] [c00000000d68fab0] [c000000001f0fb80] 0xc000000001f0fb80 (unreliable) [ 9607.835030] [c00000000d68fb30] [c000000000042420] .__might_sleep+0xf8/0x108 [ 9607.912936] [c00000000d68fbd0] [c00000000006ac34] .do_futex+0x224/0x858 The fix is to do the check only in copy_*_user, not __copy_*_user. This is the same that most other architectures do. Signed-off-by: Arnd Bergmann --- arch/ppc64/lib/usercopy.c | 2 ++ include/asm-ppc64/uaccess.h | 6 ++---- 2 files changed, 4 insertions(+), 4 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: uaccess-might-sleep.diff Type: text/x-diff Size: 2392 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050310/78bb7edf/attachment.diff From jschopp at austin.ibm.com Fri Mar 11 07:27:02 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Thu, 10 Mar 2005 14:27:02 -0600 Subject: [PATCH 7/8] pSeries_smp.c: use pSeries reconfig notifier for cpu DLPAR In-Reply-To: <20050310005207.31309.32546.73375@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005207.31309.32546.73375@otto> Message-ID: <4230AD96.7020508@austin.ibm.com> This patch seems fine. Just a couple trivial comments. > +static int pSeries_add_processor(struct device_node *np) This function doesn't really add a processor; it could use a better name. > +static void pSeries_remove_processor(struct device_node *np) This doesn't remove a processor; it could also use a better name. From moilanen at austin.ibm.com Fri Mar 11 09:25:13 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 10 Mar 2005 16:25:13 -0600 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <20050310032213.GB20789@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> <20050310032213.GB20789@austin.ibm.com> Message-ID: <20050310162513.74191caa.moilanen@austin.ibm.com> On Wed, 9 Mar 2005 21:22:13 -0600 olof at austin.ibm.com (Olof Johansson) wrote: > On Tue, Mar 08, 2005 at 05:08:26PM -0600, Jake Moilanen wrote: > > No-exec base and user space support for PPC64. > > Hi, a couple of comments below. > Here's the revised user & base support for no-exec on ppc64 with Olof and Ben's comments. Signed-off-by: Jake Moilanen --- linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S | 5 + linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c | 4 + linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c | 2 linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c | 14 +++-- linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S | 12 ++-- linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c | 10 +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c | 2 linux-2.6-bk-moilanen/include/asm-ppc64/elf.h | 7 ++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h | 19 ++++++- linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h | 46 +++++++++-------- 10 files changed, 85 insertions(+), 36 deletions(-) diff -puN arch/ppc64/kernel/head.S~nx-user-ppc64 arch/ppc64/kernel/head.S --- linux-2.6-bk/arch/ppc64/kernel/head.S~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/head.S 2005-03-08 16:08:54 -06:00 @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_ISERIES @@ -950,11 +951,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_SLB) * accessing a userspace segment (even from the kernel). We assume * kernel addresses always have the high bit set. */ - rlwinm r4,r4,32-23,29,29 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r4,r4,32-25+9,31-9,31-9 /* DSISR_STORE -> _PAGE_RW */ rotldi r0,r3,15 /* Move high bit into MSR_PR posn */ orc r0,r12,r0 /* MSR_PR | ~high_bit */ rlwimi r4,r0,32-13,30,30 /* becomes _PAGE_USER access bit */ - ori r4,r4,1 /* add _PAGE_PRESENT */ + rlwimi r4,r5,22+2,31-2,31-2 /* Set _PAGE_EXEC if trap is 0x400 */ /* * On iSeries, we soft-disable interrupts here, then diff -puN arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 arch/ppc64/kernel/iSeries_htab.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_htab.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_htab.c 2005-03-08 16:08:54 -06:00 @@ -144,6 +144,10 @@ static long iSeries_hpte_updatepp(unsign HvCallHpt_get(&hpte, slot); if ((hpte.dw0.dw0.avpn == avpn) && (hpte.dw0.dw0.v)) { + /* + * Hypervisor expects bit's as NPPP, which is + * different from how they are mapped in our PP. + */ HvCallHpt_setPp(slot, (newpp & 0x3) | ((newpp & 0x4) << 1)); iSeries_hunlock(slot); return 0; diff -puN arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 arch/ppc64/kernel/pSeries_lpar.c --- linux-2.6-bk/arch/ppc64/kernel/pSeries_lpar.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/pSeries_lpar.c 2005-03-08 16:08:54 -06:00 @@ -470,7 +470,7 @@ static void pSeries_lpar_hpte_updatebolt slot = pSeries_lpar_hpte_find(vpn); BUG_ON(slot == -1); - flags = newpp & 3; + flags = newpp & 7; lpar_rc = plpar_pte_protect(flags, slot, 0); BUG_ON(lpar_rc != H_Success); diff -puN arch/ppc64/mm/fault.c~nx-user-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-10 16:14:45 -06:00 @@ -93,6 +93,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long code = SEGV_MAPERR; unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); + unsigned long is_exec = trap == 0x400; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -199,16 +200,19 @@ int do_page_fault(struct pt_regs *regs, good_area: code = SEGV_ACCERR; + if (is_exec) { + /* protection fault */ + if (error_code & 0x08000000) + goto bad_area; + if (!(vma->vm_flags & VM_EXEC)) + goto bad_area; /* a write */ - if (is_write) { + } else if (is_write) { if (!(vma->vm_flags & VM_WRITE)) goto bad_area; /* a read */ } else { - /* protection fault */ - if (error_code & 0x08000000) - goto bad_area; - if (!(vma->vm_flags & (VM_READ | VM_EXEC))) + if (!(vma->vm_flags & VM_READ)) goto bad_area; } diff -puN arch/ppc64/mm/hash_low.S~nx-user-ppc64 arch/ppc64/mm/hash_low.S --- linux-2.6-bk/arch/ppc64/mm/hash_low.S~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_low.S 2005-03-08 16:08:54 -06:00 @@ -89,7 +89,7 @@ _GLOBAL(__hash_page) /* Prepare new PTE value (turn access RW into DIRTY, then * add BUSY,HASHPTE and ACCESSED) */ - rlwinm r30,r4,5,24,24 /* _PAGE_RW -> _PAGE_DIRTY */ + rlwinm r30,r4,32-9+7,31-7,31-7 /* _PAGE_RW -> _PAGE_DIRTY */ or r30,r30,r31 ori r30,r30,_PAGE_BUSY | _PAGE_ACCESSED | _PAGE_HASHPTE /* Write the linux PTE atomically (setting busy) */ @@ -112,11 +112,11 @@ _GLOBAL(__hash_page) rldicl r5,r5,0,25 /* vsid & 0x0000007fffffffff */ rldicl r0,r3,64-12,48 /* (ea >> 12) & 0xffff */ xor r28,r5,r0 - - /* Convert linux PTE bits into HW equivalents - */ - andi. r3,r30,0x1fa /* Get basic set of flags */ - rlwinm r0,r30,32-2+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ + + /* Convert linux PTE bits into HW equivalents */ + andi. r3,r30,0x1fe /* Get basic set of flags */ + xori r3,r3,HW_NO_EXEC /* _PAGE_EXEC -> NOEXEC */ + rlwinm r0,r30,32-9+1,30,30 /* _PAGE_RW -> _PAGE_USER (r0) */ rlwinm r4,r30,32-7+1,30,30 /* _PAGE_DIRTY -> _PAGE_USER (r4) */ and r0,r0,r4 /* _PAGE_RW & _PAGE_DIRTY -> r0 bit 30 */ andc r0,r30,r0 /* r0 = pte & ~r0 */ diff -puN arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 arch/ppc64/mm/hugetlbpage.c --- linux-2.6-bk/arch/ppc64/mm/hugetlbpage.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hugetlbpage.c 2005-03-10 13:46:08 -06:00 @@ -796,6 +796,9 @@ int hash_huge_page(struct mm_struct *mm, va = (vsid << 28) | (ea & 0x0fffffff); vpn = va >> HPAGE_SHIFT; + if (unlikely((access & _PAGE_EXEC) && !(pte_val(*ptep) & _PAGE_EXEC))) + goto out; + /* * If no pte found or not present, send the problem up to * do_page_fault @@ -828,7 +831,12 @@ int hash_huge_page(struct mm_struct *mm, old_pte = *ptep; new_pte = old_pte; - hpteflags = 0x2 | (! (pte_val(new_pte) & _PAGE_RW)); + hpteflags = (pte_val(new_pte) & _PAGE_RW) | + (!(pte_val(new_pte) & _PAGE_RW)) | + _PAGE_USER; + + /* _PAGE_EXEC -> HW_NO_EXEC since it's inverted */ + hpteflags |= ((pte_val(new_pte) & _PAGE_EXEC) ? 0 : HW_NO_EXEC); /* Check if pte already has an hpte (case 2) */ if (unlikely(pte_val(old_pte) & _PAGE_HASHPTE)) { diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c 2005-03-08 16:08:54 -06:00 @@ -99,6 +99,8 @@ static int set_brk(unsigned long start, up_write(¤t->mm->mmap_sem); if (BAD_ADDR(addr)) return addr; + + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); } current->mm->start_brk = current->mm->brk = end; return 0; diff -puN include/asm-ppc64/elf.h~nx-user-ppc64 include/asm-ppc64/elf.h --- linux-2.6-bk/include/asm-ppc64/elf.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/elf.h 2005-03-08 16:23:37 -06:00 @@ -226,6 +226,13 @@ do { \ else if (current->personality != PER_LINUX32) \ set_personality(PER_LINUX); \ } while (0) + +/* + * An executable for which elf_read_implies_exec() returns TRUE will + * have the READ_IMPLIES_EXEC personality flag set automatically. + */ +#define elf_read_implies_exec(ex, have_pt_gnu_stack) (!(have_pt_gnu_stack)) + #endif /* diff -puN include/asm-ppc64/page.h~nx-user-ppc64 include/asm-ppc64/page.h --- linux-2.6-bk/include/asm-ppc64/page.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/page.h 2005-03-08 16:08:54 -06:00 @@ -235,8 +235,25 @@ extern u64 ppc64_pft_size; /* Log 2 of #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \ +#define VM_DATA_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS32 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_STACK_DEFAULT_FLAGS64 (VM_READ | VM_WRITE | VM_EXEC | \ + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) + +#define VM_DATA_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_DATA_DEFAULT_FLAGS32 : VM_DATA_DEFAULT_FLAGS64) + +#define VM_STACK_DEFAULT_FLAGS \ + (test_thread_flag(TIF_32BIT) ? \ + VM_STACK_DEFAULT_FLAGS32 : VM_STACK_DEFAULT_FLAGS64) #endif /* __KERNEL__ */ #endif /* _PPC64_PAGE_H */ diff -puN include/asm-ppc64/pgtable.h~nx-user-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-user-ppc64 2005-03-08 16:08:54 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-10 16:14:45 -06:00 @@ -82,14 +82,14 @@ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ #define _PAGE_FILE 0x0002 /* (!present only) software: pte holds file offset */ -#define _PAGE_RW 0x0004 /* software: user write access allowed */ +#define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ #define _PAGE_GUARDED 0x0008 #define _PAGE_COHERENT 0x0010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x0020 /* I: cache inhibit */ #define _PAGE_WRITETHRU 0x0040 /* W: cache write-through */ #define _PAGE_DIRTY 0x0080 /* C: page changed */ #define _PAGE_ACCESSED 0x0100 /* R: page referenced */ -#define _PAGE_EXEC 0x0200 /* software: i-cache coherence required */ +#define _PAGE_RW 0x0200 /* software: user write access allowed */ #define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */ #define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */ #define _PAGE_SECONDARY 0x8000 /* software: HPTE is in secondary group */ @@ -100,7 +100,8 @@ /* PAGE_MASK gives the right answer below, but only by accident */ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_HPTEFLAGS) +#define _PAGE_CHG_MASK (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | _PAGE_WRITETHRU | \ + _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_HPTEFLAGS | PAGE_MASK) #define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT) @@ -116,31 +117,38 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) -#define PAGE_KERNEL_CI __pgprot(_PAGE_PRESENT | _PAGE_ACCESSED | \ - _PAGE_WRENABLE | _PAGE_NO_CACHE | _PAGE_GUARDED) + +#define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is + * inverted, even though it's the + * same value, hopefully it will be + * clearer in the code what is + * going on. */ /* - * The PowerPC can only do execute protection on a segment (256MB) basis, - * not on a page basis. So we consider execute permission the same as read. + * POWER4 and newer have per page execute protection, older chips can only + * do this on a segment (256MB) basis. + * * Also, write permissions imply read permissions. * This is the closest we can get.. + * + * Note due to the way vm flags are laid out, the bits are XWR */ #define __P000 PAGE_NONE -#define __P001 PAGE_READONLY_X +#define __P001 PAGE_READONLY #define __P010 PAGE_COPY -#define __P011 PAGE_COPY_X -#define __P100 PAGE_READONLY +#define __P011 PAGE_COPY +#define __P100 PAGE_READONLY_X #define __P101 PAGE_READONLY_X -#define __P110 PAGE_COPY +#define __P110 PAGE_COPY_X #define __P111 PAGE_COPY_X #define __S000 PAGE_NONE -#define __S001 PAGE_READONLY_X +#define __S001 PAGE_READONLY #define __S010 PAGE_SHARED -#define __S011 PAGE_SHARED_X -#define __S100 PAGE_READONLY +#define __S011 PAGE_SHARED +#define __S100 PAGE_READONLY_X #define __S101 PAGE_READONLY_X -#define __S110 PAGE_SHARED +#define __S110 PAGE_SHARED_X #define __S111 PAGE_SHARED_X #ifndef __ASSEMBLY__ @@ -197,7 +205,8 @@ void hugetlb_mm_free_pgd(struct mm_struc }) #define pte_modify(_pte, newprot) \ - (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | pgprot_val(newprot))) + (__pte((pte_val(_pte) & _PAGE_CHG_MASK) | \ + (pgprot_val(newprot) & ~_PAGE_CHG_MASK))) #define pte_none(pte) ((pte_val(pte) & ~_PAGE_HPTEFLAGS) == 0) #define pte_present(pte) (pte_val(pte) & _PAGE_PRESENT) @@ -266,9 +275,6 @@ static inline int pte_young(pte_t pte) { static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_HUGE;} -static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } -static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } - static inline pte_t pte_rdprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_USER; return pte; } static inline pte_t pte_exprotect(pte_t pte) { @@ -438,7 +444,7 @@ static inline void set_pte_at(struct mm_ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry, int dirty) { unsigned long bits = pte_val(entry) & - (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW); + (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC); unsigned long old, tmp; __asm__ __volatile__( _ From moilanen at austin.ibm.com Fri Mar 11 09:27:21 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Thu, 10 Mar 2005 16:27:21 -0600 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <1110438934.32524.203.camel@gaston> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> <20050310032507.GC20789@austin.ibm.com> <1110438934.32524.203.camel@gaston> Message-ID: <20050310162721.19003dac.moilanen@austin.ibm.com> On Thu, 10 Mar 2005 18:15:34 +1100 Benjamin Herrenschmidt wrote: > On Wed, 2005-03-09 at 21:25 -0600, Olof Johansson wrote: > > Hi, > > > > On Tue, Mar 08, 2005 at 05:13:26PM -0600, Jake Moilanen wrote: > > > diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c > > > --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-08 16:08:57 -06:00 > > > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-08 16:08:57 -06:00 > > > @@ -89,12 +90,23 @@ static inline void loop_forever(void) > > > ; > > > } > > > > > > +int is_kernel_text(unsigned long addr) > > > +{ > > > + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) > > > + return 1; > > > + > > > + return 0; > > > +} > > > > This is used in two files, but never declared extern in the second file > > (iSeries_setup.c). Should it go in a header file as a static inline > > instead? > > Yes, I think it should. > Here is the revised no-exec for the kernel on ppc64 w/ Olof and Ben's comments. Signed-off-by: Jake Moilanen --- linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c | 4 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c | 3 +- linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c | 19 ++++++++++++++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c | 21 ++++++++++------ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h | 1 linux-2.6-bk-moilanen/include/asm-ppc64/sections.h | 9 ++++++ 6 files changed, 49 insertions(+), 8 deletions(-) diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c 2005-03-10 13:59:12 -06:00 @@ -633,6 +633,10 @@ static void __init iSeries_bolt_kernel(u unsigned long vpn = va >> PAGE_SHIFT; unsigned long slot = HvCallHpt_findValid(&hpte, vpn); + /* Make non-kernel text non-executable */ + if (!in_kernel_text(ea)) + mode_rw |= HW_NO_EXEC; + if (hpte.dw0.dw0.v) { /* HPTE exists, so just bolt it */ HvCallHpt_setSwBits(slot, 0x10, 0); diff -puN arch/ppc64/kernel/module.c~nx-kernel-ppc64 arch/ppc64/kernel/module.c --- linux-2.6-bk/arch/ppc64/kernel/module.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c 2005-03-10 13:54:14 -06:00 @@ -102,7 +102,8 @@ void *module_alloc(unsigned long size) { if (size == 0) return NULL; - return vmalloc(size); + + return vmalloc_exec(size); } /* Free memory returned from module_alloc */ diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-10 13:54:14 -06:00 @@ -76,6 +76,13 @@ static int store_updates_sp(struct pt_re return 0; } +pte_t *lookup_address(unsigned long address) +{ + pgd_t *pgd = pgd_offset_k(address); + + return find_linux_pte(pgd, address); +} + /* * The error_code parameter is * - DSISR for a non-SLB data access fault, @@ -94,6 +101,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); unsigned long is_exec = trap == 0x400; + pte_t *ptep; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -253,6 +261,17 @@ bad_area_nosemaphore: info.si_addr = (void __user *) address; force_sig_info(SIGSEGV, &info, current); return 0; + } + + ptep = lookup_address(address); + + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { + if (printk_ratelimit()) + printk(KERN_CRIT "kernel tried to execute NX-protected " + "page - exploit attempt? (uid: %d)\n", + current->uid); + show_stack(current, (unsigned long *)__get_SP()); + do_exit(SIGKILL); } return SIGSEGV; diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-10 13:58:37 -06:00 @@ -51,6 +51,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -95,6 +96,7 @@ static inline void create_pte_mapping(un { unsigned long addr; unsigned int step; + unsigned long tmp_mode; if (large) step = 16*MB; @@ -112,6 +114,13 @@ static inline void create_pte_mapping(un else vpn = va >> PAGE_SHIFT; + + tmp_mode = mode; + + /* Make non-kernel text non-executable */ + if (!in_kernel_text(addr)) + tmp_mode = mode | HW_NO_EXEC; + hash = hpt_hash(vpn, large); hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); @@ -120,12 +129,12 @@ static inline void create_pte_mapping(un if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); else #endif /* CONFIG_PPC_PSERIES */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); if (ret == -1) { ppc64_terminate_msg(0x20, "create_pte_mapping"); @@ -238,8 +247,6 @@ unsigned int hash_page_do_lazy_icache(un { struct page *page; -#define PPC64_HWNOEXEC (1 << 2) - if (!pfn_valid(pte_pfn(pte))) return pp; @@ -250,8 +257,8 @@ unsigned int hash_page_do_lazy_icache(un if (trap == 0x400) { __flush_dcache_icache(page_address(page)); set_bit(PG_arch_1, &page->flags); - } else - pp |= PPC64_HWNOEXEC; + } else + pp |= HW_NO_EXEC; } return pp; } @@ -271,7 +278,7 @@ int hash_page(unsigned long ea, unsigned int user_region = 0; int local = 0; cpumask_t tmp; - + switch (REGION_ID(ea)) { case USER_REGION_ID: user_region = 1; diff -puN include/asm-ppc64/pgtable.h~nx-kernel-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-10 13:54:14 -06:00 @@ -117,6 +117,7 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) +#define PAGE_KERNEL_EXEC __pgprot(_PAGE_BASE | _PAGE_WRENABLE | _PAGE_EXEC) #define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is * inverted, even though it's the diff -puN include/asm-ppc64/sections.h~nx-kernel-ppc64 include/asm-ppc64/sections.h --- linux-2.6-bk/include/asm-ppc64/sections.h~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/sections.h 2005-03-10 13:58:12 -06:00 @@ -17,4 +17,13 @@ extern char _end[]; #define __openfirmware #define __openfirmwaredata + +static inline int in_kernel_text(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) + return 1; + + return 0; +} + #endif _ From benh at kernel.crashing.org Fri Mar 11 09:44:28 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 11 Mar 2005 09:44:28 +1100 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <20050310162721.19003dac.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> <20050310032507.GC20789@austin.ibm.com> <1110438934.32524.203.camel@gaston> <20050310162721.19003dac.moilanen@austin.ibm.com> Message-ID: <1110494668.32525.283.camel@gaston> > /* Free memory returned from module_alloc */ > diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c > --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-10 13:54:14 -06:00 > @@ -76,6 +76,13 @@ static int store_updates_sp(struct pt_re > return 0; > } > > +pte_t *lookup_address(unsigned long address) > +{ > + pgd_t *pgd = pgd_offset_k(address); > + > + return find_linux_pte(pgd, address); > +} static please, even inline in this case. I've removed Andrew from CC upon his request, Paul, Anton or I will forward to him when it's ready, no need to clobber his mailbox in the meantime. Ben. From olof at austin.ibm.com Fri Mar 11 10:39:32 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Thu, 10 Mar 2005 17:39:32 -0600 Subject: [PATCH] ppc64: kill might_sleep() warnings in __copy_*_user_inatomic In-Reply-To: <200503102054.38123.arnd@arndb.de> References: <200503102054.38123.arnd@arndb.de> Message-ID: <20050310233932.GA26823@austin.ibm.com> On Thu, Mar 10, 2005 at 08:54:36PM +0100, Arnd Bergmann wrote: > I currently get warnings from futex resulting from Olofs futex+rwsem fix > combined with the fact that ppc64 __copy_from_user has a might_sleep > check in it: > > [ 9607.577071] Debug: sleeping function called from invalid context at include2/asm/uaccess.h:2 > 28 > [ 9607.676181] in_atomic():1, irqs_disabled():0 > [ 9607.724741] Call Trace: > [ 9607.752058] [c00000000d68fab0] [c000000001f0fb80] 0xc000000001f0fb80 (unreliable) > [ 9607.835030] [c00000000d68fb30] [c000000000042420] .__might_sleep+0xf8/0x108 > [ 9607.912936] [c00000000d68fbd0] [c00000000006ac34] .do_futex+0x224/0x858 > > The fix is to do the check only in copy_*_user, not __copy_*_user. This is the > same that most other architectures do. Actually, I think I would prefer the following. It renames current __copy_{to,from}_user to __copy_{to,from}_user_inatomic, adds the old ones as inlines doing the might_sleep() and calling the inatomics afterwards. This way the calls to __copy_{to,from}_user() will be caught if called under lock or preemption as well. This is also how i386 does it. This was coded up during travelling, so I haven't been able to boot the patch, only build it. Dave Jones made me aware of it since he hit exactly the above on ppc64 himself, but it was right before I left town. -Olof --- This implements the __copy_{to,from}_user_inatomic() functions on ppc64. The only difference between the inatomic and regular version is that inatomic does not call might_sleep() to detect possible faults while holding locks/elevated preempt counts. Signed-off-by: Olof Johansson Index: linux-2.5/include/asm-ppc64/uaccess.h =================================================================== --- linux-2.5.orig/include/asm-ppc64/uaccess.h 2005-03-09 17:17:31.000000000 -0600 +++ linux-2.5/include/asm-ppc64/uaccess.h 2005-03-09 17:21:01.000000000 -0600 @@ -223,9 +223,8 @@ extern unsigned long __copy_tofrom_user( unsigned long size); static inline unsigned long -__copy_from_user(void *to, const void __user *from, unsigned long n) +__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) { - might_sleep(); if (__builtin_constant_p(n)) { unsigned long ret; @@ -248,9 +247,15 @@ __copy_from_user(void *to, const void __ } static inline unsigned long -__copy_to_user(void __user *to, const void *from, unsigned long n) +__copy_from_user(void *to, const void __user *from, unsigned long n) { might_sleep(); + return __copy_from_user_inatomic(to, from, n); +} + +static inline unsigned long +__copy_to_user_inatomic(void __user *to, const void *from, unsigned long n) +{ if (__builtin_constant_p(n)) { unsigned long ret; @@ -272,6 +277,13 @@ __copy_to_user(void __user *to, const vo return __copy_tofrom_user(to, (__force const void __user *) from, n); } +static inline unsigned long +__copy_to_user(void __user *to, const void *from, unsigned long n) +{ + might_sleep(); + return __copy_to_user_inatomic(to, from, n); +} + #define __copy_in_user(to, from, size) \ __copy_tofrom_user((to), (from), (size)) @@ -284,9 +296,6 @@ extern unsigned long copy_in_user(void _ extern unsigned long __clear_user(void __user *addr, unsigned long size); -#define __copy_to_user_inatomic __copy_to_user -#define __copy_from_user_inatomic __copy_from_user - static inline unsigned long clear_user(void __user *addr, unsigned long size) { From kravetz at us.ibm.com Fri Mar 11 10:42:47 2005 From: kravetz at us.ibm.com (mike kravetz) Date: Thu, 10 Mar 2005 15:42:47 -0800 Subject: [PATCH] PPC64 NUMA memory fixup In-Reply-To: <20050310023613.23499386.akpm@osdl.org> References: <16942.30144.513313.26103@cargo.ozlabs.ibm.com> <20050310023613.23499386.akpm@osdl.org> Message-ID: <20050310234247.GA8276@w-mikek2.ibm.com> On Thu, Mar 10, 2005 at 02:36:13AM -0800, Andrew Morton wrote: > Paul Mackerras wrote: > > > > When I booted my new 720 on a kernel configured for NUMA, I received > > the following during bootup: > > > > WARNING: Unexpected node layout: region start 44000000 length 2000000 > > NUMA is disabled > > > > This is due to memory 'holes' within nodes. If such holes are > > encountered, then NUMA is disabled. The following patch adds support > > for such configurations. My 720 now boots with the following message: > > This patch causes the non-numa G5 to oops very early in boot in > smp_call_function(). > I can't recreate this on my system here even if I start with Andrew's config file. In addition, I don't have access to a PMAC to even try and figure out the flow at boot time. My guess is that this is related to the extra scan of memory sections (via of_find_node_by_type()) in do_init_bootmem. Is there something inherently different between making these calls on a LPAR as opposed to PMAC? I'm going to start looking for a PMAC so I can get more info. Any other suggestions on how to track this down are appreciated. Thanks, -- Mike From david at gibson.dropbear.id.au Fri Mar 11 11:11:20 2005 From: david at gibson.dropbear.id.au (David Gibson) Date: Fri, 11 Mar 2005 11:11:20 +1100 Subject: [PPC64] Allow emulation of mfpvr on ppc64 kernel In-Reply-To: <200503102317.04027.ioe-lkml@axxeo.de> References: <20050310021848.GD30435@localhost.localdomain> <200503102317.04027.ioe-lkml@axxeo.de> Message-ID: <20050311001120.GA6512@localhost.localdomain> On Thu, Mar 10, 2005 at 11:17:03PM +0100, Ingo Oeser wrote: > David Gibson wrote: > > Andrew, please apply. > > > > Allow userspace programs on ppc64 to use the (privileged) mfpvr > > instruction to determine the processor type. At the moment it > > emulates the instruction to provide the real PVR value, though it > > could be made to lie in future if for some reason we wish to restrict > > what CPU features userspace uses. > > Why not putting the required information into the AUX table > when executing your ELF programs? I loved this feature in the > ix86 arch. Because this is easy and is the way we already do it on ppc32..? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson From ntl at pobox.com Fri Mar 11 12:24:20 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:24:20 -0600 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <1110473410.29353.4.camel@sinatra.austin.ibm.com> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> <1110473410.29353.4.camel@sinatra.austin.ibm.com> Message-ID: <20050311012420.GD21853@otto> On Thu, Mar 10, 2005 at 10:50:10AM -0600, John Rose wrote: > > > > These functions, if passed a null mem_start argument, will > > use kmalloc for allocating extra data structures for the device node > > being processed. > > Might it be possible to use the dynamic flag of the device node to > decide when to use kmalloc? D'oh... that didn't occur to me. I think I will use this idea, it should reduce the total size of the changes. Nathan From ntl at pobox.com Fri Mar 11 12:30:47 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:30:47 -0600 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <4230A2F2.7020403@austin.ibm.com> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> <4230A2F2.7020403@austin.ibm.com> Message-ID: <20050311013047.GE21853@otto> On Thu, Mar 10, 2005 at 01:41:38PM -0600, Joel Schopp wrote: > Nathan Lynch wrote: > > >-static int of_finish_dynamic_node(struct device_node *node) > >+static int of_finish_dynamic_node(struct device_node *node, > >+ unsigned long *unused1, int unused2, > >+ int unused3, int unused4) > > { > > > Is there a reason for these 4 unused fields that I am just missing? > In order for it to be correctly used as an argument to finish_node, of_finish_dynamic_node needs to have a definition compatible with the interpret_func typedef. Nathan From ntl at pobox.com Fri Mar 11 12:37:53 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:37:53 -0600 Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <1110475200.29353.12.camel@sinatra.austin.ibm.com> References: <20050310005132.31309.65485.31668@otto> <20050310005152.31309.41959.21947@otto> <1110475200.29353.12.camel@sinatra.austin.ibm.com> Message-ID: <20050311013753.GF21853@otto> On Thu, Mar 10, 2005 at 11:20:00AM -0600, John Rose wrote: > Quick comment on this- > > > void of_add_node(struct device_node *np) > > { > > - int err; > > - > > - /* This use of finish_node will be moved to a notifier so > > - * the error code can be used. > > - */ > > - err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); > > - if (err < 0) > > - return; > > - > > write_lock(&devtree_lock); > > np->sibling = np->parent->child; > > np->allnext = allnodes; > > @@ -1682,6 +1674,36 @@ void of_remove_node(const struct device_ > > write_unlock(&devtree_lock); > > } > > If I understand correctly, of_add_node() now simply adds the node to > relevant lists and sets relational pointers to position it in the device > tree. The allocation and other setup has been moved out of the > function. Might it be more clear to rename it to of_attach_node() or > something similar? > Yup, and perhaps rename of_remove_node to of_detach_node or similar, since the semantics have changed slightly there also. Nathan From ntl at pobox.com Fri Mar 11 12:43:26 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:43:26 -0600 Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <200503102005.29526.arnd@arndb.de> References: <20050310005132.31309.65485.31668@otto> <20050310005152.31309.41959.21947@otto> <200503102005.29526.arnd@arndb.de> Message-ID: <20050311014326.GG21853@otto> On Thu, Mar 10, 2005 at 08:05:28PM +0100, Arnd Bergmann wrote: > On Dunnersdag 10 M?rz 2005 01:51, Nathan Lynch wrote: > > ?void of_add_node(struct device_node *np) > > ?{ > > While looking at the of_add_node code, I noticed that there are some > memory holes in the error path of that function. They should be trivial > to fix with the attached patch, but I can't test this because I don't > have reconfigurable machines. > I noticed this too and tried to fix it up in pSeries_reconfig_add_node in the previous (#3) patch, but I made a mess of it and will need to rework it. Thanks for pointing this out. Nathan From ntl at pobox.com Fri Mar 11 12:46:36 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:46:36 -0600 Subject: [PATCH 3/8] introduce pSeries_reconfig.[ch] In-Reply-To: <20050310005147.31309.61029.66648@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005147.31309.61029.66648@otto> Message-ID: <20050311014636.GH21853@otto> > +static int pSeries_reconfig_add_node(const char *path, struct property *proplist) > +{ > + struct device_node *np; > + int err = -ENOMEM; > + > + np = kcalloc(1, sizeof(*np), GFP_KERNEL); > + if (!np) > + goto out_err; > + > + np->full_name = kmalloc(strlen(path) + 1, GFP_KERNEL); > + if (!np->full_name) > + goto out_err; > + ... > + > +out_err: > + kfree(np->full_name); > + kfree(np); > + return err; > +} Bah, potential null pointer dereference in the first kfree, I'll need to fix that up. Nathan From ntl at pobox.com Fri Mar 11 12:53:39 2005 From: ntl at pobox.com (Nathan Lynch) Date: Thu, 10 Mar 2005 19:53:39 -0600 Subject: [PATCH 7/8] pSeries_smp.c: use pSeries reconfig notifier for cpu DLPAR In-Reply-To: <4230AD96.7020508@austin.ibm.com> References: <20050310005132.31309.65485.31668@otto> <20050310005207.31309.32546.73375@otto> <4230AD96.7020508@austin.ibm.com> Message-ID: <20050311015339.GI21853@otto> On Thu, Mar 10, 2005 at 02:27:02PM -0600, Joel Schopp wrote: > This patch seems fine. Just a couple trivial comments. > > >+static int pSeries_add_processor(struct device_node *np) > > This function doesn't really add a processor; it could use a better name. > > >+static void pSeries_remove_processor(struct device_node *np) > > This doesn't remove a processor; it could also use a better name. > They do add and remove processors in the sense that they update cpu_present_map, which is the kernel's logical representation of the cpus actually resident in the system. I'm sort of drawing a blank trying to think of alternatives. If you've better ideas please share. Nathan From paulus at samba.org Fri Mar 11 13:34:28 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Mar 2005 13:34:28 +1100 Subject: [PPC64] Allow emulation of mfpvr on ppc64 kernel In-Reply-To: <200503102317.04027.ioe-lkml@axxeo.de> References: <20050310021848.GD30435@localhost.localdomain> <200503102317.04027.ioe-lkml@axxeo.de> Message-ID: <16945.948.350317.549743@cargo.ozlabs.ibm.com> Ingo Oeser writes: > Why not putting the required information into the AUX table > when executing your ELF programs? I loved this feature in the > ix86 arch. We do put an AT_HWCAP entry in the aux table, which is a bitmap of features supported by the cpu. But for some applications, such as programming the performance monitor hardware, you need to know the specific CPU model and version, and this is a way to provide that information. Paul. From paulus at samba.org Fri Mar 11 14:02:17 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Mar 2005 14:02:17 +1100 Subject: [PATCH] AGP support for powermac G5 Message-ID: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> This patch adds AGP support for the U3 northbridge used in Apple G5 machines to drivers/char/agp/uninorth-agp.c. This patch is based on earlier work by Jerome Glisse. With this patch, the driver works in both ppc32 and ppc64 kernels. Signed-off-by: Paul Mackerras diff -urN linux-2.5/drivers/char/agp/Kconfig g5/drivers/char/agp/Kconfig --- linux-2.5/drivers/char/agp/Kconfig 2005-03-07 14:01:44.000000000 +1100 +++ g5/drivers/char/agp/Kconfig 2005-03-11 13:53:47.000000000 +1100 @@ -1,6 +1,6 @@ config AGP tristate "/dev/agpgart (AGP Support)" if !GART_IOMMU - depends on ALPHA || IA64 || PPC32 || X86 + depends on ALPHA || IA64 || PPC || X86 default y if GART_IOMMU ---help--- AGP (Accelerated Graphics Port) is a bus system mainly used to @@ -146,11 +146,11 @@ default AGP config AGP_UNINORTH - tristate "Apple UniNorth AGP support" + tristate "Apple UniNorth & U3 AGP support" depends on AGP && PPC_PMAC help This option gives you AGP support for Apple machines with a - UniNorth bridge. + UniNorth or U3 (Apple G5) bridge. config AGP_EFFICEON tristate "Transmeta Efficeon support" diff -urN linux-2.5/drivers/char/agp/uninorth-agp.c g5/drivers/char/agp/uninorth-agp.c --- linux-2.5/drivers/char/agp/uninorth-agp.c 2005-03-11 11:47:37.000000000 +1100 +++ g5/drivers/char/agp/uninorth-agp.c 2005-03-11 11:54:54.000000000 +1100 @@ -9,8 +9,23 @@ #include #include #include +#include #include "agp.h" +/* + * NOTES for uninorth3 (G5 AGP) supports : + * + * There maybe also possibility to have bigger cache line size for + * agp (see pmac_pci.c and look for cache line). Need to be investigated + * by someone. + * + * PAGE size are hardcoded but this may change, see asm/page.h. + * + * Jerome Glisse + */ +static int uninorth_rev; +static int is_u3; + static int uninorth_fetch_size(void) { int i; @@ -40,14 +55,20 @@ static void uninorth_tlbflush(struct agp_memory *mem) { + u32 ctrl = UNI_N_CFG_GART_ENABLE; + + if (is_u3) + ctrl |= U3_N_CFG_GART_PERFRD; pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_INVAL); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_2xRESET); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE); + ctrl | UNI_N_CFG_GART_INVAL); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, ctrl); + + if (uninorth_rev <= 0x30) { + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + ctrl | UNI_N_CFG_GART_2xRESET); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + ctrl); + } } static void uninorth_cleanup(void) @@ -57,14 +78,16 @@ pci_read_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, &tmp); if (!(tmp & UNI_N_CFG_GART_ENABLE)) return; - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_ENABLE | UNI_N_CFG_GART_INVAL); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - 0); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - UNI_N_CFG_GART_2xRESET); - pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, - 0); + tmp |= UNI_N_CFG_GART_INVAL; + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, tmp); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, 0); + + if (uninorth_rev <= 0x30) { + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + UNI_N_CFG_GART_2xRESET); + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_GART_CTRL, + 0); + } } static int uninorth_configure(void) @@ -87,8 +110,21 @@ * the AGP aperture isn't mapped at bus physical address 0 */ agp_bridge->gart_bus_addr = 0; +#ifdef CONFIG_PPC64 + /* Assume U3 or later on PPC64 systems */ + /* high 4 bits of GART physical address go in UNI_N_CFG_AGP_BASE */ + pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_AGP_BASE, + (agp_bridge->gatt_bus_addr >> 32) & 0xf); +#else pci_write_config_dword(agp_bridge->dev, UNI_N_CFG_AGP_BASE, agp_bridge->gart_bus_addr); +#endif + + if (is_u3) { + pci_write_config_dword(agp_bridge->dev, + UNI_N_CFG_GART_DUMMY_PAGE, + agp_bridge->scratch_page_real >> 12); + } return 0; } @@ -111,13 +147,14 @@ j = pg_start; while (j < (pg_start + mem->page_count)) { - if (!PGE_EMPTY(agp_bridge, agp_bridge->gatt_table[j])) + if (agp_bridge->gatt_table[j]) return -EBUSY; j++; } for (i = 0, j = pg_start; i < mem->page_count; i++, j++) { - agp_bridge->gatt_table[j] = cpu_to_le32((mem->memory[i] & 0xfffff000) | 0x00000001UL); + agp_bridge->gatt_table[j] = + cpu_to_le32((mem->memory[i] & 0xFFFFF000UL) | 0x1UL); flush_dcache_range((unsigned long)__va(mem->memory[i]), (unsigned long)__va(mem->memory[i])+0x1000); } @@ -130,17 +167,90 @@ return 0; } +static int u3_insert_memory(struct agp_memory *mem, off_t pg_start, int type) +{ + int i, num_entries; + void *temp; + u32 *gp; + + temp = agp_bridge->current_size; + num_entries = A_SIZE_32(temp)->num_entries; + + if (type != 0 || mem->type != 0) + /* We know nothing of memory types */ + return -EINVAL; + if ((pg_start + mem->page_count) > num_entries) + return -EINVAL; + + gp = (u32 *) &agp_bridge->gatt_table[pg_start]; + for (i = 0; i < mem->page_count; ++i) { + if (gp[i]) { + printk("u3_insert_memory: entry 0x%x occupied (%x)\n", + i, gp[i]); + return -EBUSY; + } + } + + for (i = 0; i < mem->page_count; i++) { + gp[i] = (mem->memory[i] >> PAGE_SHIFT) | 0x80000000UL; + flush_dcache_range((unsigned long)__va(mem->memory[i]), + (unsigned long)__va(mem->memory[i])+0x1000); + } + mb(); + flush_dcache_range((unsigned long)gp, (unsigned long) &gp[i]); + uninorth_tlbflush(mem); + + return 0; +} + +int u3_remove_memory(struct agp_memory *mem, off_t pg_start, int type) +{ + size_t i; + u32 *gp; + + if (type != 0 || mem->type != 0) + /* We know nothing of memory types */ + return -EINVAL; + + gp = (u32 *) &agp_bridge->gatt_table[pg_start]; + for (i = 0; i < mem->page_count; ++i) + gp[i] = 0; + mb(); + flush_dcache_range((unsigned long)gp, (unsigned long) &gp[i]); + uninorth_tlbflush(mem); + + return 0; +} + static void uninorth_agp_enable(struct agp_bridge_data *bridge, u32 mode) { - u32 command, scratch; + u32 command, scratch, status; int timeout; pci_read_config_dword(bridge->dev, bridge->capndx + PCI_AGP_STATUS, - &command); + &status); - command = agp_collect_device_status(bridge, mode, command); - command |= 0x100; + command = agp_collect_device_status(bridge, mode, status); + command |= PCI_AGP_COMMAND_AGP; + + if (uninorth_rev == 0x21) { + /* + * Darwin disable AGP 4x on this revision, thus we + * may assume it's broken. This is an AGP2 controller. + */ + command &= ~AGPSTAT2_4X; + } + + if ((uninorth_rev >= 0x30) && (uninorth_rev <= 0x33)) { + /* + * We need to to set REQ_DEPTH to 7 for U3 versions 1.0, 2.1, + * 2.2 and 2.3, Darwin do so. + */ + if ((command >> AGPSTAT_RQ_DEPTH_SHIFT) > 7) + command = (command & ~AGPSTAT_RQ_DEPTH) + | (7 << AGPSTAT_RQ_DEPTH_SHIFT); + } uninorth_tlbflush(NULL); @@ -152,11 +262,17 @@ pci_read_config_dword(bridge->dev, bridge->capndx + PCI_AGP_COMMAND, &scratch); - } while ((scratch & 0x100) == 0 && ++timeout < 1000); - if ((scratch & 0x100) == 0) + } while ((scratch & PCI_AGP_COMMAND_AGP) == 0 && ++timeout < 1000); + if ((scratch & PCI_AGP_COMMAND_AGP) == 0) printk(KERN_ERR PFX "failed to write UniNorth AGP command reg\n"); - agp_device_command(command, 0); + if (uninorth_rev >= 0x30) { + /* This is an AGP V3 */ + agp_device_command(command, (status & AGPSTAT_MODE_3_0)); + } else { + /* AGP V2 */ + agp_device_command(command, 0); + } uninorth_tlbflush(NULL); } @@ -229,12 +345,12 @@ struct page *page; /* We can't handle 2 level gatt's */ - if (agp_bridge->driver->size_type == LVL2_APER_SIZE) + if (bridge->driver->size_type == LVL2_APER_SIZE) return -EINVAL; table = NULL; - i = agp_bridge->aperture_size_idx; - temp = agp_bridge->current_size; + i = bridge->aperture_size_idx; + temp = bridge->current_size; size = page_order = num_entries = 0; do { @@ -246,11 +362,11 @@ if (table == NULL) { i++; - agp_bridge->current_size = A_IDX32(agp_bridge); + bridge->current_size = A_IDX32(bridge); } else { - agp_bridge->aperture_size_idx = i; + bridge->aperture_size_idx = i; } - } while (!table && (i < agp_bridge->driver->num_aperture_sizes)); + } while (!table && (i < bridge->driver->num_aperture_sizes)); if (table == NULL) return -ENOMEM; @@ -260,14 +376,12 @@ for (page = virt_to_page(table); page <= virt_to_page(table_end); page++) SetPageReserved(page); - agp_bridge->gatt_table_real = (u32 *) table; - agp_bridge->gatt_table = (u32 *)table; - agp_bridge->gatt_bus_addr = virt_to_phys(table); - - for (i = 0; i < num_entries; i++) { - agp_bridge->gatt_table[i] = - (unsigned long) agp_bridge->scratch_page; - } + bridge->gatt_table_real = (u32 *) table; + bridge->gatt_table = (u32 *)table; + bridge->gatt_bus_addr = virt_to_phys(table); + + for (i = 0; i < num_entries; i++) + bridge->gatt_table[i] = 0; flush_dcache_range((unsigned long)table, (unsigned long)table_end); @@ -281,7 +395,7 @@ void *temp; struct page *page; - temp = agp_bridge->current_size; + temp = bridge->current_size; page_order = A_SIZE_32(temp)->page_order; /* Do not worry about freeing memory, because if this is @@ -289,13 +403,13 @@ * from the table. */ - table = (char *) agp_bridge->gatt_table_real; + table = (char *) bridge->gatt_table_real; table_end = table + ((PAGE_SIZE * (1 << page_order)) - 1); for (page = virt_to_page(table); page <= virt_to_page(table_end); page++) ClearPageReserved(page); - free_pages((unsigned long) agp_bridge->gatt_table_real, page_order); + free_pages((unsigned long) bridge->gatt_table_real, page_order); return 0; } @@ -320,6 +434,22 @@ {4, 1024, 0, 1} }; +/* + * Not sure that u3 supports that high aperture sizes but it + * would strange if it did not :) + */ +static struct aper_size_info_32 u3_sizes[8] = +{ + {512, 131072, 7, 128}, + {256, 65536, 6, 64}, + {128, 32768, 5, 32}, + {64, 16384, 4, 16}, + {32, 8192, 3, 8}, + {16, 4096, 2, 4}, + {8, 2048, 1, 2}, + {4, 1024, 0, 1} +}; + struct agp_bridge_driver uninorth_agp_driver = { .owner = THIS_MODULE, .aperture_sizes = (void *)uninorth_sizes, @@ -344,6 +474,31 @@ .cant_use_aperture = 1, }; +struct agp_bridge_driver u3_agp_driver = { + .owner = THIS_MODULE, + .aperture_sizes = (void *)u3_sizes, + .size_type = U32_APER_SIZE, + .num_aperture_sizes = 8, + .configure = uninorth_configure, + .fetch_size = uninorth_fetch_size, + .cleanup = uninorth_cleanup, + .tlb_flush = uninorth_tlbflush, + .mask_memory = agp_generic_mask_memory, + .masks = NULL, + .cache_flush = null_cache_flush, + .agp_enable = uninorth_agp_enable, + .create_gatt_table = uninorth_create_gatt_table, + .free_gatt_table = uninorth_free_gatt_table, + .insert_memory = u3_insert_memory, + .remove_memory = u3_remove_memory, + .alloc_by_type = agp_generic_alloc_by_type, + .free_by_type = agp_generic_free_by_type, + .agp_alloc_page = agp_generic_alloc_page, + .agp_destroy_page = agp_generic_destroy_page, + .cant_use_aperture = 1, + .needs_scratch_page = 1, +}; + static struct agp_device_ids uninorth_agp_device_ids[] __devinitdata = { { .device_id = PCI_DEVICE_ID_APPLE_UNI_N_AGP, @@ -361,6 +516,18 @@ .device_id = PCI_DEVICE_ID_APPLE_UNI_N_AGP2, .chipset_name = "UniNorth 2", }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3_AGP, + .chipset_name = "U3", + }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3L_AGP, + .chipset_name = "U3L", + }, + { + .device_id = PCI_DEVICE_ID_APPLE_U3H_AGP, + .chipset_name = "U3H", + }, }; static int __devinit agp_uninorth_probe(struct pci_dev *pdev, @@ -368,6 +535,7 @@ { struct agp_device_ids *devs = uninorth_agp_device_ids; struct agp_bridge_data *bridge; + struct device_node *uninorth_node; u8 cap_ptr; int j; @@ -389,13 +557,36 @@ return -ENODEV; found: + /* Set revision to 0 if we could not read it. */ + uninorth_rev = 0; + is_u3 = 0; + /* Locate core99 Uni-N */ + uninorth_node = of_find_node_by_name(NULL, "uni-n"); + /* Locate G5 u3 */ + if (uninorth_node == NULL) { + is_u3 = 1; + uninorth_node = of_find_node_by_name(NULL, "u3"); + } + if (uninorth_node) { + int *revprop = (int *) + get_property(uninorth_node, "device-rev", NULL); + if (revprop != NULL) + uninorth_rev = *revprop & 0x3f; + of_node_put(uninorth_node); + } + bridge = agp_alloc_bridge(); if (!bridge) return -ENOMEM; - bridge->driver = &uninorth_agp_driver; + if (is_u3) + bridge->driver = &u3_agp_driver; + else + bridge->driver = &uninorth_agp_driver; + bridge->dev = pdev; bridge->capndx = cap_ptr; + bridge->flags = AGP_ERRATA_FASTWRITES; /* Fill in the mode register */ pci_read_config_dword(pdev, cap_ptr+PCI_AGP_STATUS, &bridge->mode); diff -urN linux-2.5/include/asm-ppc/uninorth.h g5/include/asm-ppc/uninorth.h --- linux-2.5/include/asm-ppc/uninorth.h 2005-01-31 17:22:37.000000000 +1100 +++ g5/include/asm-ppc/uninorth.h 2005-03-11 11:54:54.000000000 +1100 @@ -27,13 +27,18 @@ #define UNI_N_CFG_AGP_BASE 0x90 #define UNI_N_CFG_GART_CTRL 0x94 #define UNI_N_CFG_INTERNAL_STATUS 0x98 +#define UNI_N_CFG_GART_DUMMY_PAGE 0xa4 /* UNI_N_CFG_GART_CTRL bits definitions */ -/* Not U3 */ #define UNI_N_CFG_GART_INVAL 0x00000001 #define UNI_N_CFG_GART_ENABLE 0x00000100 #define UNI_N_CFG_GART_2xRESET 0x00010000 #define UNI_N_CFG_GART_DISSBADET 0x00020000 +/* The following seems to only be used only on U3 */ +#define U3_N_CFG_GART_SYNCMODE 0x00040000 +#define U3_N_CFG_GART_PERFRD 0x00080000 +#define U3_N_CFG_GART_B2BGNT 0x00200000 +#define U3_N_CFG_GART_FASTDDR 0x00400000 /* My understanding of UniNorth AGP as of UniNorth rev 1.0x, * revision 1.5 (x4 AGP) may need further changes. diff -urN linux-2.5/include/asm-ppc64/agp.h g5/include/asm-ppc64/agp.h --- /dev/null 2005-03-10 17:27:14.905983648 +1100 +++ g5/include/asm-ppc64/agp.h 2005-03-11 11:54:54.000000000 +1100 @@ -0,0 +1,13 @@ +#ifndef AGP_H +#define AGP_H 1 + +#include + +/* nothing much needed here */ + +#define map_page_into_agp(page) +#define unmap_page_from_agp(page) +#define flush_agp_mappings() +#define flush_agp_cache() mb() + +#endif diff -urN linux-2.5/include/linux/pci_ids.h g5/include/linux/pci_ids.h --- linux-2.5/include/linux/pci_ids.h 2005-03-11 11:47:38.000000000 +1100 +++ g5/include/linux/pci_ids.h 2005-03-11 11:54:54.000000000 +1100 @@ -876,10 +876,13 @@ #define PCI_DEVICE_ID_APPLE_IPID_ATA100 0x003b #define PCI_DEVICE_ID_APPLE_KEYLARGO_I 0x003e #define PCI_DEVICE_ID_APPLE_K2_ATA100 0x0043 +#define PCI_DEVICE_ID_APPLE_U3_AGP 0x004b #define PCI_DEVICE_ID_APPLE_K2_GMAC 0x004c #define PCI_DEVICE_ID_APPLE_SH_ATA 0x0050 #define PCI_DEVICE_ID_APPLE_SH_SUNGEM 0x0051 #define PCI_DEVICE_ID_APPLE_SH_FW 0x0052 +#define PCI_DEVICE_ID_APPLE_U3L_AGP 0x0058 +#define PCI_DEVICE_ID_APPLE_U3H_AGP 0x0059 #define PCI_DEVICE_ID_APPLE_TIGON3 0x1645 #define PCI_VENDOR_ID_YAMAHA 0x1073 From benh at kernel.crashing.org Fri Mar 11 14:01:26 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 11 Mar 2005 14:01:26 +1100 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050310005142.31309.45788.99418@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> Message-ID: <1110510087.32525.334.camel@gaston> On Wed, 2005-03-09 at 18:51 -0600, Nathan Lynch wrote: > At boot we recurse through the device tree "fixing up" various fields > and properties in the device nodes. Long ago, to support DLPAR and > hotplug, we largely duplicated some of this fixup code, the main > difference being that the new code used kmalloc for allocating various > data structures which are attached to the new device nodes. > > This patch kills most of the duplicated code and makes finish_node, > finish_node_interrupts, and interpret_pci_props suitable for use at > runtime. These functions, if passed a null mem_start argument, will > use kmalloc for allocating extra data structures for the device node > being processed. Not terribly elegant, but it seems worth it to get > rid of the duplicated code (and bugs). Maybe hide that logic in a macro or inline ? Ben. From ncunningham at cyclades.com Fri Mar 11 14:52:47 2005 From: ncunningham at cyclades.com (Nigel Cunningham) Date: Fri, 11 Mar 2005 14:52:47 +1100 Subject: [PATCH] AGP support for powermac G5 In-Reply-To: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> References: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> Message-ID: <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> Hi. On Fri, 2005-03-11 at 14:02, Paul Mackerras wrote: > +struct agp_bridge_driver u3_agp_driver = { > + .owner = THIS_MODULE, > + .aperture_sizes = (void *)u3_sizes, > + .size_type = U32_APER_SIZE, > + .num_aperture_sizes = 8, > + .configure = uninorth_configure, > + .fetch_size = uninorth_fetch_size, > + .cleanup = uninorth_cleanup, > + .tlb_flush = uninorth_tlbflush, > + .mask_memory = agp_generic_mask_memory, > + .masks = NULL, > + .cache_flush = null_cache_flush, > + .agp_enable = uninorth_agp_enable, > + .create_gatt_table = uninorth_create_gatt_table, > + .free_gatt_table = uninorth_free_gatt_table, > + .insert_memory = u3_insert_memory, > + .remove_memory = u3_remove_memory, > + .alloc_by_type = agp_generic_alloc_by_type, > + .free_by_type = agp_generic_free_by_type, > + .agp_alloc_page = agp_generic_alloc_page, > + .agp_destroy_page = agp_generic_destroy_page, > + .cant_use_aperture = 1, > + .needs_scratch_page = 1, > +}; > + No power management support? :> Regards, Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net From paulus at samba.org Fri Mar 11 15:02:11 2005 From: paulus at samba.org (Paul Mackerras) Date: Fri, 11 Mar 2005 15:02:11 +1100 Subject: [PATCH] AGP support for powermac G5 In-Reply-To: <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> References: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> Message-ID: <16945.6211.331369.393573@cargo.ozlabs.ibm.com> Nigel Cunningham writes: > No power management support? :> The suspend/resume methods are in the pci_driver struct, not the agp_bridge_driver struct. Not that we have suspend/resume on the G5 yet. Paul. From ncunningham at cyclades.com Fri Mar 11 15:08:19 2005 From: ncunningham at cyclades.com (Nigel Cunningham) Date: Fri, 11 Mar 2005 15:08:19 +1100 Subject: [PATCH] AGP support for powermac G5 In-Reply-To: <16945.6211.331369.393573@cargo.ozlabs.ibm.com> References: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> <16945.6211.331369.393573@cargo.ozlabs.ibm.com> Message-ID: <1110514099.3049.47.camel@desktop.cunningham.myip.net.au> Hi. On Fri, 2005-03-11 at 15:02, Paul Mackerras wrote: > Nigel Cunningham writes: > > > No power management support? :> > > The suspend/resume methods are in the pci_driver struct, not the > agp_bridge_driver struct. Not that we have suspend/resume on the G5 > yet. Ah. Thought I'd seen some in others. Humble apologies. Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net From benh at kernel.crashing.org Fri Mar 11 15:29:02 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 11 Mar 2005 15:29:02 +1100 Subject: [PATCH] AGP support for powermac G5 In-Reply-To: <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> References: <16945.2617.625095.404994@cargo.ozlabs.ibm.com> <1110513167.3049.45.camel@desktop.cunningham.myip.net.au> Message-ID: <1110515343.32524.343.camel@gaston> > > No power management support? :> Heh, not yet :) We can't really put a G5 to sleep yet. I haven't figured out the magic incantations for the PMU chip on those. Ben. From sfr at canb.auug.org.au Fri Mar 11 17:25:11 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 11 Mar 2005 17:25:11 +1100 Subject: inappropriate use of in_atomic() In-Reply-To: <20050310204006.48286d17.akpm@osdl.org> References: <20050310204006.48286d17.akpm@osdl.org> Message-ID: <20050311172511.1fa0919e.sfr@canb.auug.org.au> Hi Andrew, On Thu, 10 Mar 2005 20:40:06 -0800 Andrew Morton wrote: > > in_atomic() is not a reliable indication of whether it is currently safe > to call schedule(). > > arch/ppc64/kernel/viopath.c in_atomic() in viopath.c was just used to determine if we had initialised enough to be able to wait in a semaphore (i.e. schedule). Thus it can be replaced now with checking system_state for SYSTEM_RUNNING. Signed-off-by: Stephen Rothwell Test booted on iSeries (which is the only place it is used). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus/arch/ppc64/kernel/viopath.c linus-in_atomic/arch/ppc64/kernel/viopath.c --- linus/arch/ppc64/kernel/viopath.c 2005-01-22 06:09:01.000000000 +1100 +++ linus-in_atomic/arch/ppc64/kernel/viopath.c 2005-03-11 17:19:45.000000000 +1100 @@ -79,7 +79,7 @@ static void handleMonitorEvent(struct Hv /* * We use this structure to handle asynchronous responses. The caller * blocks on the semaphore and the handler posts the semaphore. However, - * if in_atomic() is true in the caller, then wait_atomic is used ... + * if system_state is not SYSTEM_RUNNING, then wait_atomic is used ... */ struct doneAllocParms_t { struct semaphore *sem; @@ -465,7 +465,7 @@ static int allocateEvents(HvLpIndex remo DECLARE_MUTEX_LOCKED(Semaphore); atomic_t wait_atomic; - if (in_atomic()) { + if (system_state != SYSTEM_RUNNING) { parms.used_wait_atomic = 1; atomic_set(&wait_atomic, 1); parms.wait_atomic = &wait_atomic; @@ -475,7 +475,7 @@ static int allocateEvents(HvLpIndex remo } mf_allocate_lp_events(remoteLp, HvLpEvent_Type_VirtualIo, 250, /* It would be nice to put a real number here! */ numEvents, &viopath_donealloc, &parms); - if (in_atomic()) { + if (system_state != SYSTEM_RUNNING) { while (atomic_read(&wait_atomic)) mb(); } else -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050311/266c2f0e/attachment.pgp From arnd at arndb.de Fri Mar 11 22:45:37 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Fri, 11 Mar 2005 12:45:37 +0100 Subject: [PATCH] ppc64: kill might_sleep() warnings in __copy_*_user_inatomic In-Reply-To: <20050310233932.GA26823@austin.ibm.com> References: <200503102054.38123.arnd@arndb.de> <20050310233932.GA26823@austin.ibm.com> Message-ID: <200503111245.39257.arnd@arndb.de> On Freedag 11 M?rz 2005 00:39, Olof Johansson wrote: > Actually, I think I would prefer the following. It renames current > __copy_{to,from}_user to __copy_{to,from}_user_inatomic, adds the > old ones as inlines doing the ?might_sleep() and calling the inatomics > afterwards. This way the calls to __copy_{to,from}_user() will be caught > if called under lock or preemption as well. This is also how i386 does it. Yes, that solution is better than mine. However, you missed the case where __copy_{to,from}_user_inatomic calls __{get,put}_user_size, which in turn does might_sleep(). I now changed the {get,put}_user path accordingly. I have checked that this version boots and does not warn about futex. Arnd <>< --- This implements the __copy_{to,from}_user_inatomic() functions on ppc64. The only difference between the inatomic and regular version is that inatomic does not call might_sleep() to detect possible faults while holding locks/elevated preempt counts. Signed-off-by: Olof Johansson Signed-off-by: Arnd Bergmann -------------- next part -------------- A non-text attachment was scrubbed... Name: uaccess-might-sleep-3.diff Type: text/x-diff Size: 3367 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050311/57d29cec/attachment.diff From moilanen at austin.ibm.com Sat Mar 12 01:01:31 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Fri, 11 Mar 2005 08:01:31 -0600 Subject: [PATCH 2/2] No-exec support for ppc64 In-Reply-To: <1110494668.32525.283.camel@gaston> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308171326.3d72363a.moilanen@austin.ibm.com> <20050310032507.GC20789@austin.ibm.com> <1110438934.32524.203.camel@gaston> <20050310162721.19003dac.moilanen@austin.ibm.com> <1110494668.32525.283.camel@gaston> Message-ID: <20050311080131.24419bd4.moilanen@austin.ibm.com> On Fri, 11 Mar 2005 09:44:28 +1100 Benjamin Herrenschmidt wrote: > > > > /* Free memory returned from module_alloc */ > > diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c > > --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-10 13:54:14 -06:00 > > +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-10 13:54:14 -06:00 > > @@ -76,6 +76,13 @@ static int store_updates_sp(struct pt_re > > return 0; > > } > > > > +pte_t *lookup_address(unsigned long address) > > +{ > > + pgd_t *pgd = pgd_offset_k(address); > > + > > + return find_linux_pte(pgd, address); > > +} > > static please, even inline in this case. > > I've removed Andrew from CC upon his request, Paul, Anton or I will > forward to him when it's ready, no need to clobber his mailbox in the > meantime. 3rd time is a charm. Signed-off-by: Jake Moilanen --- linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c | 4 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c | 3 +- linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c | 19 ++++++++++++++++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c | 19 ++++++++++------ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h | 1 linux-2.6-bk-moilanen/include/asm-ppc64/sections.h | 9 +++++++ 6 files changed, 48 insertions(+), 7 deletions(-) diff -puN arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 arch/ppc64/kernel/iSeries_setup.c --- linux-2.6-bk/arch/ppc64/kernel/iSeries_setup.c~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/iSeries_setup.c 2005-03-11 07:50:39 -06:00 @@ -633,6 +633,10 @@ static void __init iSeries_bolt_kernel(u unsigned long vpn = va >> PAGE_SHIFT; unsigned long slot = HvCallHpt_findValid(&hpte, vpn); + /* Make non-kernel text non-executable */ + if (!in_kernel_text(ea)) + mode_rw |= HW_NO_EXEC; + if (hpte.dw0.dw0.v) { /* HPTE exists, so just bolt it */ HvCallHpt_setSwBits(slot, 0x10, 0); diff -puN arch/ppc64/kernel/module.c~nx-kernel-ppc64 arch/ppc64/kernel/module.c --- linux-2.6-bk/arch/ppc64/kernel/module.c~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/kernel/module.c 2005-03-11 07:50:39 -06:00 @@ -102,7 +102,8 @@ void *module_alloc(unsigned long size) { if (size == 0) return NULL; - return vmalloc(size); + + return vmalloc_exec(size); } /* Free memory returned from module_alloc */ diff -puN arch/ppc64/mm/fault.c~nx-kernel-ppc64 arch/ppc64/mm/fault.c --- linux-2.6-bk/arch/ppc64/mm/fault.c~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/fault.c 2005-03-11 07:50:57 -06:00 @@ -76,6 +76,13 @@ static int store_updates_sp(struct pt_re return 0; } +static inline pte_t *lookup_address(unsigned long address) +{ + pgd_t *pgd = pgd_offset_k(address); + + return find_linux_pte(pgd, address); +} + /* * The error_code parameter is * - DSISR for a non-SLB data access fault, @@ -94,6 +101,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long is_write = error_code & 0x02000000; unsigned long trap = TRAP(regs); unsigned long is_exec = trap == 0x400; + pte_t *ptep; BUG_ON((trap == 0x380) || (trap == 0x480)); @@ -253,6 +261,17 @@ bad_area_nosemaphore: info.si_addr = (void __user *) address; force_sig_info(SIGSEGV, &info, current); return 0; + } + + ptep = lookup_address(address); + + if (ptep && pte_present(*ptep) && !pte_exec(*ptep)) { + if (printk_ratelimit()) + printk(KERN_CRIT "kernel tried to execute NX-protected " + "page - exploit attempt? (uid: %d)\n", + current->uid); + show_stack(current, (unsigned long *)__get_SP()); + do_exit(SIGKILL); } return SIGSEGV; diff -puN arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 arch/ppc64/mm/hash_utils.c --- linux-2.6-bk/arch/ppc64/mm/hash_utils.c~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/arch/ppc64/mm/hash_utils.c 2005-03-11 07:59:53 -06:00 @@ -51,6 +51,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -95,6 +96,7 @@ static inline void create_pte_mapping(un { unsigned long addr; unsigned int step; + unsigned long tmp_mode; if (large) step = 16*MB; @@ -112,6 +114,13 @@ static inline void create_pte_mapping(un else vpn = va >> PAGE_SHIFT; + + tmp_mode = mode; + + /* Make non-kernel text non-executable */ + if (!in_kernel_text(addr)) + tmp_mode = mode | HW_NO_EXEC; + hash = hpt_hash(vpn, large); hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP); @@ -120,12 +129,12 @@ static inline void create_pte_mapping(un if (systemcfg->platform & PLATFORM_LPAR) ret = pSeries_lpar_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); else #endif /* CONFIG_PPC_PSERIES */ ret = native_hpte_insert(hpteg, va, virt_to_abs(addr) >> PAGE_SHIFT, - 0, mode, 1, large); + 0, tmp_mode, 1, large); if (ret == -1) { ppc64_terminate_msg(0x20, "create_pte_mapping"); @@ -238,8 +247,6 @@ unsigned int hash_page_do_lazy_icache(un { struct page *page; -#define PPC64_HWNOEXEC (1 << 2) - if (!pfn_valid(pte_pfn(pte))) return pp; @@ -250,8 +257,8 @@ unsigned int hash_page_do_lazy_icache(un if (trap == 0x400) { __flush_dcache_icache(page_address(page)); set_bit(PG_arch_1, &page->flags); - } else - pp |= PPC64_HWNOEXEC; + } else + pp |= HW_NO_EXEC; } return pp; } diff -puN include/asm-ppc64/pgtable.h~nx-kernel-ppc64 include/asm-ppc64/pgtable.h --- linux-2.6-bk/include/asm-ppc64/pgtable.h~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/pgtable.h 2005-03-11 07:50:39 -06:00 @@ -117,6 +117,7 @@ #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER) #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC) #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_WRENABLE) +#define PAGE_KERNEL_EXEC __pgprot(_PAGE_BASE | _PAGE_WRENABLE | _PAGE_EXEC) #define HW_NO_EXEC _PAGE_EXEC /* This is used when the bit is * inverted, even though it's the diff -puN include/asm-ppc64/sections.h~nx-kernel-ppc64 include/asm-ppc64/sections.h --- linux-2.6-bk/include/asm-ppc64/sections.h~nx-kernel-ppc64 2005-03-11 07:50:39 -06:00 +++ linux-2.6-bk-moilanen/include/asm-ppc64/sections.h 2005-03-11 07:50:39 -06:00 @@ -17,4 +17,13 @@ extern char _end[]; #define __openfirmware #define __openfirmwaredata + +static inline int in_kernel_text(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end) + return 1; + + return 0; +} + #endif _ From jschopp at austin.ibm.com Sat Mar 12 03:15:01 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Fri, 11 Mar 2005 10:15:01 -0600 Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050311013047.GE21853@otto> References: <20050310005132.31309.65485.31668@otto> <20050310005142.31309.45788.99418@otto> <4230A2F2.7020403@austin.ibm.com> <20050311013047.GE21853@otto> Message-ID: <4231C405.8040703@austin.ibm.com> >> >>>-static int of_finish_dynamic_node(struct device_node *node) >>>+static int of_finish_dynamic_node(struct device_node *node, >>>+ unsigned long *unused1, int unused2, >>>+ int unused3, int unused4) >>>{ >> >> >>Is there a reason for these 4 unused fields that I am just missing? >> > > > In order for it to be correctly used as an argument to finish_node, > of_finish_dynamic_node needs to have a definition compatible with the > interpret_func typedef. OK. I'm good with the patch then. From olof at austin.ibm.com Sat Mar 12 07:30:44 2005 From: olof at austin.ibm.com (Olof Johansson) Date: Fri, 11 Mar 2005 14:30:44 -0600 Subject: [PATCH] ppc64: kill might_sleep() warnings in __copy_*_user_inatomic In-Reply-To: <200503111245.39257.arnd@arndb.de> References: <200503102054.38123.arnd@arndb.de> <20050310233932.GA26823@austin.ibm.com> <200503111245.39257.arnd@arndb.de> Message-ID: <20050311203044.GC6086@austin.ibm.com> On Fri, Mar 11, 2005 at 12:45:37PM +0100, Arnd Bergmann wrote: > On Freedag 11 M?rz 2005 00:39, Olof Johansson wrote: > > Actually, I think I would prefer the following. It renames current > > __copy_{to,from}_user to __copy_{to,from}_user_inatomic, adds the > > old ones as inlines doing the ?might_sleep() and calling the inatomics > > afterwards. This way the calls to __copy_{to,from}_user() will be caught > > if called under lock or preemption as well. This is also how i386 does it. > > Yes, that solution is better than mine. However, you missed the case where > __copy_{to,from}_user_inatomic calls __{get,put}_user_size, which in turn > does might_sleep(). I now changed the {get,put}_user path accordingly. > > I have checked that this version boots and does not warn about futex. Doh! Great, thanks. > --- > This implements the __copy_{to,from}_user_inatomic() functions on ppc64. > The only difference between the inatomic and regular version is that > inatomic does not call might_sleep() to detect possible faults while > holding locks/elevated preempt counts. > > Signed-off-by: Olof Johansson > Signed-off-by: Arnd Bergmann Acked-by: Olof Johansson From linas at austin.ibm.com Sat Mar 12 08:22:16 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 11 Mar 2005 15:22:16 -0600 Subject: [PATCH] allow dynamic enablement of EEH In-Reply-To: <1109371857.27183.28.camel@sinatra.austin.ibm.com> References: <1109371857.27183.28.camel@sinatra.austin.ibm.com> Message-ID: <20050311212216.GJ1220@austin.ibm.com> Hi John, I just unearthed this patch .. sorry it took so long ... FWIW, its good to me... Signed-off-by: Linas Vepstas or should that be Approved-by: Linas Vepstas --linas On Fri, Feb 25, 2005 at 04:50:57PM -0600, John Rose was heard to remark: > EEH scans the system I/O adapters at boot for EEH-capabilities. If no > EEH-capable adapters are found, the subsystem is marked disabled for the > life of the system. EEH should allow dynamic enabling of the EEH > subsystem when hotplug-adding an adapter. > > Please apply, if appropriate. > > Thanks- > John > > Signed-off-by: John Rose > > diff -puN arch/ppc64/kernel/eeh.c~04_eeh_add_early arch/ppc64/kernel/eeh.c > --- 2_6_linus_2/arch/ppc64/kernel/eeh.c~04_eeh_add_early 2005-02-25 16:29:51.000000000 -0600 > +++ 2_6_linus_2-johnrose/arch/ppc64/kernel/eeh.c 2005-02-25 16:29:51.000000000 -0600 > @@ -808,7 +808,7 @@ void eeh_add_device_early(struct device_ > struct pci_controller *phb; > struct eeh_early_enable_info info; > > - if (!dn || !eeh_subsystem_enabled) > + if (!dn) > return; > phb = dn->phb; > if (NULL == phb || 0 == phb->buid) { > > _ > From linas at austin.ibm.com Sat Mar 12 12:32:51 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Fri, 11 Mar 2005 19:32:51 -0600 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <421E9D16.3000606@jp.fujitsu.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> Message-ID: <20050312013251.GA2609@austin.ibm.com> Hi, Appended is my current draft PCI Error Recovery patch. Per previous conversatios, it has moved some of the ppc64-specific error reporting code into generic PCI structures: see changes to include/linux/pci.h and a new file drivers/pci/pci-error.c. Note in particular the pci bus states enumerated in "enum pci_device_io_state"; BenH was suggesting having more of these ... BenH do you want to propose a "final list"? I named the generic pci error recovery routines "peh" because my brain froze. Better suggestions invited. The patch includes error recovery code for the IPR scsi device driver that uses the new generic PCI interfaces. There's also some prototype symbios scsi recovery code, but I haven't had a chance to test it due to hardware issues. Ignore the debug statements. The last chunk of this patch is ppc64 specific code; it uses the new generic interfaces where it can. Please review, comment, criticize and suggest. I am eager to get the pci-generic parts nailed down, and want to really start moving in a direction that would let this go into mainline. --linas p.s. It should apply cleanly to kernel.org 2.6.11 and will recover from pci errors sent to IPR and ethernet on power5 boxes. I haven't tested on power4. -------------- next part -------------- --- include/linux/pci.h.linas-orig 2005-03-09 02:11:40.000000000 -0600 +++ include/linux/pci.h 2005-03-11 18:00:46.000000000 -0600 @@ -659,6 +659,63 @@ struct pci_dynids { unsigned int use_driver_data:1; /* pci_driver->driver_data is used */ }; +/* ---------------------------------------------------------------- */ +/** PCI error recovery state. Whenever the PCI bus state changes, + * the io_state_change() callback will be called to notify the + * device driver os state changes. + */ + +enum pci_device_io_state { + pci_device_io_frozen = 1, /* I/O to device is blocked */ + pci_device_io_thawed, /* I/O te device is (re-)enabled */ + pci_device_io_perm_failure, /* pci card is dead */ +}; + +/** + * PCI Error notifier event flags. + */ +#define PEH_NOTIFY_ERROR 1 + +/** PEH event -- structure holding pci controller data that describes + * a change in the isolation status of a PCI slot. A pointer + * to this struct is passed as the data pointer in a notify callback. + */ +struct peh_event { + struct list_head list; + struct pci_dev *dev; /* affected device */ + enum pci_device_io_state state; /* PCI bus state for the affected device */ + int time_unavail; /* milliseconds until device might be available */ +}; + +/** + * peh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine builds a PCI error event which will be delivered + * to all listeners on the peh_notifier_chain. + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int peh_send_failure_event (struct pci_dev *dev, + enum pci_device_io_state state, + int time_unavail); + +/** + * peh_register_notifier - Register to find out about EEH events. + * @nb: notifier block to callback on events + */ +int peh_register_notifier(struct notifier_block *nb); + +/** + * peh_unregister_notifier - Unregister to an EEH event notifier. + * @nb: notifier block to callback on events + */ +int peh_unregister_notifier(struct notifier_block *nb); + +/* ---------------------------------------------------------------- */ + struct module; struct pci_driver { struct list_head node; @@ -670,6 +727,7 @@ struct pci_driver { int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */ int (*resume) (struct pci_dev *dev); /* Device woken up */ int (*enable_wake) (struct pci_dev *dev, u32 state, int enable); /* Enable wake event */ + int (*io_state_change) (struct pci_dev *, enum pci_device_io_state); /* state change */ struct device_driver driver; struct pci_dynids dynids; --- drivers/pci/Makefile.linas-orig 2005-03-09 02:12:50.000000000 -0600 +++ drivers/pci/Makefile 2005-03-11 18:19:29.000000000 -0600 @@ -3,7 +3,7 @@ # obj-y += access.o bus.o probe.o remove.o pci.o quirks.o \ - names.o pci-driver.o search.o pci-sysfs.o \ + names.o pci-driver.o pci-error.o search.o pci-sysfs.o \ rom.o obj-$(CONFIG_PROC_FS) += proc.o --- drivers/pci/pci-error.c.linas-orig 2005-03-11 18:21:20.000000000 -0600 +++ drivers/pci/pci-error.c 2005-03-11 18:23:47.000000000 -0600 @@ -0,0 +1,152 @@ +/* + * pci-error.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include + +#undef DEBUG + +/** Overview: + * PEH, or "PCI Error Handling" is a PCI bridge technology for + * dealing with PCI bus errors that can't be dealt with within the + * usual PCI framework, except by check-stopping the CPU. Systems + * that are designed for high-availability/reliability cannot afford + * to crash due to a "mere" PCI error, thus the need for PEH. + * An PEH-capable bridge operates by converting a detected error + * into a "slot freeze", taking the PCI adapter off-line, making + * the slot behave, from the OS'es point of view, as if the slot + * were "empty": all reads return 0xff's and all writes are silently + * ignored. PEH slot isolation events can be triggered by parity + * errors on the address or data busses (e.g. during posted writes), + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. + * + * Note, however, that one of the leading causes of PEH slot + * freeze events are buggy device drivers, buggy device microcode, + * or buggy device hardware. This is because any attempt by the + * device to bus-master data to a memory address that is not + * assigned to the device will trigger a slot freeze. (The idea + * is to prevent devices-gone-wild from corrupting system memory). + * Buggy hardware/drivers will have a miserable time co-existing + * with PEH. + */ + +/* PEH event workqueue setup. */ +static spinlock_t peh_eventlist_lock = SPIN_LOCK_UNLOCKED; +LIST_HEAD(peh_eventlist); +static void peh_event_handler(void *); +DECLARE_WORK(peh_event_wq, peh_event_handler, NULL); + +static struct notifier_block *peh_notifier_chain; + +/** + * peh_event_handler - dispatch PEH events. The detection of a frozen + * slot can occur inside an interrupt, where it can be hard to do + * anything about it. The goal of this routine is to pull these + * detection events out of the context of the interrupt handler, and + * re-dispatch them for processing at a later time in a normal context. + * + * @dummy - unused + */ +static void peh_event_handler(void *dummy) +{ + unsigned long flags; + struct peh_event *event; + + while (1) { + spin_lock_irqsave(&peh_eventlist_lock, flags); + event = NULL; + if (!list_empty(&peh_eventlist)) { + event = list_entry(peh_eventlist.next, struct peh_event, list); + list_del(&event->list); + } + spin_unlock_irqrestore(&peh_eventlist_lock, flags); + if (event == NULL) + break; + + printk(KERN_INFO "PEH: Detected PCI bus error on device " + "%s %s\n", + pci_name(event->dev), pci_pretty_name(event->dev)); + + notifier_call_chain (&peh_notifier_chain, + PEH_NOTIFY_ERROR, event); + + pci_dev_put(event->dev); + kfree(event); + } +} + + +/** + * peh_send_failure_event - generate a PCI error event + * @dev pci device + * + * This routine builds a PCI error event which will be delivered + * to all listeners on the peh_notifier_chain. + * + * This routine can be called within an interrupt context; + * the actual event will be delivered in a normal context + * (from a workqueue). + */ +int peh_send_failure_event (struct pci_dev *dev, + enum pci_device_io_state state, + int time_unavail) +{ + unsigned long flags; + struct peh_event *event; + + event = kmalloc(sizeof(*event), GFP_ATOMIC); + if (event == NULL) { + printk (KERN_ERR "PEH: out of memory, event not handled\n"); + return 1; + } + + event->dev = dev; + event->state = state; + event->time_unavail = time_unavail; + + /* We may or may not be called in an interrupt context */ + spin_lock_irqsave(&peh_eventlist_lock, flags); + list_add(&event->list, &peh_eventlist); + spin_unlock_irqrestore(&peh_eventlist_lock, flags); + + schedule_work(&peh_event_wq); + + return 0; +} + +/** + * peh_register_notifier - Register to find out about EEH events. + * @nb: notifier block to callback on events + */ +int peh_register_notifier(struct notifier_block *nb) +{ + return notifier_chain_register(&peh_notifier_chain, nb); +} + +/** + * peh_unregister_notifier - Unregister to an EEH event notifier. + * @nb: notifier block to callback on events + */ +int peh_unregister_notifier(struct notifier_block *nb) +{ + return notifier_chain_unregister(&peh_notifier_chain, nb); +} + + --- drivers/scsi/ipr.c.linas-orig 2005-03-09 02:13:17.000000000 -0600 +++ drivers/scsi/ipr.c 2005-03-10 14:54:27.000000000 -0600 @@ -80,6 +80,8 @@ #include #include #include + +#define CONFIG_SCSI_IPR_EEH #include "ipr.h" /* @@ -4993,6 +4995,7 @@ static int ipr_reset_start_bist(struct i return rc; } + /** * ipr_reset_allowed - Query whether or not IOA can be reset * @ioa_cfg: ioa config struct @@ -5306,6 +5309,68 @@ static void ipr_initiate_ioa_reset(struc shutdown_type); } +#ifdef CONFIG_SCSI_IPR_EEH + +/** If the PCI slot is frozen, hold off all i/o + * activity; then, as soon as the slot is available again, + * initiate an adapter reset. + */ +static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd) +{ + list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q); + ipr_cmd->done = ipr_reset_ioa_job; + return IPR_RC_JOB_RETURN; +} + +static void ipr_eeh_frozen (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static void ipr_eeh_thawed (struct pci_dev *pdev) +{ + unsigned long flags = 0; + struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev); + + spin_lock_irqsave(ioa_cfg->host->host_lock, flags); + _ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space, + IPR_SHUTDOWN_NONE); + spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags); +} + +static void ipr_eeh_perm_failure (struct pci_dev *pdev) +{ +#if 0 // XXXXXXXXXXXXXXXXXXXXXXX + ipr_cmd->job_step = ipr_reset_shutdown_ioa; + rc = IPR_RC_JOB_CONTINUE; +#endif +} + +static int ipr_io_state_change (struct pci_dev *pdev, + enum pci_device_io_state state) +{ + switch (state) { + case pci_device_io_frozen: + ipr_eeh_frozen (pdev); + break; + case pci_device_io_thawed: + ipr_eeh_thawed (pdev); + break; + case pci_device_io_perm_failure: + ipr_eeh_perm_failure (pdev); + break; + default: + break; + } + return 0; +} +#endif + /** * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..) * @ioa_cfg: ioa cfg struct @@ -6015,6 +6080,7 @@ static struct pci_driver ipr_driver = { .id_table = ipr_pci_table, .probe = ipr_probe, .remove = ipr_remove, + .io_state_change = ipr_io_state_change, .driver = { .shutdown = ipr_shutdown, }, --- drivers/scsi/ipr.h.linas-orig 2005-03-09 02:11:12.000000000 -0600 +++ drivers/scsi/ipr.h 2005-03-10 14:54:27.000000000 -0600 @@ -1132,9 +1132,11 @@ struct ipr_ucode_image_header { #define ipr_trace ipr_dbg("%s: %s: Line: %d\n",\ __FILE__, __FUNCTION__, __LINE__) +#undef IPR_DBG_TRACE +#define IPR_DBG_TRACE 1 #if IPR_DBG_TRACE -#define ENTER printk(KERN_INFO IPR_NAME": Entering %s\n", __FUNCTION__) -#define LEAVE printk(KERN_INFO IPR_NAME": Leaving %s\n", __FUNCTION__) +#define ENTER printk(KERN_INFO IPR_NAME": Entering %s jiffies=%lu\n", __FUNCTION__, jiffies) +#define LEAVE printk(KERN_INFO IPR_NAME": Leaving %s jiffies=%lu\n", __FUNCTION__, jiffies) #else #define ENTER #define LEAVE --- drivers/scsi/sym53c8xx_2/sym_glue.c.linas-orig 2005-03-09 02:13:09.000000000 -0600 +++ drivers/scsi/sym53c8xx_2/sym_glue.c 2005-03-11 18:54:19.000000000 -0600 @@ -770,6 +770,11 @@ static irqreturn_t sym53c8xx_intr(int ir struct sym_hcb *np = (struct sym_hcb *)dev_id; if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("["); +#define CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY + if (np->s.io_state != pci_device_io_thawed) + return IRQ_HANDLED; +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ spin_lock_irqsave(np->s.host->host_lock, flags); sym_interrupt(np); @@ -844,6 +849,27 @@ static void sym_eh_done(struct scsi_cmnd */ static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); } +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +static void sym_eeh_timeout(u_long p) +{ + struct sym_eh_wait *ep = (struct sym_eh_wait *) p; + if (!ep) + return; + complete(&ep->done); +} + +static void sym_eeh_done(struct sym_eh_wait *ep) +{ + if (!ep) + return; + ep->timed_out = 0; + if (!del_timer(&ep->timer)) + return; + + complete(&ep->done); +} +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ + /* * Generic method for our eh processing. * The 'op' argument tells what we have to do. @@ -905,6 +931,35 @@ prepare: sts = 0; break; case SYM_EH_HOST_RESET: +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +printk("duuuuuude attempting symbios recovery\n"); +dump_stack(); + int rc = eeh_slot_is_isolated (np->s.device); + +printk ("duude symbios is isolated ??=%d\n", rc); +printk ("duuude the current io state is %d\n", np->s.io_state); + if (rc) { + struct sym_eh_wait eeh, *eep = &eeh; + np->s.io_reset_wait = eep; + init_completion(&eep->done); + init_timer(&eep->timer); + eep->to_do = SYM_EH_DO_WAIT; + eep->timer.expires = jiffies + (10*HZ); + eep->timer.function = sym_eeh_timeout; + eep->timer.data = (u_long)eep; + eep->timed_out = 1; /* Be pessimistic for once :) */ + add_timer(&eep->timer); + spin_unlock_irq(np->s.host->host_lock); + wait_for_completion(&eep->done); + spin_lock_irq(np->s.host->host_lock); + if (eep->timed_out) { +printk ("duude symbios timed out\n"); + } else { +printk ("duude symbios waited for completion\n"); + } + np->s.io_reset_wait = NULL; + } +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ sym_reset_scsi_bus(np, 0); sym_start_up (np, 1); sts = 0; @@ -1577,6 +1632,23 @@ static int sym_setup_bus_dma_mask(struct return -1; } +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +int sym2_io_state_change (struct pci_dev *pdev, enum pci_device_io_state state) +{ + struct sym_hcb *np = pci_get_drvdata(pdev); +printk ("duude symbios got this state change %d jiffies=%ld\n", state, jiffies); + + np->s.io_state = state; + if (state == pci_device_io_thawed) { + sym_eeh_done (np->s.io_reset_wait); + } + + // XXX if perm frozen, then ...? + + return 0; +} +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ + /* * Host attach and initialisations. * @@ -1625,6 +1697,8 @@ static struct Scsi_Host * __devinit sym_ if (!np) goto attach_failed; np->s.device = dev->pdev; + np->s.io_state = pci_device_io_thawed; + np->s.io_reset_wait = NULL; np->bus_dmat = dev->pdev; /* Result in 1 DMA pool per HBA */ host_data->ncb = np; np->s.host = instance; @@ -2359,6 +2433,7 @@ static struct pci_driver sym2_driver = { .id_table = sym2_id_table, .probe = sym2_probe, .remove = __devexit_p(sym2_remove), + .io_state_change = sym2_io_state_change, }; static int __init sym2_init(void) --- drivers/scsi/sym53c8xx_2/sym_glue.h.linas-orig 2005-03-09 02:13:03.000000000 -0600 +++ drivers/scsi/sym53c8xx_2/sym_glue.h 2005-03-10 14:54:27.000000000 -0600 @@ -358,6 +358,10 @@ struct sym_shcb { char chip_name[8]; struct pci_dev *device; + /* pci bus i/o state; waiter for clearing of i/o state */ + enum pci_device_io_state io_state; + struct sym_eh_wait *io_reset_wait; + struct Scsi_Host *host; void __iomem * mmio_va; /* MMIO kernel virtual address */ --- drivers/scsi/sym53c8xx_2/sym_hipd.c.linas-orig 2005-03-09 02:11:01.000000000 -0600 +++ drivers/scsi/sym53c8xx_2/sym_hipd.c 2005-03-11 19:06:17.000000000 -0600 @@ -2836,6 +2836,7 @@ void sym_interrupt (struct sym_hcb *np) u_char istat, istatc; u_char dstat; u_short sist; + u_int icnt; /* * interrupt on the fly ? @@ -2877,6 +2878,7 @@ void sym_interrupt (struct sym_hcb *np) sist = 0; dstat = 0; istatc = istat; + icnt = 0; do { if (istatc & SIP) sist |= INW (nc_sist); @@ -2884,6 +2886,14 @@ void sym_interrupt (struct sym_hcb *np) dstat |= INB (nc_dstat); istatc = INB (nc_istat); istat |= istatc; + icnt ++; + if (100 < icnt) { +#define CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY +#ifdef CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY + if(eeh_slot_is_isolated (np->s.device)) + return; +#endif /* CONFIG_SCSI_SYM53C8XX_EEH_RECOVERY */ + } } while (istatc & (SIP|DIP)); if (DEBUG_FLAGS & DEBUG_TINY) --- include/asm-ppc64/eeh.h.linas-orig 2005-03-09 02:13:21.000000000 -0600 +++ include/asm-ppc64/eeh.h 2005-03-11 18:01:19.000000000 -0600 @@ -23,6 +23,7 @@ #include #include #include +#include #include struct pci_dev; @@ -36,6 +37,11 @@ struct notifier_block; #define EEH_MODE_SUPPORTED (1<<0) #define EEH_MODE_NOCHECK (1<<1) #define EEH_MODE_ISOLATED (1<<2) +#define EEH_MODE_RECOVERING (1<<3) + +/* Max number of EEH freezes allowed before we consider the device + * to be permanently disabled. */ +#define EEH_MAX_ALLOWED_FREEZES 5 void __init eeh_init(void); unsigned long eeh_check_failure(const volatile void __iomem *token, @@ -59,35 +65,82 @@ void eeh_add_device_late(struct pci_dev * eeh_remove_device - undo EEH setup for the indicated pci device * @dev: pci device to be removed * - * This routine should be when a device is removed from a running - * system (e.g. by hotplug or dlpar). + * This routine should be called when a device is removed from + * a running system (e.g. by hotplug or dlpar). It unregisters + * the PCI device from the EEH subsystem. I/O errors affecting + * this device will no longer be detected after this call; thus, + * i/o errors affecting this slot may leave this device unusable. */ void eeh_remove_device(struct pci_dev *); -#define EEH_DISABLE 0 -#define EEH_ENABLE 1 -#define EEH_RELEASE_LOADSTORE 2 -#define EEH_RELEASE_DMA 3 +/** + * eeh_slot_is_isolated -- return non-zero value if slot is frozen + */ +int eeh_slot_is_isolated (struct pci_dev *dev); /** - * Notifier event flags. + * eeh_ioaddr_is_isolated -- return non-zero value if device at + * io address is frozen. */ -#define EEH_NOTIFY_FREEZE 1 +int eeh_ioaddr_is_isolated(const volatile void __iomem *token); -/** EEH event -- structure holding pci slot data that describes - * a change in the isolation status of a PCI slot. A pointer - * to this struct is passed as the data pointer in a notify callback. - */ -struct eeh_event { - struct list_head list; - struct pci_dev *dev; - struct device_node *dn; - int reset_state; -}; - -/** Register to find out about EEH events. */ -int eeh_register_notifier(struct notifier_block *nb); -int eeh_unregister_notifier(struct notifier_block *nb); +/** + * eeh_slot_error_detail -- record and EEH error condition to the log + * @severity: 1 if temporary, 2 if permanent failure. + * + * Obtains the the EEH error details from the RTAS subsystem, + * and then logs these details with the RTAS error log system. + */ +void eeh_slot_error_detail (struct device_node *dn, int severity); + +/** + * rtas_set_slot_reset -- unfreeze a frozen slot + * + * Clear the EEH-frozen condition on a slot. This routine + * does this by asserting the PCI #RST line for 1/8th of + * a second; this routine will sleep while the adapter is + * being reset. + */ +void rtas_set_slot_reset (struct device_node *dn); + +/** rtas_pci_slot_reset raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + * + * Clear the EEH-frozen condition on a slot. This routine + * asserts the PCI #RST line if the 'state' argument is '1', + * and drops the #RST line if 'state is '0'. This routine is + * safe to call in an interrupt context. + * + */ +void rtas_pci_slot_reset(struct device_node *dn, int state); +void eeh_pci_slot_reset(struct pci_dev *dev, int state); + +/** eeh_pci_slot_availability -- Indicates whether a PCI + * slot is ready to be used. After a PCI reset, it may take a while + * for the PCI fabric to fully reset the comminucations path to the + * given PCI card. This routine can be used to determine how long + * to wait before a PCI slot might become usable. + * + * This routine returns how long to wait (in milliseconds) before + * the slot is expected to be usable. A value of zero means the + * slot is immediately usable. A negavitve value means that the + * slot is permanently disabled. + */ +int eeh_pci_slot_availability(struct pci_dev *dev); + +/** Restore device configuration info across device resets. + */ +void eeh_restore_bars(struct device_node *); +void eeh_pci_restore_bars(struct pci_dev *dev); + +/** + * rtas_configure_bridge -- firmware initialization of pci bridge + * + * Ask the firmware to configure any PCI bridge devices + * located behind the indicated node. Required after a + * pci device reset. + */ +void rtas_configure_bridge(struct device_node *dn); /** * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure. --- include/asm-ppc64/prom.h.linas-orig 2005-03-09 02:13:03.000000000 -0600 +++ include/asm-ppc64/prom.h 2005-03-10 14:54:27.000000000 -0600 @@ -119,6 +119,7 @@ struct property { */ struct pci_controller; struct iommu_table; +struct eeh_recovery_ops; struct device_node { char *name; @@ -137,8 +138,12 @@ struct device_node { int devfn; /* for pci devices */ int eeh_mode; /* See eeh.h for possible EEH_MODEs */ int eeh_config_addr; + int eeh_check_count; /* number of times device driver ignored error */ + int eeh_freeze_count; /* number of times this device froze up. */ + int eeh_is_bridge; /* device is pci-to-pci bridge */ struct pci_controller *phb; /* for pci devices */ struct iommu_table *iommu_table; /* for phb's or bridges */ + u32 config_space[16]; /* saved PCI config space */ struct property *properties; struct device_node *parent; --- include/asm-ppc64/rtas.h.linas-orig 2005-03-09 02:13:00.000000000 -0600 +++ include/asm-ppc64/rtas.h 2005-03-10 14:54:27.000000000 -0600 @@ -243,4 +243,6 @@ extern unsigned long rtas_rmo_buf; #define GLOBAL_INTERRUPT_QUEUE 9005 +extern int rtas_write_config(struct device_node *dn, int where, int size, u32 val); + #endif /* _PPC64_RTAS_H */ --- arch/ppc64/kernel/eeh.c.linas-orig 2005-03-09 02:12:13.000000000 -0600 +++ arch/ppc64/kernel/eeh.c 2005-03-11 18:58:50.000000000 -0600 @@ -17,16 +17,17 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#include +#include #include +#include #include -#include #include #include #include #include #include #include +#include #include #include #include @@ -49,8 +50,8 @@ * were "empty": all reads return 0xff's and all writes are silently * ignored. EEH slot isolation events can be triggered by parity * errors on the address or data busses (e.g. during posted writes), - * which in turn might be caused by dust, vibration, humidity, - * radioactivity or plain-old failed hardware. + * which in turn might be caused by low voltage on the bus, dust, + * vibration, humidity, radioactivity or plain-old failed hardware. * * Note, however, that one of the leading causes of EEH slot * freeze events are buggy device drivers, buggy device microcode, @@ -75,22 +76,13 @@ #define BUID_HI(buid) ((buid) >> 32) #define BUID_LO(buid) ((buid) & 0xffffffff) -/* EEH event workqueue setup. */ -static DEFINE_SPINLOCK(eeh_eventlist_lock); -LIST_HEAD(eeh_eventlist); -static void eeh_event_handler(void *); -DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL); - -static struct notifier_block *eeh_notifier_chain; - /* * If a device driver keeps reading an MMIO register in an interrupt * handler after a slot isolation event has occurred, we assume it * is broken and panic. This sets the threshold for how many read * attempts we allow before panicking. */ -#define EEH_MAX_FAILS 1000 -static atomic_t eeh_fail_count; +#define EEH_MAX_FAILS 100000 /* RTAS tokens */ static int ibm_set_eeh_option; @@ -107,6 +99,10 @@ static DEFINE_SPINLOCK(slot_errbuf_lock) static int eeh_error_buf_size; /* System monitoring statistics */ +static DEFINE_PER_CPU(unsigned long, no_device); +static DEFINE_PER_CPU(unsigned long, no_dn); +static DEFINE_PER_CPU(unsigned long, no_cfg_addr); +static DEFINE_PER_CPU(unsigned long, ignored_check); static DEFINE_PER_CPU(unsigned long, total_mmio_ffs); static DEFINE_PER_CPU(unsigned long, false_positives); static DEFINE_PER_CPU(unsigned long, ignored_failures); @@ -225,9 +221,9 @@ pci_addr_cache_insert(struct pci_dev *de while (*p) { parent = *p; piar = rb_entry(parent, struct pci_io_addr_range, rb_node); - if (alo < piar->addr_lo) { + if (ahi < piar->addr_lo) { p = &parent->rb_left; - } else if (ahi > piar->addr_hi) { + } else if (alo > piar->addr_hi) { p = &parent->rb_right; } else { if (dev != piar->pcidev || @@ -245,6 +241,11 @@ pci_addr_cache_insert(struct pci_dev *de piar->addr_hi = ahi; piar->pcidev = dev; piar->flags = flags; + +#ifdef DEBUG + printk (KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n", + alo, ahi, pci_name (dev)); +#endif rb_link_node(&piar->rb_node, parent, p); rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root); @@ -369,6 +370,7 @@ void pci_addr_cache_remove_device(struct */ void __init pci_addr_cache_build(void) { + struct device_node *dn; struct pci_dev *dev = NULL; spin_lock_init(&pci_io_addr_cache_root.piar_lock); @@ -379,6 +381,17 @@ void __init pci_addr_cache_build(void) continue; } pci_addr_cache_insert_device(dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + if (dn) { + int i; + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; + } } #ifdef DEBUG @@ -390,24 +403,32 @@ void __init pci_addr_cache_build(void) /* --------------------------------------------------------------- */ /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */ -/** - * eeh_register_notifier - Register to find out about EEH events. - * @nb: notifier block to callback on events - */ -int eeh_register_notifier(struct notifier_block *nb) +void eeh_slot_error_detail (struct device_node *dn, int severity) { - return notifier_chain_register(&eeh_notifier_chain, nb); -} + unsigned long flags; + int rc; -/** - * eeh_unregister_notifier - Unregister to an EEH event notifier. - * @nb: notifier block to callback on events - */ -int eeh_unregister_notifier(struct notifier_block *nb) -{ - return notifier_chain_unregister(&eeh_notifier_chain, nb); + if (!dn) return; + + /* Log the error with the rtas logger */ + spin_lock_irqsave(&slot_errbuf_lock, flags); + memset(slot_errbuf, 0, eeh_error_buf_size); + + rc = rtas_call(ibm_slot_error_detail, + 8, 1, NULL, dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), NULL, 0, + virt_to_phys(slot_errbuf), + eeh_error_buf_size, + severity); + + if (rc == 0) + log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); + spin_unlock_irqrestore(&slot_errbuf_lock, flags); } +EXPORT_SYMBOL(eeh_slot_error_detail); + /** * read_slot_reset_state - Read the reset state of a device node's slot * @dn: device node to read @@ -422,6 +443,7 @@ static int read_slot_reset_state(struct outputs = 4; } else { token = ibm_read_slot_reset_state; + rets[2] = 0; /* fake PE Unavailable info */ outputs = 3; } @@ -430,75 +452,8 @@ static int read_slot_reset_state(struct } /** - * eeh_panic - call panic() for an eeh event that cannot be handled. - * The philosophy of this routine is that it is better to panic and - * halt the OS than it is to risk possible data corruption by - * oblivious device drivers that don't know better. - * - * @dev pci device that had an eeh event - * @reset_state current reset state of the device slot - */ -static void eeh_panic(struct pci_dev *dev, int reset_state) -{ - /* - * XXX We should create a separate sysctl for this. - * - * Since the panic_on_oops sysctl is used to halt the system - * in light of potential corruption, we can use it here. - */ - if (panic_on_oops) - panic("EEH: MMIO failure (%d) on device:%s %s\n", reset_state, - pci_name(dev), pci_pretty_name(dev)); - else { - __get_cpu_var(ignored_failures)++; - printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s %s\n", - reset_state, pci_name(dev), pci_pretty_name(dev)); - } -} - -/** - * eeh_event_handler - dispatch EEH events. The detection of a frozen - * slot can occur inside an interrupt, where it can be hard to do - * anything about it. The goal of this routine is to pull these - * detection events out of the context of the interrupt handler, and - * re-dispatch them for processing at a later time in a normal context. - * - * @dummy - unused - */ -static void eeh_event_handler(void *dummy) -{ - unsigned long flags; - struct eeh_event *event; - - while (1) { - spin_lock_irqsave(&eeh_eventlist_lock, flags); - event = NULL; - if (!list_empty(&eeh_eventlist)) { - event = list_entry(eeh_eventlist.next, struct eeh_event, list); - list_del(&event->list); - } - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); - if (event == NULL) - break; - - printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device " - "%s %s\n", event->reset_state, - pci_name(event->dev), pci_pretty_name(event->dev)); - - atomic_set(&eeh_fail_count, 0); - notifier_call_chain (&eeh_notifier_chain, - EEH_NOTIFY_FREEZE, event); - - __get_cpu_var(slot_resets)++; - - pci_dev_put(event->dev); - kfree(event); - } -} - -/** - * eeh_token_to_phys - convert EEH address token to phys address - * @token i/o token, should be address in the form 0xE.... + * eeh_token_to_phys - convert I/O address to phys address + * @token i/o address, should be address in the form 0xA.... */ static inline unsigned long eeh_token_to_phys(unsigned long token) { @@ -513,6 +468,18 @@ static inline unsigned long eeh_token_to return pa | (token & (PAGE_SIZE-1)); } + +static inline struct pci_dev * eeh_find_pci_dev(struct device_node *dn) +{ + struct pci_dev *dev = NULL; + for_each_pci_dev(dev) { + if (pci_device_to_OF_node(dev) == dn) + return dev; + } + return NULL; +} + + /** * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze * @dn device node @@ -528,29 +495,33 @@ static inline unsigned long eeh_token_to * * It is safe to call this routine in an interrupt context. */ +extern void disable_irq_nosync(unsigned int); + int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev) { int ret; int rets[3]; - unsigned long flags; - int rc, reset_state; - struct eeh_event *event; + enum pci_device_io_state state; __get_cpu_var(total_mmio_ffs)++; if (!eeh_subsystem_enabled) return 0; - if (!dn) + if (!dn) { + __get_cpu_var(no_dn)++; return 0; + } /* Access to IO BARs might get this far and still not want checking. */ if (!(dn->eeh_mode & EEH_MODE_SUPPORTED) || dn->eeh_mode & EEH_MODE_NOCHECK) { + __get_cpu_var(ignored_check)++; return 0; } if (!dn->eeh_config_addr) { + __get_cpu_var(no_cfg_addr)++; return 0; } @@ -559,12 +530,18 @@ int eeh_dn_check_failure(struct device_n * slot, we know it's bad already, we don't need to check... */ if (dn->eeh_mode & EEH_MODE_ISOLATED) { - atomic_inc(&eeh_fail_count); - if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) { + dn->eeh_check_count ++; + if (dn->eeh_check_count >= EEH_MAX_FAILS) { + printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n", + dn->eeh_check_count); + dump_stack(); /* re-read the slot reset state */ if (read_slot_reset_state(dn, rets) != 0) rets[0] = -1; /* reset state unknown */ - eeh_panic(dev, rets[0]); + + /* If we are here, then we hit an infinite loop. Stop. */ + panic("EEH: MMIO halt (%d) on device:%s %s\n", rets[0], + pci_name(dev), pci_pretty_name(dev)); } return 0; } @@ -577,53 +554,42 @@ int eeh_dn_check_failure(struct device_n * In any case they must share a common PHB. */ ret = read_slot_reset_state(dn, rets); - if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) { + if (!(ret == 0 && ((rets[1] == 1 && (rets[0] == 2 || rets[0] >= 4)) + || (rets[0] == 5)))) { __get_cpu_var(false_positives)++; return 0; } - /* prevent repeated reports of this failure */ - dn->eeh_mode |= EEH_MODE_ISOLATED; + /* Note that empty slots will fail; empty slots don't have children... */ + if ((rets[0] == 5) && (dn->child == NULL)) { + __get_cpu_var(false_positives)++; + return 0; + } - reset_state = rets[0]; + /* Prevent repeated reports of this failure */ + dn->eeh_mode |= EEH_MODE_ISOLATED; - spin_lock_irqsave(&slot_errbuf_lock, flags); - memset(slot_errbuf, 0, eeh_error_buf_size); + /* Some devices go crazy if irq's are not ack'ed; disable irq now */ + disable_irq_nosync (dev->irq); +// get_irq_desc (dev->irq)->handler->disable (dev->irq); + + __get_cpu_var(slot_resets)++; - rc = rtas_call(ibm_slot_error_detail, - 8, 1, NULL, dn->eeh_config_addr, - BUID_HI(dn->phb->buid), - BUID_LO(dn->phb->buid), NULL, 0, - virt_to_phys(slot_errbuf), - eeh_error_buf_size, - 1 /* Temporary Error */); + if (!dev) + dev = eeh_find_pci_dev (dn); - if (rc == 0) - log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0); - spin_unlock_irqrestore(&slot_errbuf_lock, flags); + state = pci_device_io_thawed; + if ((rets[0] == 2) || (rets[0] == 4)) + state = pci_device_io_frozen; + if (rets[0] == 5) + state = pci_device_io_perm_failure; - printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n", - rets[0], dn->name, dn->full_name); - event = kmalloc(sizeof(*event), GFP_ATOMIC); - if (event == NULL) { - eeh_panic(dev, reset_state); - return 1; - } - - event->dev = dev; - event->dn = dn; - event->reset_state = reset_state; - - /* We may or may not be called in an interrupt context */ - spin_lock_irqsave(&eeh_eventlist_lock, flags); - list_add(&event->list, &eeh_eventlist); - spin_unlock_irqrestore(&eeh_eventlist_lock, flags); + peh_send_failure_event (dev, state, rets[2]); /* Most EEH events are due to device driver bugs. Having * a stack trace will help the device-driver authors figure * out what happened. So print that out. */ - dump_stack(); - schedule_work(&eeh_event_wq); + if (rets[0] != 5) dump_stack(); return 0; } @@ -635,7 +601,6 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * @token i/o token, should be address in the form 0xA.... * @val value, should be all 1's (XXX why do we need this arg??) * - * Check for an eeh failure at the given token address. * Check for an EEH failure at the given token address. Call this * routine if the result of a read was all 0xff's and you want to * find out if this is due to an EEH slot freeze event. This routine @@ -643,6 +608,7 @@ EXPORT_SYMBOL(eeh_dn_check_failure); * * Note this routine is safe to call in an interrupt context. */ + unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val) { unsigned long addr; @@ -652,8 +618,10 @@ unsigned long eeh_check_failure(const vo /* Finding the phys addr + pci device; this is pretty quick. */ addr = eeh_token_to_phys((unsigned long __force) token); dev = pci_get_device_by_addr(addr); - if (!dev) + if (!dev) { + __get_cpu_var(no_device)++; return val; + } dn = pci_device_to_OF_node(dev); eeh_dn_check_failure (dn, dev); @@ -664,6 +632,235 @@ unsigned long eeh_check_failure(const vo EXPORT_SYMBOL(eeh_check_failure); +/* ------------------------------------------------------------- */ +/* The code below deals with error recovery */ + +int +eeh_slot_is_isolated(struct pci_dev *dev) +{ + struct device_node *dn; + dn = pci_device_to_OF_node(dev); + return (dn->eeh_mode & EEH_MODE_ISOLATED); +} + +int +eeh_ioaddr_is_isolated(const volatile void __iomem *token) +{ + unsigned long addr; + struct pci_dev *dev; + int rc; + + addr = eeh_token_to_phys((unsigned long __force) token); + dev = pci_get_device_by_addr(addr); + if (!dev) + return 0; + rc = eeh_slot_is_isolated(dev); + pci_dev_put(dev); + return rc; +} + +/** eeh_pci_slot_reset -- raises/lowers the pci #RST line + * state: 1/0 to raise/lower the #RST + */ +void +eeh_pci_slot_reset(struct pci_dev *dev, int state) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + rtas_pci_slot_reset (dn, state); +} + +/** Return negative value if a permanent error, else return + * a number of milliseconds to wait until the PCI slot is + * ready to be used. + */ +static int +eeh_slot_availability(struct device_node *dn) +{ + int rc; + int rets[3]; + + rc = read_slot_reset_state(dn, rets); + if (rc) return rc; + + if (rets[1] == 0) return -1; /* EEH is not supported */ + if (rets[0] == 0) return 0; /* Oll Korrect */ + if (rets[0] == 5) { + if (rets[2] == 0) return -1; /* permanently unavailable */ + return rets[2]; /* number of millisecs to wait */ + } + return -1; +} + +int +eeh_pci_slot_availability(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + if (!dn) return -1; + + BUG_ON (dn->phb==NULL); + if (dn->phb==NULL) { + printk (KERN_ERR "EEH, checking on slot with no phb dn=%s dev=%s:%s\n", + dn->full_name, pci_name(dev), pci_pretty_name (dev)); + return -1; + } + return eeh_slot_availability (dn); +} + +void +rtas_pci_slot_reset(struct device_node *dn, int state) +{ + int rc; + + if (!dn) + return; + if (!dn->phb) { + printk (KERN_WARNING "EEH: in slot reset, device node %s has no phb\n", dn->full_name); + return; + } + + dn->eeh_mode |= EEH_MODE_RECOVERING; + rc = rtas_call(ibm_set_slot_reset,4,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid), + state); + if (rc) { + printk (KERN_WARNING "EEH: Unable to reset the failed slot, (%d) #RST=%d\n", rc, state); + return; + } + + if (state == 0) + dn->eeh_mode &= ~(EEH_MODE_RECOVERING|EEH_MODE_ISOLATED); +} + +/** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second + * dn -- device node to be reset. + */ + +void +rtas_set_slot_reset(struct device_node *dn) +{ + int i, rc; + + rtas_pci_slot_reset (dn, 1); + + /* The PCI bus requires that the reset be held high for at least + * a 100 milliseconds. We wait a bit longer 'just in case'. */ + +#define PCI_BUS_RST_HOLD_TIME_MSEC 250 + msleep (PCI_BUS_RST_HOLD_TIME_MSEC); + rtas_pci_slot_reset (dn, 0); + + /* After a PCI slot has been reset, the PCI Express spec requires + * a 1.5 second idle time for the bus to stabilize, before starting + * up traffic. */ +#define PCI_BUS_SETTLE_TIME_MSEC 1800 + msleep (PCI_BUS_SETTLE_TIME_MSEC); + + /* Now double check with the firmware to make sure the device is + * ready to be used; if not, wait for recovery. */ + for (i=0; i<10; i++) { + rc = eeh_slot_availability (dn); + if (rc <= 0) return; + + msleep (rc+100); + } +} + +EXPORT_SYMBOL(rtas_set_slot_reset); + +void +rtas_configure_bridge(struct device_node *dn) +{ + int token = rtas_token ("ibm,configure-bridge"); + int rc; + + if (token == RTAS_UNKNOWN_SERVICE) + return; + rc = rtas_call(token,3,1, NULL, + dn->eeh_config_addr, + BUID_HI(dn->phb->buid), + BUID_LO(dn->phb->buid)); + if (rc) { + printk (KERN_WARNING "EEH: Unable to configure device bridge (%d) for %s\n", + rc, dn->full_name); + } +} + +EXPORT_SYMBOL(rtas_configure_bridge); + +/* ------------------------------------------------------- */ +/** Save and restore of PCI BARs + * + * Although firmware will set up BARs during boot, it doesn't + * set up device BAR's after a device reset, although it will, + * if requested, set up bridge configuration. Thus, we need to + * configure the PCI devices ourselves. Config-space setup is + * stored in the PCI structures which are normally deleted during + * device removal. Thus, the "save" routine references the + * structures so that they aren't deleted. + */ + +/** + * __restore_bars - Restore the Base Address Registers + * Loads the PCI configuration space base address registers, + * the expansion ROM base address, the latency timer, and etc. + * from the saved values in the device node. + */ +static inline void __restore_bars (struct device_node *dn) +{ + int i; + + if (NULL==dn->phb) return; + for (i=4; i<10; i++) { + rtas_write_config(dn, i*4, 4, dn->config_space[i]); + } + + /* 12 == Expansion ROM Address */ + rtas_write_config(dn, 12*4, 4, dn->config_space[12]); + +#define SAVED_BYTE(OFF) (((u8 *)(dn->config_space))[OFF]) + + rtas_write_config (dn, PCI_CACHE_LINE_SIZE, 1, + SAVED_BYTE(PCI_CACHE_LINE_SIZE)); + + rtas_write_config (dn, PCI_LATENCY_TIMER, 1, + SAVED_BYTE(PCI_LATENCY_TIMER)); + + rtas_write_config (dn, PCI_INTERRUPT_LINE, 1, + SAVED_BYTE(PCI_INTERRUPT_LINE)); +} + +/** + * eeh_restore_bars - restore the PCI config space info + */ +void eeh_restore_bars(struct device_node *dn) +{ + if (! dn->eeh_is_bridge) + __restore_bars (dn); + + if (dn->child) + eeh_restore_bars (dn->child); +#if DO_SIBLINGS + if (dn->sibling) + eeh_restore_bars (dn->sibling); +#endif +} + +void eeh_pci_restore_bars(struct pci_dev *dev) +{ + struct device_node *dn = pci_device_to_OF_node(dev); + eeh_restore_bars (dn); +} + +/* ------------------------------------------------------------- */ +/* The code below deals with enabling EEH for devices during the + * early boot sequence. EEH must be enabled before any PCI probing + * can be done. + */ + +#define EEH_ENABLE 1 + struct eeh_early_enable_info { unsigned int buid_hi; unsigned int buid_lo; @@ -682,6 +879,8 @@ static void *early_enable_eeh(struct dev int enable; dn->eeh_mode = 0; + dn->eeh_check_count = 0; + dn->eeh_freeze_count = 0; if (status && strcmp(status, "ok") != 0) return NULL; /* ignore devices with bad status */ @@ -743,7 +942,7 @@ static void *early_enable_eeh(struct dev dn->full_name); } - return NULL; + return NULL; } /* @@ -824,11 +1023,13 @@ void eeh_add_device_early(struct device_ struct pci_controller *phb; struct eeh_early_enable_info info; - if (!dn || !eeh_subsystem_enabled) + if (!dn) return; phb = dn->phb; if (NULL == phb || 0 == phb->buid) { - printk(KERN_WARNING "EEH: Expected buid but found none\n"); + printk(KERN_WARNING "EEH: Expected buid but found none for %s\n", + dn->full_name); + dump_stack(); return; } @@ -847,6 +1048,9 @@ EXPORT_SYMBOL(eeh_add_device_early); */ void eeh_add_device_late(struct pci_dev *dev) { + int i; + struct device_node *dn; + if (!dev || !eeh_subsystem_enabled) return; @@ -856,6 +1060,14 @@ void eeh_add_device_late(struct pci_dev #endif pci_addr_cache_insert_device (dev); + + /* Save the BAR's; firmware doesn't restore these after EEH reset */ + dn = pci_device_to_OF_node(dev); + for (i = 0; i < 16; i++) + pci_read_config_dword(dev, i * 4, &dn->config_space[i]); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + dn->eeh_is_bridge = 1; } EXPORT_SYMBOL(eeh_add_device_late); @@ -885,12 +1097,17 @@ static int proc_eeh_show(struct seq_file unsigned int cpu; unsigned long ffs = 0, positives = 0, failures = 0; unsigned long resets = 0; + unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0; for_each_cpu(cpu) { ffs += per_cpu(total_mmio_ffs, cpu); positives += per_cpu(false_positives, cpu); failures += per_cpu(ignored_failures, cpu); resets += per_cpu(slot_resets, cpu); + no_dev += per_cpu(no_device, cpu); + no_dn += per_cpu(no_dn, cpu); + no_cfg += per_cpu(no_cfg_addr, cpu); + no_check += per_cpu(ignored_check, cpu); } if (0 == eeh_subsystem_enabled) { @@ -898,13 +1115,17 @@ static int proc_eeh_show(struct seq_file seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs); } else { seq_printf(m, "EEH Subsystem is enabled\n"); - seq_printf(m, "eeh_total_mmio_ffs=%ld\n" + seq_printf(m, + "no device=%ld\n" + "no device node=%ld\n" + "no config address=%ld\n" + "check not wanted=%ld\n" + "eeh_total_mmio_ffs=%ld\n" "eeh_false_positives=%ld\n" "eeh_ignored_failures=%ld\n" - "eeh_slot_resets=%ld\n" - "eeh_fail_count=%d\n", - ffs, positives, failures, resets, - eeh_fail_count.counter); + "eeh_slot_resets=%ld\n", + no_dev, no_dn, no_cfg, no_check, + ffs, positives, failures, resets); } return 0; --- arch/ppc64/kernel/pSeries_pci.c.linas-orig 2005-03-09 02:13:08.000000000 -0600 +++ arch/ppc64/kernel/pSeries_pci.c 2005-03-10 14:54:27.000000000 -0600 @@ -101,7 +101,7 @@ static int rtas_pci_read_config(struct p return PCIBIOS_DEVICE_NOT_FOUND; } -static int rtas_write_config(struct device_node *dn, int where, int size, u32 val) +int rtas_write_config(struct device_node *dn, int where, int size, u32 val) { unsigned long buid, addr; int ret; --- drivers/pci/hotplug/rpaphp.h.linas-orig 2005-03-09 02:11:19.000000000 -0600 +++ drivers/pci/hotplug/rpaphp.h 2005-03-10 14:54:27.000000000 -0600 @@ -118,7 +118,8 @@ extern int rpaphp_enable_pci_slot(struct extern int register_pci_slot(struct slot *slot); extern int rpaphp_unconfig_pci_adapter(struct slot *slot); extern int rpaphp_get_pci_adapter_status(struct slot *slot, int is_init, u8 * value); -extern struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev); +extern void init_eeh_handler (void); +extern void exit_eeh_handler (void); /* rpaphp_core.c */ extern int rpaphp_add_slot(struct device_node *dn); --- drivers/pci/hotplug/rpaphp_core.c.linas-orig 2005-03-09 02:12:58.000000000 -0600 +++ drivers/pci/hotplug/rpaphp_core.c 2005-03-10 14:54:27.000000000 -0600 @@ -460,12 +460,18 @@ static int __init rpaphp_init(void) { info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); + /* Get set to handle EEH events. */ + init_eeh_handler(); + /* read all the PRA info from the system */ return init_rpa(); } static void __exit rpaphp_exit(void) { + /* Let EEH know we are going away. */ + exit_eeh_handler(); + cleanup_slots(); } --- drivers/pci/hotplug/rpaphp_pci.c.linas-orig 2005-03-09 02:11:01.000000000 -0600 +++ drivers/pci/hotplug/rpaphp_pci.c 2005-03-11 18:40:28.000000000 -0600 @@ -22,8 +22,13 @@ * Send feedback to * */ +#include +#include +#include #include +#include #include +#include #include #include #include "../pci.h" /* for pci_add_new_bus */ @@ -63,6 +68,7 @@ int rpaphp_claim_resource(struct pci_dev root ? "Address space collision on" : "No parent found for", resource, dtype, pci_name(dev), res->start, res->end); + dump_stack(); } return err; } @@ -188,6 +194,19 @@ rpaphp_fixup_new_pci_devices(struct pci_ static int rpaphp_pci_config_bridge(struct pci_dev *dev); +static void rpaphp_eeh_add_bus_device(struct pci_bus *bus) +{ + struct pci_dev *dev; + list_for_each_entry(dev, &bus->devices, bus_list) { + eeh_add_device_late(dev); + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + struct pci_bus *subbus = dev->subordinate; + if (bus) + rpaphp_eeh_add_bus_device (subbus); + } + } +} + /***************************************************************************** rpaphp_pci_config_slot() will configure all devices under the given slot->dn and return the the first pci_dev. @@ -215,6 +234,8 @@ rpaphp_pci_config_slot(struct device_nod } if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) rpaphp_pci_config_bridge(dev); + + rpaphp_eeh_add_bus_device(bus); } return dev; } @@ -223,7 +244,6 @@ static int rpaphp_pci_config_bridge(stru { u8 sec_busno; struct pci_bus *child_bus; - struct pci_dev *child_dev; dbg("Enter %s: BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev)); @@ -240,11 +260,7 @@ static int rpaphp_pci_config_bridge(stru /* do pci_scan_child_bus */ pci_scan_child_bus(child_bus); - list_for_each_entry(child_dev, &child_bus->devices, bus_list) { - eeh_add_device_late(child_dev); - } - - /* fixup new pci devices without touching bus struct */ + /* Fixup new pci devices without touching bus struct */ rpaphp_fixup_new_pci_devices(child_bus, 0); /* Make the discovered devices available */ @@ -282,7 +298,7 @@ static void print_slot_pci_funcs(struct return; } #else -static void print_slot_pci_funcs(struct slot *slot) +static inline void print_slot_pci_funcs(struct slot *slot) { return; } @@ -364,7 +380,6 @@ static void rpaphp_eeh_remove_bus_device if (pdev) rpaphp_eeh_remove_bus_device(pdev); } - } return; } @@ -566,36 +581,265 @@ exit: return retval; } -struct hotplug_slot *rpaphp_find_hotplug_slot(struct pci_dev *dev) +/** + * rpaphp_search_bus_for_dev - return 1 if device is under this bus, else 0 + * @bus: the bus to search for this device. + * @dev: the pci device we are looking for. + */ +static int rpaphp_search_bus_for_dev (struct pci_bus *bus, struct pci_dev *dev) +{ + struct list_head *ln; + + if (!bus) return 0; + + for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { + struct pci_dev *pdev = pci_dev_b(ln); + if (pdev == dev) + return 1; + if (pdev->subordinate) { + int rc; + rc = rpaphp_search_bus_for_dev (pdev->subordinate, dev); + if (rc) + return 1; + } + } + return 0; +} + +/** + * rpaphp_find_slot - find and return the slot holding the device + * @dev: pci device for which we want the slot structure. + */ +static struct slot *rpaphp_find_slot(struct pci_dev *dev) { - struct list_head *tmp, *n; - struct slot *slot; + struct list_head *tmp, *n; + struct slot *slot; list_for_each_safe(tmp, n, &rpaphp_slot_head) { struct pci_bus *bus; - struct list_head *ln; slot = list_entry(tmp, struct slot, rpaphp_slot_list); - if (slot->bridge == NULL) { - if (slot->dev_type == PCI_DEV) { - printk(KERN_WARNING "PCI slot missing bridge %s %s \n", - slot->name, slot->location); - } + + /* PHB's don't have bridges. */ + if (slot->bridge == NULL) continue; - } + + /* The PCI device could be the slot itself. */ + if (slot->bridge == dev) + return slot; bus = slot->bridge->subordinate; if (!bus) { + printk (KERN_WARNING "PCI bridge is missing bus: %s %s\n", + pci_name (slot->bridge), pci_pretty_name (slot->bridge)); continue; /* should never happen? */ } - for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) { - struct pci_dev *pdev = pci_dev_b(ln); - if (pdev == dev) - return slot->hotplug_slot; - } + + if (rpaphp_search_bus_for_dev (bus, dev)) + return slot; } + return NULL; +} + +/** get_phb_of_device -- find the pci controller for the device + * @dev the pci device + * This routine returns a pointer to the device node that + * describes the pci controller for the indicated slot. + */ +static struct device_node * +get_phb_of_device (struct pci_dev *dev) +{ + struct device_node *dn; + struct pci_bus *bus; + + while (1) { + bus = dev->bus; + if (!bus) + break; + dn = pci_bus_to_OF_node(bus); + + if (dn->phb) + return dn; + + dev = bus->self; + BUG_ON (dev==NULL); + if (dev == NULL) + return NULL; + } return NULL; } -EXPORT_SYMBOL_GPL(rpaphp_find_hotplug_slot); +/* ------------------------------------------------------- */ +/** + * handle_eeh_events -- reset a PCI device after hard lockup. + * + * pSeries systems will isolate a PCI slot if the PCI-Host + * bridge detects address or data parity errors, DMA's + * occuring to wild addresses (which usually happen due to + * bugs in device drivers or in PCI adapter firmware). + * Slot isolations also occur if #SERR, #PERR or other misc + * PCI-related errors are detected. + * + * Recovery process consists of unplugging the device driver + * (which generated hotplug events to userspace), then issuing + * a PCI #RST to the device, then reconfiguring the PCI config + * space for all bridges & devices under this slot, and then + * finally restarting the device drivers (which cause a second + * set of hotplug events to go out to userspace). + */ + +int eeh_reset_device (struct pci_dev *dev, struct device_node *dn, int reconfig) +{ + struct slot *frozen_slot= NULL; + + if (!dev) + return 1; + + if (reconfig) + frozen_slot = rpaphp_find_slot(dev); + + if (reconfig && frozen_slot) rpaphp_unconfig_pci_adapter (frozen_slot); + + /* Reset the pci controller. (Asserts RST#; resets config space). + * Reconfigure bridges and devices */ + rtas_set_slot_reset (dn->child); + rtas_configure_bridge(dn); + eeh_restore_bars(dn->child); + enable_irq (dev->irq); + + /* Give the system 5 seconds to finish running the user-space + * hotplug scripts, e.g. ifdown for ethernet. Yes, this is a hack, + * but if we don't do this, weird things happen. + */ + if (reconfig && frozen_slot) { + ssleep (5); + rpaphp_enable_pci_slot (frozen_slot); + } + return 0; +} + +/* The longest amount of time to wait for a pci device + * to come back on line, in seconds. + */ +#define MAX_WAIT_FOR_RECOVERY 15 + +int handle_eeh_events (struct notifier_block *self, + unsigned long reason, void *ev) +{ + int freeze_count=0; + struct device_node *frozen_device; + struct peh_event *event = ev; + struct pci_dev *dev = event->dev; + int perm_failure = 0; + int rc; + + if (!dev) + { + printk ("EEH: EEH error caught, but no PCI device specified!\n"); + return 1; + } + + frozen_device = get_phb_of_device (dev); + + if (!frozen_device) + { + printk (KERN_ERR "EEH: Cannot find PCI conroller for %s %s\n", + pci_name(dev), pci_pretty_name (dev)); + + return 1; + } + + /* We get "permanent failure" messages on empty slots. + * These are false alarms. Empty slots have no child dn. */ + if ((event->state == pci_device_io_perm_failure) && (frozen_device == NULL)) + return 0; + + if (frozen_device) + freeze_count = frozen_device->eeh_freeze_count; + freeze_count ++; + if (freeze_count > EEH_MAX_ALLOWED_FREEZES) + perm_failure = 1; + + /* If the reset state is a '5' and the time to reset is 0 (infinity) + * or is more then 15 seconds, then mark this as a permanent failure. + */ + if ((event->state == pci_device_io_perm_failure) && + ((event->time_unavail <= 0) || + (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000))) + perm_failure = 1; + + /* Log the error with the rtas logger. */ + if (perm_failure) { + /* + * About 90% of all real-life EEH failures in the field + * are due to poorly seated PCI cards. Only 10% or so are + * due to actual, failed cards. + */ + printk (KERN_ERR + "EEH: device %s:%s has failed %d times \n" + "and has been permanently disabled. Please try reseating\n" + "this device or replacing it.\n", + pci_name (dev), + pci_pretty_name (dev), + freeze_count); + + eeh_slot_error_detail (frozen_device, 2 /* Permanent Error */); + + /* Notify the device that its about to go down. */ + /* XXX this should be a recursive walk to children for + * multi-function devices */ + if (dev->driver->io_state_change) { + dev->driver->io_state_change (dev, pci_device_io_perm_failure); + } + + /* If there's a hotplug slot, unconfigure it */ + struct slot * frozen_slot = rpaphp_find_slot(dev); + rpaphp_unconfig_pci_adapter (frozen_slot); + return 1; + } else { + eeh_slot_error_detail (frozen_device, 1 /* Temporary Error */); + } + + printk (KERN_WARNING + "EEH: This device has failed %d times since last reboot: %s:%s\n", + freeze_count, + pci_name (dev), + pci_pretty_name (dev)); + + /* Walk the various device drivers attached to this slot through + * a reset sequence, giving each an opportunity to do what it needs + * to accomplish the reset */ + /* XXX this should be a recursive walk to children for + * multi-function devices; each child should get to report + * status too, if needed ... if any child can't handle the reset, + * then need to hotplug it. */ + if (dev->driver->io_state_change) { + dev->driver->io_state_change (dev, pci_device_io_frozen); + rc = eeh_reset_device (dev, frozen_device, 0); + dev->driver->io_state_change (dev, pci_device_io_thawed); + } else { + rc = eeh_reset_device (dev, frozen_device, 1); + } + + /* Store the freeze count with the pci adapter, and not the slot. + * This way, if the device is replaced, the count is cleared. + */ + frozen_device->eeh_freeze_count = freeze_count; + + return rc; +} + +static struct notifier_block eeh_block; + +void __init init_eeh_handler (void) +{ + eeh_block.notifier_call = handle_eeh_events; + peh_register_notifier (&eeh_block); +} + +void __exit exit_eeh_handler (void) +{ + peh_unregister_notifier (&eeh_block); +} + From aw-confirm at ebay.com Sun Mar 13 22:13:07 2005 From: aw-confirm at ebay.com (eBay) Date: Sun, 13 Mar 2005 15:13:07 +0400 Subject: Your account on eBay has been suspended Message-ID: An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050313/f12d4810/attachment.htm From ak at muc.de Sat Mar 12 20:52:32 2005 From: ak at muc.de (Andi Kleen) Date: 12 Mar 2005 10:52:32 +0100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312013251.GA2609@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> Message-ID: <20050312095232.GA31444@muc.de> On Fri, Mar 11, 2005 at 07:32:51PM -0600, Linas Vepstas wrote: > > Hi, > > Appended is my current draft PCI Error Recovery patch. > Per previous conversatios, it has moved some of the ppc64-specific > error reporting code into generic PCI structures: see changes to > include/linux/pci.h and a new file drivers/pci/pci-error.c. Note > in particular the pci bus states enumerated in > "enum pci_device_io_state"; BenH was suggesting having > more of these ... BenH do you want to propose a "final list"? I don't like it very much that the frozen state is exposed so clearly in the API. e.g. on typical PCI Express chipsets there is no such concept. With error reporting there you just get told "one of your previous accesses or DMAs failed", but the device is not completely gone. IMHO the concept of "slot freeze" is too PPC64 specific to be exposed like this. While there are other chipsets that do similar things a lot of widely used ones will not. Can you reformulate it in terms of "error states" ? Just report an error occurred and the device is unreliable. -Andi From arnd at arndb.de Sat Mar 12 20:51:37 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sat, 12 Mar 2005 10:51:37 +0100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312013251.GA2609@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> Message-ID: <200503121051.39015.arnd@arndb.de> On S?nnavend 12 M?rz 2005 02:32, Linas Vepstas wrote: > Appended is my current draft PCI Error Recovery patch. > Per previous conversatios, it has moved some of the ppc64-specific > error reporting code into generic PCI structures: see changes to > include/linux/pci.h and a new file drivers/pci/pci-error.c. Note How does that relate to the stuff that Long sent about PCIe advanced error handling yesterday [1]? Is there an overlap? Arnd <>< [1] http://marc.theaimsgroup.com/?l=linux-kernel&m=111058289917142 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058232501296 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058232501527 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058345605486 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058345701999 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058475121258 http://marc.theaimsgroup.com/?l=linux-kernel&m=111058536826662 From benh at kernel.crashing.org Sat Mar 12 22:16:34 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Mar 2005 22:16:34 +1100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312095232.GA31444@muc.de> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> Message-ID: <1110626195.5787.12.camel@gaston> > > e.g. on typical PCI Express chipsets there is no such concept. > With error reporting there you just get told "one of your > previous accesses or DMAs failed", but the device is not > completely gone. > > IMHO the concept of "slot freeze" is too PPC64 specific to > be exposed like this. While there are other chipsets > that do similar things a lot of widely used ones will not. > > Can you reformulate it in terms of "error states" ? > Just report an error occurred and the device is unreliable. I don't want to expose it that way neither, but the fact is we can't just have "generic" states that apply to every architecture. I haven't yet looked at Linas latest patch though, but at one point, we need to define a few states that may or may not apply to a given architecture and give enough info to drivers to deal with them as much as they can. I'm afraid we can't completely avoid some of the complexity here. Ben. From ak at muc.de Sat Mar 12 22:30:16 2005 From: ak at muc.de (Andi Kleen) Date: 12 Mar 2005 12:30:16 +0100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <1110626195.5787.12.camel@gaston> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> Message-ID: <20050312113016.GA47310@muc.de> > I don't want to expose it that way neither, but the fact is we can't > just have "generic" states that apply to every architecture. I haven't > yet looked at Linas latest patch though, but at one point, we need to > define a few states that may or may not apply to a given architecture > and give enough info to drivers to deal with them as much as they can. > I'm afraid we can't completely avoid some of the complexity here. Perhaps, but Linas' version seems to be far too PPC64 centric to me. It's really not in your interest either to have a too ppc64 specific solution because it means much additional work to fix drivers for ppc64 that have been developed on other architectures. What's wrong with just simply telling the driver. an error occurred. all your recent transactions may be broken. To use the device again call "foo" first to fix the device. foo then returns if it fixed the device or not. I don't get why the driver even needs to know about isolation or not. It's not fundamentally different from an bus abort on other systems, just that it lasts longer. -Andi From benh at kernel.crashing.org Sat Mar 12 22:50:35 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sat, 12 Mar 2005 22:50:35 +1100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312113016.GA47310@muc.de> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> <20050312113016.GA47310@muc.de> Message-ID: <1110628235.19810.16.camel@gaston> > Perhaps, but Linas' version seems to be far too PPC64 centric to me. > > It's really not in your interest either to have a too ppc64 specific > solution because it means much additional work to fix drivers for > ppc64 that have been developed on other architectures. > > What's wrong with just simply telling the driver. > > an error occurred. all your recent transactions may be broken. > To use the device again call "foo" first to fix the device. > foo then returns if it fixed the device or not. I want something along those lines, except that I wnat it asynchronous because of the issue of drivers sharing the same bus segment that need to be all notifed before we can re-enable things. Also, I can either just re-enable IOs, re-enable DMA, both, reset the slot, etc.... I may not offer that rich functionality in the generic API, but I need to find the right "cutting point". Just re-enabling IOs is useful for drivers who can extract diagnostic infos from the device, for example after a DMA error. Resetting the slot may be necessary to get some devices back. > I don't get why the driver even needs to know about isolation > or not. It's not fundamentally different from an bus abort > on other systems, just that it lasts longer. From grundler at parisc-linux.org Sun Mar 13 04:22:25 2005 From: grundler at parisc-linux.org (Grant Grundler) Date: Sat, 12 Mar 2005 10:22:25 -0700 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <1110628235.19810.16.camel@gaston> References: <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> <20050312113016.GA47310@muc.de> <1110628235.19810.16.camel@gaston> Message-ID: <20050312172225.GA1978@colo.lackof.org> On Sat, Mar 12, 2005 at 10:50:35PM +1100, Benjamin Herrenschmidt wrote: ... > > To use the device again call "foo" first to fix the device. > > foo then returns if it fixed the device or not. > > I want something along those lines, except that I wnat it asynchronous > because of the issue of drivers sharing the same bus segment that need > to be all notifed before we can re-enable things. Also, I can either > just re-enable IOs, re-enable DMA, both, reset the slot, etc.... > > I may not offer that rich functionality in the generic API, Why not? Can't we do that today with various PCI initialization routines that provide arch (pcibios) specific hooks? e.g. pci_set_master vs pci_enable_device I'm wondering if the second part of the error recovery path in the driver can use it's "normal" initialization sequence. Proably needs adjusting to look for error states and the first part will need to clean up pending IO requests. > but I need > to find the right "cutting point". Just re-enabling IOs is useful for > drivers who can extract diagnostic infos from the device, for example > after a DMA error. By "IO", I'm guessing you mean MMIO or IO Port space access. This implies only the device driver knows what/where any diag info lives. But some of the info is architected in PCI: SERR and PERR status bits. PCIe seems to be richer in error reporting but I don't know details. I think the majority of the error info is much more likely to be held in driver state and platform chipset state. E.g. only the driver will be able to associate a particular IO request with the invalid DMA or MMIO address that the chipset captured. The driver can reject that IO (with extreme prejudice so it doesn't get retried) and restart the PCI device. In case it's not obvious, this is all just hand waving and maybe it will inspire something more realistic... > Resetting the slot may be necessary to get some devices back. *nod* Or even several slots. > > I don't get why the driver even needs to know about isolation > > or not. It's not fundamentally different from an bus abort > > on other systems, just that it lasts longer. I think the driver just needs to know if it's ok to do MMIO/IO Port access to the device or not at any given point in time. A simpler strategy could be to just blow away (PCI Bus reset) the failed device(s) and reconfigure the PCI bus. Then call back into the drivers to tell them their devices suffered an "event". But then finer grain recovery isn't really possible. grant From aw-confirm at ebay.com Mon Mar 14 11:28:53 2005 From: aw-confirm at ebay.com (eBay) Date: Sun, 13 Mar 2005 17:28:53 -0700 Subject: Fraudulent Account Message-ID: An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050313/494f39b0/attachment.htm From benh at kernel.crashing.org Sun Mar 13 10:05:05 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Sun, 13 Mar 2005 10:05:05 +1100 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312172225.GA1978@colo.lackof.org> References: <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> <20050312113016.GA47310@muc.de> <1110628235.19810.16.camel@gaston> <20050312172225.GA1978@colo.lackof.org> Message-ID: <1110668705.5787.31.camel@gaston> > Why not? > Can't we do that today with various PCI initialization > routines that provide arch (pcibios) specific hooks? > e.g. pci_set_master vs pci_enable_device Well, it gets complicated. For example, the driver may try to re-enable IO to check for some status, but that itself triggers a new error right away because the HW is dead ... Maybe the driver should "assume" by default indeed that the slot is isolated by default (even if it's not on non-ppc64 archs) and thus has to call pci_enable_device() and pci_set_master() again. It adds some burden to the underlying code though to figure out if those calls are emitted in the context of an error or not, since the operations are completely different at the firmware level. Then, there is need to inform the driver as well of the capability to reset the slot, to be used if the driver decides it can't recover. Finally, I'm not fan at _ALL_ of providing synchronous APIs like pci_enable_device() or pci_set_master(). In fact, those two would be not _too_ bad, but the slot reset is more nasty. The problem is that we have potentially more than one driver affected. Even if the error was triggered by one card/function, several cards/functions may have been isolated etc... We need to "notify" all drivers, give them a chance to re-enable device & gather diagnostic data, etc... before we try to reset the slot if a driver decides it requires that to happen. Also, if a driver is ok after just enabling the device() re-initializes itself, but it's sibling decides it needs to reset the slot ? This is why I'm more inclined toward a callback that acts like a state machine. > I'm wondering if the second part of the error recovery path in > the driver can use it's "normal" initialization sequence. > Proably needs adjusting to look for error states and the first > part will need to clean up pending IO requests. Oh it could, but I wouldn't make it mandatory by calling probe() or whatevre. It's up to each driver to decide, easy enough to move their init code into a function called by both code path. > > but I need > > to find the right "cutting point". Just re-enabling IOs is useful for > > drivers who can extract diagnostic infos from the device, for example > > after a DMA error. > > By "IO", I'm guessing you mean MMIO or IO Port space access. Yes. > This implies only the device driver knows what/where any diag info lives. Yes. > But some of the info is architected in PCI: SERR and PERR status bits. > PCIe seems to be richer in error reporting but I don't know details. Oh, sure, and that's why it may not be worth bothering about this "step" and just always reset the slot when we can. But we then need to inform the driver of what happened since not all platforms will be able to do that. That would definitely simplify the above problem, and this is what I meant by "I may not offer that rich functionality in the generic API" > I think the majority of the error info is much more likely to be held > in driver state and platform chipset state. E.g. only the driver will > be able to associate a particular IO request with the invalid DMA or > MMIO address that the chipset captured. The driver can reject that IO > (with extreme prejudice so it doesn't get retried) and restart the PCI > device. > > In case it's not obvious, this is all just hand waving and maybe > it will inspire something more realistic... > > > Resetting the slot may be necessary to get some devices back. > > *nod* Or even several slots. > > > > I don't get why the driver even needs to know about isolation > > > or not. It's not fundamentally different from an bus abort > > > on other systems, just that it lasts longer. > > I think the driver just needs to know if it's ok to do MMIO/IO Port > access to the device or not at any given point in time. > > A simpler strategy could be to just blow away (PCI Bus reset) the failed > device(s) and reconfigure the PCI bus. Then call back into the drivers > to tell them their devices suffered an "event". But then finer grain > recovery isn't really possible. Yes, but I think fine grained recovery ends up beeing an API nightmare when you start dealing with several drivers on the same segment with conflicting requirements for recovery. Now, the problem is that we have to provide both approaches anway, as a lot of platforms can't do anything but clear the SERR/PERR state and hope the driver can go on. So we need to inform the driver of the platform capability in a way. Ben. From paulus at samba.org Mon Mar 14 08:47:15 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Mar 2005 08:47:15 +1100 Subject: [PATCH] enable DEBUG via config option In-Reply-To: <20050211105453.GA31718@suse.de> References: <20050211105453.GA31718@suse.de> Message-ID: <16948.46307.76370.206088@cargo.ozlabs.ibm.com> Olaf Hering writes: > Its always boring to edit each file and turn the #undef DEBUG into > #define DEBUG. This patch makes it a simple config option. > Now the question is, how verbose will the boot be when all the printk > are enabled? appears to be ok so far on a p620. Having it as a config option seems to be of use only to a few kernel developers. Why don't you just edit the Makefile and add -DDEBUG to the CFLAGS when you want to do that? Paul. From paulus at samba.org Mon Mar 14 20:27:29 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Mar 2005 20:27:29 +1100 Subject: [PATCH] ppc64: fix nvram partition scan In-Reply-To: References: Message-ID: <16949.22785.701996.273162@cargo.ozlabs.ibm.com> Utz Bacher writes: > the following patch against 2.6.11-rc4 corrects some problems with bad NVRAM > contents: > - when the checksum is incorrect, better do not trust anything (instead > of assuming the length is correct) > - when the partition length is zero, stop looking for more partitions > instead of looping I have tidied up some of the messages and changed the loop exits to conform to the usual kernel style. Any comments on this version? Paul. diff -urN linux-2.5/arch/ppc64/kernel/nvram.c test/arch/ppc64/kernel/nvram.c --- linux-2.5/arch/ppc64/kernel/nvram.c 2005-03-14 18:03:26.000000000 +1100 +++ test/arch/ppc64/kernel/nvram.c 2005-03-14 20:26:55.000000000 +1100 @@ -507,8 +507,8 @@ struct nvram_partition * tmp_part; unsigned char c_sum; char * header; - long size; int total_size; + int err; if (ppc_md.nvram_size == NULL) return -ENODEV; @@ -522,29 +522,37 @@ while (cur_index < total_size) { - size = ppc_md.nvram_read(header, NVRAM_HEADER_LEN, &cur_index); - if (size != NVRAM_HEADER_LEN) { + err = ppc_md.nvram_read(header, NVRAM_HEADER_LEN, &cur_index); + if (err != NVRAM_HEADER_LEN) { printk(KERN_ERR "nvram_scan_partitions: Error parsing " "nvram partitions\n"); - kfree(header); - return size; + goto out; } cur_index -= NVRAM_HEADER_LEN; /* nvram_read will advance us */ memcpy(&phead, header, NVRAM_HEADER_LEN); + err = 0; c_sum = nvram_checksum(&phead); - if (c_sum != phead.checksum) - printk(KERN_WARNING "WARNING: nvram partition checksum " - "was %02x, should be %02x!\n", phead.checksum, c_sum); - + if (c_sum != phead.checksum) { + printk(KERN_WARNING "WARNING: nvram partition checksum" + " was %02x, should be %02x!\n", + phead.checksum, c_sum); + printk(KERN_WARNING "Terminating nvram partition scan\n"); + goto out; + } + if (!phead.length) { + printk(KERN_WARNING "WARNING: nvram corruption " + "detected: 0-length partition\n"); + goto out; + } tmp_part = (struct nvram_partition *) kmalloc(sizeof(struct nvram_partition), GFP_KERNEL); + err = -ENOMEM; if (!tmp_part) { printk(KERN_ERR "nvram_scan_partitions: kmalloc failed\n"); - kfree(header); - return -ENOMEM; + goto out; } memcpy(&tmp_part->header, &phead, NVRAM_HEADER_LEN); @@ -553,9 +561,11 @@ cur_index += phead.length * NVRAM_BLOCK_LEN; } + err = 0; + out: kfree(header); - return 0; + return err; } static int __init nvram_init(void) From paulus at samba.org Mon Mar 14 20:28:47 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Mar 2005 20:28:47 +1100 Subject: [RFC/PATCH] Updated: ppc64: Add mem=X option In-Reply-To: <20050225191408.599c613d.michael@ellerman.id.au> References: <20050222192423.727023f7.michael@ellerman.id.au> <20050225191408.599c613d.michael@ellerman.id.au> Message-ID: <16949.22863.622912.175918@cargo.ozlabs.ibm.com> Michael Ellerman writes: > Here is an updated patch for adding support for the mem=X boot option. It gets rejects now that Mike Kravetz's NUMA patch has gone in. Care to respin it? Paul. From paulus at samba.org Mon Mar 14 20:47:10 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Mar 2005 20:47:10 +1100 Subject: [PATCH][RFC] unlikely spinlocks In-Reply-To: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> References: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> Message-ID: <16949.23966.756568.902508@cargo.ozlabs.ibm.com> Jake Moilanen writes: > On our raw spinlocks, we currently have an attempt at the lock, and if > we do not get it we enter a spin loop. This spinloop will likely > continue for awhile, and we pridict likely. > > Shouldn't we predict that we will get out of the loop so our next > instructions are already prefetched. Even when we miss because the lock > is still held, it won't matter since we are waiting anyways. Possibly the best thing is not to put a static prediction on it at all, and let the machine's dynamic branch prediction decide which path to predict? Paul. From paulus at samba.org Mon Mar 14 21:13:36 2005 From: paulus at samba.org (Paul Mackerras) Date: Mon, 14 Mar 2005 21:13:36 +1100 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <20050310162513.74191caa.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> <20050310032213.GB20789@austin.ibm.com> <20050310162513.74191caa.moilanen@austin.ibm.com> Message-ID: <16949.25552.640180.677985@cargo.ozlabs.ibm.com> Jake Moilanen writes: > diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c > --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 > +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c 2005-03-08 16:08:54 -06:00 > @@ -99,6 +99,8 @@ static int set_brk(unsigned long start, > up_write(¤t->mm->mmap_sem); > if (BAD_ADDR(addr)) > return addr; > + > + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); I don't think I can push that upstream. What happens if you leave that out? More generally, we are making a user-visible change, even for programs that aren't marked as having non-executable stack or heap, because we are now enforcing that the program can't execute from mappings that don't have PROT_EXEC. Perhaps we should enforce the requirement for execute permission only on those programs that indicate somehow that they can handle it? Paul. From seto.hidetoshi at jp.fujitsu.com Mon Mar 14 23:33:03 2005 From: seto.hidetoshi at jp.fujitsu.com (Hidetoshi Seto) Date: Mon, 14 Mar 2005 21:33:03 +0900 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312013251.GA2609@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> Message-ID: <4235847F.3080705@jp.fujitsu.com> Linas Vepstas wrote: > "enum pci_device_io_state"; BenH was suggesting having > more of these ... BenH do you want to propose a "final list"? > (snip) > +/* ---------------------------------------------------------------- */ > +/** PCI error recovery state. Whenever the PCI bus state changes, > + * the io_state_change() callback will be called to notify the > + * device driver os state changes. > + */ > + > +enum pci_device_io_state { > + pci_device_io_frozen = 1, /* I/O to device is blocked */ > + pci_device_io_thawed, /* I/O te device is (re-)enabled */ > + pci_device_io_perm_failure, /* pci card is dead */ > +}; I'm not BenH... but I think it's of value to have the list of states. (Even it seems that the list what originally you want isn't "state list" but "event list".) IMHO, (according to current list) there will be 3 states at least: - NORMAL: Standard, usual, healthy state. Strictly speaking, this doesn't mean "everything works well." IOW - unreliable: "works but occasionally fails." You can access the device but checking the result is recommended. - ISOLATED: Physically connected but accesses are temporarily blocked. Devices would be unstable but maybe believed as recoverable. Error info on the platform or device would be inaccessible. The system could attempt to recover - change the state to NORMAL. - DEAD: Physically connected but accesses are permanently blocked. No recovery attempt is required any more. How many other state will be there? And, I guess you would need 3 types of event at least: - ERROR_DETECTED: An error was detected. Notified driver could test the device, collect advanced/extra error info and log it. - STATE_CHANGED: I/O state was changed. New state will be indicated in the param with this event. - TRY_RECOVER: OS requires possible device-specific-recovery to drivers. After gathering all results, OS will decide recovered or not. Depending on arch's facility and implementation, behavior of system changes terribly. For example, if we get an error when in NORMAL state: case 1) NORMAL -> NORMAL State isn't changed. The error will be reported by some kind of exception, read() will return broken(or poisoned) data, and write will be ignored. Even if subsequent I/O also fails, we can continue access to the device. # ex. ia32 case 2) NORMAL -> ISOLATED/DEAD Even if it was temporary soft error, system isolates the affected bus and devices. All subsequent I/O will be blocked(or poisoned/ignored). # ex. ppc64 case 3) System reset Even if it was temporary soft error, system goes to reboot immediately. All subsequent/pending I/O will be dismissed. # ex. ia64 (too sensitive...so now I'm engaged in :-p Therefore, you will (case 1)get a lot of ERROR_DETECTED events, or (case 2)get a STATE_CHANGED event with param indicating "ISOLATED," or (case 3)get nothing. Again, currently most of arch don't use states other than NORMAL... Now your intent: > + pci_device_io_frozen = 1, /* I/O to device is blocked */ > + pci_device_io_thawed, /* I/O te device is (re-)enabled */ > + pci_device_io_perm_failure, /* pci card is dead */ would be realized by: event(STATE_CHANGED,ISOLATED) + event(TRY_RECOVER,*data) event(STATE_CHANGED,NORMAL) event(STATE_CHANGED,DEAD) I think the latter style is more generic. Do these ideas become a clue to go on? Thanks, H.Seto From linas at austin.ibm.com Tue Mar 15 04:49:06 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 14 Mar 2005 11:49:06 -0600 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <1110668705.5787.31.camel@gaston> References: <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> <20050312113016.GA47310@muc.de> <1110628235.19810.16.camel@gaston> <20050312172225.GA1978@colo.lackof.org> <1110668705.5787.31.camel@gaston> Message-ID: <20050314174906.GA498@austin.ibm.com> Hi, > The problem is that we have > potentially more than one driver affected. Even if the error was > triggered by one card/function, several cards/functions may have been > isolated etc... To be specific, on PPC64 we have PIC busses that are physical cables that run from one rack-mounted drawer to the other rack cage that contains the cpu (the "CEC central electronics complex"). Each rack-monted cage may hold 4 or 8 or 16 PCI cards, and a failure on that bus could take out multiple PCI cards at once. Even on a plain-jane desktop system, one is confronted with "multi-function pci cards" which can cause multiple drivers to be loaded. > We need to "notify" all drivers, give them a chance to re-enable device > & gather diagnostic data, etc... before we try to reset the slot if a > driver decides it requires that to happen. Also, if a driver is ok after > just enabling the device() re-initializes itself, but it's sibling > decides it needs to reset the slot ? [...] > Yes, but I think fine grained recovery ends up beeing an API nightmare > when you start dealing with several drivers on the same segment with > conflicting requirements for recovery. I'm thinking of having a way of asking all affected drivers "what can you deal with?" and then playing to the lowest common denominator. For example, the current reset sequence tries to do a hotplug add-remove if the driver is ignorant. At this time, I distinguish "ignorant" from "not ignorant" based on whether the 'state change' callback is null or not. I'll try to think of something just a tiny bit more fine-grained than this. --linas From linas at austin.ibm.com Tue Mar 15 04:55:11 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 14 Mar 2005 11:55:11 -0600 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <20050312113016.GA47310@muc.de> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <20050312095232.GA31444@muc.de> <1110626195.5787.12.camel@gaston> <20050312113016.GA47310@muc.de> Message-ID: <20050314175511.GB498@austin.ibm.com> On Sat, Mar 12, 2005 at 12:30:16PM +0100, Andi Kleen was heard to remark: > > Perhaps, but Linas' version seems to be far too PPC64 centric to me. I'm trying to expand my horizons, and part of this includes asking for advice on this mailing list. I'd like to have Long Nguyen from Intel to be a bit more involved in the conversation, so that I don't get surprised by patches that do almost the same thing, but differently (as the last PCI express patch seems to sort-of/somehow might be doing). --linas From linas at austin.ibm.com Tue Mar 15 05:00:31 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 14 Mar 2005 12:00:31 -0600 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <200503121051.39015.arnd@arndb.de> References: <20050223002409.GA10909@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <200503121051.39015.arnd@arndb.de> Message-ID: <20050314180031.GC498@austin.ibm.com> On Sat, Mar 12, 2005 at 10:51:37AM +0100, Arnd Bergmann was heard to remark: > On S?nnavend 12 M?rz 2005 02:32, Linas Vepstas wrote: > > > Appended is my current draft PCI Error Recovery patch. > > Per previous conversatios, it has moved some of the ppc64-specific > > error reporting code into generic PCI structures: see changes to > > include/linux/pci.h and a new file drivers/pci/pci-error.c. Note > > How does that relate to the stuff that Long sent about PCIe > advanced error handling yesterday [1]? Is there an overlap? Dunno, I'm looking. I was surprised to see this patch; I invite Long to join the conversation and describe the situation in his view. --linas From linas at austin.ibm.com Tue Mar 15 05:14:20 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 14 Mar 2005 12:14:20 -0600 Subject: [PATCH/RFC] PCI Error Recovery In-Reply-To: <4235847F.3080705@jp.fujitsu.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <4235847F.3080705@jp.fujitsu.com> Message-ID: <20050314181420.GD498@austin.ibm.com> On Mon, Mar 14, 2005 at 09:33:03PM +0900, Hidetoshi Seto was heard to remark: > Linas Vepstas wrote: > >+enum pci_device_io_state { > > ... but I think it's of value to have the list of states. > (Even it seems that the list what originally you want isn't "state list" > but "event list".) Sorry, you are right, I confused the concept of "state transition" with the concept of "state", I will try to clarify the difference in the next patch. > would be realized by: > event(STATE_CHANGED,ISOLATED) + event(TRY_RECOVER,*data) > event(STATE_CHANGED,NORMAL) > event(STATE_CHANGED,DEAD) > I think the latter style is more generic. Hmm, are you suggesting that there **shouldn't** be a callback function in struct pci_driver, and that instead, all state changes should be delivered as events? (i.e. by means of the notifier_chain mechanism?) Hmm ... thats possible, I'd have to rearrange the code a bit. Is there a long-term philosphy for the Linux kernel on a question like this? That is, when should changes add callbacks to structures, as opposed to notifier-chain based events? The callback is a bit simpler, and maybe a tiny bit faster, but its less flexible in the long run (e.g. anyone can listen for the events, but only device drivers can get callbacks). Comments, please? --linas From linas at austin.ibm.com Tue Mar 15 06:36:40 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Mon, 14 Mar 2005 13:36:40 -0600 Subject: [RFC][PATCH] combining header files In-Reply-To: <20050310134216.5b9b27ef.sfr@canb.auug.org.au> References: <20050309120343.0c22eb0f.sfr@canb.auug.org.au> <20050309200109.GG1220@austin.ibm.com> <20050310134216.5b9b27ef.sfr@canb.auug.org.au> Message-ID: <20050314193640.GE498@austin.ibm.com> On Thu, Mar 10, 2005 at 01:42:16PM +1100, Stephen Rothwell was heard to remark: > On Wed, 9 Mar 2005 14:01:09 -0600 Linas Vepstas wrote: > > > > Why not #include instead? > > Because I am talking about similarities between ppc and ppc64 not ppc64 > and the generic code (though there may be some of those to be exploited as > well). Hmm. well, yes. I just figured that since you're looking at this anyway, may as well look to see if it can be made generic. --linas From utz.bacher at de.ibm.com Tue Mar 15 06:26:39 2005 From: utz.bacher at de.ibm.com (Utz Bacher) Date: Mon, 14 Mar 2005 20:26:39 +0100 Subject: [PATCH] ppc64: fix nvram partition scan In-Reply-To: <16949.22785.701996.273162@cargo.ozlabs.ibm.com> Message-ID: Paul Mackerras wrote: > I have tidied up some of the messages and changed the loop exits to > conform to the usual kernel style. Any comments on this version? Even better! Thanks, Utz :wq -------------- next part -------------- An HTML attachment was scrubbed... URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050314/490c4197/attachment.htm From moilanen at austin.ibm.com Tue Mar 15 08:51:25 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Mon, 14 Mar 2005 15:51:25 -0600 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <16949.25552.640180.677985@cargo.ozlabs.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> <20050310032213.GB20789@austin.ibm.com> <20050310162513.74191caa.moilanen@austin.ibm.com> <16949.25552.640180.677985@cargo.ozlabs.ibm.com> Message-ID: <20050314155125.68dcff70.moilanen@austin.ibm.com> On Mon, 14 Mar 2005 21:13:36 +1100 Paul Mackerras wrote: > Jake Moilanen writes: > > > diff -puN fs/binfmt_elf.c~nx-user-ppc64 fs/binfmt_elf.c > > --- linux-2.6-bk/fs/binfmt_elf.c~nx-user-ppc64 2005-03-08 16:08:54 -06:00 > > +++ linux-2.6-bk-moilanen/fs/binfmt_elf.c 2005-03-08 16:08:54 -06:00 > > @@ -99,6 +99,8 @@ static int set_brk(unsigned long start, > > up_write(¤t->mm->mmap_sem); > > if (BAD_ADDR(addr)) > > return addr; > > + > > + sys_mprotect(start, end-start, PROT_READ|PROT_WRITE|PROT_EXEC); > > I don't think I can push that upstream. What happens if you leave > that out? The bss and the plt are in the same segment, and plt obviously needs to be executable. Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 10000154 000154 00000d 00 A 0 0 1 ... ... [26] .plt NOBITS 10013c5c 003c34 000210 00 WAX 0 0 4 [27] .bss NOBITS 10013e6c 003c34 000128 00 WA 0 0 4 Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.SuSE .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .text .fini .rodata 03 .data .eh_frame .got2 .dynamic .ctors .dtors .jcr .got .sdata .sbss .plt .bss 04 .dynamic 05 .note.ABI-tag 06 .note.SuSE 07 Anton mentioned that Alan was considering putting plt into a new segment. > More generally, we are making a user-visible change, even for programs > that aren't marked as having non-executable stack or heap, because we > are now enforcing that the program can't execute from mappings that > don't have PROT_EXEC. Perhaps we should enforce the requirement for > execute permission only on those programs that indicate somehow that > they can handle it? Unless a program is compiled w/ pt_gnu_stacks we will set the READ_IMPLIES_EXEC personality and those applications should still work as normal. Jake From johnrose at austin.ibm.com Tue Mar 15 09:04:25 2005 From: johnrose at austin.ibm.com (John Rose) Date: Mon, 14 Mar 2005 16:04:25 -0600 Subject: [PATCH] remove unnecessary ISA ioports Message-ID: <1110837865.3586.28.camel@sinatra.austin.ibm.com> During boot, pSeries_request_regions() should only request I/O ports for legacy ISA in the case that ISA exists on the system. Add a check for this. This patch was suggested by Anton. Please apply, if appropriate. Thanks- John Signed-off-by: John Rose diff -puN arch/ppc64/kernel/pSeries_pci.c~02_ppc64_request_regions arch/ppc64/kernel/pSeries_pci.c --- 2_6_linus_4/arch/ppc64/kernel/pSeries_pci.c~02_ppc64_request_regions 2005-03-14 15:59:44.000000000 -0600 +++ 2_6_linus_4-johnrose/arch/ppc64/kernel/pSeries_pci.c 2005-03-14 15:59:44.000000000 -0600 @@ -540,6 +540,9 @@ EXPORT_SYMBOL(pcibios_remove_root_bus); static void __init pSeries_request_regions(void) { + if (!isa_io_base) + return; + request_region(0x20,0x20,"pic1"); request_region(0xa0,0x20,"pic2"); request_region(0x00,0x20,"dma1"); _ From paulus at samba.org Tue Mar 15 09:18:04 2005 From: paulus at samba.org (Paul Mackerras) Date: Tue, 15 Mar 2005 09:18:04 +1100 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <20050314155125.68dcff70.moilanen@austin.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> <20050310032213.GB20789@austin.ibm.com> <20050310162513.74191caa.moilanen@austin.ibm.com> <16949.25552.640180.677985@cargo.ozlabs.ibm.com> <20050314155125.68dcff70.moilanen@austin.ibm.com> Message-ID: <16950.3484.416343.832453@cargo.ozlabs.ibm.com> Jake Moilanen writes: > > I don't think I can push that upstream. What happens if you leave > > that out? > > The bss and the plt are in the same segment, and plt obviously needs to > be executable. Yes... what I was asking was "do things actually break if you leave that out, or does the binfmt_elf loader honour the 'x' permission on the PT_LOAD entry for the data/bss region, meaning that it all just works anyway?" I did an objdump -p on some random 32-bit binaries, and they all have "rwx" flags on the data/bss segment (the second PT_LOAD entry). And when I look in /proc//maps, it seems that the heap is in fact marked executable (this is without your patch). So why do we need the hack in binfmt_elf.c? Paul. From ntl at pobox.com Tue Mar 15 13:49:23 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:23 -0600 (CST) Subject: [PATCH 0/8] reworked support for pSeries dynamic reconfiguration (v2) Message-ID: <20050315024923.11665.85622.82498@otto> Thanks to those who gave feedback on the previous submission of this patch series. I've noted the changes I've made in the changelogs of the individual patch mails to follow. This patch series reworks existing ppc64 architecture support for the "dynamic reconfiguration" option of RPA platforms. This includes PCI hotplug and dynamic logical partitioning (DLPAR). This was all motivated by my desire to add code for better handling of processor addition and removal, but I didn't want to just add to the growing mess in prom.c where we have duplicated code for boot and DLPAR/hotplug. This adds very little new function, but gets rid of much duplicated code and introduces a new pSeries-specific file, pSeries_reconfig.c, which contains the core support for dynamic reconfiguration and implements a more refined version of the notifier chain API I posted a few weeks ago. Code that needs to act upon device nodes that are being added or removed can register with this notifier chain. I've ported as much code as possible to that API, and I expect memory DLPAR will want to use it too. The last couple of patches in the series modify the pSeries smp code so that we properly manage cpu_present_map with respect to DLPAR, and includes the "make cpu hotplug play well with maxcpus and smt-enabled" patch, which depends on this. The following cases have been tested on a Power5 system: * CPU add/remove * Virtual I/O adapter add/remove * Logical slot add/remove (thanks to John Rose) I also checked the build against all defconfigs in arch/ppc64/configs. diffstat for the combined series: arch/ppc64/kernel/Makefile | 2 arch/ppc64/kernel/pSeries_iommu.c | 25 arch/ppc64/kernel/pSeries_reconfig.c | 434 +++++++++++++ arch/ppc64/kernel/pSeries_smp.c | 231 +++++-- arch/ppc64/kernel/pci_dn.c | 22 arch/ppc64/kernel/proc_ppc64.c | 249 ------- arch/ppc64/kernel/prom.c | 474 ++++----------- arch/ppc64/kernel/setup.c | 12 arch/ppc64/kernel/smp.c | 13 include/asm-ppc64/machdep.h | 1 include/asm-ppc64/pSeries_reconfig.h | 25 include/asm-ppc64/prom.h | 4 12 files changed, 827 insertions(+), 665 deletions(-) Thanks, Nathan From ntl at pobox.com Tue Mar 15 13:49:28 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:28 -0600 (CST) Subject: [PATCH 1/8] preliminary changes to OF fixup functions In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024928.11665.31398.52683@otto> Preliminary modifications to support using some of the interpret_func family of functions at runtime. Changes the mem_start argument to be passed by reference, and the return type to int for error handling to be implemented in following patches. Signed-off-by: Nathan Lynch arch/ppc64/kernel/prom.c | 135 ++++++++++++++------------- 1 files changed, 71 insertions(+), 64 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/prom.c 2005-03-14 21:49:40.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/prom.c 2005-03-14 21:49:46.000000000 +0000 @@ -73,8 +73,8 @@ struct isa_reg_property { }; -typedef unsigned long interpret_func(struct device_node *, unsigned long, - int, int, int); +typedef int interpret_func(struct device_node *, unsigned long *, + int, int, int); extern struct rtas_t rtas; extern struct lmb lmb; @@ -255,9 +255,9 @@ static int __devinit map_interrupt(unsig return nintrc; } -static unsigned long __init finish_node_interrupts(struct device_node *np, - unsigned long mem_start, - int measure_only) +static int __init finish_node_interrupts(struct device_node *np, + unsigned long *mem_start, + int measure_only) { unsigned int *ints; int intlen, intrcells, intrcount; @@ -267,14 +267,14 @@ static unsigned long __init finish_node_ ints = (unsigned int *) get_property(np, "interrupts", &intlen); if (ints == NULL) - return mem_start; + return 0; intrcells = prom_n_intr_cells(np); intlen /= intrcells * sizeof(unsigned int); - np->intrs = (struct interrupt_info *) mem_start; - mem_start += intlen * sizeof(struct interrupt_info); + np->intrs = (struct interrupt_info *) (*mem_start); + (*mem_start) += intlen * sizeof(struct interrupt_info); if (measure_only) - return mem_start; + return 0; intrcount = 0; for (i = 0; i < intlen; ++i, ints += intrcells) { @@ -315,13 +315,13 @@ static unsigned long __init finish_node_ } np->n_intrs = intrcount; - return mem_start; + return 0; } -static unsigned long __init interpret_pci_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_pci_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; struct pci_reg_property *pci_addrs; @@ -331,7 +331,7 @@ static unsigned long __init interpret_pc get_property(np, "assigned-addresses", &l); if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct pci_reg_property)) >= 0) { if (!measure_only) { adr[i].space = pci_addrs[i].addr.a_hi; @@ -343,15 +343,15 @@ static unsigned long __init interpret_pc } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_dbdma_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_dbdma_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct reg_property32 *rp; struct address_range *adr; @@ -372,7 +372,7 @@ static unsigned long __init interpret_db rp = (struct reg_property32 *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct reg_property32)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct reg_property32)) >= 0) { if (!measure_only) { adr[i].space = 2; @@ -383,16 +383,16 @@ static unsigned long __init interpret_db } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_macio_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_macio_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct reg_property32 *rp; struct address_range *adr; @@ -413,7 +413,7 @@ static unsigned long __init interpret_ma rp = (struct reg_property32 *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct reg_property32)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct reg_property32)) >= 0) { if (!measure_only) { adr[i].space = 2; @@ -424,16 +424,16 @@ static unsigned long __init interpret_ma } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_isa_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_isa_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct isa_reg_property *rp; struct address_range *adr; @@ -442,7 +442,7 @@ static unsigned long __init interpret_is rp = (struct isa_reg_property *) get_property(np, "reg", &l); if (rp != 0 && l >= sizeof(struct isa_reg_property)) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= sizeof(struct isa_reg_property)) >= 0) { if (!measure_only) { adr[i].space = rp[i].space; @@ -453,16 +453,16 @@ static unsigned long __init interpret_is } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init interpret_root_props(struct device_node *np, - unsigned long mem_start, - int naddrc, int nsizec, - int measure_only) +static int __init interpret_root_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; int i, l; @@ -472,7 +472,7 @@ static unsigned long __init interpret_ro rp = (unsigned int *) get_property(np, "reg", &l); if (rp != 0 && l >= rpsize) { i = 0; - adr = (struct address_range *) mem_start; + adr = (struct address_range *) (*mem_start); while ((l -= rpsize) >= 0) { if (!measure_only) { adr[i].space = 0; @@ -484,26 +484,30 @@ static unsigned long __init interpret_ro } np->addrs = adr; np->n_addrs = i; - mem_start += i * sizeof(struct address_range); + (*mem_start) += i * sizeof(struct address_range); } - return mem_start; + return 0; } -static unsigned long __init finish_node(struct device_node *np, - unsigned long mem_start, - interpret_func *ifunc, - int naddrc, int nsizec, - int measure_only) +static int __init finish_node(struct device_node *np, + unsigned long *mem_start, + interpret_func *ifunc, + int naddrc, int nsizec, + int measure_only) { struct device_node *child; - int *ip; + int *ip, rc = 0; /* get the device addresses and interrupts */ if (ifunc != NULL) - mem_start = ifunc(np, mem_start, naddrc, nsizec, measure_only); + rc = ifunc(np, mem_start, naddrc, nsizec, measure_only); + if (rc) + goto out; - mem_start = finish_node_interrupts(np, mem_start, measure_only); + rc = finish_node_interrupts(np, mem_start, measure_only); + if (rc) + goto out; /* Look for #address-cells and #size-cells properties. */ ip = (int *) get_property(np, "#address-cells", NULL); @@ -539,11 +543,14 @@ static unsigned long __init finish_node( || !strcmp(np->type, "media-bay")))) ifunc = NULL; - for (child = np->child; child != NULL; child = child->sibling) - mem_start = finish_node(child, mem_start, ifunc, - naddrc, nsizec, measure_only); - - return mem_start; + for (child = np->child; child != NULL; child = child->sibling) { + rc = finish_node(child, mem_start, ifunc, + naddrc, nsizec, measure_only); + if (rc) + goto out; + } +out: + return rc; } /** @@ -555,7 +562,7 @@ static unsigned long __init finish_node( */ void __init finish_device_tree(void) { - unsigned long mem, size; + unsigned long start, end, size = 0; DBG(" -> finish_device_tree\n"); @@ -568,11 +575,11 @@ void __init finish_device_tree(void) virt_irq_init(); /* Finish device-tree (pre-parsing some properties etc...) */ - size = finish_node(allnodes, 0, NULL, 0, 0, 1); - mem = (unsigned long)abs_to_virt(lmb_alloc(size, 128)); - if (finish_node(allnodes, mem, NULL, 0, 0, 0) != mem + size) - BUG(); - + finish_node(allnodes, &size, NULL, 0, 0, 1); + end = start = (unsigned long)abs_to_virt(lmb_alloc(size, 128)); + finish_node(allnodes, &end, NULL, 0, 0, 0); + BUG_ON(end != start + size); + DBG(" <- finish_device_tree\n"); } From ntl at pobox.com Tue Mar 15 13:49:33 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:33 -0600 (CST) Subject: [PATCH 2/8] make OF node fixup code usable at runtime In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024933.11665.3281.50892@otto> Updates since last submission: o I decided to use Ben's suggestion to introduce a small wrapper function for handling allocations. At first I thought I would use John's idea of checking the dynamic flag in the node to decide whether to use kmalloc, but I think this way is better since it keeps all that logic out of the interpret_func-style routines. At boot we recurse through the device tree "fixing up" various fields and properties in the device nodes. Long ago, to support DLPAR and hotplug, we largely duplicated some of this fixup code, the main difference being that the new code used kmalloc for allocating various data structures which are attached to the new device nodes. This patch introduces a helper function (prom_alloc) for handling allocations at both boot and runtime, kills most of the duplicated code, and makes finish_node, finish_node_interrupts, and interpret_pci_props suitable for use at runtime by converting them to use prom_alloc. Signed-off-by: Nathan Lynch arch/ppc64/kernel/prom.c | 177 +++++++++------------------ 1 files changed, 62 insertions(+), 115 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/prom.c 2005-03-14 21:49:46.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/prom.c 2005-03-14 21:54:08.000000000 +0000 @@ -103,6 +103,25 @@ static DEFINE_RWLOCK(devtree_lock); struct device_node *of_chosen; /* + * Wrapper for allocating memory for various data that needs to be + * attached to device nodes as they are processed at boot or when + * added to the device tree later (e.g. DLPAR). At boot there is + * already a region reserved so we just increment *mem_start by size; + * otherwise we call kmalloc. + */ +static void * prom_alloc(unsigned long size, unsigned long *mem_start) +{ + unsigned long tmp; + + if (!mem_start) + return kmalloc(size, GFP_KERNEL); + + tmp = *mem_start; + *mem_start += size; + return (void *)tmp; +} + +/* * Find the device_node with a given phandle. */ static struct device_node * find_phandle(phandle ph) @@ -255,9 +274,9 @@ static int __devinit map_interrupt(unsig return nintrc; } -static int __init finish_node_interrupts(struct device_node *np, - unsigned long *mem_start, - int measure_only) +static int __devinit finish_node_interrupts(struct device_node *np, + unsigned long *mem_start, + int measure_only) { unsigned int *ints; int intlen, intrcells, intrcount; @@ -270,8 +289,10 @@ static int __init finish_node_interrupts return 0; intrcells = prom_n_intr_cells(np); intlen /= intrcells * sizeof(unsigned int); - np->intrs = (struct interrupt_info *) (*mem_start); - (*mem_start) += intlen * sizeof(struct interrupt_info); + + np->intrs = prom_alloc(intlen * sizeof(*(np->intrs)), mem_start); + if (!np->intrs) + return -ENOMEM; if (measure_only) return 0; @@ -318,33 +339,39 @@ static int __init finish_node_interrupts return 0; } -static int __init interpret_pci_props(struct device_node *np, - unsigned long *mem_start, - int naddrc, int nsizec, - int measure_only) +static int __devinit interpret_pci_props(struct device_node *np, + unsigned long *mem_start, + int naddrc, int nsizec, + int measure_only) { struct address_range *adr; struct pci_reg_property *pci_addrs; - int i, l; + int i, l, n_addrs; pci_addrs = (struct pci_reg_property *) get_property(np, "assigned-addresses", &l); - if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { - i = 0; - adr = (struct address_range *) (*mem_start); - while ((l -= sizeof(struct pci_reg_property)) >= 0) { - if (!measure_only) { - adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo | - ((u64)pci_addrs[i].addr.a_mid << 32); - adr[i].size = pci_addrs[i].size_lo; - } - ++i; - } - np->addrs = adr; - np->n_addrs = i; - (*mem_start) += i * sizeof(struct address_range); + if (!pci_addrs) + return 0; + + n_addrs = l / sizeof(*pci_addrs); + + adr = prom_alloc(n_addrs * sizeof(*adr), mem_start); + if (!adr) + return -ENOMEM; + + if (measure_only) + return 0; + + np->addrs = adr; + np->n_addrs = n_addrs; + + for (i = 0; i < n_addrs; i++) { + adr[i].space = pci_addrs[i].addr.a_hi; + adr[i].address = pci_addrs[i].addr.a_lo | + ((u64)pci_addrs[i].addr.a_mid << 32); + adr[i].size = pci_addrs[i].size_lo; } + return 0; } @@ -490,11 +517,11 @@ static int __init interpret_root_props(s return 0; } -static int __init finish_node(struct device_node *np, - unsigned long *mem_start, - interpret_func *ifunc, - int naddrc, int nsizec, - int measure_only) +static int __devinit finish_node(struct device_node *np, + unsigned long *mem_start, + interpret_func *ifunc, + int naddrc, int nsizec, + int measure_only) { struct device_node *child; int *ip, rc = 0; @@ -1627,54 +1654,6 @@ static void remove_node_proc_entries(str #endif /* CONFIG_PROC_DEVICETREE */ /* - * Fix up n_intrs and intrs fields in a new device node - * - */ -static int of_finish_dynamic_node_interrupts(struct device_node *node) -{ - int intrcells, intlen, i; - unsigned *irq, *ints, virq; - struct device_node *ic; - - ints = (unsigned int *)get_property(node, "interrupts", &intlen); - intrcells = prom_n_intr_cells(node); - intlen /= intrcells * sizeof(unsigned int); - node->n_intrs = intlen; - node->intrs = kmalloc(sizeof(struct interrupt_info) * intlen, - GFP_KERNEL); - if (!node->intrs) - return -ENOMEM; - - for (i = 0; i < intlen; ++i) { - int n, j; - node->intrs[i].line = 0; - node->intrs[i].sense = 1; - n = map_interrupt(&irq, &ic, node, ints, intrcells); - if (n <= 0) - continue; - virq = virt_irq_create_mapping(irq[0]); - if (virq == NO_IRQ) { - printk(KERN_CRIT "Could not allocate interrupt " - "number for %s\n", node->full_name); - return -ENOMEM; - } - node->intrs[i].line = irq_offset_up(virq); - if (n > 1) - node->intrs[i].sense = irq[1]; - if (n > 2) { - printk(KERN_DEBUG "hmmm, got %d intr cells for %s:", n, - node->full_name); - for (j = 0; j < n; ++j) - printk(" %d", irq[j]); - printk("\n"); - } - ints += intrcells; - } - return 0; -} - - -/* * Fix up the uninitialized fields in a new device node: * name, type, n_addrs, addrs, n_intrs, intrs, and pci-specific fields * @@ -1685,7 +1664,9 @@ static int of_finish_dynamic_node_interr * This should probably be split up into smaller chunks. */ -static int of_finish_dynamic_node(struct device_node *node) +static int of_finish_dynamic_node(struct device_node *node, + unsigned long *unused1, int unused2, + int unused3, int unused4) { struct device_node *parent = of_get_parent(node); u32 *regs; @@ -1710,41 +1691,6 @@ static int of_finish_dynamic_node(struct if ((ibm_phandle = (unsigned int *)get_property(node, "ibm,phandle", NULL))) node->linux_phandle = *ibm_phandle; - /* do the work of interpret_pci_props */ - if (parent->type && !strcmp(parent->type, "pci")) { - struct address_range *adr; - struct pci_reg_property *pci_addrs; - int i, l; - - pci_addrs = (struct pci_reg_property *) - get_property(node, "assigned-addresses", &l); - if (pci_addrs != 0 && l >= sizeof(struct pci_reg_property)) { - i = 0; - adr = kmalloc(sizeof(struct address_range) * - (l / sizeof(struct pci_reg_property)), - GFP_KERNEL); - if (!adr) { - err = -ENOMEM; - goto out; - } - while ((l -= sizeof(struct pci_reg_property)) >= 0) { - adr[i].space = pci_addrs[i].addr.a_hi; - adr[i].address = pci_addrs[i].addr.a_lo | - ((u64)pci_addrs[i].addr.a_mid << 32); - adr[i].size = pci_addrs[i].size_lo; - ++i; - } - node->addrs = adr; - node->n_addrs = i; - } - } - - /* now do the work of finish_node_interrupts */ - if (get_property(node, "interrupts", NULL)) { - err = of_finish_dynamic_node_interrupts(node); - if (err) goto out; - } - /* now do the rough equivalent of update_dn_pci_info, this * probably is not correct for phb's, but should work for * IOAs and slots. @@ -1796,7 +1742,8 @@ int of_add_node(const char *path, struct return -EINVAL; /* could also be ENOMEM, though */ } - if (0 != (err = of_finish_dynamic_node(np))) { + err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); + if (err < 0) { kfree(np); return err; } From ntl at pobox.com Tue Mar 15 13:49:38 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:38 -0600 (CST) Subject: [PATCH 3/8] introduce pSeries_reconfig.[ch] In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024938.11665.274.97750@otto> Updates since last submission: o The memory leaks in the error path of of_add_node which Arnd pointed out should be gone (see pSeries_reconfig_add_node). o Fixed the potential null pointer dereference in the error path of pSeries_reconfig_add_node, where the code would try to kfree(np->full_name) even though the allocation for np had failed. o As suggested by John, changed the names of of_add_node and of_remove_node to of_attach_node and of_detach_node, respectively, to reflect the changes in their meanings. Move as much pSeries-specific DLPAR/hotplug code as possible into its own file, which is built only when pSeries support is enabled in the config. This new file is intended to contain support code for the "Dynamic Reconfiguration" option in the RISC Platform Architecture, which encompasses both PCI hotplug and dynamic logical partitioning (DLPAR). This patch mostly just moves code around, but the device node addition and removal API is slightly modified. In this way, of_add_node and of_remove_node are now responsible only for safely updating the device tree and global list, without all the other stuff like proc entries etc. of_add_node and of_remove_node have been renamed to of_attach_node and of_detach_node, respectively. This also adds the definitions and api for a notifier chain which is meant to be used by code that must act upon device node addition or removal. Patches to migrate code to the notifier api follow in this series. Signed-off-by: Nathan Lynch arch/ppc64/kernel/Makefile | 2 arch/ppc64/kernel/pSeries_reconfig.c | 446 +++++++++++++++ arch/ppc64/kernel/proc_ppc64.c | 249 -------- arch/ppc64/kernel/prom.c | 156 ----- include/asm-ppc64/pSeries_reconfig.h | 25 include/asm-ppc64/prom.h | 4 6 files changed, 487 insertions(+), 395 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/Makefile =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/Makefile 2005-03-14 21:28:14.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/Makefile 2005-03-14 22:06:42.000000000 +0000 @@ -31,7 +31,7 @@ obj-$(CONFIG_PPC_ISERIES) += iSeries_irq obj-$(CONFIG_PPC_MULTIPLATFORM) += nvram.o i8259.o prom_init.o prom.o mpic.o obj-$(CONFIG_PPC_PSERIES) += pSeries_pci.o pSeries_lpar.o pSeries_hvCall.o \ - pSeries_nvram.o rtasd.o ras.o \ + pSeries_nvram.o rtasd.o ras.o pSeries_reconfig.o \ xics.o rtas.o pSeries_setup.o pSeries_iommu.o obj-$(CONFIG_EEH) += eeh.o Index: linux-2.6.11-bk10/arch/ppc64/kernel/proc_ppc64.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/proc_ppc64.c 2005-03-14 21:28:14.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/proc_ppc64.c 2005-03-14 22:06:42.000000000 +0000 @@ -41,20 +41,6 @@ static struct file_operations page_map_f .mmap = page_map_mmap }; -#ifdef CONFIG_PPC_PSERIES -/* routines for /proc/ppc64/ofdt */ -static ssize_t ofdt_write(struct file *, const char __user *, size_t, loff_t *); -static void proc_ppc64_create_ofdt(void); -static int do_remove_node(char *); -static int do_add_node(char *, size_t); -static void release_prop_list(const struct property *); -static struct property *new_property(const char *, const int, const unsigned char *, struct property *); -static char * parse_next_property(char *, char *, char **, int *, unsigned char**); -static struct file_operations ofdt_fops = { - .write = ofdt_write -}; -#endif - /* * Create the ppc64 and ppc64/rtas directories early. This allows us to * assume that they have been previously created in drivers. @@ -92,11 +78,6 @@ static int __init proc_ppc64_init(void) pde->size = PAGE_SIZE; pde->proc_fops = &page_map_fops; -#ifdef CONFIG_PPC_PSERIES - if ((systemcfg->platform & PLATFORM_PSERIES)) - proc_ppc64_create_ofdt(); -#endif - return 0; } __initcall(proc_ppc64_init); @@ -145,233 +126,3 @@ static int page_map_mmap( struct file *f return 0; } -#ifdef CONFIG_PPC_PSERIES -/* create /proc/ppc64/ofdt write-only by root */ -static void proc_ppc64_create_ofdt(void) -{ - struct proc_dir_entry *ent; - - ent = create_proc_entry("ppc64/ofdt", S_IWUSR, NULL); - if (ent) { - ent->nlink = 1; - ent->data = NULL; - ent->size = 0; - ent->proc_fops = &ofdt_fops; - } -} - -/** - * ofdt_write - perform operations on the Open Firmware device tree - * - * @file: not used - * @buf: command and arguments - * @count: size of the command buffer - * @off: not used - * - * Operations supported at this time are addition and removal of - * whole nodes along with their properties. Operations on individual - * properties are not implemented (yet). - */ -static ssize_t ofdt_write(struct file *file, const char __user *buf, size_t count, - loff_t *off) -{ - int rv = 0; - char *kbuf; - char *tmp; - - if (!(kbuf = kmalloc(count + 1, GFP_KERNEL))) { - rv = -ENOMEM; - goto out; - } - if (copy_from_user(kbuf, buf, count)) { - rv = -EFAULT; - goto out; - } - - kbuf[count] = '\0'; - - tmp = strchr(kbuf, ' '); - if (!tmp) { - rv = -EINVAL; - goto out; - } - *tmp = '\0'; - tmp++; - - if (!strcmp(kbuf, "add_node")) - rv = do_add_node(tmp, count - (tmp - kbuf)); - else if (!strcmp(kbuf, "remove_node")) - rv = do_remove_node(tmp); - else - rv = -EINVAL; -out: - kfree(kbuf); - return rv ? rv : count; -} - -static int do_remove_node(char *buf) -{ - struct device_node *node; - int rv = -ENODEV; - - if ((node = of_find_node_by_path(buf))) - rv = of_remove_node(node); - - of_node_put(node); - return rv; -} - -static int do_add_node(char *buf, size_t bufsize) -{ - char *path, *end, *name; - struct device_node *np; - struct property *prop = NULL; - unsigned char* value; - int length, rv = 0; - - end = buf + bufsize; - path = buf; - buf = strchr(buf, ' '); - if (!buf) - return -EINVAL; - *buf = '\0'; - buf++; - - if ((np = of_find_node_by_path(path))) { - of_node_put(np); - return -EINVAL; - } - - /* rv = build_prop_list(tmp, bufsize - (tmp - buf), &proplist); */ - while (buf < end && - (buf = parse_next_property(buf, end, &name, &length, &value))) { - struct property *last = prop; - - prop = new_property(name, length, value, last); - if (!prop) { - rv = -ENOMEM; - prop = last; - goto out; - } - } - if (!buf) { - rv = -EINVAL; - goto out; - } - - rv = of_add_node(path, prop); - -out: - if (rv) - release_prop_list(prop); - return rv; -} - -static struct property *new_property(const char *name, const int length, - const unsigned char *value, struct property *last) -{ - struct property *new = kmalloc(sizeof(*new), GFP_KERNEL); - - if (!new) - return NULL; - memset(new, 0, sizeof(*new)); - - if (!(new->name = kmalloc(strlen(name) + 1, GFP_KERNEL))) - goto cleanup; - if (!(new->value = kmalloc(length + 1, GFP_KERNEL))) - goto cleanup; - - strcpy(new->name, name); - memcpy(new->value, value, length); - *(((char *)new->value) + length) = 0; - new->length = length; - new->next = last; - return new; - -cleanup: - if (new->name) - kfree(new->name); - if (new->value) - kfree(new->value); - kfree(new); - return NULL; -} - -/** - * parse_next_property - process the next property from raw input buffer - * @buf: input buffer, must be nul-terminated - * @end: end of the input buffer + 1, for validation - * @name: return value; set to property name in buf - * @length: return value; set to length of value - * @value: return value; set to the property value in buf - * - * Note that the caller must make copies of the name and value returned, - * this function does no allocation or copying of the data. Return value - * is set to the next name in buf, or NULL on error. - */ -static char * parse_next_property(char *buf, char *end, char **name, int *length, - unsigned char **value) -{ - char *tmp; - - *name = buf; - - tmp = strchr(buf, ' '); - if (!tmp) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - *tmp = '\0'; - - if (++tmp >= end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - - /* now we're on the length */ - *length = -1; - *length = simple_strtoul(tmp, &tmp, 10); - if (*length == -1) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - if (*tmp != ' ' || ++tmp >= end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - - /* now we're on the value */ - *value = tmp; - tmp += *length; - if (tmp > end) { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - else if (tmp < end && *tmp != ' ' && *tmp != '\0') { - printk(KERN_ERR "property parse failed in %s at line %d\n", - __FUNCTION__, __LINE__); - return NULL; - } - tmp++; - - /* and now we should be on the next name, or the end */ - return tmp; -} - -static void release_prop_list(const struct property *prop) -{ - struct property *next; - for (; prop; prop = next) { - next = prop->next; - kfree(prop->name); - kfree(prop->value); - kfree(prop); - } - -} -#endif /* defined(CONFIG_PPC_PSERIES) */ Index: linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_reconfig.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-14 22:16:09.000000000 +0000 @@ -0,0 +1,446 @@ +/* + * pSeries_reconfig.c - support for dynamic reconfiguration (including PCI + * Hotplug and Dynamic Logical Partitioning on RPA platforms). + * + * Copyright (C) 2005 Nathan Lynch + * Copyright (C) 2005 IBM Corporation + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License version + * 2 as published by the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include +#include +#include + + + +/* + * Routines for "runtime" addition and removal of device tree nodes. + */ +#ifdef CONFIG_PROC_DEVICETREE +/* + * Add a node to /proc/device-tree. + */ +static void add_node_proc_entries(struct device_node *np) +{ + struct proc_dir_entry *ent; + + ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde); + if (ent) + proc_device_tree_add_node(np, ent); +} + +static void remove_node_proc_entries(struct device_node *np) +{ + struct property *pp = np->properties; + struct device_node *parent = np->parent; + + while (pp) { + remove_proc_entry(pp->name, np->pde); + pp = pp->next; + } + + /* Assuming that symlinks have the same parent directory as + * np->pde. + */ + if (np->name_link) + remove_proc_entry(np->name_link->name, parent->pde); + if (np->addr_link) + remove_proc_entry(np->addr_link->name, parent->pde); + if (np->pde) + remove_proc_entry(np->pde->name, parent->pde); +} +#else /* !CONFIG_PROC_DEVICETREE */ +static void add_node_proc_entries(struct device_node *np) +{ + return; +} + +static void remove_node_proc_entries(struct device_node *np) +{ + return; +} +#endif /* CONFIG_PROC_DEVICETREE */ + +/** + * derive_parent - basically like dirname(1) + * @path: the full_name of a node to be added to the tree + * + * Returns the node which should be the parent of the node + * described by path. E.g., for path = "/foo/bar", returns + * the node with full_name = "/foo". + */ +static struct device_node *derive_parent(const char *path) +{ + struct device_node *parent = NULL; + char *parent_path = "/"; + size_t parent_path_len = strrchr(path, '/') - path + 1; + + /* reject if path is "/" */ + if (!strcmp(path, "/")) + return ERR_PTR(-EINVAL); + + if (strrchr(path, '/') != path) { + parent_path = kmalloc(parent_path_len, GFP_KERNEL); + if (!parent_path) + return ERR_PTR(-ENOMEM); + strlcpy(parent_path, path, parent_path_len); + } + parent = of_find_node_by_path(parent_path); + if (!parent) + return ERR_PTR(-EINVAL); + if (strcmp(parent_path, "/")) + kfree(parent_path); + return parent; +} + +static struct notifier_block *pSeries_reconfig_chain; + +int pSeries_reconfig_notifier_register(struct notifier_block *nb) +{ + return notifier_chain_register(&pSeries_reconfig_chain, nb); +} + +void pSeries_reconfig_notifier_unregister(struct notifier_block *nb) +{ + notifier_chain_unregister(&pSeries_reconfig_chain, nb); +} + +static int pSeries_reconfig_add_node(const char *path, struct property *proplist) +{ + struct device_node *np; + int err = -ENOMEM; + + np = kcalloc(1, sizeof(*np), GFP_KERNEL); + if (!np) + goto out_err; + + np->full_name = kmalloc(strlen(path) + 1, GFP_KERNEL); + if (!np->full_name) + goto out_err; + + strcpy(np->full_name, path); + + np->properties = proplist; + OF_MARK_DYNAMIC(np); + kref_init(&np->kref); + + np->parent = derive_parent(path); + if (IS_ERR(np->parent)) { + err = PTR_ERR(np->parent); + goto out_err; + } + + err = notifier_call_chain(&pSeries_reconfig_chain, + PSERIES_RECONFIG_ADD, np); + if (err == NOTIFY_BAD) { + printk(KERN_ERR "Failed to add device node %s\n", path); + err = -ENOMEM; /* For now, safe to assume kmalloc failure */ + goto out_err; + } + + of_attach_node(np); + + add_node_proc_entries(np); + + of_node_put(np->parent); + + return 0; + +out_err: + if (np) { + of_node_put(np->parent); + kfree(np->full_name); + kfree(np); + } + return err; +} + +/* + * Prepare an OF node for removal from system + * XXX move this to pSeries_iommu.c + */ +static void of_cleanup_node(struct device_node *np) +{ + if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); +} + +static int pSeries_reconfig_remove_node(struct device_node *np) +{ + struct device_node *parent, *child; + + parent = of_get_parent(np); + if (!parent) + return -EINVAL; + + if ((child = of_get_next_child(np, NULL))) { + of_node_put(child); + return -EBUSY; + } + + of_cleanup_node(np); + + remove_node_proc_entries(np); + + notifier_call_chain(&pSeries_reconfig_chain, + PSERIES_RECONFIG_REMOVE, np); + of_detach_node(np); + + of_node_put(parent); + of_node_put(np); /* Must decrement the refcount */ + return 0; +} + +/* + * /proc/ppc64/ofdt - yucky binary interface for adding and removing + * OF device nodes. Should be deprecated as soon as we get an + * in-kernel wrapper for the RTAS ibm,configure-connector call. + */ + +static void release_prop_list(const struct property *prop) +{ + struct property *next; + for (; prop; prop = next) { + next = prop->next; + kfree(prop->name); + kfree(prop->value); + kfree(prop); + } + +} + +/** + * parse_next_property - process the next property from raw input buffer + * @buf: input buffer, must be nul-terminated + * @end: end of the input buffer + 1, for validation + * @name: return value; set to property name in buf + * @length: return value; set to length of value + * @value: return value; set to the property value in buf + * + * Note that the caller must make copies of the name and value returned, + * this function does no allocation or copying of the data. Return value + * is set to the next name in buf, or NULL on error. + */ +static char * parse_next_property(char *buf, char *end, char **name, int *length, + unsigned char **value) +{ + char *tmp; + + *name = buf; + + tmp = strchr(buf, ' '); + if (!tmp) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + *tmp = '\0'; + + if (++tmp >= end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + + /* now we're on the length */ + *length = -1; + *length = simple_strtoul(tmp, &tmp, 10); + if (*length == -1) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + if (*tmp != ' ' || ++tmp >= end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + + /* now we're on the value */ + *value = tmp; + tmp += *length; + if (tmp > end) { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + else if (tmp < end && *tmp != ' ' && *tmp != '\0') { + printk(KERN_ERR "property parse failed in %s at line %d\n", + __FUNCTION__, __LINE__); + return NULL; + } + tmp++; + + /* and now we should be on the next name, or the end */ + return tmp; +} + +static struct property *new_property(const char *name, const int length, + const unsigned char *value, struct property *last) +{ + struct property *new = kmalloc(sizeof(*new), GFP_KERNEL); + + if (!new) + return NULL; + memset(new, 0, sizeof(*new)); + + if (!(new->name = kmalloc(strlen(name) + 1, GFP_KERNEL))) + goto cleanup; + if (!(new->value = kmalloc(length + 1, GFP_KERNEL))) + goto cleanup; + + strcpy(new->name, name); + memcpy(new->value, value, length); + *(((char *)new->value) + length) = 0; + new->length = length; + new->next = last; + return new; + +cleanup: + if (new->name) + kfree(new->name); + if (new->value) + kfree(new->value); + kfree(new); + return NULL; +} + +static int do_add_node(char *buf, size_t bufsize) +{ + char *path, *end, *name; + struct device_node *np; + struct property *prop = NULL; + unsigned char* value; + int length, rv = 0; + + end = buf + bufsize; + path = buf; + buf = strchr(buf, ' '); + if (!buf) + return -EINVAL; + *buf = '\0'; + buf++; + + if ((np = of_find_node_by_path(path))) { + of_node_put(np); + return -EINVAL; + } + + /* rv = build_prop_list(tmp, bufsize - (tmp - buf), &proplist); */ + while (buf < end && + (buf = parse_next_property(buf, end, &name, &length, &value))) { + struct property *last = prop; + + prop = new_property(name, length, value, last); + if (!prop) { + rv = -ENOMEM; + prop = last; + goto out; + } + } + if (!buf) { + rv = -EINVAL; + goto out; + } + + rv = pSeries_reconfig_add_node(path, prop); + +out: + if (rv) + release_prop_list(prop); + return rv; +} + +static int do_remove_node(char *buf) +{ + struct device_node *node; + int rv = -ENODEV; + + if ((node = of_find_node_by_path(buf))) + rv = pSeries_reconfig_remove_node(node); + + of_node_put(node); + return rv; +} + +/** + * ofdt_write - perform operations on the Open Firmware device tree + * + * @file: not used + * @buf: command and arguments + * @count: size of the command buffer + * @off: not used + * + * Operations supported at this time are addition and removal of + * whole nodes along with their properties. Operations on individual + * properties are not implemented (yet). + */ +static ssize_t ofdt_write(struct file *file, const char __user *buf, size_t count, + loff_t *off) +{ + int rv = 0; + char *kbuf; + char *tmp; + + if (!(kbuf = kmalloc(count + 1, GFP_KERNEL))) { + rv = -ENOMEM; + goto out; + } + if (copy_from_user(kbuf, buf, count)) { + rv = -EFAULT; + goto out; + } + + kbuf[count] = '\0'; + + tmp = strchr(kbuf, ' '); + if (!tmp) { + rv = -EINVAL; + goto out; + } + *tmp = '\0'; + tmp++; + + if (!strcmp(kbuf, "add_node")) + rv = do_add_node(tmp, count - (tmp - kbuf)); + else if (!strcmp(kbuf, "remove_node")) + rv = do_remove_node(tmp); + else + rv = -EINVAL; +out: + kfree(kbuf); + return rv ? rv : count; +} + +static struct file_operations ofdt_fops = { + .write = ofdt_write +}; + +/* create /proc/ppc64/ofdt write-only by root */ +static int proc_ppc64_create_ofdt(void) +{ + struct proc_dir_entry *ent; + + if (!(systemcfg->platform & PLATFORM_PSERIES)) + return 0; + + ent = create_proc_entry("ppc64/ofdt", S_IWUSR, NULL); + if (ent) { + ent->nlink = 1; + ent->data = NULL; + ent->size = 0; + ent->proc_fops = &ofdt_fops; + } + + return 0; +} +__initcall(proc_ppc64_create_ofdt); Index: linux-2.6.11-bk10/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/prom.c 2005-03-14 21:54:08.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/prom.c 2005-03-14 22:15:45.000000000 +0000 @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include @@ -1575,84 +1574,6 @@ void of_node_put(struct device_node *nod } EXPORT_SYMBOL(of_node_put); -/** - * derive_parent - basically like dirname(1) - * @path: the full_name of a node to be added to the tree - * - * Returns the node which should be the parent of the node - * described by path. E.g., for path = "/foo/bar", returns - * the node with full_name = "/foo". - */ -static struct device_node *derive_parent(const char *path) -{ - struct device_node *parent = NULL; - char *parent_path = "/"; - size_t parent_path_len = strrchr(path, '/') - path + 1; - - /* reject if path is "/" */ - if (!strcmp(path, "/")) - return NULL; - - if (strrchr(path, '/') != path) { - parent_path = kmalloc(parent_path_len, GFP_KERNEL); - if (!parent_path) - return NULL; - strlcpy(parent_path, path, parent_path_len); - } - parent = of_find_node_by_path(parent_path); - if (strcmp(parent_path, "/")) - kfree(parent_path); - return parent; -} - -/* - * Routines for "runtime" addition and removal of device tree nodes. - */ -#ifdef CONFIG_PROC_DEVICETREE -/* - * Add a node to /proc/device-tree. - */ -static void add_node_proc_entries(struct device_node *np) -{ - struct proc_dir_entry *ent; - - ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde); - if (ent) - proc_device_tree_add_node(np, ent); -} - -static void remove_node_proc_entries(struct device_node *np) -{ - struct property *pp = np->properties; - struct device_node *parent = np->parent; - - while (pp) { - remove_proc_entry(pp->name, np->pde); - pp = pp->next; - } - - /* Assuming that symlinks have the same parent directory as - * np->pde. - */ - if (np->name_link) - remove_proc_entry(np->name_link->name, parent->pde); - if (np->addr_link) - remove_proc_entry(np->addr_link->name, parent->pde); - if (np->pde) - remove_proc_entry(np->pde->name, parent->pde); -} -#else /* !CONFIG_PROC_DEVICETREE */ -static void add_node_proc_entries(struct device_node *np) -{ - return; -} - -static void remove_node_proc_entries(struct device_node *np) -{ - return; -} -#endif /* CONFIG_PROC_DEVICETREE */ - /* * Fix up the uninitialized fields in a new device node: * name, type, n_addrs, addrs, n_intrs, intrs, and pci-specific fields @@ -1710,43 +1631,18 @@ out: } /* - * Given a path and a property list, construct an OF device node, add - * it to the device tree and global list, and place it in - * /proc/device-tree. This function may sleep. + * Plug a device node into the tree and global list. */ -int of_add_node(const char *path, struct property *proplist) +void of_attach_node(struct device_node *np) { - struct device_node *np; - int err = 0; - - np = kmalloc(sizeof(struct device_node), GFP_KERNEL); - if (!np) - return -ENOMEM; - - memset(np, 0, sizeof(*np)); - - np->full_name = kmalloc(strlen(path) + 1, GFP_KERNEL); - if (!np->full_name) { - kfree(np); - return -ENOMEM; - } - strcpy(np->full_name, path); - - np->properties = proplist; - OF_MARK_DYNAMIC(np); - kref_init(&np->kref); - of_node_get(np); - np->parent = derive_parent(path); - if (!np->parent) { - kfree(np); - return -EINVAL; /* could also be ENOMEM, though */ - } + int err; + /* This use of finish_node will be moved to a notifier so + * the error code can be used. + */ err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); - if (err < 0) { - kfree(np); - return err; - } + if (err < 0) + return; write_lock(&devtree_lock); np->sibling = np->parent->child; @@ -1754,21 +1650,6 @@ int of_add_node(const char *path, struct np->parent->child = np; allnodes = np; write_unlock(&devtree_lock); - - add_node_proc_entries(np); - - of_node_put(np->parent); - of_node_put(np); - return 0; -} - -/* - * Prepare an OF node for removal from system - */ -static void of_cleanup_node(struct device_node *np) -{ - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); } /* @@ -1776,23 +1657,14 @@ static void of_cleanup_node(struct devic * a reference to the node. The memory associated with the node * is not freed until its refcount goes to zero. */ -int of_remove_node(struct device_node *np) +void of_detach_node(const struct device_node *np) { - struct device_node *parent, *child; + struct device_node *parent; - parent = of_get_parent(np); - if (!parent) - return -EINVAL; - - if ((child = of_get_next_child(np, NULL))) { - of_node_put(child); - return -EBUSY; - } + write_lock(&devtree_lock); - of_cleanup_node(np); + parent = np->parent; - write_lock(&devtree_lock); - remove_node_proc_entries(np); if (allnodes == np) allnodes = np->allnext; else { @@ -1814,10 +1686,8 @@ int of_remove_node(struct device_node *n ; prevsib->sibling = np->sibling; } + write_unlock(&devtree_lock); - of_node_put(parent); - of_node_put(np); /* Must decrement the refcount */ - return 0; } /* Index: linux-2.6.11-bk10/include/asm-ppc64/prom.h =================================================================== --- linux-2.6.11-bk10.orig/include/asm-ppc64/prom.h 2005-03-14 21:28:20.000000000 +0000 +++ linux-2.6.11-bk10/include/asm-ppc64/prom.h 2005-03-14 22:15:17.000000000 +0000 @@ -209,8 +209,8 @@ extern struct device_node *of_node_get(s extern void of_node_put(struct device_node *node); /* For updating the device tree at runtime */ -extern int of_add_node(const char *path, struct property *proplist); -extern int of_remove_node(struct device_node *np); +extern void of_attach_node(struct device_node *); +extern void of_detach_node(const struct device_node *); /* Other Prototypes */ extern unsigned long prom_init(unsigned long, unsigned long, unsigned long, Index: linux-2.6.11-bk10/include/asm-ppc64/pSeries_reconfig.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6.11-bk10/include/asm-ppc64/pSeries_reconfig.h 2005-03-14 22:06:42.000000000 +0000 @@ -0,0 +1,25 @@ +#ifndef _PPC64_PSERIES_RECONFIG_H +#define _PPC64_PSERIES_RECONFIG_H + +#include + +/* + * Use this API if your code needs to know about OF device nodes being + * added or removed on pSeries systems. + */ + +#define PSERIES_RECONFIG_ADD 0x0001 +#define PSERIES_RECONFIG_REMOVE 0x0002 + +#ifdef CONFIG_PPC_PSERIES +extern int pSeries_reconfig_notifier_register(struct notifier_block *); +extern void pSeries_reconfig_notifier_unregister(struct notifier_block *); +#else /* !CONFIG_PPC_PSERIES */ +static inline int pSeries_reconfig_notifier_register(struct notifier_block *nb) +{ + return 0; +} +static inline void pSeries_reconfig_notifier_unregister(struct notifier_block *nb) { } +#endif /* CONFIG_PPC_PSERIES */ + +#endif /* _PPC64_PSERIES_RECONFIG_H */ From ntl at pobox.com Tue Mar 15 13:49:43 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:43 -0600 (CST) Subject: [PATCH 4/8] prom.c: use pSeries reconfig notifier In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024943.11665.41759.40955@otto> Use the pSeries_reconfig notifier list to fix up a device node which is about to be added. Signed-off-by: Nathan Lynch arch/ppc64/kernel/prom.c | 40 ++++++++++++++++++++------- 1 files changed, 31 insertions(+), 9 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/prom.c 2005-03-14 22:15:45.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/prom.c 2005-03-14 22:28:19.000000000 +0000 @@ -52,6 +52,7 @@ #include #include #include +#include #ifdef DEBUG #define DBG(fmt...) udbg_printf(fmt) @@ -1635,15 +1636,6 @@ out: */ void of_attach_node(struct device_node *np) { - int err; - - /* This use of finish_node will be moved to a notifier so - * the error code can be used. - */ - err = finish_node(np, NULL, of_finish_dynamic_node, 0, 0, 0); - if (err < 0) - return; - write_lock(&devtree_lock); np->sibling = np->parent->child; np->allnext = allnodes; @@ -1690,6 +1682,36 @@ void of_detach_node(const struct device_ write_unlock(&devtree_lock); } +static int prom_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err; + + switch (action) { + case PSERIES_RECONFIG_ADD: + err = finish_node(node, NULL, of_finish_dynamic_node, 0, 0, 0); + if (err < 0) { + printk(KERN_ERR "finish_node returned %d\n", err); + err = NOTIFY_BAD; + } + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block prom_reconfig_nb = { + .notifier_call prom_reconfig_notifier, + .priority = 10, /* This one needs to run first */ +}; + +static int __init prom_reconfig_setup(void) +{ + return pSeries_reconfig_notifier_register(&prom_reconfig_nb); +} +__initcall(prom_reconfig_setup); + /* * Find a property with a given name for a given node * and return the value. From ntl at pobox.com Tue Mar 15 13:49:49 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:49 -0600 (CST) Subject: [PATCH 5/8] pci_dn.c: use pSeries reconfig notifier In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024949.11665.81330.76171@otto> Use the pSeries_reconfig notifier list to handle newly added pci device nodes. Signed-off-by: Nathan Lynch arch/ppc64/kernel/pci_dn.c | 22 ++++++++++++++++++++++ arch/ppc64/kernel/prom.c | 14 -------------- 2 files changed, 22 insertions(+), 14 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/pci_dn.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/pci_dn.c 2005-03-14 21:28:14.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/pci_dn.c 2005-03-14 22:29:03.000000000 +0000 @@ -27,6 +27,7 @@ #include #include #include +#include #include "pci.h" @@ -161,6 +162,25 @@ struct device_node *fetch_dev_dn(struct } EXPORT_SYMBOL(fetch_dev_dn); +static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + struct device_node *np = node; + int err = NOTIFY_OK; + + switch (action) { + case PSERIES_RECONFIG_ADD: + update_dn_pci_info(np, np->parent->phb); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block pci_dn_reconfig_nb = { + .notifier_call = pci_dn_reconfig_notifier, +}; /* * Actually initialize the phbs. @@ -173,4 +193,6 @@ void __init pci_devs_phb_init(void) /* This must be done first so the device nodes have valid pci info! */ list_for_each_entry_safe(phb, tmp, &hose_list, list_node) pci_devs_phb_init_dynamic(phb); + + pSeries_reconfig_notifier_register(&pci_dn_reconfig_nb); } Index: linux-2.6.11-bk10/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/prom.c 2005-03-14 22:28:19.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/prom.c 2005-03-14 22:29:03.000000000 +0000 @@ -1591,7 +1591,6 @@ static int of_finish_dynamic_node(struct int unused3, int unused4) { struct device_node *parent = of_get_parent(node); - u32 *regs; int err = 0; phandle *ibm_phandle; @@ -1613,19 +1612,6 @@ static int of_finish_dynamic_node(struct if ((ibm_phandle = (unsigned int *)get_property(node, "ibm,phandle", NULL))) node->linux_phandle = *ibm_phandle; - /* now do the rough equivalent of update_dn_pci_info, this - * probably is not correct for phb's, but should work for - * IOAs and slots. - */ - - node->phb = parent->phb; - - regs = (u32 *)get_property(node, "reg", NULL); - if (regs) { - node->busno = (regs[0] >> 16) & 0xff; - node->devfn = (regs[0] >> 8) & 0xff; - } - out: of_node_put(parent); return err; From ntl at pobox.com Tue Mar 15 13:49:54 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:54 -0600 (CST) Subject: [PATCH 6/8] pSeries_iommu.c: use pSeries reconfig notifier In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024954.11665.81666.16106@otto> Use the pSeries_reconfig notifier chain for tearing down the iommu table when a device node is removed. Signed-off-by: Nathan Lynch arch/ppc64/kernel/pSeries_iommu.c | 25 +++++++++++++++ arch/ppc64/kernel/pSeries_reconfig.c | 12 ------- 2 files changed, 25 insertions(+), 12 deletions(-) Index: linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_iommu.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/pSeries_iommu.c 2005-03-13 02:51:53.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_iommu.c 2005-03-14 22:29:30.000000000 +0000 @@ -43,6 +43,7 @@ #include #include #include +#include #include #include "pci.h" @@ -455,6 +456,28 @@ static void iommu_dev_setup_pSeries(stru } } +static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err = NOTIFY_OK; + struct device_node *np = node; + + switch (action) { + case PSERIES_RECONFIG_REMOVE: + if (np->iommu_table && + get_property(np, "ibm,dma-window", NULL)) + iommu_free_table(np); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block iommu_reconfig_nb = { + .notifier_call = iommu_reconfig_notifier, +}; + static void iommu_bus_setup_null(struct pci_bus *b) { } static void iommu_dev_setup_null(struct pci_dev *d) { } @@ -487,6 +510,8 @@ void iommu_init_early_pSeries(void) ppc_md.iommu_dev_setup = iommu_dev_setup_pSeries; + pSeries_reconfig_notifier_register(&iommu_reconfig_nb); + pci_iommu_init(); } Index: linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_reconfig.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-14 22:16:09.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_reconfig.c 2005-03-14 22:29:30.000000000 +0000 @@ -164,16 +164,6 @@ out_err: return err; } -/* - * Prepare an OF node for removal from system - * XXX move this to pSeries_iommu.c - */ -static void of_cleanup_node(struct device_node *np) -{ - if (np->iommu_table && get_property(np, "ibm,dma-window", NULL)) - iommu_free_table(np); -} - static int pSeries_reconfig_remove_node(struct device_node *np) { struct device_node *parent, *child; @@ -187,8 +177,6 @@ static int pSeries_reconfig_remove_node( return -EBUSY; } - of_cleanup_node(np); - remove_node_proc_entries(np); notifier_call_chain(&pSeries_reconfig_chain, From ntl at pobox.com Tue Mar 15 13:49:59 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:49:59 -0600 (CST) Subject: [PATCH 7/8] pSeries_smp.c: use pSeries reconfig notifier for cpu DLPAR In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315024959.11665.79221.22369@otto> Use the pSeries_reconfig notifier API to handle processor addition and removal on pSeries LPAR. This is the "right" way to do it, as opposed to setting cpu_present_map = cpu_possible_map at boot (this is fixed in a following patch). Signed-off-by: Nathan Lynch arch/ppc64/kernel/pSeries_smp.c | 126 ++++++++++++++++++++ 1 files changed, 126 insertions(+) Index: linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-bk10.orig/arch/ppc64/kernel/pSeries_smp.c 2005-03-14 21:28:14.000000000 +0000 +++ linux-2.6.11-bk10/arch/ppc64/kernel/pSeries_smp.c 2005-03-14 22:29:53.000000000 +0000 @@ -44,6 +44,7 @@ #include #include #include +#include #include "mpic.h" @@ -213,6 +214,127 @@ static inline int __devinit smp_startup_ } return 1; } + +/* + * Update cpu_present_map and paca(s) for a new cpu node. The wrinkle + * here is that a cpu device node may represent up to two logical cpus + * in the SMT case. We must honor the assumption in other code that + * the logical ids for sibling SMT threads x and y are adjacent, such + * that x^1 == y and y^1 == x. + */ +static int pSeries_add_processor(struct device_node *np) +{ + unsigned int cpu; + cpumask_t candidate_map, tmp = CPU_MASK_NONE; + int err = -ENOSPC, len, nthreads, i; + u32 *intserv; + + intserv = (u32 *)get_property(np, "ibm,ppc-interrupt-server#s", &len); + if (!intserv) + return 0; + + nthreads = len / sizeof(u32); + for (i = 0; i < nthreads; i++) + cpu_set(i, tmp); + + lock_cpu_hotplug(); + + BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map)); + + /* Get a bitmap of unoccupied slots. */ + cpus_xor(candidate_map, cpu_possible_map, cpu_present_map); + if (cpus_empty(candidate_map)) { + /* If we get here, it most likely means that NR_CPUS is + * less than the partition's max processors setting. + */ + printk(KERN_ERR "Cannot add cpu %s; this system configuration" + " supports %d logical cpus.\n", np->full_name, + cpus_weight(cpu_possible_map)); + goto out_unlock; + } + + while (!cpus_empty(tmp)) + if (cpus_subset(tmp, candidate_map)) + /* Found a range where we can insert the new cpu(s) */ + break; + else + cpus_shift_left(tmp, tmp, nthreads); + + if (cpus_empty(tmp)) { + printk(KERN_ERR "Unable to find space in cpu_present_map for" + " processor %s with %d thread(s)\n", np->name, + nthreads); + goto out_unlock; + } + + for_each_cpu_mask(cpu, tmp) { + BUG_ON(cpu_isset(cpu, cpu_present_map)); + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, *intserv++); + } + err = 0; +out_unlock: + unlock_cpu_hotplug(); + return err; +} + +/* + * Update the present map for a cpu node which is going away, and set + * the hard id in the paca(s) to -1 to be consistent with boot time + * convention for non-present cpus. + */ +static void pSeries_remove_processor(struct device_node *np) +{ + unsigned int cpu; + int len, nthreads, i; + u32 *intserv; + + intserv = (u32 *)get_property(np, "ibm,ppc-interrupt-server#s", &len); + if (!intserv) + return; + + nthreads = len / sizeof(u32); + + lock_cpu_hotplug(); + for (i = 0; i < nthreads; i++) { + for_each_present_cpu(cpu) { + if (get_hard_smp_processor_id(cpu) != intserv[i]) + continue; + BUG_ON(cpu_online(cpu)); + cpu_clear(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, -1); + break; + } + if (cpu == NR_CPUS) + printk(KERN_WARNING "Could not find cpu to remove " + "with physical id 0x%x\n", intserv[i]); + } + unlock_cpu_hotplug(); +} + +static int pSeries_smp_notifier(struct notifier_block *nb, unsigned long action, void *node) +{ + int err = NOTIFY_OK; + + switch (action) { + case PSERIES_RECONFIG_ADD: + if (pSeries_add_processor(node)) + err = NOTIFY_BAD; + break; + case PSERIES_RECONFIG_REMOVE: + pSeries_remove_processor(node); + break; + default: + err = NOTIFY_DONE; + break; + } + return err; +} + +static struct notifier_block pSeries_smp_nb = { + .notifier_call = pSeries_smp_notifier, +}; + #else /* ... CONFIG_HOTPLUG_CPU */ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { @@ -336,6 +458,10 @@ void __init smp_init_pSeries(void) #ifdef CONFIG_HOTPLUG_CPU smp_ops->cpu_disable = pSeries_cpu_disable; smp_ops->cpu_die = pSeries_cpu_die; + + /* Processors can be added/removed only on LPAR */ + if (systemcfg->platform == PLATFORM_PSERIES_LPAR) + pSeries_reconfig_notifier_register(&pSeries_smp_nb); #endif /* Start secondary threads on SMT systems; primary threads From ntl at pobox.com Tue Mar 15 13:50:04 2005 From: ntl at pobox.com (Nathan Lynch) Date: Mon, 14 Mar 2005 20:50:04 -0600 (CST) Subject: [PATCH 8/8] make cpu hotplug play well with maxcpus and smt-enabled In-Reply-To: <20050315024923.11665.85622.82498@otto> References: <20050315024923.11665.85622.82498@otto> Message-ID: <20050315025004.11665.99923.55129@otto> This patch allows you to boot a pSeries system with maxcpus=x or smt-enabled=off (or both) and bring up the offline cpus later from userspace, assuming the kernel was built with CONFIG_HOTPLUG_CPU=y. - Record cpus which were started from OF in a cpu map and use that instead of system_state to decide how to start a cpu in smp_startup_cpu. - Change the smp bootup logic slightly so that the path for bringing up secondary threads is exactly the same as hotplugging a cpu later from userspace. - Add a new function to smp_ops - cpu_bootable. This is implemented only by pSeries to filter out secondary threads during boot with smt-enabled=off. Another way this could be done is to change the kick_cpu member to return int and we can check for this case in smp_pSeries_kick_cpu. - Remove the games we play with cpu_present_map and the hard_smp_processor_id to handle smt-enabled=off, since they're now unnecessary. - Remove find_physical_cpu_to_start; assigning threads to logical slots should be done at bootup and at DLPAR time, not during a cpu online operation. One caveat: you need up-to-date firmware on Power5 for the maxcpus option to work on systems with more than one processor. Otherwise interrupts get misrouted, typically resulting in hangs or "unable to find root filesystem" problems. Tested on Power5 with and without CONFIG_HOTPLUG_CPU and with various combinations of the maxcpus= and smt-enabled= parameters. arch/ppc64/kernel/pSeries_smp.c | 183 +++++++++++++++------------------------- arch/ppc64/kernel/setup.c | 12 -- arch/ppc64/kernel/smp.c | 13 -- include/asm-ppc64/machdep.h | 1 4 files changed, 78 insertions(+), 131 deletions(-) Signed-off-by: Nathan Lynch Index: linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:31:06.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/pSeries_smp.c 2005-03-09 20:32:55.000000000 +0000 @@ -54,8 +54,16 @@ #define DBG(fmt...) #endif +/* + * The primary thread of each non-boot processor is recorded here before + * smp init. + */ +static cpumask_t of_spin_map; + extern void pSeries_secondary_smp_init(unsigned long); +#ifdef CONFIG_HOTPLUG_CPU + /* Get state of physical CPU. * Return codes: * 0 - The processor is in the RTAS stopped state @@ -82,9 +90,6 @@ static int query_cpu_stopped(unsigned in return cpu_status; } - -#ifdef CONFIG_HOTPLUG_CPU - int pSeries_cpu_disable(void) { systemcfg->processorCount--; @@ -123,98 +128,6 @@ void pSeries_cpu_die(unsigned int cpu) paca[cpu].cpu_start = 0; } -/* Search all cpu device nodes for an offline logical cpu. If a - * device node has a "ibm,my-drc-index" property (meaning this is an - * LPAR), paranoid-check whether we own the cpu. For each "thread" - * of a cpu, if it is offline and has the same hw index as before, - * grab that in preference. - */ -static unsigned int find_physical_cpu_to_start(unsigned int old_hwindex) -{ - struct device_node *np = NULL; - unsigned int best = -1U; - - while ((np = of_find_node_by_type(np, "cpu"))) { - int nr_threads, len; - u32 *index = (u32 *)get_property(np, "ibm,my-drc-index", NULL); - u32 *tid = (u32 *) - get_property(np, "ibm,ppc-interrupt-server#s", &len); - - if (!tid) - tid = (u32 *)get_property(np, "reg", &len); - - if (!tid) - continue; - - /* If there is a drc-index, make sure that we own - * the cpu. - */ - if (index) { - int state; - int rc = rtas_get_sensor(9003, *index, &state); - if (rc < 0 || state != 1) - continue; - } - - nr_threads = len / sizeof(u32); - - while (nr_threads--) { - if (0 == query_cpu_stopped(tid[nr_threads])) { - best = tid[nr_threads]; - if (best == old_hwindex) - goto out; - } - } - } -out: - of_node_put(np); - return best; -} - -/** - * smp_startup_cpu() - start the given cpu - * - * At boot time, there is nothing to do. At run-time, call RTAS with - * the appropriate start location, if the cpu is in the RTAS stopped - * state. - * - * Returns: - * 0 - failure - * 1 - success - */ -static inline int __devinit smp_startup_cpu(unsigned int lcpu) -{ - int status; - unsigned long start_here = __pa((u32)*((unsigned long *) - pSeries_secondary_smp_init)); - unsigned int pcpu; - - /* At boot time the cpus are already spinning in hold - * loops, so nothing to do. */ - if (system_state < SYSTEM_RUNNING) - return 1; - - pcpu = find_physical_cpu_to_start(get_hard_smp_processor_id(lcpu)); - if (pcpu == -1U) { - printk(KERN_INFO "No more cpus available, failing\n"); - return 0; - } - - /* Fixup atomic count: it exited inside IRQ handler. */ - paca[lcpu].__current->thread_info->preempt_count = 0; - - /* At boot this is done in prom.c. */ - paca[lcpu].hw_cpu_id = pcpu; - - status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, - pcpu, start_here, lcpu); - if (status != 0) { - printk(KERN_ERR "start-cpu failed: %i\n", status); - return 0; - } - return 1; -} - /* * Update cpu_present_map and paca(s) for a new cpu node. The wrinkle * here is that a cpu device node may represent up to two logical cpus @@ -335,12 +248,43 @@ static struct notifier_block pSeries_smp .notifier_call = pSeries_smp_notifier, }; -#else /* ... CONFIG_HOTPLUG_CPU */ +#endif /* CONFIG_HOTPLUG_CPU */ + +/** + * smp_startup_cpu() - start the given cpu + * + * At boot time, there is nothing to do for primary threads which were + * started from Open Firmware. For anything else, call RTAS with the + * appropriate start location. + * + * Returns: + * 0 - failure + * 1 - success + */ static inline int __devinit smp_startup_cpu(unsigned int lcpu) { + int status; + unsigned long start_here = __pa((u32)*((unsigned long *) + pSeries_secondary_smp_init)); + unsigned int pcpu; + + if (cpu_isset(lcpu, of_spin_map)) + /* Already started by OF and sitting in spin loop */ + return 1; + + pcpu = get_hard_smp_processor_id(lcpu); + + /* Fixup atomic count: it exited inside IRQ handler. */ + paca[lcpu].__current->thread_info->preempt_count = 0; + + status = rtas_call(rtas_token("start-cpu"), 3, 1, NULL, + pcpu, start_here, lcpu); + if (status != 0) { + printk(KERN_ERR "start-cpu failed: %i\n", status); + return 0; + } return 1; } -#endif /* CONFIG_HOTPLUG_CPU */ static inline void smp_xics_do_message(int cpu, int msg) { @@ -380,6 +324,8 @@ static void __devinit smp_xics_setup_cpu if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) vpa_init(cpu); + cpu_clear(cpu, of_spin_map); + /* * Put the calling processor into the GIQ. This is really only * necessary from a secondary thread as the OF start-cpu interface @@ -429,6 +375,20 @@ static void __devinit smp_pSeries_kick_c paca[nr].cpu_start = 1; } +static int smp_pSeries_cpu_bootable(unsigned int nr) +{ + /* Special case - we inhibit secondary thread startup + * during boot if the user requests it. Odd-numbered + * cpus are assumed to be secondary threads. + */ + if (system_state < SYSTEM_RUNNING && + cur_cpu_spec->cpu_features & CPU_FTR_SMT && + !smt_enabled_at_boot && nr % 2 != 0) + return 0; + + return 1; +} + static struct smp_ops_t pSeries_mpic_smp_ops = { .message_pass = smp_mpic_message_pass, .probe = smp_mpic_probe, @@ -441,12 +401,13 @@ static struct smp_ops_t pSeries_xics_smp .probe = smp_xics_probe, .kick_cpu = smp_pSeries_kick_cpu, .setup_cpu = smp_xics_setup_cpu, + .cpu_bootable = smp_pSeries_cpu_bootable, }; /* This is called very early */ void __init smp_init_pSeries(void) { - int ret, i; + int i; DBG(" -> smp_init_pSeries()\n"); @@ -464,20 +425,20 @@ void __init smp_init_pSeries(void) pSeries_reconfig_notifier_register(&pSeries_smp_nb); #endif - /* Start secondary threads on SMT systems; primary threads - * are already in the running state. - */ - for_each_present_cpu(i) { - if (query_cpu_stopped(get_hard_smp_processor_id(i)) == 0) { - printk("%16.16x : starting thread\n", i); - DBG("%16.16x : starting thread\n", i); - rtas_call(rtas_token("start-cpu"), 3, 1, &ret, - get_hard_smp_processor_id(i), - __pa((u32)*((unsigned long *) - pSeries_secondary_smp_init)), - i); + /* Mark threads which are still spinning in hold loops. */ + if (cur_cpu_spec->cpu_features & CPU_FTR_SMT) + for_each_present_cpu(i) { + if (i % 2 == 0) + /* + * Even-numbered logical cpus correspond to + * primary threads. + */ + cpu_set(i, of_spin_map); } - } + else + of_spin_map = cpu_present_map; + + cpu_clear(boot_cpuid, of_spin_map); /* Non-lpar has additional take/give timebase */ if (rtas_token("freeze-time-base") != RTAS_UNKNOWN_SERVICE) { Index: linux-2.6.11-bk5/include/asm-ppc64/machdep.h =================================================================== --- linux-2.6.11-bk5.orig/include/asm-ppc64/machdep.h 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/include/asm-ppc64/machdep.h 2005-03-09 20:32:55.000000000 +0000 @@ -33,6 +33,7 @@ struct smp_ops_t { int (*cpu_enable)(unsigned int nr); int (*cpu_disable)(void); void (*cpu_die)(unsigned int nr); + int (*cpu_bootable)(unsigned int nr); }; #endif Index: linux-2.6.11-bk5/arch/ppc64/kernel/smp.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/smp.c 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/smp.c 2005-03-09 20:32:55.000000000 +0000 @@ -490,9 +490,8 @@ int __devinit __cpu_up(unsigned int cpu) if (!cpu_enable(cpu)) return 0; - /* At boot, don't bother with non-present cpus -JSCHOPP */ - if (system_state < SYSTEM_RUNNING && !cpu_present(cpu)) - return -ENOENT; + if (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)) + return -EINVAL; paca[cpu].default_decr = tb_ticks_per_jiffy / decr_overclock; @@ -606,14 +605,6 @@ void __init smp_cpus_done(unsigned int m smp_ops->setup_cpu(boot_cpuid); set_cpus_allowed(current, old_mask); - - /* - * We know at boot the maximum number of cpus we can add to - * a partition and set cpu_possible_map accordingly. cpu_present_map - * needs to match for the hotplug code to allow us to hot add - * any offline cpus. - */ - cpu_present_map = cpu_possible_map; } #ifdef CONFIG_HOTPLUG_CPU Index: linux-2.6.11-bk5/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-bk5.orig/arch/ppc64/kernel/setup.c 2005-03-09 20:30:34.000000000 +0000 +++ linux-2.6.11-bk5/arch/ppc64/kernel/setup.c 2005-03-09 20:32:55.000000000 +0000 @@ -269,15 +269,9 @@ static void __init setup_cpu_maps(void) nthreads = len / sizeof(u32); for (j = 0; j < nthreads && cpu < NR_CPUS; j++) { - /* - * Only spin up secondary threads if SMT is enabled. - * We must leave space in the logical map for the - * threads. - */ - if (j == 0 || smt_enabled_at_boot) { - cpu_set(cpu, cpu_present_map); - set_hard_smp_processor_id(cpu, intserv[j]); - } + cpu_set(cpu, cpu_present_map); + set_hard_smp_processor_id(cpu, intserv[j]); + if (intserv[j] == boot_cpuid_phys) swap_cpuid = cpu; cpu_set(cpu, cpu_possible_map); From sfr at canb.auug.org.au Tue Mar 15 14:34:12 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 15 Mar 2005 14:34:12 +1100 Subject: [PATCH] PPC64 iSeries: cleanup viopath Message-ID: <20050315143412.0c60690a.sfr@canb.auug.org.au> Hi Andrew, Since you brought this file to my attention, I figured I might as well do some simple cleanups. This patch does: - single bit int bitfields are a bit suspect and Anndrew pointed out recently that they are probably slower to access than ints - get rid of some more stufly caps - define the semaphore and the atomic in struct alloc_parms rather than pointers to them since we just allocate them on the stack anyway. - one small white space cleanup - use the HvLpIndexInvalid constant instead of ita value Built and booted on iSeries (which is the only place it is used). Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus/arch/ppc64/kernel/viopath.c linus-cleanup.1/arch/ppc64/kernel/viopath.c --- linus/arch/ppc64/kernel/viopath.c 2005-03-13 04:07:42.000000000 +1100 +++ linus-cleanup.1/arch/ppc64/kernel/viopath.c 2005-03-15 14:02:48.000000000 +1100 @@ -42,6 +42,7 @@ #include #include +#include #include #include #include @@ -56,8 +57,8 @@ * But this allows for other support in the future. */ static struct viopathStatus { - int isOpen:1; /* Did we open the path? */ - int isActive:1; /* Do we have a mon msg outstanding */ + int isOpen; /* Did we open the path? */ + int isActive; /* Do we have a mon msg outstanding */ int users[VIO_MAX_SUBTYPES]; HvLpInstanceId mSourceInst; HvLpInstanceId mTargetInst; @@ -81,10 +82,10 @@ static void handleMonitorEvent(struct Hv * blocks on the semaphore and the handler posts the semaphore. However, * if system_state is not SYSTEM_RUNNING, then wait_atomic is used ... */ -struct doneAllocParms_t { - struct semaphore *sem; +struct alloc_parms { + struct semaphore sem; int number; - atomic_t *wait_atomic; + atomic_t wait_atomic; int used_wait_atomic; }; @@ -97,9 +98,9 @@ static u8 viomonseq = 22; /* Our hosting logical partition. We get this at startup * time, and different modules access this variable directly. */ -HvLpIndex viopath_hostLp = 0xff; /* HvLpIndexInvalid */ +HvLpIndex viopath_hostLp = HvLpIndexInvalid; EXPORT_SYMBOL(viopath_hostLp); -HvLpIndex viopath_ourLp = 0xff; +HvLpIndex viopath_ourLp = HvLpIndexInvalid; EXPORT_SYMBOL(viopath_ourLp); /* For each kind of incoming event we set a pointer to a @@ -200,7 +201,7 @@ EXPORT_SYMBOL(viopath_isactive); /* * We cache the source and target instance ids for each - * partition. + * partition. */ HvLpInstanceId viopath_sourceinst(HvLpIndex lp) { @@ -450,36 +451,33 @@ static void vio_handleEvent(struct HvLpE static void viopath_donealloc(void *parm, int number) { - struct doneAllocParms_t *parmsp = (struct doneAllocParms_t *)parm; + struct alloc_parms *parmsp = parm; parmsp->number = number; if (parmsp->used_wait_atomic) - atomic_set(parmsp->wait_atomic, 0); + atomic_set(&parmsp->wait_atomic, 0); else - up(parmsp->sem); + up(&parmsp->sem); } static int allocateEvents(HvLpIndex remoteLp, int numEvents) { - struct doneAllocParms_t parms; - DECLARE_MUTEX_LOCKED(Semaphore); - atomic_t wait_atomic; + struct alloc_parms parms; if (system_state != SYSTEM_RUNNING) { parms.used_wait_atomic = 1; - atomic_set(&wait_atomic, 1); - parms.wait_atomic = &wait_atomic; + atomic_set(&parms.wait_atomic, 1); } else { parms.used_wait_atomic = 0; - parms.sem = &Semaphore; + init_MUTEX_LOCKED(&parms.sem); } mf_allocate_lp_events(remoteLp, HvLpEvent_Type_VirtualIo, 250, /* It would be nice to put a real number here! */ numEvents, &viopath_donealloc, &parms); if (system_state != SYSTEM_RUNNING) { - while (atomic_read(&wait_atomic)) + while (atomic_read(&parms.wait_atomic)) mb(); } else - down(&Semaphore); + down(&parms.sem); return parms.number; } @@ -558,8 +556,7 @@ int viopath_close(HvLpIndex remoteLp, in unsigned long flags; int i; int numOpen; - struct doneAllocParms_t doneAllocParms; - DECLARE_MUTEX_LOCKED(Semaphore); + struct alloc_parms parms; if ((remoteLp >= HvMaxArchitectedLps) || (remoteLp == HvLpIndexInvalid)) return -EINVAL; @@ -580,11 +577,11 @@ int viopath_close(HvLpIndex remoteLp, in spin_unlock_irqrestore(&statuslock, flags); - doneAllocParms.used_wait_atomic = 0; - doneAllocParms.sem = &Semaphore; + parms.used_wait_atomic = 0; + init_MUTEX_LOCKED(&parms.sem); mf_deallocate_lp_events(remoteLp, HvLpEvent_Type_VirtualIo, - numReq, &viopath_donealloc, &doneAllocParms); - down(&Semaphore); + numReq, &viopath_donealloc, &parms); + down(&parms.sem); spin_lock_irqsave(&statuslock, flags); for (i = 0, numOpen = 0; i < VIO_MAX_SUBTYPES; i++) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050315/b924afe2/attachment.pgp From sfr at canb.auug.org.au Tue Mar 15 15:34:46 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 15 Mar 2005 15:34:46 +1100 Subject: [PATCH] PPC64 iSeries: cleanup iSeries_setup Message-ID: <20050315153446.4404919f.sfr@canb.auug.org.au> Hi Andrew, This patch does some trivial cleanups on iSeries_setup.[ch]: - eliminiate warning about iommu_init_early_iSeries not being declared - remove trailing whitespace - change some functions to static - remove defunct function declarations Built and booted on iSeries. Signed-off-by: Stephen Rothwell -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ diff -ruNp linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.c linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.c --- linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.c 2005-03-06 07:08:24.000000000 +1100 +++ linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.c 2005-03-15 15:23:35.000000000 +1100 @@ -15,7 +15,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. */ - + #undef DEBUG #include @@ -39,6 +39,7 @@ #include #include #include +#include #include #include "iSeries_setup.h" @@ -57,6 +58,7 @@ #include #include #include +#include extern void hvlog(char *fmt, ...); @@ -72,7 +74,6 @@ extern void ppcdbg_initialize(void); static void build_iSeries_Memory_Map(void); static void setup_iSeries_cache_sizes(void); static void iSeries_bolt_kernel(unsigned long saddr, unsigned long eaddr); -extern void iSeries_setup_arch(void); extern void iSeries_pci_final_fixup(void); /* Global Variables */ @@ -108,8 +109,8 @@ struct MemoryBlock { * and return the number of physical blocks and fill in the array of * block data. */ -unsigned long iSeries_process_Condor_mainstore_vpd(struct MemoryBlock *mb_array, - unsigned long max_entries) +static unsigned long iSeries_process_Condor_mainstore_vpd( + struct MemoryBlock *mb_array, unsigned long max_entries) { unsigned long holeFirstChunk, holeSizeChunks; unsigned long numMemoryBlocks = 1; @@ -154,7 +155,7 @@ unsigned long iSeries_process_Condor_mai #define MaxSegmentAdrRangeBlocks 128 #define MaxAreaRangeBlocks 4 -unsigned long iSeries_process_Regatta_mainstore_vpd( +static unsigned long iSeries_process_Regatta_mainstore_vpd( struct MemoryBlock *mb_array, unsigned long max_entries) { struct IoHriMainStoreSegment5 *msVpdP = @@ -246,7 +247,7 @@ unsigned long iSeries_process_Regatta_ma printk(" Bitmap range: %016lx - %016lx\n" " Absolute range: %016lx - %016lx\n", mb_array[i].logicalStart, - mb_array[i].logicalEnd, + mb_array[i].logicalEnd, mb_array[i].absStart, mb_array[i].absEnd); mb_array[i].absStart = addr_to_chunk(mb_array[i].absStart & 0x000fffffffffffff); @@ -261,7 +262,7 @@ unsigned long iSeries_process_Regatta_ma return numSegmentBlocks; } -unsigned long iSeries_process_mainstore_vpd(struct MemoryBlock *mb_array, +static unsigned long iSeries_process_mainstore_vpd(struct MemoryBlock *mb_array, unsigned long max_entries) { unsigned long i; @@ -302,7 +303,7 @@ static void __init iSeries_parse_cmdline *p = 0; } -/*static*/ void __init iSeries_init_early(void) +static void __init iSeries_init_early(void) { DBG(" -> iSeries_init_early()\n"); @@ -355,7 +356,7 @@ static void __init iSeries_parse_cmdline #ifdef CONFIG_SMP smp_init_iSeries(); #endif - if (itLpNaca.xPirEnvironMode == 0) + if (itLpNaca.xPirEnvironMode == 0) piranha_simulator = 1; /* Associate Lp Event Queue 0 with processor 0 */ @@ -385,21 +386,21 @@ static void __init iSeries_parse_cmdline /* * The iSeries may have very large memories ( > 128 GB ) and a partition * may get memory in "chunks" that may be anywhere in the 2**52 real - * address space. The chunks are 256K in size. To map this to the - * memory model Linux expects, the AS/400 specific code builds a + * address space. The chunks are 256K in size. To map this to the + * memory model Linux expects, the AS/400 specific code builds a * translation table to translate what Linux thinks are "physical" - * addresses to the actual real addresses. This allows us to make + * addresses to the actual real addresses. This allows us to make * it appear to Linux that we have contiguous memory starting at * physical address zero while in fact this could be far from the truth. - * To avoid confusion, I'll let the words physical and/or real address - * apply to the Linux addresses while I'll use "absolute address" to + * To avoid confusion, I'll let the words physical and/or real address + * apply to the Linux addresses while I'll use "absolute address" to * refer to the actual hardware real address. * - * build_iSeries_Memory_Map gets information from the Hypervisor and + * build_iSeries_Memory_Map gets information from the Hypervisor and * looks at the Main Store VPD to determine the absolute addresses * of the memory that has been assigned to our partition and builds * a table used to translate Linux's physical addresses to these - * absolute addresses. Absolute addresses are needed when + * absolute addresses. Absolute addresses are needed when * communicating with the hypervisor (e.g. to build HPT entries) */ @@ -428,13 +429,13 @@ static void __init build_iSeries_Memory_ * otherwise, it might not be returned by PLIC as the first * chunks */ - + loadAreaFirstChunk = (u32)addr_to_chunk(itLpNaca.xLoadAreaAddr); loadAreaSize = itLpNaca.xLoadAreaChunks; /* - * Only add the pages already mapped here. - * Otherwise we might add the hpt pages + * Only add the pages already mapped here. + * Otherwise we might add the hpt pages * The rest of the pages of the load area * aren't in the HPT yet and can still * be assigned an arbitrary physical address @@ -446,7 +447,7 @@ static void __init build_iSeries_Memory_ /* * TODO Do we need to do something if the HPT is in the 64MB load area? - * This would be required if the itLpNaca.xLoadAreaChunks includes + * This would be required if the itLpNaca.xLoadAreaChunks includes * the HPT size */ @@ -454,11 +455,11 @@ static void __init build_iSeries_Memory_ " absolute addr = %016lx\n", chunk_to_addr(loadAreaFirstChunk)); printk("Load area size %dK\n", loadAreaSize * 256); - + for (nextPhysChunk = 0; nextPhysChunk < loadAreaSize; ++nextPhysChunk) msChunks.abs[nextPhysChunk] = loadAreaFirstChunk + nextPhysChunk; - + /* * Get absolute address of our HPT and remember it so * we won't map it to any physical address @@ -475,7 +476,7 @@ static void __init build_iSeries_Memory_ num_ptegs = hptSizePages * (PAGE_SIZE / (sizeof(HPTE) * HPTES_PER_GROUP)); htab_hash_mask = num_ptegs - 1; - + /* * The actual hashed page table is in the hypervisor, * we have no direct access @@ -533,9 +534,9 @@ static void __init build_iSeries_Memory_ } /* - * main store size (in chunks) is + * main store size (in chunks) is * totalChunks - hptSizeChunks - * which should be equal to + * which should be equal to * nextPhysChunk */ systemcfg->physicalMemorySize = chunk_to_addr(nextPhysChunk); @@ -650,7 +651,7 @@ extern unsigned long ppc_tb_freq; /* * Document me. */ -void __init iSeries_setup_arch(void) +static void __init iSeries_setup_arch(void) { void *eventStack; unsigned procIx = get_paca()->lppaca.dyn_hv_phys_proc_index; @@ -669,14 +670,14 @@ void __init iSeries_setup_arch(void) */ eventStack = alloc_bootmem_pages(LpEventStackSize); memset(eventStack, 0, LpEventStackSize); - + /* Invoke the hypervisor to initialize the event stack */ HvCallEvent_setLpEventStack(0, eventStack, LpEventStackSize); /* Initialize fields in our Lp Event Queue */ xItLpQueue.xSlicEventStackPtr = (char *)eventStack; xItLpQueue.xSlicCurEventPtr = (char *)eventStack; - xItLpQueue.xSlicLastValidEventPtr = (char *)eventStack + + xItLpQueue.xSlicLastValidEventPtr = (char *)eventStack + (LpEventStackSize - LpEventMaxSize); xItLpQueue.xIndex = 0; @@ -694,7 +695,7 @@ void __init iSeries_setup_arch(void) tbFreqMhzHundreths = (tbFreqHz / 10000) - (tbFreqMhz * 100); ppc_tb_freq = tbFreqHz; - printk("Max logical processors = %d\n", + printk("Max logical processors = %d\n", itVpdAreas.xSlicMaxLogicalProcs); printk("Max physical processors = %d\n", itVpdAreas.xSlicMaxPhysicalProcs); @@ -706,7 +707,7 @@ void __init iSeries_setup_arch(void) printk("Processor version = %x\n", systemcfg->processor); } -void iSeries_get_cpuinfo(struct seq_file *m) +static void iSeries_get_cpuinfo(struct seq_file *m) { seq_printf(m, "machine\t\t: 64-bit iSeries Logical Partition\n"); } @@ -715,7 +716,7 @@ void iSeries_get_cpuinfo(struct seq_file * Document me. * and Implement me. */ -int iSeries_get_irq(struct pt_regs *regs) +static int iSeries_get_irq(struct pt_regs *regs) { /* -2 means ignore this interrupt */ return -2; @@ -724,7 +725,7 @@ int iSeries_get_irq(struct pt_regs *regs /* * Document me. */ -void iSeries_restart(char *cmd) +static void iSeries_restart(char *cmd) { mf_reboot(); } @@ -732,7 +733,7 @@ void iSeries_restart(char *cmd) /* * Document me. */ -void iSeries_power_off(void) +static void iSeries_power_off(void) { mf_power_off(); } @@ -740,14 +741,11 @@ void iSeries_power_off(void) /* * Document me. */ -void iSeries_halt(void) +static void iSeries_halt(void) { mf_power_off(); } -/* JDH Hack */ -unsigned long jdh_time = 0; - extern void setup_default_decr(void); /* @@ -758,17 +756,17 @@ extern void setup_default_decr(void); * and sets up the kernel timer decrementer based on that value. * */ -void __init iSeries_calibrate_decr(void) +static void __init iSeries_calibrate_decr(void) { unsigned long cyclesPerUsec; struct div_result divres; - + /* Compute decrementer (and TB) frequency in cycles/sec */ cyclesPerUsec = ppc_tb_freq / 1000000; /* * Set the amount to refresh the decrementer by. This - * is the number of decrementer ticks it takes for + * is the number of decrementer ticks it takes for * 1/HZ seconds. */ tb_ticks_per_jiffy = ppc_tb_freq / HZ; @@ -793,7 +791,7 @@ void __init iSeries_calibrate_decr(void) setup_default_decr(); } -void __init iSeries_progress(char * st, unsigned short code) +static void __init iSeries_progress(char * st, unsigned short code) { printk("Progress: [%04x] - %s\n", (unsigned)code, st); if (!piranha_simulator && mf_initialized) { @@ -825,7 +823,7 @@ static void __init iSeries_fixup_klimit( } } -int __init iSeries_src_init(void) +static int __init iSeries_src_init(void) { /* clear the progress line */ ppc_md.progress(" ", 0xffff); diff -ruNp linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.h linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.h --- linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.h 2004-09-24 15:23:06.000000000 +1000 +++ linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.h 2005-03-15 15:22:05.000000000 +1100 @@ -19,19 +19,8 @@ #ifndef __ISERIES_SETUP_H__ #define __ISERIES_SETUP_H__ -extern void iSeries_setup_arch(void); -extern void iSeries_setup_residual(struct seq_file *m, int cpu_id); -extern void iSeries_get_cpuinfo(struct seq_file *m); -extern void iSeries_init_IRQ(void); -extern int iSeries_get_irq(struct pt_regs *regs); -extern void iSeries_restart(char *cmd); -extern void iSeries_power_off(void); -extern void iSeries_halt(void); -extern void iSeries_time_init(void); extern void iSeries_get_boot_time(struct rtc_time *tm); extern int iSeries_set_rtc_time(struct rtc_time *tm); extern void iSeries_get_rtc_time(struct rtc_time *tm); -extern void iSeries_calibrate_decr(void); -extern void iSeries_progress( char *, unsigned short ); #endif /* __ISERIES_SETUP_H__ */ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050315/9622fe83/attachment.pgp From benh at kernel.crashing.org Tue Mar 15 16:32:20 2005 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Tue, 15 Mar 2005 16:32:20 +1100 Subject: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCI Error Recovery) In-Reply-To: <20050314181420.GD498@austin.ibm.com> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <4235847F.3080705@jp.fujitsu.com> <20050314181420.GD498@austin.ibm.com> Message-ID: <1110864741.29124.68.camel@gaston> > Is there a long-term philosphy for the Linux kernel on a question like > this? That is, when should changes add callbacks to structures, > as opposed to notifier-chain based events? The callback is a bit > simpler, and maybe a tiny bit faster, but its less flexible in the > long run (e.g. anyone can listen for the events, but only device > drivers can get callbacks). Comments, please? Ok, let's propose what i think is a proper API and simple enough on the driver side, if complexity there is, it's in the platform policy. That should cover all the needs we discussed so far: I think we need a callback in pci_driver, as I explained all along, with a very simple semantic: int (*error_handler)(struct pci_dev *dev, int message); At first, message will be : 1) PCIERR_ERROR_DETECTED Error detected. This is sent once after an error has been detected. At this point, the device might not be accessible anymore depending on the platform (the slot will be isolated on ppc64). The driver may already have "noticed" the error because of a failing IO, but this is the proper "synchronisation point", that is, it gives a chance to the driver to cleanup, waiting for pending stuffs (timers, whatever, etc...) to complete, it can take semaphores, schedule, etc... everything but touch the device. Within this function and after it returns, the driver shouldn't do any new IOs. Called in task context. This is sort of a "quiesce" point. See note about interrupts at the end of this doc. Result codes: - PCIERR_RESULT_CAN_RECOVER: Return this if you think you might be able to recover the HW by just banging IOs or if you want to be given a chance to extract some diagnostic informations (see below). - PCIERR_RESULT_NEED_RESET: Return this if you think you can't recover unless the slot is reset. - PCIERR_RESULT_DISCONNECT: Return this if you think you won't recover at all, (this will detach the driver ? or just leave it dangling ? to be decided) So at this point, we have called PCIERR_ERROR_DETECTED for all drivers on the segment that had the error. On ppc64, the slot is isolated. What happens now typically depends on the result from the drivers. If all drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would re-enable IOs on the slot (or do nothing special if the platform doesn't isolate slots) and call 2). If not and we can reset slots, we go to 4), if neither, we have a dead slot. If it's an hotplug slot, we might "simulate" reset by triggering HW unplug/replug tho. 2) PCIERR_ERROR_RECOVER This is the "early recovery" call. IOs are allowed again, but DMA is not (hrm... to be discussed, I prefer not), with some restrictions. This is NOT a callback for the driver to start operations again, only to peek/poke at the device, extract diagnostic informations if any, and eventually do things like trigger a device local reset or such things, but not restart operations. This is sent if all drivers on a segment agree that they can try to recover. If the platform can't just re-enable IOs without a slot reset, it doesn't call this callback and goes directly to 4). All IOs should be done _synchronously_ from withing this callback, errors triggered by them will be returned via the normal pci_check_whatever() api, no new PCIERR_ERROR_DETECTED callback will be issued due to an error happening here, though such an error might cause IOs to be re-blocked for the whole segment (and thus invalidating the recovery of other devices on the same segment). Result codes: - PCIERR_RESULT_RECOVERED Return this if you think your device is fully functionnal and think you are ready to start to do your normal driver job again. There is no guarantee that because you returned that, you'll be allowed to actually proceed as another driver on the same segment might have failed and thus triggered a slot reset on platforms that support it. - PCIERR_RESULT_NEED_RESET Return this if you think your device is not recoverable in it's current state and you need a slot reset to proceed. - PCIERR_RESULT_DISCONNECT Same as above. Total failure, no recovery even after reset driver dead. (To be defined more precisely) 3) PCIERR_ERROR_RESTART This is called if all drivers on the segment have returned PCIERR_RESULT_RECOVERED from the prevous callback. That basically tells the driver to restart activity, everything is back & running. No result code is taken into account here. If a new error happens, it will restart a new error handling process. 4) PCIERR_ERROR_RESET This is called after the slot has been reset (and PCI BARs re-configured by the platform). As for PCIERR_ERROR_RESTART, drivers here are just supposed to re-init the hardware and restart operations. However, a driver can still return a critical failure from here in case it just can't get it's device back from reset. There is just nothing we can do about it tho. Result codes: - PCIERR_RESULT_DISCONNECT Same as above. That's it. I think this covers all the possibilities. The way those callbacks are called is platform policy. A platform with no slot reset capability for example may want to just "ignore" drivers that can't recover (disconnect them) and try to let other cards on the same segment recover. Keep in mind that in most real life cases, though, there will be only one driver per segment. Now, there is a note about interrupts. If you get an interrupt and your device is dead or has been isolated, there is a problem :) After much thinking, I decided to leave that to the platform. That is, the recovery API only precies that: - There is no guarantee that interrupt delivery can proceed from any device on the segment starting from the error detection and until the restart callback is sent, at which point interrupts are expected to be fully operational. - There is no guarantee that interrupt delivery is stopped, that is, ad river that gets an interrupts after detecting an error, or that detects and error within the interrupt handler such that it prevents proper ack'ing of the interrupt (and thus removal of the source) should just return IRQ_NOTHANDLED. It's up to the platform to deal with taht condition, typically by masking the irq source during the duration of the error handling. It is expected that the platform "knows" which interrupts are routed to error-management capable slots and can deal with temporarily disabling that irq number during error processing (this isn't terribly complex). That means some IRQ latency for other devices sharing the interrupt, but there is simply no other way. High end platforms aren't supposed to share interrupts between many devices anyway :) Comments welcome. Linas, I'll give a try at coding something up in the upcoming days unless you beat me to it. Ben. From hollis at penguinppc.org Wed Mar 16 01:32:27 2005 From: hollis at penguinppc.org (Hollis Blanchard) Date: Tue, 15 Mar 2005 08:32:27 -0600 Subject: [PATCH] PPC64 iSeries: cleanup viopath In-Reply-To: <20050315143412.0c60690a.sfr@canb.auug.org.au> References: <20050315143412.0c60690a.sfr@canb.auug.org.au> Message-ID: <0961a209ce72bb9f2a01b163aa6e6fbd@penguinppc.org> On Mar 14, 2005, at 9:34 PM, Stephen Rothwell wrote: > > Since you brought this file to my attention, I figured I might as well > do > some simple cleanups. This patch does: > - single bit int bitfields are a bit suspect and Anndrew pointed > out recently that they are probably slower to access than ints > --- linus/arch/ppc64/kernel/viopath.c 2005-03-13 04:07:42.000000000 > +1100 > +++ linus-cleanup.1/arch/ppc64/kernel/viopath.c 2005-03-15 > 14:02:48.000000000 +1100 > @@ -56,8 +57,8 @@ > * But this allows for other support in the future. > */ > static struct viopathStatus { > - int isOpen:1; /* Did we open the path? */ > - int isActive:1; /* Do we have a mon msg outstanding */ > + int isOpen; /* Did we open the path? */ > + int isActive; /* Do we have a mon msg outstanding */ > int users[VIO_MAX_SUBTYPES]; > HvLpInstanceId mSourceInst; > HvLpInstanceId mTargetInst; Why not use a byte instead of a full int (reordering the members for alignment)? -- Hollis Blanchard IBM Linux Technology Center From olh at suse.de Wed Mar 16 02:12:59 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 15 Mar 2005 16:12:59 +0100 Subject: [PATCH] enable DEBUG via config option In-Reply-To: <16948.46307.76370.206088@cargo.ozlabs.ibm.com> References: <20050211105453.GA31718@suse.de> <16948.46307.76370.206088@cargo.ozlabs.ibm.com> Message-ID: <20050315151259.GC22412@suse.de> On Mon, Mar 14, Paul Mackeras wrote: > Olaf Hering writes: > > > Its always boring to edit each file and turn the #undef DEBUG into > > #define DEBUG. This patch makes it a simple config option. > > Now the question is, how verbose will the boot be when all the printk > > are enabled? appears to be ok so far on a p620. > > Having it as a config option seems to be of use only to a few kernel > developers. Why don't you just edit the Makefile and add -DDEBUG to > the CFLAGS when you want to do that? This series of patches changes all DEBUG_FOO to DEBUG (except NUMA_DEBUG) and removes all the remaining #define DEBUG or #undef DEBUG compile-tested on all 5 configs in arch/ppc64, with and without -DDEBUG ppc64-undef-debug-LPARCFG_DEBUG.patch ppc64-undef-debug-module-DEBUGP.patch ppc64-undef-debug-nvram-DEBUG_NVRAM.patch ppc64-undef-debug-pmac_feature-DEBUG_FEATURE.patch ppc64-undef-debug-prom_init-DEBUG_PROM.patch ppc64-undef-debug-rtasd-DEBUG.patch ppc64-undef-debug-scanlog-DEBUG.patch ppc64-undef-debug-signal-DEBUG_SIG.patch ppc64-undef-debug-time-DEBUG_PPC_ADJTIMEX.patch ppc64-undef-debug-vdso-__DEBUG.patch ppc64-undef-debug.patch -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/lparcfg.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/lparcfg.c +++ linux-2.6.11-olh/arch/ppc64/kernel/lparcfg.c @@ -38,8 +38,6 @@ #define MODULE_VERS "1.6" #define MODULE_NAME "lparcfg" -/* #define LPARCFG_DEBUG */ - /* find a better place for this function... */ void log_plpar_hcall_return(unsigned long rc, char *tag) { @@ -274,7 +272,7 @@ static void parse_system_parameter_strin __FILE__, __FUNCTION__, __LINE__); return; } -#ifdef LPARCFG_DEBUG +#ifdef DEBUG printk(KERN_INFO "success calling get-system-parameter \n"); #endif splpar_strlen = local_buffer[0] * 16 + local_buffer[1]; @@ -328,7 +326,7 @@ static int lparcfg_count_active_processo int count = 0; while ((cpus_dn = of_find_node_by_type(cpus_dn, "cpu"))) { -#ifdef LPARCFG_DEBUG +#ifdef DEBUG printk(KERN_ERR "cpus_dn %p \n", cpus_dn); #endif count++; -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/module.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/module.c +++ linux-2.6.11-olh/arch/ppc64/kernel/module.c @@ -30,10 +30,10 @@ Using a magic allocator which places modules within 32MB solves this, and makes other things simpler. Anton? --RR. */ -#if 0 -#define DEBUGP printk +#ifdef DEBUG +#define DBG printk #else -#define DEBUGP(fmt , ...) +#define DBG(fmt , ...) #endif /* There's actually a third entry here, but it's unused */ @@ -124,8 +124,8 @@ static unsigned long get_stubs_size(cons /* Every relocated section... */ for (i = 1; i < hdr->e_shnum; i++) { if (sechdrs[i].sh_type == SHT_RELA) { - DEBUGP("Found relocations in section %u\n", i); - DEBUGP("Ptr: %p. Number: %lu\n", + DBG("Found relocations in section %u\n", i); + DBG("Ptr: %p. Number: %lu\n", (void *)sechdrs[i].sh_addr, sechdrs[i].sh_size / sizeof(Elf64_Rela)); relocs += count_relocs((void *)sechdrs[i].sh_addr, @@ -134,7 +134,7 @@ static unsigned long get_stubs_size(cons } } - DEBUGP("Looks like a total of %lu stubs, max\n", relocs); + DBG("Looks like a total of %lu stubs, max\n", relocs); return relocs * sizeof(struct ppc64_stub_entry); } @@ -246,7 +246,7 @@ static inline int create_stub(Elf64_Shdr me->name, (void *)reladdr, (void *)my_r2); return 0; } - DEBUGP("Stub %p get data from reladdr %li\n", entry, reladdr); + DBG("Stub %p get data from reladdr %li\n", entry, reladdr); *loc1 = PPC_HA(reladdr); *loc2 = PPC_LO(reladdr); @@ -307,7 +307,7 @@ int apply_relocate_add(Elf64_Shdr *sechd unsigned long *location; unsigned long value; - DEBUGP("Applying ADD relocate section %u to %u\n", relsec, + DBG("Applying ADD relocate section %u to %u\n", relsec, sechdrs[relsec].sh_info); for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) { /* This is where to make the change */ @@ -317,7 +317,7 @@ int apply_relocate_add(Elf64_Shdr *sechd sym = (Elf64_Sym *)sechdrs[symindex].sh_addr + ELF64_R_SYM(rela[i].r_info); - DEBUGP("RELOC at %p: %li-type as %s (%lu) + %li\n", + DBG("RELOC at %p: %li-type as %s (%lu) + %li\n", location, (long)ELF64_R_TYPE(rela[i].r_info), strtab + sym->st_name, (unsigned long)sym->st_value, (long)rela[i].r_addend); -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/nvram.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/nvram.c +++ linux-2.6.11-olh/arch/ppc64/kernel/nvram.c @@ -33,7 +33,7 @@ #include #include -#undef DEBUG_NVRAM +#undef DEBUG static int nvram_scan_partitions(void); static int nvram_setup_partition(void); @@ -200,7 +200,7 @@ static struct miscdevice nvram_dev = { }; -#ifdef DEBUG_NVRAM +#ifdef DEBUG static void nvram_print_partitions(char * label) { struct list_head * p; @@ -591,7 +591,7 @@ static int __init nvram_init(void) printk(KERN_WARNING "nvram_init: Could not find nvram partition" " for nvram buffered error logging.\n"); -#ifdef DEBUG_NVRAM +#ifdef DEBUG nvram_print_partitions("NVRAM Partitions"); #endif -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_feature.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_feature.c @@ -41,9 +41,9 @@ #include #include -#undef DEBUG_FEATURE +#undef DEBUG -#ifdef DEBUG_FEATURE +#ifdef DEBUG #define DBG(fmt...) printk(KERN_DEBUG fmt) #else #define DBG(fmt...) -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/prom_init.c +++ linux-2.6.11-olh/arch/ppc64/kernel/prom_init.c @@ -15,7 +15,7 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG_PROM +#undef DEBUG #include #include @@ -106,7 +106,7 @@ extern const struct linux_logo logo_linu __asm__ __volatile__(".long " BUG_ILLEGAL_INSTR); \ } while (0) -#ifdef DEBUG_PROM +#ifdef DEBUG #define prom_debug(x...) prom_printf(x) #else #define prom_debug(x...) @@ -642,11 +642,11 @@ static void __init prom_init_mem(void) p = RELOC(regbuf); endp = p + (plen / sizeof(cell_t)); -#ifdef DEBUG_PROM +#ifdef DEBUG memset(path, 0, PROM_SCRATCH_SIZE); call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1); prom_debug(" node %s :\n", path); -#endif /* DEBUG_PROM */ +#endif /* DEBUG */ while ((endp - p) >= (_prom->root_addr_cells + _prom->root_size_cells)) { unsigned long base, size; @@ -1514,7 +1514,7 @@ static void __init flatten_device_tree(v reserve_mem(RELOC(dt_header_start), hdr->totalsize); memcpy(rsvmap, RELOC(mem_reserve_map), sizeof(mem_reserve_map)); -#ifdef DEBUG_PROM +#ifdef DEBUG { int i; prom_printf("reserved memory map:\n"); -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/rtasd.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/rtasd.c +++ linux-2.6.11-olh/arch/ppc64/kernel/rtasd.c @@ -28,10 +28,10 @@ #include #include -#if 0 -#define DEBUG(A...) printk(KERN_ERR A) +#ifdef DEBUG +#define DBG(A...) printk(KERN_ERR A) #else -#define DEBUG(A...) +#define DBG(A...) #endif static DEFINE_SPINLOCK(rtasd_log_lock); @@ -194,7 +194,7 @@ void pSeries_log_error(char *buf, unsign unsigned long s; int len = 0; - DEBUG("logging event\n"); + DBG("logging event\n"); if (buf == NULL) return; @@ -369,7 +369,7 @@ static int get_eventscan_parms(void) return -1; } rtas_event_scan_rate = *ip; - DEBUG("rtas-event-scan-rate %d\n", rtas_event_scan_rate); + DBG("rtas-event-scan-rate %d\n", rtas_event_scan_rate); /* Make room for the sequence number */ rtas_error_log_max = rtas_get_error_log_max(); @@ -419,7 +419,7 @@ static int rtasd(void *unused) printk(KERN_ERR "RTAS daemon started\n"); - DEBUG("will sleep for %d jiffies\n", (HZ*60/rtas_event_scan_rate) / 2); + DBG("will sleep for %d jiffies\n", (HZ*60/rtas_event_scan_rate) / 2); /* See if we have any error stored in NVRAM */ memset(logdata, 0, rtas_error_log_max); @@ -438,9 +438,9 @@ static int rtasd(void *unused) /* First pass. */ lock_cpu_hotplug(); for_each_online_cpu(cpu) { - DEBUG("scheduling on %d\n", cpu); + DBG("scheduling on %d\n", cpu); set_cpus_allowed(current, cpumask_of_cpu(cpu)); - DEBUG("watchdog scheduled on cpu %d\n", smp_processor_id()); + DBG("watchdog scheduled on cpu %d\n", smp_processor_id()); do_event_scan(event_scan); set_current_state(TASK_INTERRUPTIBLE); @@ -449,9 +449,9 @@ static int rtasd(void *unused) unlock_cpu_hotplug(); if (surveillance_timeout != -1) { - DEBUG("enabling surveillance\n"); + DBG("enabling surveillance\n"); enable_surveillance(surveillance_timeout); - DEBUG("surveillance enabled\n"); + DBG("surveillance enabled\n"); } lock_cpu_hotplug(); -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/scanlog.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/scanlog.c +++ linux-2.6.11-olh/arch/ppc64/kernel/scanlog.c @@ -37,7 +37,7 @@ #define SCANLOG_HWERROR -1 #define SCANLOG_CONTINUE 1 -#define DEBUG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) +#define DBG(A...) do { if (scanlog_debug) printk(KERN_ERR "scanlog: " A); } while (0) static int scanlog_debug; static unsigned int ibm_scan_log_dump; /* RTAS token */ @@ -85,14 +85,14 @@ static ssize_t scanlog_read(struct file memcpy(data, rtas_data_buf, RTAS_DATA_BUF_SIZE); spin_unlock(&rtas_data_buf_lock); - DEBUG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", + DBG("status=%d, data[0]=%x, data[1]=%x, data[2]=%x\n", status, data[0], data[1], data[2]); switch (status) { case SCANLOG_COMPLETE: - DEBUG("hit eof\n"); + DBG("hit eof\n"); return 0; case SCANLOG_HWERROR: - DEBUG("hardware error reading scan log data\n"); + DBG("hardware error reading scan log data\n"); return -EIO; case SCANLOG_CONTINUE: /* We may or may not have data yet */ @@ -143,9 +143,9 @@ static ssize_t scanlog_write(struct file if (buf) { if (strncmp(stkbuf, "reset", 5) == 0) { - DEBUG("reset scanlog\n"); + DBG("reset scanlog\n"); status = rtas_call(ibm_scan_log_dump, 2, 1, NULL, 0, 0); - DEBUG("rtas returns %d\n", status); + DBG("rtas returns %d\n", status); } else if (strncmp(stkbuf, "debugon", 7) == 0) { printk(KERN_ERR "scanlog: debug on\n"); scanlog_debug = 1; -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/signal.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/signal.c +++ linux-2.6.11-olh/arch/ppc64/kernel/signal.c @@ -38,7 +38,7 @@ #include #include -#define DEBUG_SIG 0 +#define DEBUG 0 #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) @@ -383,7 +383,7 @@ int sys_rt_sigreturn(unsigned long r3, u return regs->result; badframe: -#if DEBUG_SIG +#if DEBUG printk("badframe in sys_rt_sigreturn, regs=%p uc=%p &uc->uc_mcontext=%p\n", regs, uc, &uc->uc_mcontext); #endif @@ -465,7 +465,7 @@ static int setup_rt_frame(int signr, str return 1; badframe: -#if DEBUG_SIG +#if DEBUG printk("badframe in setup_rt_frame, regs=%p frame=%p newsp=%lx\n", regs, frame, newsp); #endif Index: linux-2.6.11-olh/arch/ppc64/kernel/signal32.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/signal32.c +++ linux-2.6.11-olh/arch/ppc64/kernel/signal32.c @@ -33,7 +33,7 @@ #include #include -#define DEBUG_SIG 0 +#define DEBUG 0 #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) @@ -684,7 +684,7 @@ static int handle_rt_signal32(unsigned l return 1; badframe: -#if DEBUG_SIG +#if DEBUG printk("badframe in handle_rt_signal, regs=%p frame=%p newsp=%lx\n", regs, frame, newsp); #endif @@ -857,7 +857,7 @@ static int handle_signal32(unsigned long return 1; badframe: -#if DEBUG_SIG +#if DEBUG printk("badframe in handle_signal, regs=%p frame=%x newsp=%x\n", regs, frame, *newspp); #endif -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/time.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/time.c +++ linux-2.6.11-olh/arch/ppc64/kernel/time.c @@ -546,7 +546,7 @@ void __init time_init(void) * adjust the frequency. */ -/* #define DEBUG_PPC_ADJTIMEX 1 */ +#undef DEBUG void ppc_adjtimex(void) { @@ -576,7 +576,7 @@ void ppc_adjtimex(void) /* If there is a single shot time adjustment in progress */ if ( time_adjust ) { -#ifdef DEBUG_PPC_ADJTIMEX +#ifdef DEBUG printk("ppc_adjtimex: "); if ( adjusting_time == 0 ) printk("starting "); @@ -599,7 +599,7 @@ void ppc_adjtimex(void) singleshot_ppm = -singleshot_ppm; } else { -#ifdef DEBUG_PPC_ADJTIMEX +#ifdef DEBUG if ( adjusting_time ) printk("ppc_adjtimex: ending single shot time_adjust\n"); #endif @@ -620,7 +620,7 @@ void ppc_adjtimex(void) new_tb_ticks_per_sec = tb_ticks_per_sec - tb_ticks_per_sec_delta; } -#ifdef DEBUG_PPC_ADJTIMEX +#ifdef DEBUG printk("ppc_adjtimex: ltemp = %ld, time_freq = %ld, singleshot_ppm = %ld\n", ltemp, time_freq, singleshot_ppm); printk("ppc_adjtimex: tb_ticks_per_sec - base = %ld new = %ld\n", tb_ticks_per_sec, new_tb_ticks_per_sec); #endif -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/vdso.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/vdso.c +++ linux-2.6.11-olh/arch/ppc64/kernel/vdso.c @@ -109,7 +109,7 @@ struct lib64_elfinfo }; -#ifdef __DEBUG +#ifdef DEBUG static void dump_one_vdso_page(struct page *pg, struct page *upg) { printk("kpg: %p (c:%d,f:%08lx)", __va(page_to_pfn(pg) << PAGE_SHIFT), -------------- next part -------------- Index: linux-2.6.11-olh/arch/ppc64/kernel/prom_init.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/prom_init.c +++ linux-2.6.11-olh/arch/ppc64/kernel/prom_init.c @@ -15,8 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/eeh.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/eeh.c +++ linux-2.6.11-olh/arch/ppc64/kernel/eeh.c @@ -35,8 +35,6 @@ #include #include "pci.h" -#undef DEBUG - /** Overview: * EEH, or "Extended Error Handling" is a PCI bridge technology for * dealing with PCI bus errors that can't be dealt with within the Index: linux-2.6.11-olh/arch/ppc64/kernel/pSeries_smp.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pSeries_smp.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pSeries_smp.c @@ -12,8 +12,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/lmb.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/lmb.c +++ linux-2.6.11-olh/arch/ppc64/kernel/lmb.c @@ -22,8 +22,6 @@ struct lmb lmb; -#undef DEBUG - void lmb_dump_all(void) { #ifdef DEBUG Index: linux-2.6.11-olh/arch/ppc64/kernel/pSeries_setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pSeries_setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pSeries_setup.c @@ -16,8 +16,6 @@ * bootup setup stuff.. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/idle_power4.S =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/idle_power4.S +++ linux-2.6.11-olh/arch/ppc64/kernel/idle_power4.S @@ -22,8 +22,6 @@ #include #include -#undef DEBUG - .text /* Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_smp.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_smp.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_smp.c @@ -22,8 +22,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/smp.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/smp.c +++ linux-2.6.11-olh/arch/ppc64/kernel/smp.c @@ -15,8 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/nvram.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/nvram.c +++ linux-2.6.11-olh/arch/ppc64/kernel/nvram.c @@ -33,8 +33,6 @@ #include #include -#undef DEBUG - static int nvram_scan_partitions(void); static int nvram_setup_partition(void); static int nvram_create_os_partition(void); Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_feature.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_feature.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_feature.c @@ -41,8 +41,6 @@ #include #include -#undef DEBUG - #ifdef DEBUG #define DBG(fmt...) printk(KERN_DEBUG fmt) #else Index: linux-2.6.11-olh/arch/ppc64/kernel/prom.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/prom.c +++ linux-2.6.11-olh/arch/ppc64/kernel/prom.c @@ -15,8 +15,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/setup.c @@ -10,8 +10,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/boot/main.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/boot/main.c +++ linux-2.6.11-olh/arch/ppc64/boot/main.c @@ -73,8 +73,6 @@ void *stdin; void *stdout; void *stderr; -#undef DEBUG - static unsigned long claim_base = PROG_START; static unsigned long try_claim(unsigned long size) Index: linux-2.6.11-olh/arch/ppc64/kernel/vdso.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/vdso.c +++ linux-2.6.11-olh/arch/ppc64/kernel/vdso.c @@ -36,8 +36,6 @@ #include #include -#undef DEBUG - #ifdef DEBUG #define DBG(fmt...) printk(fmt) #else Index: linux-2.6.11-olh/arch/ppc64/mm/hash_utils.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/mm/hash_utils.c +++ linux-2.6.11-olh/arch/ppc64/mm/hash_utils.c @@ -18,8 +18,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_low_i2c.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_low_i2c.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_low_i2c.c @@ -16,8 +16,6 @@ * properties parser */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/mpic.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/mpic.c +++ linux-2.6.11-olh/arch/ppc64/kernel/mpic.c @@ -12,8 +12,6 @@ * for more details. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/maple_time.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/maple_time.c +++ linux-2.6.11-olh/arch/ppc64/kernel/maple_time.c @@ -11,8 +11,6 @@ * */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_setup.c @@ -23,8 +23,6 @@ * bootup setup stuff.. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pci.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pci.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pci.c @@ -11,8 +11,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/iSeries_setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/iSeries_setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/iSeries_setup.c @@ -16,8 +16,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_time.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_time.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_time.c @@ -32,8 +32,6 @@ #include #include -#undef DEBUG - #ifdef DEBUG #define DBG(x...) printk(x) #else Index: linux-2.6.11-olh/arch/ppc64/kernel/iSeries_smp.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/iSeries_smp.c +++ linux-2.6.11-olh/arch/ppc64/kernel/iSeries_smp.c @@ -12,8 +12,6 @@ * 2 of the License, or (at your option) any later version. */ -#undef DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_pci.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_pci.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_pci.c @@ -31,8 +31,6 @@ #include "pci.h" #include "pmac.h" -#define DEBUG - #ifdef DEBUG #define DBG(x...) printk(x) #else Index: linux-2.6.11-olh/arch/ppc64/kernel/ras.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/ras.c +++ linux-2.6.11-olh/arch/ppc64/kernel/ras.c @@ -74,8 +74,6 @@ static irqreturn_t ras_epow_interrupt(in static irqreturn_t ras_error_interrupt(int irq, void *dev_id, struct pt_regs * regs); -/* #define DEBUG */ - static void request_ras_irqs(struct device_node *np, char *propname, irqreturn_t (*handler)(int, void *, struct pt_regs *), const char *name) Index: linux-2.6.11-olh/arch/ppc64/kernel/maple_setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/maple_setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/maple_setup.c @@ -11,8 +11,6 @@ * */ -#define DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pmac_nvram.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pmac_nvram.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pmac_nvram.c @@ -29,8 +29,6 @@ #include #include -#define DEBUG - #ifdef DEBUG #define DBG(x...) printk(x) #else Index: linux-2.6.11-olh/arch/ppc64/kernel/maple_pci.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/maple_pci.c +++ linux-2.6.11-olh/arch/ppc64/kernel/maple_pci.c @@ -8,8 +8,6 @@ * 2 of the License, or (at your option) any later version. */ -#define DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/pSeries_lpar.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/pSeries_lpar.c +++ linux-2.6.11-olh/arch/ppc64/kernel/pSeries_lpar.c @@ -19,8 +19,6 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#define DEBUG - #include #include #include Index: linux-2.6.11-olh/arch/ppc64/kernel/signal.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/signal.c +++ linux-2.6.11-olh/arch/ppc64/kernel/signal.c @@ -38,8 +38,6 @@ #include #include -#define DEBUG 0 - #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) #ifndef MIN Index: linux-2.6.11-olh/arch/ppc64/kernel/time.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/time.c +++ linux-2.6.11-olh/arch/ppc64/kernel/time.c @@ -546,8 +546,6 @@ void __init time_init(void) * adjust the frequency. */ -#undef DEBUG - void ppc_adjtimex(void) { unsigned long den, new_tb_ticks_per_sec, tb_ticks, old_xsec, new_tb_to_xs, new_xsec, new_stamp_xsec; Index: linux-2.6.11-olh/arch/ppc64/kernel/signal32.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/signal32.c +++ linux-2.6.11-olh/arch/ppc64/kernel/signal32.c @@ -33,8 +33,6 @@ #include #include -#define DEBUG 0 - #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) #define GP_REGS_SIZE32 min(sizeof(elf_gregset_t32), sizeof(struct pt_regs32)) From sfr at canb.auug.org.au Wed Mar 16 02:53:39 2005 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 16 Mar 2005 02:53:39 +1100 Subject: [PATCH] PPC64 iSeries: cleanup viopath In-Reply-To: <0961a209ce72bb9f2a01b163aa6e6fbd@penguinppc.org> References: <20050315143412.0c60690a.sfr@canb.auug.org.au> <0961a209ce72bb9f2a01b163aa6e6fbd@penguinppc.org> Message-ID: <20050316025339.318fc246.sfr@canb.auug.org.au> On Tue, 15 Mar 2005 08:32:27 -0600 Hollis Blanchard wrote: > > On Mar 14, 2005, at 9:34 PM, Stephen Rothwell wrote: > > > > Since you brought this file to my attention, I figured I might as well > > do > > some simple cleanups. This patch does: > > - single bit int bitfields are a bit suspect and Anndrew pointed > > out recently that they are probably slower to access than ints > > > --- linus/arch/ppc64/kernel/viopath.c 2005-03-13 04:07:42.000000000 > > +1100 > > +++ linus-cleanup.1/arch/ppc64/kernel/viopath.c 2005-03-15 > > 14:02:48.000000000 +1100 > > @@ -56,8 +57,8 @@ > > * But this allows for other support in the future. > > */ > > static struct viopathStatus { > > - int isOpen:1; /* Did we open the path? */ > > - int isActive:1; /* Do we have a mon msg outstanding */ > > + int isOpen; /* Did we open the path? */ > > + int isActive; /* Do we have a mon msg outstanding */ > > int users[VIO_MAX_SUBTYPES]; > > HvLpInstanceId mSourceInst; > > HvLpInstanceId mTargetInst; > > Why not use a byte instead of a full int (reordering the members for > alignment)? Because "classical" boleans are ints. Because I don't know the relative speed of accessing single byte variables. Because it was easy. Because we only allocate 32 of these structures. Changing them really only adds four bytes per structure. I guess using bytes and rearranging the structure could actually save 4 bytes per structure. I originally changed them to unsigned int single bit bitfields, but changed my mind - would that be better? It really makes little difference, I was just trying to get rid of the silly signed single bit bitfields ... -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050316/205a4051/attachment.pgp From linas at austin.ibm.com Wed Mar 16 04:43:10 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 15 Mar 2005 11:43:10 -0600 Subject: [PATCH] PPC64 iSeries: cleanup viopath In-Reply-To: <20050316025339.318fc246.sfr@canb.auug.org.au> References: <20050315143412.0c60690a.sfr@canb.auug.org.au> <0961a209ce72bb9f2a01b163aa6e6fbd@penguinppc.org> <20050316025339.318fc246.sfr@canb.auug.org.au> Message-ID: <20050315174310.GH498@austin.ibm.com> On Wed, Mar 16, 2005 at 02:53:39AM +1100, Stephen Rothwell was heard to remark: > On Tue, 15 Mar 2005 08:32:27 -0600 Hollis Blanchard wrote: > > > > Why not use a byte instead of a full int (reordering the members for > > alignment)? > > Because "classical" boleans are ints. > > Because I don't know the relative speed of accessing single byte variables. > > Because it was easy. > > Because we only allocate 32 of these structures. Changing them really > only adds four bytes per structure. I guess using bytes and rearranging > the structure could actually save 4 bytes per structure. FWIW, keep in mind that a cache miss due to large structures not fitting is a zillion times more expensive than byte-aligning in the cpu (even if byte operands had a cpu perf overhead, which I don't think they do on ppc). > It really makes little difference, Yep. So my apologies for making you read this email. --linas From flar at allandria.com Wed Mar 16 04:49:30 2005 From: flar at allandria.com (Brad Boyer) Date: Tue, 15 Mar 2005 09:49:30 -0800 Subject: [PATCH] PPC64 iSeries: cleanup viopath In-Reply-To: <20050315174310.GH498@austin.ibm.com> References: <20050315143412.0c60690a.sfr@canb.auug.org.au> <0961a209ce72bb9f2a01b163aa6e6fbd@penguinppc.org> <20050316025339.318fc246.sfr@canb.auug.org.au> <20050315174310.GH498@austin.ibm.com> Message-ID: <20050315174929.GC10301@pants.nu> On Tue, Mar 15, 2005 at 11:43:10AM -0600, Linas Vepstas wrote: > FWIW, keep in mind that a cache miss due to large structures not fitting > is a zillion times more expensive than byte-aligning in the cpu > (even if byte operands had a cpu perf overhead, which I don't think > they do on ppc). Actually, there is a small overhead to bytes if you make them signed. That's why char is unsigned by default on ppc. Brad Boyer flar at allandria.com From jschopp at austin.ibm.com Wed Mar 16 05:15:17 2005 From: jschopp at austin.ibm.com (Joel Schopp) Date: Tue, 15 Mar 2005 12:15:17 -0600 Subject: [PATCH][RFC] unlikely spinlocks In-Reply-To: <16949.23966.756568.902508@cargo.ozlabs.ibm.com> References: <20050302163412.0fa52c4b.moilanen@austin.ibm.com> <16949.23966.756568.902508@cargo.ozlabs.ibm.com> Message-ID: <42372635.2070705@austin.ibm.com> >>On our raw spinlocks, we currently have an attempt at the lock, and if >>we do not get it we enter a spin loop. This spinloop will likely >>continue for awhile, and we pridict likely. >> >>Shouldn't we predict that we will get out of the loop so our next >>instructions are already prefetched. Even when we miss because the lock >>is still held, it won't matter since we are waiting anyways. > > > Possibly the best thing is not to put a static prediction on it at > all, and let the machine's dynamic branch prediction decide which path > to predict? It is better to predict you will get out of the loop than to let the machine predict it. If we are wrong and go back into the loop we have all the time in the world and have lost nothing with the wrong prediction. We could predict wrong 5000 times in a row, get it right on try 5001, and still come out ahead. From olh at suse.de Wed Mar 16 08:02:46 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 15 Mar 2005 22:02:46 +0100 Subject: [PATCH] allow xmon=on,off,early Message-ID: <20050315210246.GA24477@suse.de> allow 'xmon' or 'xmon=early' to enter xmon very early during boot. allow 'xmon=on' to just enable it, or 'xmon=off' to disable it. Signed-off-by: Olaf Hering Index: linux-2.6.11-olh/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/setup.c @@ -1365,6 +1365,12 @@ EXPORT_SYMBOL(check_legacy_ioport); static int __init early_xmon(char *p) { /* ensure xmon is enabled */ + if (p) { + if (strncmp(p, "on", 2) == 0) + xmon_init(); + if (strncmp(p, "early", 5)) + return 0; + } xmon_init(); debugger(NULL); From olh at suse.de Wed Mar 16 08:26:56 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 15 Mar 2005 22:26:56 +0100 Subject: [PATCH] CONFIG_PM for ppc64, to allow sysrq o Message-ID: <20050315212656.GA24563@suse.de> For some weird reason, sysrq o is hidden behind CONFIG_PM. Why? One can power off just fine without that. Can pm_sysrq_init be moved to a better place? I think it used to be in sysrq.c in 2.4. Too bad, with this patch radeonfb fails to compile. Index: linux-2.6.11-olh/arch/ppc64/Kconfig =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/Kconfig +++ linux-2.6.11-olh/arch/ppc64/Kconfig @@ -350,6 +350,8 @@ config CMDLINE endmenu +source "kernel/power/Kconfig" + source "drivers/Kconfig" source "fs/Kconfig" From linas at austin.ibm.com Wed Mar 16 08:44:13 2005 From: linas at austin.ibm.com (Linas Vepstas) Date: Tue, 15 Mar 2005 15:44:13 -0600 Subject: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCI Error Recovery) In-Reply-To: <1110864741.29124.68.camel@gaston> References: <20050223002409.GA10909@austin.ibm.com> <20050223174356.GH13081@kroah.com> <20050224011409.GE2088@austin.ibm.com> <421DDEF7.7080103@jp.fujitsu.com> <20050224231455.GH2088@austin.ibm.com> <421E9D16.3000606@jp.fujitsu.com> <20050312013251.GA2609@austin.ibm.com> <4235847F.3080705@jp.fujitsu.com> <20050314181420.GD498@austin.ibm.com> <1110864741.29124.68.camel@gaston> Message-ID: <20050315214413.GI498@austin.ibm.com> On Tue, Mar 15, 2005 at 04:32:20PM +1100, Benjamin Herrenschmidt was heard to remark: > > Ok, let's propose what i think is a proper API and simple enough on the > driver side, ... > That > should cover all the needs we discussed so far: > > I think we need a callback in pci_driver, as I explained all along, with > a very simple semantic: > > int (*error_handler)(struct pci_dev *dev, int message); How about enum instead of int? that allows static type checking by the compiler. > 1) PCIERR_ERROR_DETECTED Elsewhere in the kernel, enums seem to be lowercase ... > Comments welcome. Linas, I'll give a try at coding something up in the > upcoming days unless you beat me to it. Looks good to me. This is a minor tweak on what I currently have, so I'll take a shot at it. Unfortunately I just lost my regular devel machine. And still haven't debugged the symbios recovery. It might take a few days, but what you wrote looks eminently workable. Also, I have not completely read (or understood) what Long Nguyen just sent in... and haven't heard him make any remarks. Sounds to me like his AER patch is a pcie-specific version of what we are talking about. It would be nice to hear for Long about his thoughts on this. --linas From moilanen at austin.ibm.com Wed Mar 16 08:51:35 2005 From: moilanen at austin.ibm.com (Jake Moilanen) Date: Tue, 15 Mar 2005 15:51:35 -0600 Subject: [PATCH 1/2] No-exec support for ppc64 In-Reply-To: <16950.3484.416343.832453@cargo.ozlabs.ibm.com> References: <20050308165904.0ce07112.moilanen@austin.ibm.com> <20050308170826.13a2299e.moilanen@austin.ibm.com> <20050310032213.GB20789@austin.ibm.com> <20050310162513.74191caa.moilanen@austin.ibm.com> <16949.25552.640180.677985@cargo.ozlabs.ibm.com> <20050314155125.68dcff70.moilanen@austin.ibm.com> <16950.3484.416343.832453@cargo.ozlabs.ibm.com> Message-ID: <20050315155135.11b942ef.moilanen@austin.ibm.com> On Tue, 15 Mar 2005 09:18:04 +1100 Paul Mackerras wrote: > Jake Moilanen writes: > > > > I don't think I can push that upstream. What happens if you leave > > > that out? > > > > The bss and the plt are in the same segment, and plt obviously needs to > > be executable. > > Yes... what I was asking was "do things actually break if you leave > that out, or does the binfmt_elf loader honour the 'x' permission on > the PT_LOAD entry for the data/bss region, meaning that it all just > works anyway?" It does not work w/o the sys_mprotect. It will hang in one of the first few binaries. I believe the problem is that the last PT_LOAD entry does not have the correct size, and we only mmap up to the sbss. The .sbss, .plt, and .bss do not get mmapped with the section. Here is /bin/bash on SLES 9: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 10000174 000174 00000d 00 A 0 0 1 ... ... [19] .data PROGBITS 1008ca80 07ca80 001b34 00 WA 0 0 4 [20] .eh_frame PROGBITS 1008e5b4 07e5b4 0000b4 00 A 0 0 4 [21] .got2 PROGBITS 1008e668 07e668 000010 00 WA 0 0 1 [22] .dynamic DYNAMIC 1008e678 07e678 0000e8 08 WA 6 0 4 [23] .ctors PROGBITS 1008e760 07e760 000008 00 WA 0 0 4 [24] .dtors PROGBITS 1008e768 07e768 000008 00 WA 0 0 4 [25] .jcr PROGBITS 1008e770 07e770 000004 00 WA 0 0 4 [26] .got PROGBITS 1008e774 07e774 000014 04 WAX 0 0 4 [27] .sdata PROGBITS 1008e788 07e788 0000d4 00 WA 0 0 4 [28] .sbss NOBITS 1008e860 07e860 000704 00 WA 0 0 8 [29] .plt NOBITS 1008ef64 07e860 000aa4 00 WAX 0 0 4 [30] .bss NOBITS 1008fa10 07e868 0062f0 00 WA 0 0 16 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x10000034 0x10000034 0x00120 0x00120 R E 0x4 INTERP 0x000174 0x10000174 0x10000174 0x0000d 0x0000d R 0x1 [Requesting program interpreter: /lib/ld.so.1] LOAD 0x000000 0x10000000 0x10000000 0x7ca80 0x7ca80 R E 0x10000 LOAD 0x07ca80 0x1008ca80 0x1008ca80 0x01ddc 0x09280 RWE 0x10000 DYNAMIC 0x07e678 0x1008e678 0x1008e678 0x000e8 0x000e8 RW 0x4 NOTE 0x000184 0x10000184 0x10000184 0x00020 0x00020 R 0x4 NOTE 0x0001a4 0x100001a4 0x100001a4 0x00018 0x00018 R 0x4 GNU_EH_FRAME 0x07ca54 0x1007ca54 0x1007ca54 0x0002c 0x0002c R 0x4 STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.SuSE .hash .dynsym .dynstr .gnu.version .g nu.version_r .rela.dyn .rela.plt .init .text text.unlikely text.hot .fini .rodat a .eh_frame_hdr 03 .data .eh_frame .got2 .dynamic .ctors .dtors .jcr .got .sdata .sbss .p lt .bss 04 .dynamic 05 .note.ABI-tag 06 .note.SuSE 07 .eh_frame_hdr 08 In the program headers section, the FileSiz for the last PT_LOAD is 0x1ddc. If we go back to the Section Headers and look at .data it is at 0x1008ca80. So the segment should end at 0x1008e85c. We round up for alignment and we get 0x1008e860 or .sbss. The sbss, plt, and bss are not mmapped. So the sys_mprotect is used to pick it up. Did I miss something to explain this? Can you think of another way to fix it? Jake From olh at suse.de Wed Mar 16 09:03:32 2005 From: olh at suse.de (Olaf Hering) Date: Tue, 15 Mar 2005 23:03:32 +0100 Subject: [PATCH] CONFIG_PM for ppc64, to allow sysrq o In-Reply-To: <20050315212656.GA24563@suse.de> References: <20050315212656.GA24563@suse.de> Message-ID: <20050315220332.GA24708@suse.de> On Tue, Mar 15, Olaf Hering wrote: > > For some weird reason, sysrq o is hidden behind CONFIG_PM. > Why? One can power off just fine without that. Can pm_sysrq_init be > moved to a better place? I think it used to be in sysrq.c in 2.4. > > Too bad, with this patch radeonfb fails to compile. After disabling radeon and this additional change, sysrq o powers off. Just dont type too fast over hvc (ctrl o o) ;) Index: linux-2.6.11-olh/arch/ppc64/kernel/setup.c =================================================================== --- linux-2.6.11-olh.orig/arch/ppc64/kernel/setup.c +++ linux-2.6.11-olh/arch/ppc64/kernel/setup.c @@ -31,6 +31,7 @@ #include #include #include +#include #include