[v0 PATCH 0/4] Add INT mode support for EDAC drivers on Maple

Harry Ciao qingtao.cao at windriver.com
Fri May 15 18:43:50 EST 2009


Comments:		
---------

What to be added
-----------------

1, Support EDAC INT mode on Maple platform, where CPC925 Hypertransport
hostbridge controller will latch MPIC INT0 pin on receiving upstream
NMI request messages with vector == 0 that posted from Hypertransport
southbridges such as AMD8131 & AMD8111 chips.

Since multiple southbridges could post NMI request messages, EDAC core
should be responsible for maintaining the mapping from hwirq == 0 to
the related virq, that's what edac_mpic_irq.c is for - on the very first
call to edac_get_mpic_irq() related mapping will be created, and the
same virq will be returned to caller on successive calls with its
reference count increased. On EDAC driver module removal the reference
count will be decreased by edac_put_mpic_irq() accordingly, and the 
mapping will be disposed if it reaches zero. 

edac_mpic_irq.c and its exported APIs will be controlled by CONFIG_MPIC
since it will be inert for EDAC drivers where related hardware doesn't
support MPIC.

Now AMD8111 & AMD8131 EDAC drivers could register their error handlers
to the virtual IRQ that maps to hardware IRQ == 0. If they ever adopted
on a new machine other than Maple or where MPIC is not supported, their
new EDAC driver should implement a machine-specific method to get a IRQ
from their NMI request messages.

2, Add a new EDAC MCE mode for CPC925 EDAC driver. CPC925 Hypertransport
hostbridge controller may generate MCE on memory ECC Errors and Processor
Interface Errors, their EDAC handlers could be hooked into the generic MCE
handler in MCE mode.


Known limitations
------------------
I once tried to trigger memory ECC errors by trying to mask two DIMM data
pins in the way described by the first test method on EDAC twiki page(
http://bluesmoke.sourceforge.net/testing.html), but only resulted in Maple's
FRU date being destroyed and only after reflashing FRU data could Maple
boot up normally when inserted back to chassis. Since Maple is locked in
the chassis the second approach of heat-lamp won't be applicable either.

As for the MCE/INT mode support for CPC925 EDAC driver, following aspects
have been tested:
1, module initialization and deletion in MCE/INT mode;
2, creation and deletion for the mapping between hwirq==2 to a virq
   for the Hypertransport Link Errors;
3, registration and unregistration for the EDAC MCE handler from the
   generic MCE handler on PPC;

Due to the difficulty and complexity to generate a real hardware
ECC/HT Link/CPU Errors, below aspects have not been tested yet:
1, if ECC or CPU Errors would generate MCE event;
2, if HT Link Error will indeed latch MPIC INT2 pin;
3, if EDAC isr/mce methods could handle errors correctly.

As for the INT mode support for AMD8111 & AMD87131 EDAC driver,
below aspects have not been tested yet:
1, code that controls the generation of the NMI Request Message;
2, the mapping from the NMI Request Messages to MPIC INT0 pin;
3, if EDAC isr methods could handle errors correctly.

I think I am at the point where I'd like to seek comments and ideas
from others about how to resolve above test issues, hope someone knows
a proper method or has an instrument to generate real hardware errors.

Any comments are welcomed!


Test steps:
-----------
CONFIG_EDAC=y
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_AMD8111=m
CONFIG_EDAC_AMD8131=m
CONFIG_EDAC_CPC925=m

insmod edac_core.ko
insmod cpc925_edac.ko
insmod amd8111_edac.ko amd8111_op_state=1
insmod amd8131_edac.ko amd8131_op_state=1
cat /proc/interrupts

cd /sys/devices/system/edac/
cat cpu/poll_msec
cat htlink/poll_msec
cat lpc/poll_msec

rmmod cpc925_edac
rmmod amd8111_edac
rmmod amd8131_edac

insmod amd8111_edac.ko amd8111_op_state=1
insmod amd8131_edac.ko amd8131_op_state=1
insmod cpc925_edac.ko
cat /proc/interrupts

rmmod cpc925_edac
rmmod amd8111_edac
rmmod amd8131_edac
cat /proc/interrupts

insmod amd8131_edac.ko
insmod amd8111_edac.ko
cat /proc/interrupts
cd /sys/devices/system/edac/
cat lpc/poll_msec

rmmod amd8111_edac
rmmod amd8131_edac
rmmod edac_core

Test results:
-------------

root at localhost:/root> cd /int
root at localhost:/int> dmesg -n 8
root at localhost:/int> lsmod
Module                  Size  Used by
root at localhost:/int> insmod edac_core.ko 
EDAC MC: Ver: 2.1.0 May 12 2009
insmod used greatest stack depth: 4880 bytes left
root at localhost:/int> insmod amd8111_edac.ko amd8111_op_state=1
AMD8111 EDAC driver  Ver: 1.0.0 May 12 2009
	(c) 2008 Wind River Systems, Inc.
amd8111_lpc_bridge_init: port 97 is buggy, not supported by hardware?
amd8111_NMI_global_enable: PM48[NMI2SMI_EN] is cleared
EDAC DEVICE0: Giving out device to module 'amd8111_edac' controller 'lpc': DEV '0000:00:06.0' (INTERRUPT)
added one device on AMD8111 vendor 1022, device 7468, name lpc
EDAC PCI0: Giving out device to module 'amd8111_edac' controller 'AMD8111_PCI_Controller': DEV '0000:00:05.0' (INTERRUPT)
added one device on AMD8111 vendor 1022, device 7460, name AMD8111_PCI_Controller
irq: irq 0 on host /hostbridge at 0/interrupt-controller at f8040000 mapped to virtual irq 18
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        120        300   MPIC      Edge      serial
 18:          0          0   MPIC      Edge      [EDAC] AMD8111
 22:       6020      23894   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       2912       2595   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> insmod amd8131_edac.ko amd8131_op_state=1
AMD8131 EDAC driver  Ver: 1.0.0 May 12 2009
	(c) 2008 Wind River Systems, Inc.
EDAC PCI1: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_NORTH_A': DEV '0000:00:01.0' (INTERRUPT)
added one device on AMD8131 vendor 1022, device 7451, devfn 8, name AMD8131_PCIX_NORTH_A
EDAC PCI2: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_NORTH_B': DEV '0000:00:02.0' (INTERRUPT)
added one device on AMD8131 vendor 1022, device 7451, devfn 10, name AMD8131_PCIX_NORTH_B
EDAC PCI3: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_SOUTH_A': DEV '0000:00:03.0' (INTERRUPT)
added one device on AMD8131 vendor 1022, device 7451, devfn 18, name AMD8131_PCIX_SOUTH_A
EDAC PCI4: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_SOUTH_B': DEV '0000:00:04.0' (INTERRUPT)
added one device on AMD8131 vendor 1022, device 7451, devfn 20, name AMD8131_PCIX_SOUTH_B
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        141        420   MPIC      Edge      serial
 18:          0          0   MPIC      Edge      [EDAC] AMD8111, [EDAC] AMD8131
 22:       6031      23955   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       2931       2608   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> insmod cpc925_edac.ko 
IBM CPC925 EDAC driver  Ver: 1.0.0 May 12 2009
	(c) 2008 Wind River Systems, Inc
EDAC MC0: Giving out device to 'cpc925_edac' 'cpc925_edac': DEV cpc925_edac.0
EDAC DEVICE1: Giving out device to module 'cpc925_edac' controller 'cpu': DEV 'cpu.0' (INTERRUPT)
irq: irq 2 on host /hostbridge at 0/interrupt-controller at f8040000 mapped to virtual irq 19
EDAC DEVICE2: Giving out device to module 'cpc925_edac' controller 'htlink': DEV 'htlink.0' (INTERRUPT)
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        172        464   MPIC      Edge      serial
 18:          0          0   MPIC      Edge      [EDAC] AMD8111, [EDAC] AMD8131
 19:          0          0   MPIC      Edge      [EDAC] CPC925 
 22:       6186      24557   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       2971       2632   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> cd /sys/devices/system/edac/
root at localhost:/sys/devices/system/edac> ls -lt
total 0
drwxr-xr-x 3 root root 0 Jan  1 05:46 cpu
drwxr-xr-x 3 root root 0 Jan  1 05:46 htlink
drwxr-xr-x 3 root root 0 Jan  1 05:46 lpc
drwxr-xr-x 3 root root 0 Jan  1 05:46 mc
drwxr-xr-x 7 root root 0 Jan  1 05:46 pci
root at localhost:/sys/devices/system/edac> cat cpu/poll_msec 
0
root at localhost:/sys/devices/system/edac> cat htlink/poll_msec 
0
root at localhost:/sys/devices/system/edac> cat lpc/poll_msec 
0
root at localhost:/sys/devices/system/edac> ls -lt mc/mc0
total 0
-r--r--r-- 1 root root 4096 Jan  1 05:46 ce_count
-r--r--r-- 1 root root 4096 Jan  1 05:46 ce_noinfo_count
drwxr-xr-x 2 root root    0 Jan  1 05:46 csrow0
drwxr-xr-x 2 root root    0 Jan  1 05:46 csrow4
lrwxrwxrwx 1 root root    0 Jan  1 05:46 device -> ../../../../platform/cpc925_edac.0
-r--r--r-- 1 root root 4096 Jan  1 05:46 mc_name
--w------- 1 root root 4096 Jan  1 05:46 reset_counters
-rw-r--r-- 1 root root 4096 Jan  1 05:46 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Jan  1 05:46 seconds_since_reset
-r--r--r-- 1 root root 4096 Jan  1 05:46 size_mb
-r--r--r-- 1 root root 4096 Jan  1 05:46 ue_count
-r--r--r-- 1 root root 4096 Jan  1 05:46 ue_noinfo_count
root at localhost:/sys/devices/system/edac> ls -lt pci   
total 0
-rw-r--r-- 1 root root 4096 Jan  1 05:46 check_pci_errors
-rw-r--r-- 1 root root 4096 Jan  1 05:46 edac_pci_log_npe
-rw-r--r-- 1 root root 4096 Jan  1 05:46 edac_pci_log_pe
-rw-r--r-- 1 root root 4096 Jan  1 05:46 edac_pci_panic_on_pe
drwxr-xr-x 2 root root    0 Jan  1 05:46 pci0
drwxr-xr-x 2 root root    0 Jan  1 05:46 pci1
drwxr-xr-x 2 root root    0 Jan  1 05:46 pci2
drwxr-xr-x 2 root root    0 Jan  1 05:46 pci3
drwxr-xr-x 2 root root    0 Jan  1 05:46 pci4
-r--r--r-- 1 root root 4096 Jan  1 05:46 pci_nonparity_count
-r--r--r-- 1 root root 4096 Jan  1 05:46 pci_parity_count
root at localhost:/sys/devices/system/edac> cd /int
root at localhost:/int> rmmod amd8111_edac.ko 
EDAC PCI: Removed device 0 for amd8111_edac AMD8111_PCI_Controller: DEV 0000:00:05.0
EDAC MC: Removed device 0 for amd8111_edac lpc: DEV 0000:00:06.0
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        278        792   MPIC      Edge      serial
 18:          0          0   MPIC      Edge      [EDAC] AMD8131
 19:          0          0   MPIC      Edge      [EDAC] CPC925 
 22:       6484      25426   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       3047       2707   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> rmmod amd8131_edac.ko 
EDAC PCI: Removed device 4 for amd8131_edac AMD8131_PCIX_SOUTH_B: DEV 0000:00:04.0
EDAC PCI: Removed device 3 for amd8131_edac AMD8131_PCIX_SOUTH_A: DEV 0000:00:03.0
EDAC PCI: Removed device 2 for amd8131_edac AMD8131_PCIX_NORTH_B: DEV 0000:00:02.0
EDAC PCI: Removed device 1 for amd8131_edac AMD8131_PCIX_NORTH_A: DEV 0000:00:01.0
root at localhost:/int> rmmod cpc925_edac.ko 
EDAC MC: Removed device 1 for cpc925_edac cpu: DEV cpu.0
EDAC MC: Removed device 2 for cpc925_edac htlink: DEV htlink.0
EDAC MC: Removed device 0 for cpc925_edac cpc925_edac: DEV cpc925_edac.0
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        305        890   MPIC      Edge      serial
 22:       6659      25995   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       3107       2766   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> ls -lt /sys/devices/system/edac/
total 0
drwxr-xr-x 2 root root 0 Jan  1 05:46 mc
root at localhost:/int> dmesg -n 4
root at localhost:/int> insmod cpc925_edac.ko 
root at localhost:/int> insmod amd8131_edac.ko amd8131_op_state=1
root at localhost:/int> insmod amd8111_edac.ko amd8111_op_state=1
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        404       1163   MPIC      Edge      serial
 18:          0          0   MPIC      Edge      [EDAC] CPC925 
 19:          0          0   MPIC      Edge      [EDAC] AMD8131, [EDAC] AMD8111
 22:       6946      27069   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       3244       2877   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> rmmod amd8131_edac.ko 
root at localhost:/int> rmmod amd8111_edac.ko 
root at localhost:/int> rmmod cpc925_edac.ko 
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        456       1268   MPIC      Edge      serial
 22:       7097      27525   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       3318       2936   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> dmesg -n 8
root at localhost:/int> insmod amd8131_edac.ko 
AMD8131 EDAC driver  Ver: 1.0.0 May 12 2009
	(c) 2008 Wind River Systems, Inc.
EDAC PCI10: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_NORTH_A': DEV '0000:00:01.0' (POLLED)
added one device on AMD8131 vendor 1022, device 7451, devfn 8, name AMD8131_PCIX_NORTH_A
EDAC PCI11: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_NORTH_B': DEV '0000:00:02.0' (POLLED)
added one device on AMD8131 vendor 1022, device 7451, devfn 10, name AMD8131_PCIX_NORTH_B
EDAC PCI12: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_SOUTH_A': DEV '0000:00:03.0' (POLLED)
added one device on AMD8131 vendor 1022, device 7451, devfn 18, name AMD8131_PCIX_SOUTH_A
EDAC PCI13: Giving out device to module 'amd8131_edac' controller 'AMD8131_PCIX_SOUTH_B': DEV '0000:00:04.0' (POLLED)
added one device on AMD8131 vendor 1022, device 7451, devfn 20, name AMD8131_PCIX_SOUTH_B
root at localhost:/int> insmod amd8111_edac.ko 
AMD8111 EDAC driver  Ver: 1.0.0 May 12 2009
	(c) 2008 Wind River Systems, Inc.
amd8111_lpc_bridge_init: port 97 is buggy, not supported by hardware?
EDAC DEVICE8: Giving out device to module 'amd8111_edac' controller 'lpc': DEV '0000:00:06.0' (POLLED)
added one device on AMD8111 vendor 1022, device 7468, name lpc
EDAC PCI14: Giving out device to module 'amd8111_edac' controller 'AMD8111_PCI_Controller': DEV '0000:00:05.0' (POLLED)
added one device on AMD8111 vendor 1022, device 7460, name AMD8111_PCI_Controller
root at localhost:/int> cat /proc/interrupts 
           CPU0       CPU1       
 16:        480       1393   MPIC      Edge      serial
 22:       7130      27610   MPIC      Level     eth6
 25:          0          0   MPIC      Level     ohci_hcd:usb1, ohci_hcd:usb2
251:          0          0   MPIC      Edge      ipi call function
252:       3346       2964   MPIC      Edge      ipi reschedule
253:          0          0   MPIC      Edge      ipi call function single
254:          0          0   MPIC      Edge      ipi debugger
BAD:          0
root at localhost:/int> cd /sys/devices/system/edac/
root at localhost:/sys/devices/system/edac> ls -lt
total 0
drwxr-xr-x 3 root root 0 Jan  1 05:48 lpc
drwxr-xr-x 7 root root 0 Jan  1 05:48 pci
drwxr-xr-x 2 root root 0 Jan  1 05:48 mc
root at localhost:/sys/devices/system/edac> cat lpc/poll_msec 
1000
root at localhost:/sys/devices/system/edac> ls -lt pci
total 0
-rw-r--r-- 1 root root 4096 Jan  1 05:48 check_pci_errors
-rw-r--r-- 1 root root 4096 Jan  1 05:48 edac_pci_log_npe
-rw-r--r-- 1 root root 4096 Jan  1 05:48 edac_pci_log_pe
-rw-r--r-- 1 root root 4096 Jan  1 05:48 edac_pci_panic_on_pe
drwxr-xr-x 2 root root    0 Jan  1 05:48 pci10
drwxr-xr-x 2 root root    0 Jan  1 05:48 pci11
drwxr-xr-x 2 root root    0 Jan  1 05:48 pci12
drwxr-xr-x 2 root root    0 Jan  1 05:48 pci13
drwxr-xr-x 2 root root    0 Jan  1 05:48 pci14
-r--r--r-- 1 root root 4096 Jan  1 05:48 pci_nonparity_count
-r--r--r-- 1 root root 4096 Jan  1 05:48 pci_parity_count
root at localhost:/sys/devices/system/edac> cd /int
root at localhost:/int> rmmod amd8111_edac.ko 
EDAC PCI: Removed device 14 for amd8111_edac AMD8111_PCI_Controller: DEV 0000:00:05.0
EDAC MC: Removed device 8 for amd8111_edac lpc: DEV 0000:00:06.0
root at localhost:/int> rmmod amd8131_edac.ko 
EDAC PCI: Removed device 13 for amd8131_edac AMD8131_PCIX_SOUTH_B: DEV 0000:00:04.0
EDAC PCI: Removed device 12 for amd8131_edac AMD8131_PCIX_SOUTH_A: DEV 0000:00:03.0
EDAC PCI: Removed device 11 for amd8131_edac AMD8131_PCIX_NORTH_B: DEV 0000:00:02.0
EDAC PCI: Removed device 10 for amd8131_edac AMD8131_PCIX_NORTH_A: DEV 0000:00:01.0
root at localhost:/int> rmmod edac_core.ko 
root at localhost:/int> lsmod
Module                  Size  Used by
root at localhost:/int> 


diffstat:
---------
0001-EDAC-MPIC-Hypertransport-IRQ-support.patch
 drivers/edac/Makefile        |    4 +
 drivers/edac/edac_mpic_irq.c |  145 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/edac.h         |   23 ++++++
 3 files changed, 172 insertions(+)

0002-EDAC-MCE-INT-mode-support-for-CPC925-driver.patch
 arch/powerpc/kernel/traps.c |   16 ++
 drivers/edac/cpc925_edac.c  |  280 +++++++++++++++++++++++++++++++++++++++++---
 drivers/edac/edac_stub.c    |    6 
 include/linux/edac.h        |    6 
 4 files changed, 289 insertions(+), 19 deletions(-)

0003-EDAC-INT-mode-support-for-AMD8111-driver.patch
 amd8111_edac.c |  352 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 amd8111_edac.h |   43 ++++++
 2 files changed, 347 insertions(+), 48 deletions(-)

0004-EDAC-INT-mode-support-for-AMD8131-driver.patch
 amd8131_edac.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 amd8131_edac.h |   20 ++++++
 2 files changed, 174 insertions(+), 19 deletions(-)



More information about the Linuxppc-dev mailing list