[Skiboot] Support for DIMM without ECC

Oliver oohall at gmail.com
Fri Apr 19 01:35:21 AEST 2019


On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <aghiti at upmem.com> wrote:
>
> Hi everyone,
>
> We are currently developing a RDIMM with CPUs inside the DRAM chips. This DIMM does not have ECC support and then I would like to disable ECC check and correct, ie I want to set the bit 0 of the register RECR (page 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in Hostboot, during the DRAM initialization.
>
> My problem is that it seems that this register is "reset" during the transition from Hostboot to Skiboot: does that seem plausible to you ?
>
> The platform I'm working on is a TL2MB1 (https://wiki.raptorcs.com/wiki/Talos_II).
> The code I'm based on is a custom hostboot/skiboot developed by RaptorCS team:
> - hostboot: https://scm.raptorcs.com/scm/git/talos-hostboot 884b60b16009061ab84db88f918902a8c8098a4b
> - skiboot: https://scm.raptorcs.com/scm/git/talos-skiboot bc106a09b0ab298ec71b3ef8337fb5f820a7c454
>
> Below is the trace I get of this transition:
>
>  51.69046|ISTEP 21. 2 - host_verify_hdat^M
>  51.70166|ISTEP 21. 3 - host_start_payload
> # This is the value I'm interested in, as you can see, bit 63 is set.
>  51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000^M
>  51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M

The targeting stuff in hostboot is a bit of a headache so it's
possible that you're just reading from the wrong RECR here. Try
dumping out the RECRs for all four ports on each chip from hostboot
and from skiboot and compare the results. You can also the getscom
tool from the host, or  pdbg from the BMC to read scoms. pdbg from
theBMC works even when the system is checkstopped too.

No sure how that's done in hostboot, but this should do it for skiboot:

diff --git a/core/init.c b/core/init.c
index 0fe6c16820bb..ce1eeb97a641 100644
--- a/core/init.c
+++ b/core/init.c
@@ -923,6 +923,24 @@ bool verify_romem(void)
 /* Called from head.S, thus no prototype. */
 void main_cpu_entry(const void *fdt);

+
+static void dump_recr(void)
+{
+    uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };
+    struct proc_chip *c;
+    uint64_t out;
+    int i;
+
+    for_each_chip(c) {
+        for (i = 0; i < 4; i++) {
+            xscom_read(c->id, addrs[i], &out);
+            prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",
+                c->id, addrs[i], out);
+        }
+    }
+
+}
+
 void __noreturn __nomcount main_cpu_entry(const void *fdt)
 {
     /*
@@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
     xscom_init();
     mfsi_init();

+
+    dump_recr();
+
     /*
      * Direct controls facilities provides some controls over CPUs
      * using scoms.


> [   51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M
> [   51.501851186,7] initial console log level: memory 7, driver 5^M
> [   51.501853342,6] CPU: P9 generation processor (max 4 threads/core)^M
> [   51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M
> [   51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch table: 0x30002000^M
> [   51.501860995,7] Assigning physical memory map table for nimbus^M
> [   51.501864003,7] Parsing HDAT...^M
> [   51.501865320,5] SPIRA-S found.^M
> [   51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M
> [   51.501895710,4] SENSORS: Duplicate sensor ID : 0^M
> [   51.501897519,4] SENSORS: Duplicate sensor ID : 0^M
> [   51.501906010,4] SENSORS: Duplicate sensor ID : 0^M
> [   51.501911808,4] SENSORS: Duplicate sensor ID : 0^M
> [   51.501942832,6] SP Family is ibm,ast2500,openbmc^M
> [   51.501949550,7] LPC: IOPATH chip id = 0^M
> [   51.501950989,7] LPC: FW BAR       = f0000000^M
> [   51.501952683,7] LPC: MEM BAR      = e0000000^M
> [   51.501954472,7] LPC: IO BAR       = d0010000^M
> [   51.501956092,7] LPC: Internal BAR = c0012000^M
> [   51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200^M
> [   51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M
> [   51.502871058,5] UART: Using UART at 0x60300d00103f8^M
> [   51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc00780078^M
> [   51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc000c0010^M
> [   51.504930258,5] P9 DD2.21 detected^M
> [   51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M
> [   51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc38085050^M

> # I re-read the same address: as you can see, the PIR seems to indicate that I'm on
> # the same socket but the value I read is not the same I read from Hostboot.

It doesn't really matter what socket the thread doing the reading is
on. Each chip's xscom range has a unique address range in the global
mmio map so you can read the XSCOMs of any chip from any thread.

> [   51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir = 81c, @0x7010a0a = 0xa10603810800000^M
> [   51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc00780078^M
> [   51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc000c0010^M
> [   51.505264687,5] P9 DD2.21 detected^M
> [   51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M
> [   51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc38085050^M
> # I tried the other XSCOM device (?)
> [   51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir = 81c, @0x7010a0a = 0xa10603810800000^M
> [   51.505475821,3] UPMEM: xscom_init^M
>
> I can provide provide any needed information :)

Does it boot? If it does then it's probably safe to assume that
disabling ECC worked.

> Thanks in advance,
>
> Alexandre Ghiti
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot


More information about the Skiboot mailing list