[Skiboot] Support for DIMM without ECC

Alexandre Ghiti aghiti at upmem.com
Fri Apr 19 02:01:36 AEST 2019


Le jeu. 18 avr. 2019 à 17:35, Oliver <oohall at gmail.com> a écrit :

> On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <aghiti at upmem.com> wrote:
> >
> > Hi everyone,
> >
> > We are currently developing a RDIMM with CPUs inside the DRAM chips.
> This DIMM does not have ECC support and then I would like to disable ECC
> check and correct, ie I want to set the bit 0 of the register RECR (page
> 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in
> Hostboot, during the DRAM initialization.
> >
> > My problem is that it seems that this register is "reset" during the
> transition from Hostboot to Skiboot: does that seem plausible to you ?
> >
> > The platform I'm working on is a TL2MB1 (
> https://wiki.raptorcs.com/wiki/Talos_II).
> > The code I'm based on is a custom hostboot/skiboot developed by RaptorCS
> team:
> > - hostboot: https://scm.raptorcs.com/scm/git/talos-hostboot
> 884b60b16009061ab84db88f918902a8c8098a4b
> > - skiboot: https://scm.raptorcs.com/scm/git/talos-skiboot
> bc106a09b0ab298ec71b3ef8337fb5f820a7c454
> >
> > Below is the trace I get of this transition:
> >
> >  51.69046|ISTEP 21. 2 - host_verify_hdat^M
> >  51.70166|ISTEP 21. 3 - host_start_payload
> > # This is the value I'm interested in, as you can see, bit 63 is set.
> >  51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is :
> 0x8890a23810800000^M
> >  51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M
>
> The targeting stuff in hostboot is a bit of a headache so it's
> possible that you're just reading from the wrong RECR here. Try
> dumping out the RECRs for all four ports on each chip from hostboot
> and from skiboot and compare the results. You can also the getscom
> tool from the host, or  pdbg from the BMC to read scoms. pdbg from
> theBMC works even when the system is checkstopped too.
>
> No sure how that's done in hostboot, but this should do it for skiboot:
>
> diff --git a/core/init.c b/core/init.c
> index 0fe6c16820bb..ce1eeb97a641 100644
> --- a/core/init.c
> +++ b/core/init.c
> @@ -923,6 +923,24 @@ bool verify_romem(void)
>  /* Called from head.S, thus no prototype. */
>  void main_cpu_entry(const void *fdt);
>
> +
> +static void dump_recr(void)
> +{
> +    uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };
> +    struct proc_chip *c;
> +    uint64_t out;
> +    int i;
> +
> +    for_each_chip(c) {
> +        for (i = 0; i < 4; i++) {
> +            xscom_read(c->id, addrs[i], &out);
> +            prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",
> +                c->id, addrs[i], out);
> +        }
> +    }
> +
> +}
> +
>  void __noreturn __nomcount main_cpu_entry(const void *fdt)
>  {
>      /*
> @@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const void
> *fdt)
>      xscom_init();
>      mfsi_init();
>
> +
> +    dump_recr();
> +
>      /*
>       * Direct controls facilities provides some controls over CPUs
>       * using scoms.
>
>
Thanks for this, here is the output:

 51.49558|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is :
0x8890a23810800000
 51.49560|FAPI|call_host_start_payload.C: UPMEM: PIR: 0xc
...
[   51.401722470,5] CHIP: Chip ID 0008 type: P9N DD2.2
[   51.401776856,3] XXXXX: chip: 0 addr: 0000000007010a0a = 0a10603810800000
[   51.401840480,3] XXXXX: chip: 0 addr: 0000000007010a4a = 0a10603810800000
[   51.401900084,3] XXXXX: chip: 0 addr: 0000000007010a8a = 0a10603810800000
[   51.401949960,3] XXXXX: chip: 0 addr: 0000000007010aca = 0a10603810800000
[   51.402001171,3] XXXXX: chip: 8 addr: 0000000007010a0a = 0a10603810800000
[   51.402054501,3] XXXXX: chip: 8 addr: 0000000007010a4a = 0a10603810800000
[   51.402106614,3] XXXXX: chip: 8 addr: 0000000007010a8a = 0a10603810800000
[   51.402159370,3] XXXXX: chip: 8 addr: 0000000007010aca = 0a10603810800000
[   51.402604608,5] PLAT: Using virtual UART


I'll dump the same thing from Hostboot tomorrow, but the only value I
output does not match skiboot reading.
Just to make sure: there is only one scom device per socket and then what
you made me dump are the values
for both sockets on my platform right ?
Would it be possible that from Hostboot, I read/write MC23 RECR register ?
I don't see in the datasheet how to
access this device.



> > [   51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M
> > [   51.501851186,7] initial console log level: memory 7, driver 5^M
> > [   51.501853342,6] CPU: P9 generation processor (max 4 threads/core)^M
> > [   51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M
> > [   51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch table:
> 0x30002000^M
> > [   51.501860995,7] Assigning physical memory map table for nimbus^M
> > [   51.501864003,7] Parsing HDAT...^M
> > [   51.501865320,5] SPIRA-S found.^M
> > [   51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M
> > [   51.501895710,4] SENSORS: Duplicate sensor ID : 0^M
> > [   51.501897519,4] SENSORS: Duplicate sensor ID : 0^M
> > [   51.501906010,4] SENSORS: Duplicate sensor ID : 0^M
> > [   51.501911808,4] SENSORS: Duplicate sensor ID : 0^M
> > [   51.501942832,6] SP Family is ibm,ast2500,openbmc^M
> > [   51.501949550,7] LPC: IOPATH chip id = 0^M
> > [   51.501950989,7] LPC: FW BAR       = f0000000^M
> > [   51.501952683,7] LPC: MEM BAR      = e0000000^M
> > [   51.501954472,7] LPC: IO BAR       = d0010000^M
> > [   51.501956092,7] LPC: Internal BAR = c0012000^M
> > [   51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk =
> 1843200, baud = 115200^M
> > [   51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M
> > [   51.502871058,5] UART: Using UART at 0x60300d00103f8^M
> > [   51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc00780078^M
> > [   51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc000c0010^M
> > [   51.504930258,5] P9 DD2.21 detected^M
> > [   51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M
> > [   51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc38085050^M
>
> > # I re-read the same address: as you can see, the PIR seems to indicate
> that I'm on
> > # the same socket but the value I read is not the same I read from
> Hostboot.
>
> It doesn't really matter what socket the thread doing the reading is
> on. Each chip's xscom range has a unique address range in the global
> mmio map so you can read the XSCOMs of any chip from any thread.
>

Ok thanks for that, I was not sure.


>
> > [   51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir = 81c,
> @0x7010a0a = 0xa10603810800000^M
> > [   51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc00780078^M
> > [   51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc000c0010^M
> > [   51.505264687,5] P9 DD2.21 detected^M
> > [   51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M
> > [   51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc38085050^M
> > # I tried the other XSCOM device (?)
> > [   51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir = 81c,
> @0x7010a0a = 0xa10603810800000^M
> > [   51.505475821,3] UPMEM: xscom_init^M
> >
> > I can provide provide any needed information :)
>
> Does it boot? If it does then it's probably safe to assume that
> disabling ECC worked.
>

Yes I can boot to Linux without problem.

Thanks Oliver and Stewart for your answers,

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/skiboot/attachments/20190418/ecbad210/attachment-0001.htm>


More information about the Skiboot mailing list