[Skiboot] Support for DIMM without ECC

Alexandre Ghiti aghiti at upmem.com
Fri Apr 19 20:26:45 AEST 2019


Le jeu. 18 avr. 2019 à 18:30, Oliver <oohall at gmail.com> a écrit :

> On Fri, Apr 19, 2019 at 2:03 AM Alexandre Ghiti <aghiti at upmem.com> wrote:
> >
> >
> > Le jeu. 18 avr. 2019 à 17:35, Oliver <oohall at gmail.com> a écrit :
> >>
> >> On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <aghiti at upmem.com>
> wrote:
> >> >
> >> > Hi everyone,
> >> >
> >> > We are currently developing a RDIMM with CPUs inside the DRAM chips.
> This DIMM does not have ECC support and then I would like to disable ECC
> check and correct, ie I want to set the bit 0 of the register RECR (page
> 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in
> Hostboot, during the DRAM initialization.
> >> >
> >> > My problem is that it seems that this register is "reset" during the
> transition from Hostboot to Skiboot: does that seem plausible to you ?
> >> >
> >> > The platform I'm working on is a TL2MB1 (
> https://wiki.raptorcs.com/wiki/Talos_II).
> >> > The code I'm based on is a custom hostboot/skiboot developed by
> RaptorCS team:
> >> > - hostboot: https://scm.raptorcs.com/scm/git/talos-hostboot
> 884b60b16009061ab84db88f918902a8c8098a4b
> >> > - skiboot: https://scm.raptorcs.com/scm/git/talos-skiboot
> bc106a09b0ab298ec71b3ef8337fb5f820a7c454
> >> >
> >> > Below is the trace I get of this transition:
> >> >
> >> >  51.69046|ISTEP 21. 2 - host_verify_hdat^M
> >> >  51.70166|ISTEP 21. 3 - host_start_payload
> >> > # This is the value I'm interested in, as you can see, bit 63 is set.
> >> >  51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is :
> 0x8890a23810800000^M
> >> >  51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M
> >>
> >> The targeting stuff in hostboot is a bit of a headache so it's
> >> possible that you're just reading from the wrong RECR here. Try
> >> dumping out the RECRs for all four ports on each chip from hostboot
> >> and from skiboot and compare the results. You can also the getscom
> >> tool from the host, or  pdbg from the BMC to read scoms. pdbg from
> >> theBMC works even when the system is checkstopped too.
> >>
> >> No sure how that's done in hostboot, but this should do it for skiboot:
> >>
> >> diff --git a/core/init.c b/core/init.c
> >> index 0fe6c16820bb..ce1eeb97a641 100644
> >> --- a/core/init.c
> >> +++ b/core/init.c
> >> @@ -923,6 +923,24 @@ bool verify_romem(void)
> >>  /* Called from head.S, thus no prototype. */
> >>  void main_cpu_entry(const void *fdt);
> >>
> >> +
> >> +static void dump_recr(void)
> >> +{
> >> +    uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };
> >> +    struct proc_chip *c;
> >> +    uint64_t out;
> >> +    int i;
> >> +
> >> +    for_each_chip(c) {
> >> +        for (i = 0; i < 4; i++) {
> >> +            xscom_read(c->id, addrs[i], &out);
> >> +            prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",
> >> +                c->id, addrs[i], out);
> >> +        }
> >> +    }
> >> +
> >> +}
> >> +
> >>  void __noreturn __nomcount main_cpu_entry(const void *fdt)
> >>  {
> >>      /*
> >> @@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const
> void *fdt)
> >>      xscom_init();
> >>      mfsi_init();
> >>
> >> +
> >> +    dump_recr();
> >> +
> >>      /*
> >>       * Direct controls facilities provides some controls over CPUs
> >>       * using scoms.
> >>
> >
> > Thanks for this, here is the output:
> >
> >  51.49558|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is :
> 0x8890a23810800000
> >  51.49560|FAPI|call_host_start_payload.C: UPMEM: PIR: 0xc
> > ...
> > [   51.401722470,5] CHIP: Chip ID 0008 type: P9N DD2.2
> > [   51.401776856,3] XXXXX: chip: 0 addr: 0000000007010a0a =
> 0a10603810800000
> > [   51.401840480,3] XXXXX: chip: 0 addr: 0000000007010a4a =
> 0a10603810800000
> > [   51.401900084,3] XXXXX: chip: 0 addr: 0000000007010a8a =
> 0a10603810800000
> > [   51.401949960,3] XXXXX: chip: 0 addr: 0000000007010aca =
> 0a10603810800000
> > [   51.402001171,3] XXXXX: chip: 8 addr: 0000000007010a0a =
> 0a10603810800000
> > [   51.402054501,3] XXXXX: chip: 8 addr: 0000000007010a4a =
> 0a10603810800000
> > [   51.402106614,3] XXXXX: chip: 8 addr: 0000000007010a8a =
> 0a10603810800000
> > [   51.402159370,3] XXXXX: chip: 8 addr: 0000000007010aca =
> 0a10603810800000
> > [   51.402604608,5] PLAT: Using virtual UART
> >
> >
> > I'll dump the same thing from Hostboot tomorrow, but the only value I
> output does not match skiboot reading.
> > Just to make sure: there is only one scom device per socket and then
> what you made me dump are the values
> > for both sockets on my platform right ?
>
> Yep
>
>
Ok thanks.


> > Would it be possible that from Hostboot, I read/write MC23 RECR register
> ? I don't see in the datasheet how to
> > access this device.
>
> I had a look at what some of our internal debug tools do and it looks
> like the MC23 register set is identical to the MC01 set with the '7'
> at the top of the xscom address replaced with '8'. The full set of
> RECR registers would be:
>
> 7010A0A
> 7010A4A
> 7010A8A
> 7010ACA
> 8010A0A
> 8010A4A
> 8010A8A
> 8010ACA
>

Ok 7 and 8 being the chiplet number, perfect that makes sense, thanks.


>
> >> > [   51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M
> >> > [   51.501851186,7] initial console log level: memory 7, driver 5^M
> >> > [   51.501853342,6] CPU: P9 generation processor (max 4
> threads/core)^M
> >> > [   51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M
> >> > [   51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch
> table: 0x30002000^M
> >> > [   51.501860995,7] Assigning physical memory map table for nimbus^M
> >> > [   51.501864003,7] Parsing HDAT...^M
> >> > [   51.501865320,5] SPIRA-S found.^M
> >> > [   51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M
> >> > [   51.501895710,4] SENSORS: Duplicate sensor ID : 0^M
> >> > [   51.501897519,4] SENSORS: Duplicate sensor ID : 0^M
> >> > [   51.501906010,4] SENSORS: Duplicate sensor ID : 0^M
> >> > [   51.501911808,4] SENSORS: Duplicate sensor ID : 0^M
> >> > [   51.501942832,6] SP Family is ibm,ast2500,openbmc^M
> >> > [   51.501949550,7] LPC: IOPATH chip id = 0^M
> >> > [   51.501950989,7] LPC: FW BAR       = f0000000^M
> >> > [   51.501952683,7] LPC: MEM BAR      = e0000000^M
> >> > [   51.501954472,7] LPC: IO BAR       = d0010000^M
> >> > [   51.501956092,7] LPC: Internal BAR = c0012000^M
> >> > [   51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk =
> 1843200, baud = 115200^M
> >> > [   51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M
> >> > [   51.502871058,5] UART: Using UART at 0x60300d00103f8^M
> >> > [   51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc00780078^M
> >> > [   51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc000c0010^M
> >> > [   51.504930258,5] P9 DD2.21 detected^M
> >> > [   51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M
> >> > [   51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0
> 0x0x603fc38085050^M
> >>
> >> > # I re-read the same address: as you can see, the PIR seems to
> indicate that I'm on
> >> > # the same socket but the value I read is not the same I read from
> Hostboot.
> >>
> >> It doesn't really matter what socket the thread doing the reading is
> >> on. Each chip's xscom range has a unique address range in the global
> >> mmio map so you can read the XSCOMs of any chip from any thread.
> >
> >
> > Ok thanks for that, I was not sure.
> >
> >>
> >>
> >> > [   51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir =
> 81c, @0x7010a0a = 0xa10603810800000^M
> >> > [   51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc00780078^M
> >> > [   51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc000c0010^M
> >> > [   51.505264687,5] P9 DD2.21 detected^M
> >> > [   51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M
> >> > [   51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8
> 0x0x623fc38085050^M
> >> > # I tried the other XSCOM device (?)
> >> > [   51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir =
> 81c, @0x7010a0a = 0xa10603810800000^M
> >> > [   51.505475821,3] UPMEM: xscom_init^M
> >> >
> >> > I can provide provide any needed information :)
> >>
> >> Does it boot? If it does then it's probably safe to assume that
> >> disabling ECC worked.
> >
> >
> > Yes I can boot to Linux without problem.
>
> Neat. If you haven't already then you should look at disabling MCS
> grouping in hostboot to prevent your from being added to an interleave
> set with other DIMMs. There's a FAPI attribute that controls grouping,
> but it might be easier to just hack it out.
>

Exactly what I was doing, indeed we do not want our DIMMs to be interleaved
with 'normal' DIMMs :)

Thanks for your answer,

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/skiboot/attachments/20190419/d995bdd7/attachment-0001.htm>


More information about the Skiboot mailing list