[Skiboot] Support for DIMM without ECC

Oliver oohall at gmail.com
Fri Apr 19 02:29:53 AEST 2019


On Fri, Apr 19, 2019 at 2:03 AM Alexandre Ghiti <aghiti at upmem.com> wrote:
>
>
> Le jeu. 18 avr. 2019 à 17:35, Oliver <oohall at gmail.com> a écrit :
>>
>> On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <aghiti at upmem.com> wrote:
>> >
>> > Hi everyone,
>> >
>> > We are currently developing a RDIMM with CPUs inside the DRAM chips. This DIMM does not have ECC support and then I would like to disable ECC check and correct, ie I want to set the bit 0 of the register RECR (page 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in Hostboot, during the DRAM initialization.
>> >
>> > My problem is that it seems that this register is "reset" during the transition from Hostboot to Skiboot: does that seem plausible to you ?
>> >
>> > The platform I'm working on is a TL2MB1 (https://wiki.raptorcs.com/wiki/Talos_II).
>> > The code I'm based on is a custom hostboot/skiboot developed by RaptorCS team:
>> > - hostboot: https://scm.raptorcs.com/scm/git/talos-hostboot 884b60b16009061ab84db88f918902a8c8098a4b
>> > - skiboot: https://scm.raptorcs.com/scm/git/talos-skiboot bc106a09b0ab298ec71b3ef8337fb5f820a7c454
>> >
>> > Below is the trace I get of this transition:
>> >
>> >  51.69046|ISTEP 21. 2 - host_verify_hdat^M
>> >  51.70166|ISTEP 21. 3 - host_start_payload
>> > # This is the value I'm interested in, as you can see, bit 63 is set.
>> >  51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000^M
>> >  51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M
>>
>> The targeting stuff in hostboot is a bit of a headache so it's
>> possible that you're just reading from the wrong RECR here. Try
>> dumping out the RECRs for all four ports on each chip from hostboot
>> and from skiboot and compare the results. You can also the getscom
>> tool from the host, or  pdbg from the BMC to read scoms. pdbg from
>> theBMC works even when the system is checkstopped too.
>>
>> No sure how that's done in hostboot, but this should do it for skiboot:
>>
>> diff --git a/core/init.c b/core/init.c
>> index 0fe6c16820bb..ce1eeb97a641 100644
>> --- a/core/init.c
>> +++ b/core/init.c
>> @@ -923,6 +923,24 @@ bool verify_romem(void)
>>  /* Called from head.S, thus no prototype. */
>>  void main_cpu_entry(const void *fdt);
>>
>> +
>> +static void dump_recr(void)
>> +{
>> +    uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };
>> +    struct proc_chip *c;
>> +    uint64_t out;
>> +    int i;
>> +
>> +    for_each_chip(c) {
>> +        for (i = 0; i < 4; i++) {
>> +            xscom_read(c->id, addrs[i], &out);
>> +            prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",
>> +                c->id, addrs[i], out);
>> +        }
>> +    }
>> +
>> +}
>> +
>>  void __noreturn __nomcount main_cpu_entry(const void *fdt)
>>  {
>>      /*
>> @@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
>>      xscom_init();
>>      mfsi_init();
>>
>> +
>> +    dump_recr();
>> +
>>      /*
>>       * Direct controls facilities provides some controls over CPUs
>>       * using scoms.
>>
>
> Thanks for this, here is the output:
>
>  51.49558|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000
>  51.49560|FAPI|call_host_start_payload.C: UPMEM: PIR: 0xc
> ...
> [   51.401722470,5] CHIP: Chip ID 0008 type: P9N DD2.2
> [   51.401776856,3] XXXXX: chip: 0 addr: 0000000007010a0a = 0a10603810800000
> [   51.401840480,3] XXXXX: chip: 0 addr: 0000000007010a4a = 0a10603810800000
> [   51.401900084,3] XXXXX: chip: 0 addr: 0000000007010a8a = 0a10603810800000
> [   51.401949960,3] XXXXX: chip: 0 addr: 0000000007010aca = 0a10603810800000
> [   51.402001171,3] XXXXX: chip: 8 addr: 0000000007010a0a = 0a10603810800000
> [   51.402054501,3] XXXXX: chip: 8 addr: 0000000007010a4a = 0a10603810800000
> [   51.402106614,3] XXXXX: chip: 8 addr: 0000000007010a8a = 0a10603810800000
> [   51.402159370,3] XXXXX: chip: 8 addr: 0000000007010aca = 0a10603810800000
> [   51.402604608,5] PLAT: Using virtual UART
>
>
> I'll dump the same thing from Hostboot tomorrow, but the only value I output does not match skiboot reading.
> Just to make sure: there is only one scom device per socket and then what you made me dump are the values
> for both sockets on my platform right ?

Yep

> Would it be possible that from Hostboot, I read/write MC23 RECR register ? I don't see in the datasheet how to
> access this device.

I had a look at what some of our internal debug tools do and it looks
like the MC23 register set is identical to the MC01 set with the '7'
at the top of the xscom address replaced with '8'. The full set of
RECR registers would be:

7010A0A
7010A4A
7010A8A
7010ACA
8010A0A
8010A4A
8010A8A
8010ACA

>> > [   51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M
>> > [   51.501851186,7] initial console log level: memory 7, driver 5^M
>> > [   51.501853342,6] CPU: P9 generation processor (max 4 threads/core)^M
>> > [   51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M
>> > [   51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch table: 0x30002000^M
>> > [   51.501860995,7] Assigning physical memory map table for nimbus^M
>> > [   51.501864003,7] Parsing HDAT...^M
>> > [   51.501865320,5] SPIRA-S found.^M
>> > [   51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M
>> > [   51.501895710,4] SENSORS: Duplicate sensor ID : 0^M
>> > [   51.501897519,4] SENSORS: Duplicate sensor ID : 0^M
>> > [   51.501906010,4] SENSORS: Duplicate sensor ID : 0^M
>> > [   51.501911808,4] SENSORS: Duplicate sensor ID : 0^M
>> > [   51.501942832,6] SP Family is ibm,ast2500,openbmc^M
>> > [   51.501949550,7] LPC: IOPATH chip id = 0^M
>> > [   51.501950989,7] LPC: FW BAR       = f0000000^M
>> > [   51.501952683,7] LPC: MEM BAR      = e0000000^M
>> > [   51.501954472,7] LPC: IO BAR       = d0010000^M
>> > [   51.501956092,7] LPC: Internal BAR = c0012000^M
>> > [   51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200^M
>> > [   51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M
>> > [   51.502871058,5] UART: Using UART at 0x60300d00103f8^M
>> > [   51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc00780078^M
>> > [   51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc000c0010^M
>> > [   51.504930258,5] P9 DD2.21 detected^M
>> > [   51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M
>> > [   51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc38085050^M
>>
>> > # I re-read the same address: as you can see, the PIR seems to indicate that I'm on
>> > # the same socket but the value I read is not the same I read from Hostboot.
>>
>> It doesn't really matter what socket the thread doing the reading is
>> on. Each chip's xscom range has a unique address range in the global
>> mmio map so you can read the XSCOMs of any chip from any thread.
>
>
> Ok thanks for that, I was not sure.
>
>>
>>
>> > [   51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir = 81c, @0x7010a0a = 0xa10603810800000^M
>> > [   51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc00780078^M
>> > [   51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc000c0010^M
>> > [   51.505264687,5] P9 DD2.21 detected^M
>> > [   51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M
>> > [   51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc38085050^M
>> > # I tried the other XSCOM device (?)
>> > [   51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir = 81c, @0x7010a0a = 0xa10603810800000^M
>> > [   51.505475821,3] UPMEM: xscom_init^M
>> >
>> > I can provide provide any needed information :)
>>
>> Does it boot? If it does then it's probably safe to assume that
>> disabling ECC worked.
>
>
> Yes I can boot to Linux without problem.

Neat. If you haven't already then you should look at disabling MCS
grouping in hostboot to prevent your from being added to an interleave
set with other DIMMs. There's a FAPI attribute that controls grouping,
but it might be easier to just hack it out.

>
> Thanks Oliver and Stewart for your answers,
>
> Alex


More information about the Skiboot mailing list