<div dir="ltr"><div dir="ltr"></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le jeu. 18 avr. 2019 à 18:30, Oliver <<a href="mailto:oohall@gmail.com">oohall@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Apr 19, 2019 at 2:03 AM Alexandre Ghiti <<a href="mailto:aghiti@upmem.com" target="_blank">aghiti@upmem.com</a>> wrote:<br>
><br>
><br>
> Le jeu. 18 avr. 2019 à 17:35, Oliver <<a href="mailto:oohall@gmail.com" target="_blank">oohall@gmail.com</a>> a écrit :<br>
>><br>
>> On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <<a href="mailto:aghiti@upmem.com" target="_blank">aghiti@upmem.com</a>> wrote:<br>
>> ><br>
>> > Hi everyone,<br>
>> ><br>
>> > We are currently developing a RDIMM with CPUs inside the DRAM chips. This DIMM does not have ECC support and then I would like to disable ECC check and correct, ie I want to set the bit 0 of the register RECR (page 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in Hostboot, during the DRAM initialization.<br>
>> ><br>
>> > My problem is that it seems that this register is "reset" during the transition from Hostboot to Skiboot: does that seem plausible to you ?<br>
>> ><br>
>> > The platform I'm working on is a TL2MB1 (<a href="https://wiki.raptorcs.com/wiki/Talos_II" rel="noreferrer" target="_blank">https://wiki.raptorcs.com/wiki/Talos_II</a>).<br>
>> > The code I'm based on is a custom hostboot/skiboot developed by RaptorCS team:<br>
>> > - hostboot: <a href="https://scm.raptorcs.com/scm/git/talos-hostboot" rel="noreferrer" target="_blank">https://scm.raptorcs.com/scm/git/talos-hostboot</a> 884b60b16009061ab84db88f918902a8c8098a4b<br>
>> > - skiboot: <a href="https://scm.raptorcs.com/scm/git/talos-skiboot" rel="noreferrer" target="_blank">https://scm.raptorcs.com/scm/git/talos-skiboot</a> bc106a09b0ab298ec71b3ef8337fb5f820a7c454<br>
>> ><br>
>> > Below is the trace I get of this transition:<br>
>> ><br>
>> > 51.69046|ISTEP 21. 2 - host_verify_hdat^M<br>
>> > 51.70166|ISTEP 21. 3 - host_start_payload<br>
>> > # This is the value I'm interested in, as you can see, bit 63 is set.<br>
>> > 51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000^M<br>
>> > 51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M<br>
>><br>
>> The targeting stuff in hostboot is a bit of a headache so it's<br>
>> possible that you're just reading from the wrong RECR here. Try<br>
>> dumping out the RECRs for all four ports on each chip from hostboot<br>
>> and from skiboot and compare the results. You can also the getscom<br>
>> tool from the host, or pdbg from the BMC to read scoms. pdbg from<br>
>> theBMC works even when the system is checkstopped too.<br>
>><br>
>> No sure how that's done in hostboot, but this should do it for skiboot:<br>
>><br>
>> diff --git a/core/init.c b/core/init.c<br>
>> index 0fe6c16820bb..ce1eeb97a641 100644<br>
>> --- a/core/init.c<br>
>> +++ b/core/init.c<br>
>> @@ -923,6 +923,24 @@ bool verify_romem(void)<br>
>> /* Called from head.S, thus no prototype. */<br>
>> void main_cpu_entry(const void *fdt);<br>
>><br>
>> +<br>
>> +static void dump_recr(void)<br>
>> +{<br>
>> + uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };<br>
>> + struct proc_chip *c;<br>
>> + uint64_t out;<br>
>> + int i;<br>
>> +<br>
>> + for_each_chip(c) {<br>
>> + for (i = 0; i < 4; i++) {<br>
>> + xscom_read(c->id, addrs[i], &out);<br>
>> + prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",<br>
>> + c->id, addrs[i], out);<br>
>> + }<br>
>> + }<br>
>> +<br>
>> +}<br>
>> +<br>
>> void __noreturn __nomcount main_cpu_entry(const void *fdt)<br>
>> {<br>
>> /*<br>
>> @@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)<br>
>> xscom_init();<br>
>> mfsi_init();<br>
>><br>
>> +<br>
>> + dump_recr();<br>
>> +<br>
>> /*<br>
>> * Direct controls facilities provides some controls over CPUs<br>
>> * using scoms.<br>
>><br>
><br>
> Thanks for this, here is the output:<br>
><br>
> 51.49558|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000<br>
> 51.49560|FAPI|call_host_start_payload.C: UPMEM: PIR: 0xc<br>
> ...<br>
> [ 51.401722470,5] CHIP: Chip ID 0008 type: P9N DD2.2<br>
> [ 51.401776856,3] XXXXX: chip: 0 addr: 0000000007010a0a = 0a10603810800000<br>
> [ 51.401840480,3] XXXXX: chip: 0 addr: 0000000007010a4a = 0a10603810800000<br>
> [ 51.401900084,3] XXXXX: chip: 0 addr: 0000000007010a8a = 0a10603810800000<br>
> [ 51.401949960,3] XXXXX: chip: 0 addr: 0000000007010aca = 0a10603810800000<br>
> [ 51.402001171,3] XXXXX: chip: 8 addr: 0000000007010a0a = 0a10603810800000<br>
> [ 51.402054501,3] XXXXX: chip: 8 addr: 0000000007010a4a = 0a10603810800000<br>
> [ 51.402106614,3] XXXXX: chip: 8 addr: 0000000007010a8a = 0a10603810800000<br>
> [ 51.402159370,3] XXXXX: chip: 8 addr: 0000000007010aca = 0a10603810800000<br>
> [ 51.402604608,5] PLAT: Using virtual UART<br>
><br>
><br>
> I'll dump the same thing from Hostboot tomorrow, but the only value I output does not match skiboot reading.<br>
> Just to make sure: there is only one scom device per socket and then what you made me dump are the values<br>
> for both sockets on my platform right ?<br>
<br>
Yep<br>
<br></blockquote><div><br></div><div>Ok thanks.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> Would it be possible that from Hostboot, I read/write MC23 RECR register ? I don't see in the datasheet how to<br>
> access this device.<br>
<br>
I had a look at what some of our internal debug tools do and it looks<br>
like the MC23 register set is identical to the MC01 set with the '7'<br>
at the top of the xscom address replaced with '8'. The full set of<br>
RECR registers would be:<br>
<br>
7010A0A<br>
7010A4A<br>
7010A8A<br>
7010ACA<br>
8010A0A<br>
8010A4A<br>
8010A8A<br>
8010ACA<br></blockquote><div><br></div><div>Ok 7 and 8 being the chiplet number, perfect that makes sense, thanks.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
>> > [ 51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M<br>
>> > [ 51.501851186,7] initial console log level: memory 7, driver 5^M<br>
>> > [ 51.501853342,6] CPU: P9 generation processor (max 4 threads/core)^M<br>
>> > [ 51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M<br>
>> > [ 51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch table: 0x30002000^M<br>
>> > [ 51.501860995,7] Assigning physical memory map table for nimbus^M<br>
>> > [ 51.501864003,7] Parsing HDAT...^M<br>
>> > [ 51.501865320,5] SPIRA-S found.^M<br>
>> > [ 51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M<br>
>> > [ 51.501895710,4] SENSORS: Duplicate sensor ID : 0^M<br>
>> > [ 51.501897519,4] SENSORS: Duplicate sensor ID : 0^M<br>
>> > [ 51.501906010,4] SENSORS: Duplicate sensor ID : 0^M<br>
>> > [ 51.501911808,4] SENSORS: Duplicate sensor ID : 0^M<br>
>> > [ 51.501942832,6] SP Family is ibm,ast2500,openbmc^M<br>
>> > [ 51.501949550,7] LPC: IOPATH chip id = 0^M<br>
>> > [ 51.501950989,7] LPC: FW BAR = f0000000^M<br>
>> > [ 51.501952683,7] LPC: MEM BAR = e0000000^M<br>
>> > [ 51.501954472,7] LPC: IO BAR = d0010000^M<br>
>> > [ 51.501956092,7] LPC: Internal BAR = c0012000^M<br>
>> > [ 51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200^M<br>
>> > [ 51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M<br>
>> > [ 51.502871058,5] UART: Using UART at 0x60300d00103f8^M<br>
>> > [ 51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc00780078^M<br>
>> > [ 51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc000c0010^M<br>
>> > [ 51.504930258,5] P9 DD2.21 detected^M<br>
>> > [ 51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M<br>
>> > [ 51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc38085050^M<br>
>><br>
>> > # I re-read the same address: as you can see, the PIR seems to indicate that I'm on<br>
>> > # the same socket but the value I read is not the same I read from Hostboot.<br>
>><br>
>> It doesn't really matter what socket the thread doing the reading is<br>
>> on. Each chip's xscom range has a unique address range in the global<br>
>> mmio map so you can read the XSCOMs of any chip from any thread.<br>
><br>
><br>
> Ok thanks for that, I was not sure.<br>
><br>
>><br>
>><br>
>> > [ 51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir = 81c, @0x7010a0a = 0xa10603810800000^M<br>
>> > [ 51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc00780078^M<br>
>> > [ 51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc000c0010^M<br>
>> > [ 51.505264687,5] P9 DD2.21 detected^M<br>
>> > [ 51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M<br>
>> > [ 51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc38085050^M<br>
>> > # I tried the other XSCOM device (?)<br>
>> > [ 51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir = 81c, @0x7010a0a = 0xa10603810800000^M<br>
>> > [ 51.505475821,3] UPMEM: xscom_init^M<br>
>> ><br>
>> > I can provide provide any needed information :)<br>
>><br>
>> Does it boot? If it does then it's probably safe to assume that<br>
>> disabling ECC worked.<br>
><br>
><br>
> Yes I can boot to Linux without problem.<br>
<br>
Neat. If you haven't already then you should look at disabling MCS<br>
grouping in hostboot to prevent your from being added to an interleave<br>
set with other DIMMs. There's a FAPI attribute that controls grouping,<br>
but it might be easier to just hack it out.<br></blockquote><div><br></div><div>Exactly what I was doing, indeed we do not want our DIMMs to be interleaved<br></div><div>with 'normal' DIMMs :)<br></div><div> <br></div><div>Thanks for your answer,<br><br></div><div>Alex</div><br></div></div>