<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le jeu. 18 avr. 2019 à 17:35, Oliver <<a href="mailto:oohall@gmail.com" target="_blank">oohall@gmail.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Apr 18, 2019 at 1:32 AM Alexandre Ghiti <<a href="mailto:aghiti@upmem.com" target="_blank">aghiti@upmem.com</a>> wrote:<br>
><br>
> Hi everyone,<br>
><br>
> We are currently developing a RDIMM with CPUs inside the DRAM chips. This DIMM does not have ECC support and then I would like to disable ECC check and correct, ie I want to set the bit 0 of the register RECR (page 210 of POWER9_Registers_vol3_version1.2_pub.pdf). I successfully do that in Hostboot, during the DRAM initialization.<br>
><br>
> My problem is that it seems that this register is "reset" during the transition from Hostboot to Skiboot: does that seem plausible to you ?<br>
><br>
> The platform I'm working on is a TL2MB1 (<a href="https://wiki.raptorcs.com/wiki/Talos_II" rel="noreferrer" target="_blank">https://wiki.raptorcs.com/wiki/Talos_II</a>).<br>
> The code I'm based on is a custom hostboot/skiboot developed by RaptorCS team:<br>
> - hostboot: <a href="https://scm.raptorcs.com/scm/git/talos-hostboot" rel="noreferrer" target="_blank">https://scm.raptorcs.com/scm/git/talos-hostboot</a> 884b60b16009061ab84db88f918902a8c8098a4b<br>
> - skiboot: <a href="https://scm.raptorcs.com/scm/git/talos-skiboot" rel="noreferrer" target="_blank">https://scm.raptorcs.com/scm/git/talos-skiboot</a> bc106a09b0ab298ec71b3ef8337fb5f820a7c454<br>
><br>
> Below is the trace I get of this transition:<br>
><br>
> 51.69046|ISTEP 21. 2 - host_verify_hdat^M<br>
> 51.70166|ISTEP 21. 3 - host_start_payload<br>
> # This is the value I'm interested in, as you can see, bit 63 is set.<br>
> 51.72614|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000^M<br>
> 51.72616|FAPI|call_host_start_payload.C: UPMEM: PIR: 0x80d^M<br>
<br>
The targeting stuff in hostboot is a bit of a headache so it's<br>
possible that you're just reading from the wrong RECR here. Try<br>
dumping out the RECRs for all four ports on each chip from hostboot<br>
and from skiboot and compare the results. You can also the getscom<br>
tool from the host, or pdbg from the BMC to read scoms. pdbg from<br>
theBMC works even when the system is checkstopped too.<br>
<br>
No sure how that's done in hostboot, but this should do it for skiboot:<br>
<br>
diff --git a/core/init.c b/core/init.c<br>
index 0fe6c16820bb..ce1eeb97a641 100644<br>
--- a/core/init.c<br>
+++ b/core/init.c<br>
@@ -923,6 +923,24 @@ bool verify_romem(void)<br>
/* Called from head.S, thus no prototype. */<br>
void main_cpu_entry(const void *fdt);<br>
<br>
+<br>
+static void dump_recr(void)<br>
+{<br>
+ uint64_t addrs[] = { 0x7010A0A, 0x7010A4A, 0x7010A8A, 0x7010ACA };<br>
+ struct proc_chip *c;<br>
+ uint64_t out;<br>
+ int i;<br>
+<br>
+ for_each_chip(c) {<br>
+ for (i = 0; i < 4; i++) {<br>
+ xscom_read(c->id, addrs[i], &out);<br>
+ prerror("XXXXX: chip: %x addr: %016llx = %016llx\n",<br>
+ c->id, addrs[i], out);<br>
+ }<br>
+ }<br>
+<br>
+}<br>
+<br>
void __noreturn __nomcount main_cpu_entry(const void *fdt)<br>
{<br>
/*<br>
@@ -1042,6 +1060,9 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)<br>
xscom_init();<br>
mfsi_init();<br>
<br>
+<br>
+ dump_recr();<br>
+<br>
/*<br>
* Direct controls facilities provides some controls over CPUs<br>
* using scoms.<br>
<br></blockquote><div><br></div><div>Thanks for this, here is the output:<br><br> 51.49558|FAPI|call_host_start_payload.C: UPMEM: 0x7010a0a value is : 0x8890a23810800000<br> 51.49560|FAPI|call_host_start_payload.C: UPMEM: PIR: 0xc<br>...<br></div><div>[ 51.401722470,5] CHIP: Chip ID 0008 type: P9N DD2.2<br>[ 51.401776856,3] XXXXX: chip: 0 addr: 0000000007010a0a = 0a10603810800000<br>[ 51.401840480,3] XXXXX: chip: 0 addr: 0000000007010a4a = 0a10603810800000<br>[ 51.401900084,3] XXXXX: chip: 0 addr: 0000000007010a8a = 0a10603810800000<br>[ 51.401949960,3] XXXXX: chip: 0 addr: 0000000007010aca = 0a10603810800000<br>[ 51.402001171,3] XXXXX: chip: 8 addr: 0000000007010a0a = 0a10603810800000<br>[ 51.402054501,3] XXXXX: chip: 8 addr: 0000000007010a4a = 0a10603810800000<br>[ 51.402106614,3] XXXXX: chip: 8 addr: 0000000007010a8a = 0a10603810800000<br>[ 51.402159370,3] XXXXX: chip: 8 addr: 0000000007010aca = 0a10603810800000<br>[ 51.402604608,5] PLAT: Using virtual UART<br><br><br></div><div>I'll dump the same thing from Hostboot tomorrow, but the only value I output does not match skiboot reading.<br></div><div>Just to make sure: there is only one scom device per socket and then what you made me dump are the values<br></div><div>for both sockets on my platform right ?<br></div><div>Would it be possible that from Hostboot, I read/write MC23 RECR register ? I don't see in the datasheet how to<br></div><div>access this device.<br><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> [ 51.501846781,5] OPAL bc106a09-upmem-dirty-c19e811 starting...^M<br>
> [ 51.501851186,7] initial console log level: memory 7, driver 5^M<br>
> [ 51.501853342,6] CPU: P9 generation processor (max 4 threads/core)^M<br>
> [ 51.501855276,7] CPU: Boot CPU PIR is 0x081c PVR is 0x004e1202^M<br>
> [ 51.501857947,7] OPAL table: 0x30101230 .. 0x301017e0, branch table: 0x30002000^M<br>
> [ 51.501860995,7] Assigning physical memory map table for nimbus^M<br>
> [ 51.501864003,7] Parsing HDAT...^M<br>
> [ 51.501865320,5] SPIRA-S found.^M<br>
> [ 51.501867765,6] BMC #0: HW version 3, SW version 2, chip DD1.0^M<br>
> [ 51.501895710,4] SENSORS: Duplicate sensor ID : 0^M<br>
> [ 51.501897519,4] SENSORS: Duplicate sensor ID : 0^M<br>
> [ 51.501906010,4] SENSORS: Duplicate sensor ID : 0^M<br>
> [ 51.501911808,4] SENSORS: Duplicate sensor ID : 0^M<br>
> [ 51.501942832,6] SP Family is ibm,ast2500,openbmc^M<br>
> [ 51.501949550,7] LPC: IOPATH chip id = 0^M<br>
> [ 51.501950989,7] LPC: FW BAR = f0000000^M<br>
> [ 51.501952683,7] LPC: MEM BAR = e0000000^M<br>
> [ 51.501954472,7] LPC: IO BAR = d0010000^M<br>
> [ 51.501956092,7] LPC: Internal BAR = c0012000^M<br>
> [ 51.501968744,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200^M<br>
> [ 51.501971651,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0^M<br>
> [ 51.502871058,5] UART: Using UART at 0x60300d00103f8^M<br>
> [ 51.504754984,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc00780078^M<br>
> [ 51.504860126,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc000c0010^M<br>
> [ 51.504930258,5] P9 DD2.21 detected^M<br>
> [ 51.504955724,5] CHIP: Chip ID 0000 type: P9N DD2.2^M<br>
> [ 51.504992069,3] UPMEM: __xscom_read: gcid, read address gcid:0x0 0x0x603fc38085050^M<br>
<br>
> # I re-read the same address: as you can see, the PIR seems to indicate that I'm on<br>
> # the same socket but the value I read is not the same I read from Hostboot.<br>
<br>
It doesn't really matter what socket the thread doing the reading is<br>
on. Each chip's xscom range has a unique address range in the global<br>
mmio map so you can read the XSCOMs of any chip from any thread.<br></blockquote><div><br></div><div>Ok thanks for that, I was not sure.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> [ 51.505061382,3] UPMEM: xscom_init: recr value gcid = 0, pir = 81c, @0x7010a0a = 0xa10603810800000^M<br>
> [ 51.505149206,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc00780078^M<br>
> [ 51.505211582,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc000c0010^M<br>
> [ 51.505264687,5] P9 DD2.21 detected^M<br>
> [ 51.505286188,5] CHIP: Chip ID 0008 type: P9N DD2.2^M<br>
> [ 51.505321741,3] UPMEM: __xscom_read: gcid, read address gcid:0x8 0x0x623fc38085050^M<br>
> # I tried the other XSCOM device (?)<br>
> [ 51.505396556,3] UPMEM: xscom_init: recr value gcid = 8, pir = 81c, @0x7010a0a = 0xa10603810800000^M<br>
> [ 51.505475821,3] UPMEM: xscom_init^M<br>
><br>
> I can provide provide any needed information :)<br>
<br>
Does it boot? If it does then it's probably safe to assume that<br>
disabling ECC worked.<br></blockquote><div><br></div><div>Yes I can boot to Linux without problem.<br></div><div> <br>Thanks Oliver and Stewart for your answers,<br><br></div><div>Alex</div></div></div></div></div>