<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Update: I have dealt with write leveling issue, I accidentally
shifted a bit twice when trying to set PAR_A17_MASK in
SEQ_CONTROL0, so it was left unmasked.</p>
<p>Now I'm back to initial issue with loop in CCS. This time however
I see a difference between original code (refresh on):<br>
</p>
<p> 0x0000000000000000 - APB_ERROR_STATUS0<br>
0x0000000000001000 - RC_ERROR_STATUS0<br>
0x0000000000000000 - SEQ_ERROR_STATUS0<br>
0x0000000000000000 - WC_ERROR_STATUS0<br>
0x0000000000000400 - PC_ERROR_STATUS0<br>
0x0000000000002008 - PC_INIT_CAL_ERROR<br>
0x0000000000000688 - DDRPHY_PC_INIT_CAL_STATUS<br>
0x0000000000000080 - IOM_PHY0_DDRPHY_FIR_REG<br>
</p>
<p>and after setting DDRPHY_PC_INIT_CAL_CONFIG1_P0 as in previous
mail:<br>
<br>
0x0000000000000000 - APB_ERROR_STATUS0<br>
0x0000000000001000 - RC_ERROR_STATUS0<br>
0x0000000000000000 - SEQ_ERROR_STATUS0<br>
0x0000000000000000 - WC_ERROR_STATUS0<br>
0x0000000000000000 - PC_ERROR_STATUS0<br>
0x0000000000000000 - PC_INIT_CAL_ERROR<br>
0x0000000000000608 - DDRPHY_PC_INIT_CAL_STATUS<br>
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG</p>
<p>PC_INIT_CAL_ERROR no longer reports an error, but
DDRPHY_PC_INIT_CAL_STATUS still doesn't report a success. No
DQ/DQS bits are disabled, neither with nor without refresh.<br>
</p>
<div class="moz-cite-prefix">On 06.04.2021 12:28, Krystian Hebel
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:5e371e8c-d0f0-a393-3365-2749c606e314@3mdeb.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Hi Daniel,<br>
</p>
<div class="moz-cite-prefix">Thanks for quick and informative
response.</div>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<meta http-equiv="content-type" content="text/html;
charset=UTF-8">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt">
<div dir="ltr">I got these answers from one of our memory
experts.</div>
<div dir="ltr"> </div>
<div dir="ltr">
<div dir="ltr">Hi <font size="2">Krystian,</font></div>
<ol dir="ltr">
<li>IBM mostly uses x4 DIMM's. Is it possible to run with
a x4 DIMM for debug purposes to see if the problem
persists? This will help debug configuration issues with
the x8 DIMM's</li>
</ol>
</div>
</div>
</blockquote>
This may be difficult due to remote work, but I'll see what can be
done.<br>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="2">
<li>Have you tried disabling refresh to see if the issues
go away?</li>
</ol>
</div>
</div>
</blockquote>
Is it enough to just modify DDRPHY_PC_INIT_CAL_CONFIG1_P0? If yes,
I changed all of REFRESH_COUNT, REFRESH_CONTROL and
REFRESH_ALL_RANKS to all 0's and REFRESH_INTERVAL to all 1's. It
still fails the same way, but a few microseconds faster than
before.<br>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="3">
<li>For calibration fails (which it looks like you are
experiencing), I would recommend dumping the following
registers for rank 0<br>
DQS disable bits<br>
0x8000007d0701103f<br>
0x8000047d0701103f<br>
0x8000087d0701103f<br>
0x80000c7d0701103f<br>
0x8000107d0701103f<br>
<br>
DQ disable bits<br>
0x8000007c0701103f<br>
0x8000047c0701103f<br>
0x8000087c0701103f<br>
0x80000c7c0701103f<br>
0x8000107c0701103f<br>
<br>
If calibration is passing on a given DRAM, all of the
bits should be 0's. Fails are noted by 1's in the
register. As per all PHY registers only the right most
16 bits matter.</li>
</ol>
</div>
</div>
</blockquote>
<p>Here I can see some fails: all DQ bits on first and second DP16
and all configured DQS bits (0xc300 for first and 0x3c00 for
second, which is consistent with settings from [1]). The rest of
DP16s passes. This DIMM works with Hostboot so I think clock
bits are selected properly.</p>
<p> I haven't thought that these are updated by a hardware and
then used as an input for next steps. Now I know that what I
think was a successful write leveling, was actually skipping bad
bits. I was mislead by the fact that the second attempt took
more time than the first one, but it makes sense, as it starts
from a higher initial delay and has a longer way to go down and
up again, if I understand this step correctly.</p>
<p>I went a step further and dumped all WR_DELAY_VALUE_x_RP0_REG -
for passed bits it is somewhere in range 0x1900-0x2b00, where
every set of 8 DQ bits and its accompanying DQS bit have the
same value, which I believe is expected for x8 memory. For
failed bits this value is always 0x3a00 for DQ bits (and
whatever is in DELAY_VALUE_16-22 which isn't configured as a
DQS), but 0x4200 for DQS bits. Contrary to passing DP16s, these
values don't change between boots. They can change slightly when
I modify DDRPHY_WC_CONFIG1_P0, but still no pass.<br>
</p>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica,
sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="4">
<li>To my knowledge, there should not be an issue sending
the RCW commands via i2c.</li>
<li>Running in our test environment, I am seeing the
following scoms for DQS align:
<div>
<div>CRONUSDEBUG(30807) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
4000000000000000 # Stop CCS<br>
CRONUSDEBUG(30818) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012315
000000F0CC0000C0 # Configure init calibration<br>
CRONUSDEBUG(30823) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012335
0000000000000041 # Go to instruction 1<br>
CRONUSDEBUG(30826) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012316
000008F0CC000000 # don't do anything<br>
CRONUSDEBUG(30831) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012336
0000000000000020 # End CCS<br>
CRONUSDEBUG(30839) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123DB
0400000000000000 # Configure the port to run<br>
CRONUSDEBUG(30848) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
8000000000000000 # Kick off CCS<br>
<br>
I hope that this trace helps.</div>
</div>
</li>
</ol>
</div>
</div>
</blockquote>
<p>So, DDR_CAL_RANK in ARR1 is a number, and not a bit map of
selected ranks? That was my initial understanding, but then I
changed the code to treat it as a bit map. Still, fixing the
code doesn't help, even though now it is identical to the trace
above.<br>
</p>
<p><br>
</p>
[1]
<a class="moz-txt-link-freetext"
href="https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963"
moz-do-not-send="true">https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963</a><br>
<pre class="moz-signature" cols="72">--
Krystian Hebel
Firmware Engineer
<a class="moz-txt-link-freetext" href="https://3mdeb.com" moz-do-not-send="true">https://3mdeb.com</a> | @3mdeb_com</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Krystian Hebel
Firmware Engineer
<a class="moz-txt-link-freetext" href="https://3mdeb.com">https://3mdeb.com</a> | @3mdeb_com</pre>
</body>
</html>