<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Daniel,<br>
</p>
<div class="moz-cite-prefix">Thanks for quick and informative
response.</div>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica, sans-serif;font-size:10pt">
<div dir="ltr">I got these answers from one of our memory
experts.</div>
<div dir="ltr"> </div>
<div dir="ltr">
<div dir="ltr">Hi <font size="2">Krystian,</font></div>
<ol dir="ltr">
<li>IBM mostly uses x4 DIMM's. Is it possible to run with a
x4 DIMM for debug purposes to see if the problem persists?
This will help debug configuration issues with the x8
DIMM's</li>
</ol>
</div>
</div>
</blockquote>
This may be difficult due to remote work, but I'll see what can be
done.<br>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica, sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="2">
<li>Have you tried disabling refresh to see if the issues go
away?</li>
</ol>
</div>
</div>
</blockquote>
Is it enough to just modify DDRPHY_PC_INIT_CAL_CONFIG1_P0? If yes, I
changed all of REFRESH_COUNT, REFRESH_CONTROL and REFRESH_ALL_RANKS
to all 0's and REFRESH_INTERVAL to all 1's. It still fails the same
way, but a few microseconds faster than before.<br>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica, sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="3">
<li>For calibration fails (which it looks like you are
experiencing), I would recommend dumping the following
registers for rank 0<br>
DQS disable bits<br>
0x8000007d0701103f<br>
0x8000047d0701103f<br>
0x8000087d0701103f<br>
0x80000c7d0701103f<br>
0x8000107d0701103f<br>
<br>
DQ disable bits<br>
0x8000007c0701103f<br>
0x8000047c0701103f<br>
0x8000087c0701103f<br>
0x80000c7c0701103f<br>
0x8000107c0701103f<br>
<br>
If calibration is passing on a given DRAM, all of the bits
should be 0's. Fails are noted by 1's in the register. As
per all PHY registers only the right most 16 bits matter.</li>
</ol>
</div>
</div>
</blockquote>
<p>Here I can see some fails: all DQ bits on first and second DP16
and all configured DQS bits (0xc300 for first and 0x3c00 for
second, which is consistent with settings from [1]). The rest of
DP16s passes. This DIMM works with Hostboot so I think clock bits
are selected properly.</p>
<p>
I haven't thought that these are updated by a hardware and then
used as an input for next steps. Now I know that what I think was
a successful write leveling, was actually skipping bad bits. I was
mislead by the fact that the second attempt took more time than
the first one, but it makes sense, as it starts from a higher
initial delay and has a longer way to go down and up again, if I
understand this step correctly.</p>
<p>I went a step further and dumped all WR_DELAY_VALUE_x_RP0_REG -
for passed bits it is somewhere in range 0x1900-0x2b00, where
every set of 8 DQ bits and its accompanying DQS bit have the same
value, which I believe is expected for x8 memory. For failed bits
this value is always 0x3a00 for DQ bits (and whatever is in
DELAY_VALUE_16-22 which isn't configured as a DQS), but 0x4200 for
DQS bits. Contrary to passing DP16s, these values don't change
between boots. They can change slightly when I modify
DDRPHY_WC_CONFIG1_P0, but still no pass.<br>
</p>
<blockquote type="cite"
cite="mid:OF09EA2B1E.321DECBE-ON002586AB.001A1805-002586AB.001A1866@notes.na.collabserv.com">
<div class="socmaildefaultfont" dir="ltr"
style="font-family:Arial, Helvetica, sans-serif;font-size:10pt">
<div dir="ltr">
<ol dir="ltr" start="4">
<li>To my knowledge, there should not be an issue sending
the RCW commands via i2c.</li>
<li>Running in our test environment, I am seeing the
following scoms for DQS align:
<div>
<div>CRONUSDEBUG(30807) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
4000000000000000 # Stop CCS<br>
CRONUSDEBUG(30818) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012315
000000F0CC0000C0 # Configure init calibration<br>
CRONUSDEBUG(30823) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012335
0000000000000041 # Go to instruction 1<br>
CRONUSDEBUG(30826) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012316
000008F0CC000000 # don't do anything<br>
CRONUSDEBUG(30831) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012336
0000000000000020 # End CCS<br>
CRONUSDEBUG(30839) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123DB
0400000000000000 # Configure the port to run<br>
CRONUSDEBUG(30848) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
8000000000000000 # Kick off CCS<br>
<br>
I hope that this trace helps.</div>
</div>
</li>
</ol>
</div>
</div>
</blockquote>
<p>So, DDR_CAL_RANK in ARR1 is a number, and not a bit map of
selected ranks? That was my initial understanding, but then I
changed the code to treat it as a bit map. Still, fixing the code
doesn't help, even though now it is identical to the trace above.<br>
</p>
<p><br>
</p>
[1]
<a class="moz-txt-link-freetext" href="https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963">https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963</a><br>
<pre class="moz-signature" cols="72">--
Krystian Hebel
Firmware Engineer
<a class="moz-txt-link-freetext" href="https://3mdeb.com">https://3mdeb.com</a> | @3mdeb_com</pre>
</body>
</html>