<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>No, I haven't. How can I get it?<br>
</p>
<div class="moz-cite-prefix">On 06.04.2021 16:00, Daniel M Crowell
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:OF3A5A2C8C.6340E7C7-ON862586AF.004CC354-862586AF.004CEA27@notes.na.collabserv.com">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<p><font size="2">Have you attempted to get a complete scom trace
from the original Hostboot code and compare it to your new
code? That is a pretty typical debug strategy on our side when
migrating from the initial hardware bringup scripts into the
firmware implementation.</font><br>
<font size="2"><br>
--<br>
Dan Crowell<br>
Senior Software Engineer - Power Systems Enablement Firmware<br>
IBM Rochester: t/l 553-2987<br>
<a class="moz-txt-link-abbreviated" href="mailto:dcrowell@us.ibm.com">dcrowell@us.ibm.com</a></font><br>
<br>
<img src="cid:part1.72668866.150D7313@3mdeb.com" alt="Inactive
hide details for Krystian Hebel ---04/06/2021 07:45:26
AM---Update: I have dealt with write leveling issue, I
accident" class="" width="16" height="16" border="0"><font
size="2" color="#424282">Krystian Hebel ---04/06/2021 07:45:26
AM---Update: I have dealt with write leveling issue, I
accidentally shifted a bit twice when trying to s</font><br>
<br>
<font size="2" color="#5F5F5F">From: </font><font size="2">Krystian
Hebel <a class="moz-txt-link-rfc2396E" href="mailto:krystian.hebel@3mdeb.com"><krystian.hebel@3mdeb.com></a></font><br>
<font size="2" color="#5F5F5F">To: </font><font size="2">Daniel
M Crowell <a class="moz-txt-link-rfc2396E" href="mailto:dcrowell@us.ibm.com"><dcrowell@us.ibm.com></a></font><br>
<font size="2" color="#5F5F5F">Cc: </font><font size="2"><a class="moz-txt-link-abbreviated" href="mailto:firmware@3mdeb.com">firmware@3mdeb.com</a>,
<a class="moz-txt-link-abbreviated" href="mailto:openpower-firmware@lists.ozlabs.org">openpower-firmware@lists.ozlabs.org</a></font><br>
<font size="2" color="#5F5F5F">Date: </font><font size="2">04/06/2021
07:45 AM</font><br>
<font size="2" color="#5F5F5F">Subject: </font><font size="2">[EXTERNAL]
Re: [OpenPower-Firmware] Problem with CCS</font><br>
</p>
<hr style="color:#8091A5; " width="100%" size="2"
noshade="noshade" align="left"><br>
<br>
<br>
<font size="1" color="#FFFFFF">Update: I have dealt with write
leveling issue, I accidentally shifted a bit twice when trying
to set PAR_A17_MASK in SEQ_CONTROL0, so it was left unmasked.
Now I'm back to initial issue with loop in CCS. This time
however I see a difference ZjQcmQRYFpfptBannerStart</font> <br>
<b><font face="Arial">This Message Is From an External Sender </font></b><br>
<font size="2" face="Arial">This message came from outside your
organization. </font><br>
<font size="1" color="#FFFFFF">ZjQcmQRYFpfptBannerEnd</font>
<p>Update: I have dealt with write leveling issue, I accidentally
shifted a bit twice when trying to set PAR_A17_MASK in
SEQ_CONTROL0, so it was left unmasked.
</p>
<p>Now I'm back to initial issue with loop in CCS. This time
however I see a difference between original code (refresh on):
</p>
<p> 0x0000000000000000 - APB_ERROR_STATUS0<br>
0x0000000000001000 - RC_ERROR_STATUS0<br>
0x0000000000000000 - SEQ_ERROR_STATUS0<br>
0x0000000000000000 - WC_ERROR_STATUS0<br>
0x0000000000000400 - PC_ERROR_STATUS0<br>
0x0000000000002008 - PC_INIT_CAL_ERROR<br>
0x0000000000000688 - DDRPHY_PC_INIT_CAL_STATUS<br>
0x0000000000000080 - IOM_PHY0_DDRPHY_FIR_REG
</p>
<p>and after setting DDRPHY_PC_INIT_CAL_CONFIG1_P0 as in previous
mail:<br>
<br>
0x0000000000000000 - APB_ERROR_STATUS0<br>
0x0000000000001000 - RC_ERROR_STATUS0<br>
0x0000000000000000 - SEQ_ERROR_STATUS0<br>
0x0000000000000000 - WC_ERROR_STATUS0<br>
0x0000000000000000 - PC_ERROR_STATUS0<br>
0x0000000000000000 - PC_INIT_CAL_ERROR<br>
0x0000000000000608 - DDRPHY_PC_INIT_CAL_STATUS<br>
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
</p>
<p>PC_INIT_CAL_ERROR no longer reports an error, but
DDRPHY_PC_INIT_CAL_STATUS still doesn't report a success. No
DQ/DQS bits are disabled, neither with nor without refresh.
</p>
<p>On 06.04.2021 12:28, Krystian Hebel wrote:
</p>
<ul>
<ul>
Hi Daniel,
<p>Thanks for quick and informative response.
</p>
<ul>
<ul>
<font size="2" face="Arial">I got these answers from one
of our memory experts.</font><br>
<font size="2" face="Arial"> </font><br>
<font size="2" face="Arial">Hi Krystian,</font>
<ul>
<font size="2">1. </font><font size="2" face="Arial">IBM
mostly uses x4 DIMM's. Is it possible to run with a x4
DIMM for debug purposes to see if the problem
persists? This will help debug configuration issues
with the x8 DIMM's</font>
</ul>
</ul>
</ul>
This may be difficult due to remote work, but I'll see what
can be done.
<ul>
<ul>
<ul>
<font size="2">2. </font><font size="2" face="Arial">Have
you tried disabling refresh to see if the issues go
away?</font>
</ul>
</ul>
</ul>
Is it enough to just modify DDRPHY_PC_INIT_CAL_CONFIG1_P0? If
yes, I changed all of REFRESH_COUNT, REFRESH_CONTROL and
REFRESH_ALL_RANKS to all 0's and REFRESH_INTERVAL to all 1's.
It still fails the same way, but a few microseconds faster
than before.
<ul>
<ul>
<ul>
<font size="2">3. </font><font size="2" face="Arial">For
calibration fails (which it looks like you are
experiencing), I would recommend dumping the following
registers for rank 0<br>
DQS disable bits<br>
0x8000007d0701103f<br>
0x8000047d0701103f<br>
0x8000087d0701103f<br>
0x80000c7d0701103f<br>
0x8000107d0701103f<br>
<br>
DQ disable bits<br>
0x8000007c0701103f<br>
0x8000047c0701103f<br>
0x8000087c0701103f<br>
0x80000c7c0701103f<br>
0x8000107c0701103f<br>
<br>
If calibration is passing on a given DRAM, all of the
bits should be 0's. Fails are noted by 1's in the
register. As per all PHY registers only the right most
16 bits matter.</font>
</ul>
</ul>
</ul>
Here I can see some fails: all DQ bits on first and second
DP16 and all configured DQS bits (0xc300 for first and 0x3c00
for second, which is consistent with settings from [1]). The
rest of DP16s passes. This DIMM works with Hostboot so I think
clock bits are selected properly.
<p>I haven't thought that these are updated by a hardware and
then used as an input for next steps. Now I know that what I
think was a successful write leveling, was actually skipping
bad bits. I was mislead by the fact that the second attempt
took more time than the first one, but it makes sense, as it
starts from a higher initial delay and has a longer way to
go down and up again, if I understand this step correctly.
</p>
<p>I went a step further and dumped all
WR_DELAY_VALUE_x_RP0_REG - for passed bits it is somewhere
in range 0x1900-0x2b00, where every set of 8 DQ bits and its
accompanying DQS bit have the same value, which I believe is
expected for x8 memory. For failed bits this value is always
0x3a00 for DQ bits (and whatever is in DELAY_VALUE_16-22
which isn't configured as a DQS), but 0x4200 for DQS bits.
Contrary to passing DP16s, these values don't change between
boots. They can change slightly when I modify
DDRPHY_WC_CONFIG1_P0, but still no pass.
</p>
<ul>
<ul>
<ul>
<font size="2">4. </font><font size="2" face="Arial">To
my knowledge, there should not be an issue sending the
RCW commands via i2c.</font><br>
<font size="2">5. </font><font size="2" face="Arial">Running
in our test environment, I am seeing the following
scoms for DQS align: </font>
<ul>
<font size="2" face="Arial">CRONUSDEBUG(30807) :
PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
070123A5 4000000000000000 # Stop CCS<br>
CRONUSDEBUG(30818) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012315
000000F0CC0000C0 # Configure init calibration<br>
CRONUSDEBUG(30823) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012335
0000000000000041 # Go to instruction 1<br>
CRONUSDEBUG(30826) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012316
000008F0CC000000 # don't do anything<br>
CRONUSDEBUG(30831) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012336
0000000000000020 # End CCS<br>
CRONUSDEBUG(30839) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123DB
0400000000000000 # Configure the port to run<br>
CRONUSDEBUG(30848) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
8000000000000000 # Kick off CCS<br>
<br>
I hope that this trace helps.</font>
</ul>
</ul>
</ul>
</ul>
So, DDR_CAL_RANK in ARR1 is a number, and not a bit map of
selected ranks? That was my initial understanding, but then I
changed the code to treat it as a bit map. Still, fixing the
code doesn't help, even though now it is identical to the
trace above.
<p>[1] <a
href="https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963"
moz-do-not-send="true"><u><font color="#0000FF">https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963</font></u></a><br>
<tt>-- <br>
Krystian Hebel<br>
Firmware Engineer<br>
</tt><a href="https://3mdeb.com" moz-do-not-send="true"><tt><u><font
color="#0000FF">https://3mdeb.com</font></u></tt></a><tt> |
@3mdeb_com</tt></p>
</ul>
</ul>
<tt>-- <br>
Krystian Hebel<br>
Firmware Engineer<br>
</tt><a href="https://3mdeb.com" moz-do-not-send="true"><tt><u><font
color="#0000FF">https://3mdeb.com</font></u></tt></a><tt> |
@3mdeb_com</tt><br>
<br>
<br>
</blockquote>
<pre class="moz-signature" cols="72">--
Krystian Hebel
Firmware Engineer
<a class="moz-txt-link-freetext" href="https://3mdeb.com">https://3mdeb.com</a> | @3mdeb_com</pre>
</body>
</html>