[OpenPower-Firmware] Problem with CCS
Daniel M Crowell
dcrowell at us.ibm.com
Wed Apr 7 00:00:06 AEST 2021
Have you attempted to get a complete scom trace from the original Hostboot
code and compare it to your new code? That is a pretty typical debug
strategy on our side when migrating from the initial hardware bringup
scripts into the firmware implementation.
--
Dan Crowell
Senior Software Engineer - Power Systems Enablement Firmware
IBM Rochester: t/l 553-2987
dcrowell at us.ibm.com
From: Krystian Hebel <krystian.hebel at 3mdeb.com>
To: Daniel M Crowell <dcrowell at us.ibm.com>
Cc: firmware at 3mdeb.com, openpower-firmware at lists.ozlabs.org
Date: 04/06/2021 07:45 AM
Subject: [EXTERNAL] Re: [OpenPower-Firmware] Problem with CCS
Update: I have dealt with write leveling issue, I accidentally shifted a
bit twice when trying to set PAR_A17_MASK in SEQ_CONTROL0, so it was left
unmasked. Now I'm back to initial issue with loop in CCS. This time however
I see a difference ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Update: I have dealt with write leveling issue, I accidentally shifted a
bit twice when trying to set PAR_A17_MASK in SEQ_CONTROL0, so it was left
unmasked.
Now I'm back to initial issue with loop in CCS. This time however I see a
difference between original code (refresh on):
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000001000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000400 - PC_ERROR_STATUS0
0x0000000000002008 - PC_INIT_CAL_ERROR
0x0000000000000688 - DDRPHY_PC_INIT_CAL_STATUS
0x0000000000000080 - IOM_PHY0_DDRPHY_FIR_REG
and after setting DDRPHY_PC_INIT_CAL_CONFIG1_P0 as in previous mail:
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000001000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000000608 - DDRPHY_PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
PC_INIT_CAL_ERROR no longer reports an error, but DDRPHY_PC_INIT_CAL_STATUS
still doesn't report a success. No DQ/DQS bits are disabled, neither with
nor without refresh.
On 06.04.2021 12:28, Krystian Hebel wrote:
Hi Daniel,
Thanks for quick and informative response.
I got these answers from one of our memory experts.
Hi Krystian,
1. IBM mostly uses x4 DIMM's. Is it possible to run with a
x4 DIMM for debug purposes to see if the problem
persists? This will help debug configuration issues with
the x8 DIMM's
This may be difficult due to remote work, but I'll see what can be
done.
2. Have you tried disabling refresh to see if the issues go
away?
Is it enough to just modify DDRPHY_PC_INIT_CAL_CONFIG1_P0? If yes, I
changed all of REFRESH_COUNT, REFRESH_CONTROL and REFRESH_ALL_RANKS
to all 0's and REFRESH_INTERVAL to all 1's. It still fails the same
way, but a few microseconds faster than before.
3. For calibration fails (which it looks like you are
experiencing), I would recommend dumping the following
registers for rank 0
DQS disable bits
0x8000007d0701103f
0x8000047d0701103f
0x8000087d0701103f
0x80000c7d0701103f
0x8000107d0701103f
DQ disable bits
0x8000007c0701103f
0x8000047c0701103f
0x8000087c0701103f
0x80000c7c0701103f
0x8000107c0701103f
If calibration is passing on a given DRAM, all of the
bits should be 0's. Fails are noted by 1's in the
register. As per all PHY registers only the right most 16
bits matter.
Here I can see some fails: all DQ bits on first and second DP16 and
all configured DQS bits (0xc300 for first and 0x3c00 for second,
which is consistent with settings from [1]). The rest of DP16s
passes. This DIMM works with Hostboot so I think clock bits are
selected properly.
I haven't thought that these are updated by a hardware and then used
as an input for next steps. Now I know that what I think was a
successful write leveling, was actually skipping bad bits. I was
mislead by the fact that the second attempt took more time than the
first one, but it makes sense, as it starts from a higher initial
delay and has a longer way to go down and up again, if I understand
this step correctly.
I went a step further and dumped all WR_DELAY_VALUE_x_RP0_REG - for
passed bits it is somewhere in range 0x1900-0x2b00, where every set
of 8 DQ bits and its accompanying DQS bit have the same value, which
I believe is expected for x8 memory. For failed bits this value is
always 0x3a00 for DQ bits (and whatever is in DELAY_VALUE_16-22 which
isn't configured as a DQS), but 0x4200 for DQS bits. Contrary to
passing DP16s, these values don't change between boots. They can
change slightly when I modify DDRPHY_WC_CONFIG1_P0, but still no
pass.
4. To my knowledge, there should not be an issue sending the
RCW commands via i2c.
5. Running in our test environment, I am seeing the
following scoms for DQS align:
CRONUSDEBUG(30807) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
4000000000000000 # Stop CCS
CRONUSDEBUG(30818) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012315
000000F0CC0000C0 # Configure init calibration
CRONUSDEBUG(30823) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012335
0000000000000041 # Go to instruction 1
CRONUSDEBUG(30826) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012316
000008F0CC000000 # don't do anything
CRONUSDEBUG(30831) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 07012336
0000000000000020 # End CCS
CRONUSDEBUG(30839) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123DB
0400000000000000 # Configure the port to run
CRONUSDEBUG(30848) : PUTSCOM :
p9n.mcbist:k0:n0:s0:p01:c1 : 070123A5
8000000000000000 # Kick off CCS
I hope that this trace helps.
So, DDR_CAL_RANK in ARR1 is a number, and not a bit map of selected
ranks? That was my initial understanding, but then I changed the code
to treat it as a bit map. Still, fixing the code doesn't help, even
though now it is identical to the trace above.
[1]
https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963
--
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com
--
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openpower-firmware/attachments/20210406/dbddbf29/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openpower-firmware/attachments/20210406/dbddbf29/attachment-0001.gif>
More information about the OpenPower-Firmware
mailing list