[OpenPower-Firmware] Problem with CCS
Krystian Hebel
krystian.hebel at 3mdeb.com
Tue Apr 6 20:28:54 AEST 2021
Hi Daniel,
Thanks for quick and informative response.
> I got these answers from one of our memory experts.
> Hi Krystian,
>
> 1. IBM mostly uses x4 DIMM's. Is it possible to run with a x4 DIMM
> for debug purposes to see if the problem persists? This will help
> debug configuration issues with the x8 DIMM's
>
This may be difficult due to remote work, but I'll see what can be done.
>
> 2. Have you tried disabling refresh to see if the issues go away?
>
Is it enough to just modify DDRPHY_PC_INIT_CAL_CONFIG1_P0? If yes, I
changed all of REFRESH_COUNT, REFRESH_CONTROL and REFRESH_ALL_RANKS to
all 0's and REFRESH_INTERVAL to all 1's. It still fails the same way,
but a few microseconds faster than before.
>
> 3. For calibration fails (which it looks like you are experiencing),
> I would recommend dumping the following registers for rank 0
> DQS disable bits
> 0x8000007d0701103f
> 0x8000047d0701103f
> 0x8000087d0701103f
> 0x80000c7d0701103f
> 0x8000107d0701103f
>
> DQ disable bits
> 0x8000007c0701103f
> 0x8000047c0701103f
> 0x8000087c0701103f
> 0x80000c7c0701103f
> 0x8000107c0701103f
>
> If calibration is passing on a given DRAM, all of the bits should
> be 0's. Fails are noted by 1's in the register. As per all PHY
> registers only the right most 16 bits matter.
>
Here I can see some fails: all DQ bits on first and second DP16 and all
configured DQS bits (0xc300 for first and 0x3c00 for second, which is
consistent with settings from [1]). The rest of DP16s passes. This DIMM
works with Hostboot so I think clock bits are selected properly.
I haven't thought that these are updated by a hardware and then used as
an input for next steps. Now I know that what I think was a successful
write leveling, was actually skipping bad bits. I was mislead by the
fact that the second attempt took more time than the first one, but it
makes sense, as it starts from a higher initial delay and has a longer
way to go down and up again, if I understand this step correctly.
I went a step further and dumped all WR_DELAY_VALUE_x_RP0_REG - for
passed bits it is somewhere in range 0x1900-0x2b00, where every set of 8
DQ bits and its accompanying DQS bit have the same value, which I
believe is expected for x8 memory. For failed bits this value is always
0x3a00 for DQ bits (and whatever is in DELAY_VALUE_16-22 which isn't
configured as a DQS), but 0x4200 for DQS bits. Contrary to passing
DP16s, these values don't change between boots. They can change slightly
when I modify DDRPHY_WC_CONFIG1_P0, but still no pass.
> 4. To my knowledge, there should not be an issue sending the RCW
> commands via i2c.
> 5. Running in our test environment, I am seeing the following scoms
> for DQS align:
> CRONUSDEBUG(30807) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 070123A5 4000000000000000 # Stop CCS
> CRONUSDEBUG(30818) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 07012315 000000F0CC0000C0 # Configure init calibration
> CRONUSDEBUG(30823) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 07012335 0000000000000041 # Go to instruction 1
> CRONUSDEBUG(30826) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 07012316 000008F0CC000000 # don't do anything
> CRONUSDEBUG(30831) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 07012336 0000000000000020 # End CCS
> CRONUSDEBUG(30839) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 070123DB 0400000000000000 # Configure the port to run
> CRONUSDEBUG(30848) : PUTSCOM : p9n.mcbist:k0:n0:s0:p01:c1 :
> 070123A5 8000000000000000 # Kick off CCS
>
> I hope that this trace helps.
>
So, DDR_CAL_RANK in ARR1 is a number, and not a bit map of selected
ranks? That was my initial understanding, but then I changed the code to
treat it as a bit map. Still, fixing the code doesn't help, even though
now it is identical to the trace above.
[1]
https://git.raptorcs.com/git/talos-hostboot/tree/src/import/chips/p9/procedures/hwp/memory/lib/phy/dp16.C#n1963
--
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openpower-firmware/attachments/20210406/caa16219/attachment.htm>
More information about the OpenPower-Firmware
mailing list