[OpenPower-Firmware] Problem with CCS
Krystian Hebel
krystian.hebel at 3mdeb.com
Thu Apr 1 23:13:11 AEDT 2021
Hello,
I am currently working on implementation of memory training in
coreboot's port
for Talos II (POWER9). I'm stuck at DQS alignment, due to what I believe
is CCS
problem.
During this step CCS goes crazy: first command gets written in the place
of the
last one, minus GOTO_CMD field, which is zeroed. This results in
infinite loop
followed by a timeout. It works correctly for previous operations - MRS
loading,
ZQ calibration, write leveling and initial pattern write. For loading RCD
control words I'm using I2C instead of CCS - contrary to MR{0-6}, I
haven't seen
its state being mirrored in MC registers so I think this is acceptable,
please
correct me if I'm wrong.
This is an example of a working command (initial pattern write):
Sending PHY calibration command 0x4000 to CCS - 1 instruction(s)
0Last ARR0 (1) = 0x000008f0cc000000
0Last ARR1 (1) = 0x0000000000000020, 38 us timeout
1Last ARR0 (1) = 0x000008f0cc000000
1Last ARR1 (1) = 0x0000000000000020
2Last ARR1 (1) = 0x0000000000000020, took 5 us
and this is for DQS alignment:
Sending PHY calibration command 0x2000 to CCS - 1 instruction(s)
0Last ARR0 (1) = 0x000008f0cc000000
0Last ARR1 (1) = 0x0000000000000020, 40 us timeout
1Last ARR0 (1) = 0x000000f0cc0000c0
1Last ARR1 (1) = 0x0000000000000440
'0Last' is just before setting CCS_CNTLQ_CCS_START, '1Last' is after the
program
succeeds or times out, '2Last' is mostly just to print out the time elapsed.
Number in brackets is index of the last instruction. Full code can be
found at
[1], mostly in files 'ccs.c' and 'istep_13_11.c'.
For even more info, I read those registers also right after setting
CCS_START
and between initial delay and polling. Those lines are removed from the
code as
they heavily impacted time calculation. For DQS alignment the bad values
were
present immediately after CCS_START and hold until the end, at least at the
points where they were read. What I find surprising, just after
CCS_START the
values change also for working CCS programs, but then they return to
normal in
reads after initial timeout.
I also dumped error/status registers, which in most cases reports no errors
(except for write leveling which has to be run twice to complete
successfully,
but that is another issue). For DQS alignment, APB, SEQ and WC, as well
as all
DP16 status registers are all zeroes. This is a list of registers which have
any of the bits set:
0x0000000000001000 - RC_ERROR_STATUS0
0x0000000000000400 - PC_ERROR_STATUS0
0x0000000000002008 - PC_INIT_CAL_ERROR
0x0000000000000688 - DDRPHY_PC_INIT_CAL_STATUS
0x0000000000000080 - IOM_PHY0_DDRPHY_FIR_REG
Values of INIT_CAL_STATUS and RC_ERROR_STATUS say there was an overflow of
refresh pending counter, but whether it is a cause or a result of CCS
error is
beyond my current knowledge.
This is what I've tried, without success:
- playing with the settings in PC_INIT_CAL_CONFIG1: halving and zeroing
REFRESH_COUNT and changing REFRESH_CONTROL between non-reserved values,
I haven't touched REFRESH_ALL_RANKS because I'm testing it on just
one 1R x8
DIMM anyway so it shouldn't make a difference
- manually sending REF commands before calibration, both instead and in
addition
to those configured in PC_INIT_CAL_CONFIG1
- increasing timeout for this step - both initial delay and duration of
polling
- re-running DQS alignment after the error
- sending CCS_STOP and waiting for completion before starting new program.
- adding delays between calibration steps (500 us, much more than 9*tREFI)
- doing initial pattern write and DQS calibration with one CCS instruction
What am I missing? Are there any other SCOM registers I can read that
would help
with debugging?
[1] https://github.com/3mdeb/coreboot/tree/istep_13_11/src/soc/ibm/power9
--
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com
More information about the OpenPower-Firmware
mailing list