[OpenPower-Firmware] Problem with CCS

Krystian Hebel krystian.hebel at 3mdeb.com
Thu Apr 1 23:13:11 AEDT 2021


Hello,

I am currently working on implementation of memory training in 
coreboot's port
for Talos II (POWER9). I'm stuck at DQS alignment, due to what I believe 
is CCS
problem.

During this step CCS goes crazy: first command gets written in the place 
of the
last one, minus GOTO_CMD field, which is zeroed. This results in 
infinite loop
followed by a timeout. It works correctly for previous operations - MRS 
loading,
ZQ calibration, write leveling and initial pattern write. For loading RCD
control words I'm using I2C instead of CCS - contrary to MR{0-6}, I 
haven't seen
its state being mirrored in MC registers so I think this is acceptable, 
please
correct me if I'm wrong.

This is an example of a working command (initial pattern write):

     Sending PHY calibration command 0x4000 to CCS - 1 instruction(s)
     0Last ARR0 (1) = 0x000008f0cc000000
     0Last ARR1 (1) = 0x0000000000000020, 38 us timeout
     1Last ARR0 (1) = 0x000008f0cc000000
     1Last ARR1 (1) = 0x0000000000000020
     2Last ARR1 (1) = 0x0000000000000020, took 5 us

and this is for DQS alignment:

     Sending PHY calibration command 0x2000 to CCS - 1 instruction(s)
     0Last ARR0 (1) = 0x000008f0cc000000
     0Last ARR1 (1) = 0x0000000000000020, 40 us timeout
     1Last ARR0 (1) = 0x000000f0cc0000c0
     1Last ARR1 (1) = 0x0000000000000440

'0Last' is just before setting CCS_CNTLQ_CCS_START, '1Last' is after the 
program
succeeds or times out, '2Last' is mostly just to print out the time elapsed.
Number in brackets is index of the last instruction. Full code can be 
found at
[1], mostly in files 'ccs.c' and 'istep_13_11.c'.

For even more info, I read those registers also right after setting 
CCS_START
and between initial delay and polling. Those lines are removed from the 
code as
they heavily impacted time calculation. For DQS alignment the bad values 
were
present immediately after CCS_START and hold until the end, at least at the
points where they were read. What I find surprising, just after 
CCS_START the
values change also for working CCS programs, but then they return to 
normal in
reads after initial timeout.

I also dumped error/status registers, which in most cases reports no errors
(except for write leveling which has to be run twice to complete 
successfully,
but that is another issue). For DQS alignment, APB, SEQ and WC, as well 
as all
DP16 status registers are all zeroes. This is a list of registers which have
any of the bits set:

     0x0000000000001000 - RC_ERROR_STATUS0
     0x0000000000000400 - PC_ERROR_STATUS0
     0x0000000000002008 - PC_INIT_CAL_ERROR
     0x0000000000000688 - DDRPHY_PC_INIT_CAL_STATUS
     0x0000000000000080 - IOM_PHY0_DDRPHY_FIR_REG

Values of INIT_CAL_STATUS and RC_ERROR_STATUS say there was an overflow of
refresh pending counter, but whether it is a cause or a result of CCS 
error is
beyond my current knowledge.

This is what I've tried, without success:
- playing with the settings in PC_INIT_CAL_CONFIG1: halving and zeroing
   REFRESH_COUNT and changing REFRESH_CONTROL between non-reserved values,
   I haven't touched REFRESH_ALL_RANKS because I'm testing it on just 
one 1R x8
   DIMM anyway so it shouldn't make a difference
- manually sending REF commands before calibration, both instead and in 
addition
   to those configured in PC_INIT_CAL_CONFIG1
- increasing timeout for this step - both initial delay and duration of 
polling
- re-running DQS alignment after the error
- sending CCS_STOP and waiting for completion before starting new program.
- adding delays between calibration steps (500 us, much more than 9*tREFI)
- doing initial pattern write and DQS calibration with one CCS instruction

What am I missing? Are there any other SCOM registers I can read that 
would help
with debugging?

[1] https://github.com/3mdeb/coreboot/tree/istep_13_11/src/soc/ibm/power9

-- 
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com



More information about the OpenPower-Firmware mailing list